Deploy Enterprise Data Preparation on the AWS Cloud Marketplace (10.5)

Deploy Enterprise Data Preparation on the AWS Cloud Marketplace (10.5)

Informatica Domain

Informatica Domain

The Informatica domain is a server component that hosts application services, such as the Model Repository Service and the Data Integration Service. These services, together with domain clients, enable you to create and run mappings and other objects to extract, transform, and write data.

Application Services

Enterprise Data Preparation Service
The Enterprise Data Preparation Service is an application service in the Informatica domain that runs the Enterprise Data Preparation application.
Interactive Data Preparation Service
The Interactive Data Preparation Service is an application service in the Informatica domain that manages data preparation within the Enterprise Data Preparation application.
Catalog Service
The Catalog Service is an application service in the Informatica domain that runs the Enterprise Data Catalog application, and manages connections between service components and external applications.
Model Repository Service
The Model Repository Service is an application service in the Informatica domain that manages the Model repository. The Model repository stores metadata created by Informatica products in a relational database to enable collaboration among the products. Informatica Developer, the Data Integration Service, and the Administrator tool store metadata in the Model repository.
Data Integration Service
The Data Integration Service is an application service in the Informatica domain that performs data integration tasks for the Developer tool and for external clients.
Metadata Access Service
The Metadata Access Service is an application service that allows the Developer tool to access Hadoop connection information to import and preview metadata. The Metadata Access Service contains information about the Service Principal Name (SPN) and keytab information if the Hadoop cluster uses Kerberos authentication.
Content Management Service
The Content Management Service is an application service in the Informatica domain that manages reference data and is responsible for compiling rule specifications into mapplets. The Content Management Service provides reference data information to the Data Integration Service and to the Developer tool and Analyst tool. The Content Management Service stores reference data in a database that you specify.
Analyst Service
The Analyst Service is an application service in the Informatica domain that runs Informatica Analyst. The Analyst Service manages the connection between the service components and the users who log in to Analyst tool. You can perform column and rule profiling, manage scorecards, and manage bad records and duplicate records in the Analyst tool. The Analyst Service stores profiling, scorecarding, and bad and duplicate record data in databases that you specify.
Informatica Cluster Service
The Informatica Cluster Service is an application service that runs and manages all the associated services that are required to run Enterprise Data Catalog in the Informatica domain. The associated services include Mongo DB, Nomad, Solr, PostgreSQL, and ZooKeeper.
The Informatica domain can run several other services. For more information about Informatica services, see the

Repositories

Informatica repositories, hosted on Oracle or Microsoft SQL Server databases, store metadata about domain objects. Informatica repositories include the following:
Domain configuration repository
The domain configuration repository stores configuration metadata about the Informatica domain. It also stores user privileges and permissions.
Model repository
The Model repository stores metadata for projects and folders and their contents, including all repository objects such as mappings and workflows. The repository also stores rules you apply during data preparation.
Data Preparation repository
The Data Preparation repository stores worksheet metadata created when you use the Enterprise Data Preparation application to prepare data.
In addition to these domain repositories, the solution also requires a repository for Hive metadata. This repository is hosted on an SQL database. It stores Hive table metadata to enable Hadoop operations.
For more information about domain repositories, see the

Clusters

Informatica uses the following EC2 node clusters.
Informatica Cluster
The Informatica cluster deployed on Amazon EC2 instances that the Catalog Service uses to run metadata processing and profiling jobs.
Informatica compute cluster on EMR
A compute cluster of Amazon EC2 instances with autoscaling enabled that Enterprise Data Preparation uses to publish prepared data to the data lake.