Common Content for Data Engineering | 10.2.1 Service Pack 2

This content is shared by multiple products within the Data Engineering product family. Visit the product pages to see content that is specific to each product.

Big Data Release Notes

This document contains important information about Emergency Bug Fixes in Informatica 10.2.1 Service Pack 2.

Introduction to the Python Transformation

Watch an introduction to the Python transformation and learn how to use it to solve a classic Titanic use case.

Installing Python for the Python Transformation on Hadoop

Learn how to install Python on a Data Integration Service machine so that the Spark engine can run Python transformations for Data Engineering products.

Configuring a Databricks Cluster to Run Python Transformations

Learn how to configure a Databricks cluster to run Python transformations.

Running the Python Transformation on an Azure Databricks Cluster

Learn how to run a Python transformation on an Azure Databricks cluster.

Configuring Git Version Control for Model Repository Service in 10.2 HotFix 1

You can integrate a Model repository with a Perforce, Subversion, or Git version control system. This article discusses how to integrate a Git system with a Model Repository Service in 10.2 HotFix 1.

Creating a Parameter File for Informatica Developer

A parameter file is an .xml file that lists user-defined parameters and their assigned values. Parameter files provide the flexibility to change parameter values each time that you run a mapping or a workflow. Generate a parameter file based on a mapping or workflow using the Developer tool or the command line. Edit the contents of the file …

Creating a REST Web Service in the Informatica Developer Tool

You can create an Informatica REST web service that returns data to a web service client in JSON or XML format. The article explains how to define a REST web service in the Developer tool. The REST web service runs a mapping that returns hierarchical data in JSON format to a web service client browser.

Enabling SAML Authentication with NetScaler for Web Applications

You can enable users to log into Informatica web applications using single sign-on. This article explains how to configure single sign-on in an Informatica domain using Security Assertion Markup Language (SAML) v2.0 and the Citrix NetScaler 13.0 identity provider.

How to Create Ephemeral Clusters with Data Engineering 10.4.1

You can use a workflow to automate the creation of a cluster on supported cloud platforms. The workflow creates a cluster and runs mappings and other workflow tasks. When you include a Delete Cluster task, so that the cluster terminates when workflow tasks are complete, the cluster is known as an ephemeral cluster.

How to Migrate Microsoft SQL Server Connections from the OLE DB to the ODBC Provider Type

When you create a Microsoft SQL Server connection, you can use the OLE DB or ODBC provider types. If required, you can migrate the OLE DB provider type to the ODBC provider type. This article explains how to migrate Microsoft SQL Server connections from the OLE DB provider type to the ODBC provider type.

Using Operating System Profiles in a Hadoop Environment

An operating system profile is a type of security that the Data Integration Service uses to run mappings. You can define an operating system profile as a user to run mappings. Use operating system profiles to increase security and to isolate the run-time environment for users.

Using the SQL Transformation in an Informatica Developer Mapping

You can run SQL queries against a relational database midstream in a mapping. This article describes how to configure an SQL transformation in a logical data object mapping in the Developer tool.

Integrate

Implementing Big Data Management 10.2 with Ephemeral Clusters in a MS Azure Cloud Environment

You can take advantage of cloud computing efficiencies and power by deploying the Informatica Big Data Management solution in the Microsoft Azure environment. You can use a hybrid solution to offload or extend on-premises applications to the cloud. You can also use a lift-and-shift strategy to move an existing on-premises big data solution …

Install Python for the Python transformation on Hadoop

Follow the steps to install Python on each Data Integration Service machine so that the Spark engine can run the Python transformation. This article uses Python 3.6.5 on Cloudera CDH 6.1, but you can follow similar steps for other Python and Hadoop distributions such as Amazon EMR.

Optimize and Tune

Improving the Performance of the Model Repository

The Model repository is a relational database that contains metadata about connections, applications and workflows, transformations and functions, and other objects. You can perform several tasks to improve the performance of the Model repository and its interactions with other Informatica services, and with databases and external clients.

Informatica Developer Tool Naming Conventions

This article provides standardized naming conventions for repository objects. Naming conventions improve readability for anyone reviewing or carrying out maintenance on repository objects. The application and enforcement of naming standards establishes consistency in the repository and creates a developer-friendly environment. In addition, …

Optimizing the Data Integration Service to Process Concurrent Web Services

The Data Integration Service runs concurrent web service requests according to the properties that you configure on the Data Integration Service and the application properties that you configure for each web service object. When you optimize the properties that affect web service concurrency, you can improve performance.

Tuning the Hardware and Hadoop Cluster for Informatica Big Data Products

You can tune the hardware and the Hadoop cluster for better performance of Informatica big data products. This article provides tuning recommendations for Hadoop administrators and system administrators who set up the Hadoop cluster and hardware for Informatica big data products.

Administration

High availability and disaster recovery in a Data Engineering domain

Learn how the Informatica domain and application services in Data Engineering meet disaster recovery and high availability requirements.

Informatica Architecture: Nodes and Domains

The Informatica domain consists of one or more servers, one or more installations of the Informatica software, and at least one relational database. This article is a discussion of how nodes work with the database, communications between nodes, what happens when a node dies, and basic troubleshooting on your domain.

Security

Enabling SAML Authentication in an Informatica 10.2.x Domain

You can enable users to log into Informatica web applications using single sign-on. This article explains how to configure single sign-on in an Informatica 10.2.x domain using Security Assertion Markup Language (SAML) and Microsoft Active Directory Federation Services (AD FS).

Install and Upgrade

10.5 Upgrade Paths

How to Migrate Mappings from the Hive Engine

Effective in version 10.2.2, Informatica dropped support for the Hive engine. You can run mappings on the Blaze and Spark engines in the Hadoop environment or on the Databricks Spark engine in the Databricks environment. This article tells how to change the validation and run-time environments for mappings, and it describes processing …

Additional Articles

Configuring SAML-based Single Sign-on for Informatica 10.1.1 Web Applications

You can enable users to log into the Administrator tool, the Analyst tool and the Monitoring tool using single sign-on. This article explains how to configure single sign-on in an Informatica domain using Security Assertion Markup Language (SAML) and Microsoft Active Directory Federation Services (AD FS).

Tuning the Performance of the Monitoring Model Repository

The monitoring Model repository is a relational database instance. The monitoring Model Repository Service monitors the Data Integration Service jobs, and stores the statistics in the monitoring Model repository. This article discusses the methods that you can use to improve monitoring Model repository performance.

Download Documentation Set

Send Feedback

Resources

Communities

Knowledge Base

Success Portal

Rename Saved Search