Table of Contents

Search

  1. Preface
  2. Part 1: Hadoop Integration
  3. Part 2: Databricks Integration
  4. Appendix A: Connections Reference

Configure the Files to Use S3

Configure the Files to Use S3

To run mappings using S3 resources, you must configure the files from the master node to the Data Integration Service machine.
Perform this task in the following situations:
  • You are integrating for the first time.
  • You upgraded from any Informatica version and changed the distribution version.
Perform the following steps to configure master node files to integrate with EMR:
  1. Copy the hadoop-assembly .JAR file.
  2. Create an AWS configuration file on the Data Integration Service machine.
  3. Create and configure an environment variable.
  4. Copy and replace the hadoop-common .JAR file.
  5. Recycle the Data Integration Service.
Perform these steps regardless of the version of EMR you are integrating with, but note that the version number parts of file names may vary depending on the version.

Example

This example contains file names that support integration with EMR 5.20.
Copy the .jar file
To integrate with EMR 5.20, get
emrfs-hadoop-assembly-2.29.0.jar
from the Hadoop administrator. Copy
the file to the following locations on each Data Integration Service machine:
/<Informatica installation directory>/services/shared/hadoop/EMR_5.16/lib
Required when you run mapping on the Spark engine.
Create a file
Create a
~/.aws/config
on the Data Integration Service machine. The file must contain the AWS location.
For example,
[default] region=us-west-2
Required when you run mapping on the Spark engine.
Create an environment variable
Create the
AWS_CONFIG_FILE
environment variable on the Data Integration Service machine. Set the value to
<EMR_5.16>/conf/aws.default
Required when you run mapping on the Spark and Blaze engines.
Copy and replace a file
Copy hadoop-common-2.8.5-amzn-1.jar from the following location in the EMR 5.20 cluster:
/usr/lib/hadoop
Replace the file in the following location:
<Informatica installation directory>/services/shared/hadoop/EMR_5.16/lib
Required when you run mapping on the Spark engine.
Recycle the Data Integration Service
You must recycle the Data Integration Service to reflect the changes.

0 COMMENTS

We’d like to hear from you!