Integration Guide

Back Next

Configure the Files on S3 on the Spark Engine

To run mappings on S3 on the Spark engine, you need to configure the files from the master node to the Data Integration Service machine.

Perform this task in the following situations:
You are integrating for the first time. You upgraded from any Informatica version and changed the distribution version.

You can perform the following steps to configure the files to integrate with EMR 5.16:

Copy the .jar file: To integrate with EMR 5.16, get
emrfs-hadoop-assembly-2.25.0.jar
from the Hadoop administrator. Copy
the file to the following locations on each Data Integration Service machine:
/<Informatica installation directory>/services/shared/hadoop/EMR_<version number>/lib; If you upgraded from EMR 5.10 to EMR 5.14, the part of the file path that includes
EMR_<version number>
remains
EMR_5.10
.

Create a file: Create a
~/.aws/config
on the Data Integration Service machine. The file must contain AWS location.; For example,
[default] region=us-west-2
Create an environment variable: Create
AWS_CONFIG_FILE
environment variable on the Data Integration Service machine. Set the value to
<EMR_5.10>/conf/aws.default

You can perform the following steps to configure the files to integrate with EMR 5.20:

Copy the .jar file: To integrate with EMR 5.20, get
emrfs-hadoop-assembly-2.29.0.jar
from the Hadoop administrator. Copy
the file to the following locations on each Data Integration Service machine:
/<Informatica installation directory>/services/shared/hadoop/EMR_5.16/lib
Create a file: Create a
~/.aws/config
on the Data Integration Service machine. The file must contain the AWS location.; For example,
[default] region=us-west-2
Create an environment variable: Create the
AWS_CONFIG_FILE
environment variable on the Data Integration Service machine. Set the value to
<EMR_5.16>/conf/aws.default
Copy and replace a file: Copy hadoop-common-2.8.5-amzn-1.jar from the following location in the EMR 5.20 cluster:; /usr/lib/hadoop; Replace the file in the following location:; <Informatica installation directory>/services/shared/hadoop/EMR_5.16/lib
Recycle the Data Integration Service: You must recycle the Data Integration Service to reflect the changes.