Integration Guide

10.2.2
- 10.5.5
- 10.5.4.1
- 10.5.4
- 10.5.3
- 10.5.2
- 10.5.1
- 10.5
- 10.4.1
- 10.4.0
- 10.2.2 HotFix 1
- 10.2.2 Service Pack 1
- 10.2.1

Back Next

Configure *-site.xml Files for MapR

The Hadoop administrator needs to configure *-site.xml file properties and restart impacted services before the Informatica administrator imports cluster information into the domain.

core.site.xml

Configure the following properties in the core-site.xml file:

fs.s3.enableServerSideEncryption: Enables server side encryption for S3 buckets. Required for SSE and SSE-KMS encryption.; Set to: TRUE
fs.s3a.access.key: The ID for the Blaze and Spark engines to connect to the Amazon S3 file system.; Set to your access key.
fs.s3a.secret.key: The password for the Blaze and Spark engines to connect to the Amazon S3 file system; Set to your access ID.
fs.s3a.server-side-encryption-algorithm: The server-side encryption algorithm for S3. Required for SSE and SSE-KMS encryption. Set to the encryption algorithm used.
hadoop.proxyuser.<proxy user>.groups: Defines the groups that the proxy user account can impersonate. On a secure cluster the <proxy user> is the Service Principal Name that corresponds to the cluster keytab file. On a non-secure cluster, the <proxy user> is the system user that runs the Informatica daemon.; Set to group names of impersonation users separated by commas. If less security is preferred, use the wildcard " * " to allow impersonation from any group.
hadoop.proxyuser.<proxy user>.hosts: Defines the host machines that a user account can impersonate. On a secure cluster the <proxy user> is the Service Principal Name that corresponds to the cluster keytab file. On a non-secure cluster, the <proxy user> is the system user that runs the Informatica daemon.; Set to a single host name or IP address, or set to a comma-separated list. If less security is preferred, use the wildcard " * " to allow impersonation from any host.
hadoop.proxyuser.yarn.groups: Comma-separated list of groups that you want to allow the YARN user to impersonate on a non-secure cluster.; Set to group names of impersonation users separated by commas. If less security is preferred, use the wildcard " * " to allow impersonation from any group.
hadoop.proxyuser.yarn.hosts: Comma-separated list of hosts that you want to allow the YARN user to impersonate on a non-secure cluster.; Set to a single host name or IP address, or set to a comma-separated list. If less security is preferred, use the wildcard " * " to allow impersonation from any host.
io.compression.codecs: Enables compression on temporary staging tables.; Set to a comma-separated list of compression codec classes on the cluster.
hadoop.security.auth_to_local: Translates the principal names from the Active Directory and MIT realm into local names within the Hadoop cluster. Based on the Hadoop cluster used, you can set multiple rules.; Set to: RULE:[1:$1@$0](^.*@YOUR.REALM)s/^(.*)@YOUR.REALM\.COM$/$1/g; Set to: RULE:[2:$1@$0](^.*@YOUR.REALM\.$)s/^(.*)@YOUR.REALM\.COM$/$1/g

hbase-site.xml

Configure the following properties in the hbase-site.xml file:

zookeeper.znode.parent: Identifies HBase master and region servers.; Set to the relative path to the znode directory of HBase.

hive-site.xml

Configure the following properties in the hive-site.xml file:

hive.cluster.delegation.token.store.class: The token store implementation. Required for HiveServer2 high availability and load balancing.; Set to: org.apache.hadoop.hive.thrift.DBTokenStore
hive.compactor.initiator.on: Runs the initiator and cleaner threads on metastore instance. Required for an Update Strategy transformation in a mapping that writes to a Hive target.; Set to: TRUE
hive.compactor.worker.threads: The number of worker threads to run in a metastore instance. Required for an Update Strategy transformation in a mapping that writes to a Hive target.; Set to: 1
hive.enforce.bucketing: Enables dynamic bucketing while loading to Hive. Required for an Update Strategy transformation in a mapping that writes to a Hive target.; Set to: TRUE
hive.exec.dynamic.partition: Enables dynamic partitioned tables for Hive tables. Applicable for Hive versions 0.9 and earlier.; Set to: TRUE
hive.exec.dynamic.partition.mode: Allows all partitions to be dynamic. Required for the Update Strategy transformation in a mapping that writes to a Hive target. Also required if you use Sqoop and define a DDL query to create or replace a partitioned Hive target at run time.; Set to: nonstrict
hive.support.concurrency: Enables table locking in Hive. Required for an Update Strategy transformation in a mapping that writes to a Hive target.; Set to: TRUE
hive.server2.support.dynamic.service.discovery: Enables HiveServer2 dynamic service discovery. Required for HiveServer2 high availability.

Set to: TRUE
hive.server2.zookeeper.namespace: The value of the ZooKeeper namespace in the JDBC connection string. Required for HiveServer2 high availability.; Set to:
jdbc:hive2://<zookeeper_ensemble>/default;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
hive.txn.manager: Turns on transaction support. Required for an Update Strategy transformation in a mapping that writes to a Hive target.; Set to: org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
hive.zookeeper.quorum: Comma-separated list of ZooKeeper server host:ports in a cluster. The value of the ZooKeeper ensemble in the JDBC connection string. Required for HiveServer2 high availability.; Set to:
jdbc:hive2://<zookeeper_ensemble>/default;serviceDiscoveryMode=zooKeeper;

mapred-site.xml

Configure the following properties in the mapred-site.xml file:

mapreduce.framework.name: The run-time framework to run MapReduce jobs. Values can be local, classic, or yarn. Required for Sqoop.; Set to: yarn
mapreduce.jobhistory.address: Location of the MapReduce JobHistory Server. The default port is 10020. Required for Sqoop.; Set to: <MapReduce JobHistory Server>:<port>
yarn.app.mapreduce.am.staging-dir: The HDFS staging directory used while submitting jobs.; Set to the staging directory path.

yarn-site.xml

Configure the following properties in the yarn-site.xml file:

yarn.application.classpath: Required for dynamic resource allocation.; Add spark_shuffle.jar to the class path. The .jar file must contain the class "org.apache.spark.network.yarn.YarnShuffleService."
yarn.nodemanager.resource.memory-mb: The maximum RAM available for each container. Set the maximum memory on the cluster to increase resource memory available to the Blaze engine.; Set to 16 GB if value is less than 16 GB.
yarn.nodemanager.resource.cpu-vcores: The number of virtual cores for each container. Required for Blaze engine resource allocation.; Set to 10 if the value is less than 10.
yarn.scheduler.minimum-allocation-mb: The minimum RAM available for each container. Required for Blaze engine resource allocation.; Set to 6 GB if the value is less than 6 GB.
yarn.nodemanager.vmem-check-enabled: Disables virtual memory limits for containers. Required for the Blaze and Spark engines.; Set to: FALSE
yarn.nodemanager.aux-services: Required for dynamic resource allocation for the Spark engine.; Add an entry for "spark_shuffle."
yarn.nodemanager.aux-services.spark_shuffle.class: Required for dynamic resource allocation for the Spark engine.; Set to: org.apache.spark.network.yarn.YarnShuffleService
yarn.resourcemanager.scheduler.class: Defines the YARN scheduler that the Data Integration Service uses to assign resources.; Set to: org.apache.hadoop.yarn.server.resourcemanager.scheduler
yarn.node-labels.enabled: Enables node labeling.; Set to: TRUE
yarn.node-labels.fs-store.root-dir: The HDFS location to update node label dynamically.; Set to: <hdfs://[Node name]:[Port]/[Path to store]/[Node labels]/>

Rename Saved Search

Table of Contents

Integration Guide

Integration Guide

Configure *-site.xml Files for MapR

Configure *-site.xml Files for MapR

core.site.xml

hbase-site.xml

hive-site.xml

mapred-site.xml

yarn-site.xml