fs.s3.awsAccessKeyID
The ID for the run-time engine to connect to the Amazon S3 file system. Required for the Blaze engine and for the Spark engine if the Data Integration if S3 policy does not allow EMR access.
If the Data Integration Service is deployed on an EC2 instance and the IAM roles and policies allow access to S3 and other resources, this property is not required. If the Data Integration Service is deployed on-premises, then you can choose to configure the value for this property in the cluster configuration on the Data Integration Service after you import the cluster configuration. Configuring the AccessKeyID value on the cluster configuration is more secure than configuring it in core-site.xml on the cluster.
Set to your access ID.
fs.s3.awsSecretAccessKey
The access key for the Blaze and Spark engines to connect to the Amazon S3 file system. Required for the Blaze engine and for the Spark engine if the Data Integration if S3 policy does not allow EMR access.
If the Data Integration Service is deployed on an EC2 instance and the IAM roles and policies allow access to S3 and other resources, this property is not required. If the Data Integration Service is deployed on-premises, then you can choose to configure the value for this property in the cluster configuration on the Data Integration Service after you import the cluster configuration. Configuring the AccessKeyID value on the cluster configuration is more secure than configuring it in core-site.xml on the cluster.
Set to your access key.
fs.s3.enableServerSideEncryption
Enables server side encryption for hive buckets. Required if the S3 bucket is encrypted. Required for EMR 5.14 integration if the S3 bucket is encrypted with SSE-KMS.
Set to: TRUE
fs.s3a.server-side-encryption-algorithm
The server-side encryption algorithm for S3. Required if the S3 bucket is encrypted using an algorithm. Required for EMR 5.14 integration if the S3 bucket is encrypted with SSE-KMS.
Set to the encryption algorithm used.
fs.s3a.endpoint
URL of the entry point for the web service. Required for EMR 5.14 integration if the S3 bucket is encrypted with SSE-KMS.
For example:
<property>
<name>fs.s3a.endpoint</name>
<value>s3-us-west-1.amazonaws.com</value>
</property>
fs.s3a.bucket.BUCKET_NAME.server-side-encryption.key
Server-side encryption key for the S3 bucket. Required for EMR 5.14 integration if the S3 bucket is encrypted with SSE-KMS.
For example:
<property>
<name>fs.s3a.bucket.BUCKET_NAME.server-side-encryption.key</name>
<value>arn:aws:kms:us-west-1*******/value>
<source>core-site.xml</source>
</property>
where BUCKET_NAME is the name of the S3 bucket.
hadoop.proxyuser.<proxy user>.groups
Defines the groups that the proxy user account can impersonate. On a secure cluster the <proxy user> is the Service Principal Name that corresponds to the cluster keytab file. On a non-secure cluster, the <proxy user> is the system user that runs the Informatica daemon.
Set to group names of impersonation users separated by commas. If less security is preferred, use the wildcard " * " to allow impersonation from any group.
hadoop.proxyuser.<proxy user>.hosts
Defines the host machines that a user account can impersonate. On a secure cluster the <proxy user> is the Service Principal Name that corresponds to the cluster keytab file. On a non-secure cluster, the <proxy user> is the system user that runs the Informatica daemon.
Set to a single host name or IP address, or set to a comma-separated list. If less security is preferred, use the wildcard " * " to allow impersonation from any host.
hadoop.proxyuser.yarn.groups
Comma-separated list of groups that you want to allow the YARN user to impersonate on a non-secure cluster.
Set to group names of impersonation users separated by commas. If less security is preferred, use the wildcard " * " to allow impersonation from any group.
hadoop.proxyuser.yarn.hosts
Comma-separated list of hosts that you want to allow the YARN user to impersonate on a non-secure cluster.
Set to a single host name or IP address, or set to a comma-separated list. If less security is preferred, use the wildcard " * " to allow impersonation from any host.
hadoop.security.auth_to_local
Translates the principal names from the Active Directory and MIT realm into local names within the Hadoop cluster. Based on the Hadoop cluster used, you can set multiple rules.
Set to: RULE:[1:$1@$0](^.*@YOUR.REALM)s/^(.*)@YOUR.REALM\.COM$/$1/g
Set to: RULE:[2:$1@$0](^.*@YOUR.REALM\.$)s/^(.*)@YOUR.REALM\.COM$/$1/g
io.compression.codecs
Enables compression on temporary staging tables.
Set to a comma-separated list of compression codec classes on the cluster.