Table of Contents

Search

  1. Preface
  2. Introduction to Big Data Streaming
  3. Big Data Streaming Configuration
  4. Sources in a Streaming Mapping
  5. Targets in a Streaming Mapping
  6. Streaming Mappings
  7. Window Transformation
  8. Appendix A: Connections
  9. Appendix B: Data Type Reference
  10. Appendix C: Sample Files

Big Data Streaming User Guide

Big Data Streaming User Guide

Prerequisites to Read From or Write to a Kerberised Kafka Cluster

Prerequisites to Read From or Write to a Kerberised Kafka Cluster

To read from or write to a Kerberised Kafka cluster, configure the default realm, KDC, Hadoop connection properties, and Kafka data object read or write data operation properties.
Before you read from or write to a Kerberized Kafka cluster, perform the following tasks:
  1. Ensure that you have the krb5.conf file for the Kerberised Kafka server.
  2. Configure the default realm and KDC. If the default
    /etc/krb5.conf
    file is not configured or you want to change the configuration, add the following lines to the
    /etc/krb5.conf
    file:
    [libdefaults] default_realm = <REALM NAME> dns_lookup_realm = false dns_lookup_kdc = false ticket_lifetime = 24h renew_lifetime = 7d forwardable = true [realms] <REALM NAME> = { kdc = <Location where KDC is installed> admin_server = <Location where KDC is installed> } [domain_realm] .<domain name or hostname> = <KERBEROS DOMAIN NAME> <domain name or hostname> = <KERBEROS DOMAIN NAME>
  3. To pass a static JAAS configuration file into the JVM using the
    java.security.auth.login.config
    property at runtime, perform the following tasks:
    1. Ensure that you have JAAS configuration file.
      For information about creating JAAS configuration and configuring Keytab for Kafka clients, see the Apache Kafka documentation at https://kafka.apache.org/0101/documentation/#security
      For example, the JAAS configuration file can contain the following lines of configuration:
      //Kafka Client Authentication. Used for client to kafka broker connection KafkaClient { com.sun.security.auth.module.Krb5LoginModule required doNotPrompt=true useKeyTab=true storeKey=true keyTab="<path to keytab file>/<keytab file name>" principal="<principal name>" client=true };
    2. Place the JAAS config file and keytab file in the same location on all the nodes of the Hadoop cluster.
      Informatica recommends that you place the files in a location that is accessible to all the nodes in the cluster. Example:
      /etc
      or
      /temp
    3. On the
      Spark Engine
      tab of the Hadoop connection properties, update the
      extraJavaOptions
      property of the executor and the driver in the
      Advanced Properties
      property. Click
      Edit
      and update the properties in the following format:
      infaspark.executor.extraJavaOptions=-Djava.security.egd=file:/dev/./urandom -XX:MaxMetaspaceSize=256M
      -Djavax.security.auth.useSubjectCredsOnly=true -Djava.security.krb5.conf=/<path to krb5.conf file>/krb5.conf -Djava.security.auth.login.config=/<path to jAAS config>/<kafka_client_jaas>.config
      infaspark.driver.cluster.mode.extraJavaOptions=-Djava.security.egd=file:/dev/./urandom -XX:MaxMetaspaceSize=256M
      -Djavax.security.auth.useSubjectCredsOnly=true -Djava.security.krb5.conf=/<path to krb5.conf file>/krb5.conf -Djava.security.auth.login.config=<path to jaas config>/<kafka_client_jaas>.config
    4. Configure the following properties in the data object read or write operation:
      • Data object read operation. Configure the
        Consumer Configuration Properties
        property in the advanced properties.
      • Data object write operation. Configure the
        Producer Configuration Properties
        property in the advanced properties.
      Specify the following value:
      security.protocol=SASL_PLAINTEXT,sasl.kerberos.service.name=kafka,sasl.mechanism=GSSAPI
  4. To embed the JAAS configuration in the
    sasl.jaas.config
    configuration property, perform the following tasks:
    1. On the
      Spark Engine
      tab of the Hadoop connection properties, update the
      extraJavaOptions
      property of the executor and the driver in the
      Advanced Properties
      property. Click
      Edit
      and update the properties in the following format:
      infaspark.executor.extraJavaOptions = -Djava.security.egd=file:/dev/./urandom -XX:MaxMetaspaceSize=256M -XX:+UseG1GC -XX:MaxGCPauseMillis=500
      -Djava.security.krb5.conf=<path to krb5.conf file>
      infaspark.driver.cluster.mode.extraJavaOptions = -Djava.security.egd=file:/dev/./urandom -XX:MaxMetaspaceSize=256M -XX:+UseG1GC -XX:MaxGCPauseMillis=500
      -Djava.security.krb5.conf=<path to krb5.conf file>
    2. Configure the following properties in the data object read or write operation:
      • Data object read operation. Configure the
        Consumer Configuration Properties
        property in the advanced properties.
      • Data object write operation. Configure the
        Producer Configuration Properties
        property in the advanced properties.
      Specify the following value:
      security.protocol=SASL_PLAINTEXT,sasl.kerberos.service.name=kafka,sasl.mechanism=GSSAPI,
      sasl.jaas.config=com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true storeKey=true doNotPrompt=true serviceName="<service_name>" keyTab="<location of keytab file>" client=true principal="<principal_name>";
The following image shows the
Advanced Properties
property in the Hadoop connection: