Table of Contents

Search

  1. Preface
  2. Part 1: Hadoop Integration
  3. Part 2: Databricks Integration
  4. Appendix A: Connections Reference

Blaze Configuration

Blaze Configuration

The following table describes the connection properties that you configure for the Blaze engine:
Property
Description
Blaze Staging Directory
The HDFS file path of the directory that the Blaze engine uses to store temporary files. Verify that the directory exists. The YARN user, Blaze engine user, and mapping impersonation user must have write permission on this directory.
Default is
/blaze/workdir
. If you clear this property, the staging files are written to the Hadoop staging directory
/tmp/blaze_<user name>
.
Blaze User Name
The owner of the Blaze service and Blaze service logs.
When the Hadoop cluster uses Kerberos authentication, the default user is the Data Integration Service SPN user. When the Hadoop cluster does not use Kerberos authentication and the Blaze user is not configured, the default user is the Data Integration Service user.
Minimum Port
The minimum value for the port number range for the Blaze engine. Default is 12300.
Maximum Port
The maximum value for the port number range for the Blaze engine. Default is 12600.
YARN Queue Name
The YARN scheduler queue name used by the Blaze engine that specifies available resources on a cluster.
If YARN preemption is enabled on the cluster, verify with the Hadoop administrator that preemption is disabled on the queue associated with the Blaze engine.
Blaze Job Monitor Address
The host name and port number for the Blaze Job Monitor.
Use the following format:
<hostname>:<port>
Where
  • <hostname> is the host name or IP address of the Blaze Job Monitor server.
  • <port> is the port on which the Blaze Job Monitor listens for remote procedure calls (RPC).
For example, enter:
myhostname:9080
Blaze YARN Node Label
Node label that determines the node on the Hadoop cluster where the Blaze engine runs. If you do not specify a node label, the Blaze engine runs on the nodes in the default partition.
If the Hadoop cluster supports logical operators for node labels, you can specify a list of node labels. To list the node labels, use the operators
&&
(AND),
||
(OR), and
!
(NOT).
You cannot use node labels on a Cloudera CDH cluster.
Advanced Properties
List of advanced properties that are unique to the Blaze engine. The advanced properties include a list of default properties.
You can configure run-time properties for the Hadoop environment in the Data Integration Service, the Hadoop connection, and in the mapping. You can override a property configured at a high level by setting the value at a lower level. For example, if you configure a property in the Data Integration Service custom properties, you can override it in the Hadoop connection or in the mapping. The Data Integration Service processes property overrides based on the following priorities:
  1. Mapping custom properties set using
    infacmd ms runMapping
    with the
    -cp
    option
  2. Mapping run-time properties for the Hadoop environment
  3. Hadoop connection advanced properties for run-time engines
  4. Hadoop connection advanced general properties, environment variables, and classpaths
  5. Data Integration Service custom properties
When a mapping uses Hive Server 2 to run a job or parts of a job, you cannot override properties that are configured on the cluster level in preSQL or post-SQL queries or SQL override statements.
Workaround: Instead of attempting to use the cluster configuration on the domain to override cluster properties, pass the override settings to the JDBC URL. For example:
beeline -u "jdbc:hive2://<domain host>:<port_number>/tpch_text_100" --hiveconf hive.execution.engine=tez
Informatica does not recommend changing these property values before you consult with third-party documentation, Informatica documentation, or Informatica Global Customer Support. If you change a value without knowledge of the property, you might experience performance degradation or other unexpected results.