Table of Contents

Search

  1. Preface
  2. Introduction to Informatica MDM - Relate 360
  3. Linking Batch Data
  4. Tokenizing Batch Data
  5. Processing Streaming Data
  6. Creating Relationship Graph
  7. Loading Linked and Consolidated Data into Hive
  8. Searching Data
  9. Monitoring the Batch Jobs
  10. Troubleshooting
  11. Glossary

User Guide

User Guide

Troubleshooting Batch Jobs

Troubleshooting Batch Jobs

If you encounter any issues with the batch jobs, use the following information to troubleshoot.

The region splitter job takes a longer time to run.

The region splitter job uses random sampling to analyze the input data. The random sampling is based on the default values of the following parameters:
--sortmaxsamples
Maximum number of samples that the job uses to analyze the input data. Default is 100,000.
--sortmaxsplitssampled
Maximum number of splits that the job uses to extract the sample data. Default is 20.
--sortsampleprobability
Frequency to sample the input data in each split. Specify a value between 0.0 and 1.0. A higher value results in dense sampling of each split and a lower value results in sparse sampling of each split. Default is 1.0.
If the input data is uniformly distributed, you can use a small sample size, few splits, and a higher sampling frequency to reduce the running time of the job. If the input data is skewed, you can use a large sample size, more number of splits, and a lower sampling frequency to reduce the running time of the job.
For example, the following command runs the region splitter job with the additional parameters:
run_hbase_region_analysis.sh --config=/usr/local/conf/config_big.xml --input=/usr/hdfs/workingdir/MDMBDRMInitialBatch/MDMBDE0063_1602999447744334391/output/dir/pass-join --hdfsdir=/usr/hdfs/workingdir --rule=/usr/local/conf/matching_rules.xml --regions=14 --sortmaxsamples=200000 --sortmaxsplitssampled=30 --sortsampleprobability=0.5

When you rerun the Hive enabler job with the same Hive-related options, the job fails.

When you run the Hive enabler job for the first time, the job creates an output table and an internal table in Hive. When you rerun the Hive enabler job with the same Hive-related options, you get an error in the following format:
AlreadyExistsException(message:Table <Table Name>|<Table Name>_internal already exists)
where
Table Name
indicates the output table, and
<Table Name>_internal
indicates the internal table. For example,
mdmbdrm002_emp
indicates the output table and
mdmbdrm002_emp_internal
indicates the internal table.
To rerun the Hive enabler job with the same Hive-related options, perform the following tasks:
  1. If you ran the Hive enabler job without the
    --linkHBase
    parameter, drop the output table as a view.
  2. If you ran the Hive enabler job with the
    --linkHBase
    parameter, drop the output table.
  3. If the
    <Table Name>_internal
    table exists, drop it.
  4. Rerun the Hive enabler job.

In an encrypted environment, when you run the Hive enabler job, the job fails.

In an encrypted environment, when you run the Hive enabler job, you get the following error:
ERROR transport.TSaslTransport: SASL negotiation failure javax.security.sasl.SaslException: No common protection layer between client and server
When you do not specify the authentication type that your environment uses in the configuration file, you get this error.
To fix this issue, in the
HiveConfiguration
section of the configuration file, specify the
sasl.qop
parameter in the
JDBCUrl
parameter.
For more information about the
JDBCUrl
parameter, see the
Informatica MDM - Relate 360
Installation and Configuration Guide
.

When you run the load clustering job, the job fails.

If the repository configuration in the configuration file is not in sync with the
hbase-site.xml
file, the job fails. You can find the
hbase-site.xml
file in the following directory:
${HBASE_HOME}/conf/hbase-site.xml
Ensure that the values that you specify in the
HBASEConfiguration
section of the configuration file are in sync with the values in the
hbase-site.xml
file.
For more information about the repository configuration, see the
Informatica MDM - Relate 360
Installation and Configuration Guide
.

0 COMMENTS

We’d like to hear from you!