Preface
Data Replication Overview
- Product Overview
- Data Replication Usage Scenarios
- Data Replication Sources and Targets
- Data Replication Architecture and Components
- Stages of Replication Processing
- Alternative Deployment Topologies
- Replication Configurations
Understanding Data Replication
- Overview
- Replicating Source Tables That Do Not Have a Primary Key Definition
- Start Point for the Extractor Task
- Apply Processing
- Checkpointing
- Recovery
  - Recovery Tables
- Tcl and SQL Scripts for Advanced InitialSync and Applier Processing of Data
- Database Character Set Conversion
- Replication of Database-Generated Values
- Generated Virtual Indexes
- Datatype Conversion Rules
- DDL Replication
Sources - Preparation and Replication Considerations
- DB2 for Linux, UNIX, and Windows Sources
- Microsoft SQL Server Sources
- MySQL Sources
- Oracle Sources
Targets - Preparation and Replication Considerations
- Amazon Redshift Targets
- Apache Kafka Targets
  - Preparing Apache Kafka Target Systems
  - Replication Considerations for Apache Kafka Targets
- Cloudera and Hortonworks Targets
  - Preparing Cloudera and Hortonworks Target Systems
- DB2 for Linux, UNIX, and Windows Targets
- Greenplum Targets
- MemSQL Targets
- Microsoft SQL Server Targets
- MySQL Targets
- Netezza Targets
- Oracle Targets
- PostgreSQL Targets
- Teradata Targets
- Vertica Targets
Starting the Server Manager
- Overview
- Installing the Server Manager as a Service on Windows
- Starting the Server Manager as a Windows Service
- Starting the Server Manager as a Daemon on Linux or UNIX
- Manually Starting the Server Manager
- Stopping a Server Manager Service or Daemon
- Uninstalling the Server Manager Service on Windows
Getting Started with the Data Replication Console
- Data Replication Console Interface
- Starting the Data Replication Console
Defining and Managing Server Manager Main Servers and Subservers
- Server Manager Main Server and Subservers
- Defining the Main Server and Its Subservers
- Editing Connection Information for a Main Server or Subserver
- Editing Microsoft SQL Server Instance Settings
- Editing Properties for the Main Server or a Subserver
- Viewing Information About the Server Manager System
- Configuring the Server Manager for HTTPS Communication
- Configuring the Server Manager Main Server to Run with NAT
- Associating a Subserver with Another Main Server
- Deleting Subservers
Creating and Managing User Accounts
- User Account Overview
- Users and Privileges
- Server Manager Security Policies
- Creating a User Account
- Changing the Password for Your User Account - Replication User
- Changing the Password for a User Account - idradmin User
- Unlocking a User Account
- Resetting the Password for the idradmin Account
Creating and Managing Connections
- Connections Overview
- Creating a Source or Target Connection from the Server Manager Tab
- Editing a Source or Target Connection
- Assigning a Different Source or Target Connection to a Configuration
Creating Replication Configurations
- Replication Configuration
- Task Flow: Creating a Replication Configuration
- Defining the Source Database
  - Configuring a Connection to an Oracle ASM Instance
  - Configuring Connections to Oracle RAC Sources for High Availability
    - Connecting to an Oracle RAC by Using Custom Connection Strings
    - Connecting to an Oracle RAC by Using a Virtual IP Address or Host Name
- Defining the Target Database
- Generating Target Tables and Audit Log Tables
- Generating Avro Schemas for Apache Kafka Consumers
- Handling Source Tables with Long Table or Column Names
  - Strategies for Handling Long Table Names
    - Editing the Audit Log Table Suffix
    - Manually Editing Target Table Names in an SQL Script
  - Strategies for Handling Long Column Names
- Mapping Source and Target Tables
- Defining Source Table Indexes
- Enabling Replication of DDL Changes at the Schema and Table Levels
- Customizing Apply Settings for Target Tables
- Configuring the Start Points for Extractor and Applier Tasks
- Configuring Conflict Resolution
  - Example of Configuring a MAXIMUM Resolution Strategy for Update Conflicts
  - Example of Configuring Custom Conflict Resolution
- Customizing Column Mappings
  - Filtering Column Data
- Adding Tcl and SQL Expressions
- Specifying the Database Logs from Which to Extract Data
- Configuring Runtime Settings
- Configuring Message Logging
- Saving Replication Configurations to the Main Server Manager
Materializing Targets with InitialSync
- InitialSync Overview
- Source Connectivity for Data Unload Operations
- Target Connectivity for Data Load Operations
- Sync Point Value
- InitialSync Handling of Target Table Constraints
- Considerations for Running InitialSync
- Task Flow: Using InitialSync to Materialize a Target
Scheduling and Running Replication Tasks
- Methods of Running Replication Tasks
- Types of Replication Tasks
- Schedule and Task Statuses
- Conflicting Replication Tasks
- Running Replication Executables Manually from the Data Replication Console
- Scheduling Replication Tasks
Implementing Advanced Replication Topologies
- Advanced Replication Topologies
- Configuring Continuous Replication
- Configuring Data Replication from One Source to Multiple Targets
- Configuring Bidirectional Replication
- Configuring Cascade Replication
- Loopback Avoidance for Replicated Data
Monitoring Data Replication
- Types of Monitoring Information
- Replication Statistics
- Intermediate Files
- Task Execution Logs
- Server Manager Logs
  - Viewing Server Manager Logs
- User Notifications
- Skipped Transaction Records
- Managing Open Transactions
Managing Replication Configurations
- Configuration Management Tasks
- Switching to Read or Edit Mode for a Replication Configuration
- Editing a Replication Configuration
- Changing the Server Manager Associated with a Replication Configuration
- Clearing User Replication Settings
- Managing Database Supplemental Logging
- Deploying Replication Configurations
- Generating a Reverse-Replication Configuration
- Exporting a Configuration File
- Importing a Configuration File
- Cleaning Replication Processing Information for a Configuration
  - Performing a Clean Operation on a Configuration
- Viewing Earlier Revisions of a Replication Configuration
- Viewing a List of Processed Database Logs
Handling Replication Environment Changes and Failures
- Manually Changing the Source Table Structure After Running Data Replication
- Updating an Avro Schema for Kafka Targets After Running Data Replication
- Adding Table Mappings Manually After Running Data Replication
- Resuming Replication After Upgrading a DB2 for Linux, UNIX, and Windows Source Database
- Resuming Replication After Upgrading a Microsoft SQL Server Source Database
- Handling Applier Failures
Troubleshooting
- Collecting Diagnostic Data for Troubleshooting
- Common Replication Problems
Data Replication Files and Subdirectories
- Files and Subdirectories
- Data Replication Script Files
- Executables Called from the Data Replication Console or Scripts
- Default.cfg File
- Other Key Files
- Subdirectories
Data Replication Runtime Parameters
Command Line Parameters for Data Replication Components
- About Command Line Parameters
- Command Line Parameters for InitialSync
- Command Line Parameters for the Extractor
- Command Line Parameters for the Applier
- Command Line Parameters for the Server Manager
Updating Configurations in the Replication Configuration CLI
- Replication Configuration CLI Overview
- Updating Source and Target Metadata for a Replication Configuration in the CLI
  - Updating Source and Target Metadata for a Configuration in Interactive Mode
  - Updating Source and Target Metadata for a Configuration in Non-interactive Mode
- Replication Configuration CLI Commands
DDL Statements for Manually Creating Recovery Tables
Sample Scripts for Enabling or Disabling SQL Server Change Data Capture
- Microsoft SQL Server Enterprise Edition
Glossary
- Applier
- Applier task
- apply cycle
- Audit Apply
- audit log table
- bidirectional replication
- binary log
- calculated columns
- cascade replication
- change data capture
- checkpoint processing
- Command Line Interface
- configuration file
- continuous replication
- Copy File task
- data files
- data warehouse appliance
- Edit mode
- External task
- Extractor
- Extractor task
- Flat File target
- global transaction
- heterogeneous replication
- Informatica Data Replication Console
- initial materialization
- InitialSync
- InitialSync task
- intermediate files
- log coordinates
- loopback avoidance
- Merge Apply
- Microsoft SQL Server Backup task
- primary target
- CDC Publisher
- Read mode
- recovery table
- Replication Configuration Command Line Interface
- replication configuration file
- replication statistics
- replication tasks
- routing
- secondary target
- Send File task
- Server Manager
- Server Manager Command Line Interface
- Server Manager Main server
- Server Manager subserver
- SQL Apply
- SQL Script Engine
- staging table
- Start Point
- subtask threads
- supplemental logging
- Sync Point
- Tcl expression
- Tcl Script Engine
- transaction files
- transactional replication
- virtual column
- virtual index

User Guide

9.8.0 HotFix 2

Back Next

Preparing Cloudera and Hortonworks Target Systems

To replicate change data to Cloudera and Hortonworks targets on a Hadoop Distributed File System (HDFS), you must complete several prerequisite tasks to prepare the systems where the Applier and Data Replication Console run.

Install the 64-bit Java Development Kit (JDK) 1.7 or 1.8 if you have not done so already.

For Cloudera or Hortonworks targets that use Kerberos authentication, ensure that the JDK 1.7u65 or later is installed.

Define the JAVA_HOME environment variable to point to the root Java installation directory.

Add a Java library to the system path.

On Windows, add the directory that contains the jvm.dll library to the PATH environment variable. For example, use the following command:

PATH=%PATH%;%JAVA_HOME%\jre\bin\server

On Linux and UNIX, add the directory that contains the libjvm.so library to the library path environment variable for your operating system. The library path environment variables are:

LD_LIBRARY_PATH for HP-UX and Linux systems

LD_LIBRARY_PATH_64 for Solaris systems

LIBPATH for AIX systems

For example, use the following command:

LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$JAVA_HOME/jre/lib/amd64/server

On AIX, add the network API library, libnet.so, to the LIBPATH environment variable. Use the following command:

LIBPATH=$LIBPATH:$JAVA_HOME/jre/lib/ppc_64

On Windows, install WinUtils by performing the following substeps:

Download WinUtils from the following web site:

https://github.com/steveloughran/winutils

Extract the .zip file.

Add the bin subdirectory that contains the WinUtils executable file and required .dll libraries to the HADOOP_HOME environment variable.

On Windows, for Cloudera or Hortonworks targets that use Kerberos authentication, define the DBSYNC_KERBEROS_CACHE_NAME environment variable. The environment variable points to the file that contains Kerberos credential cache.

You can get the path to the Kerberos credential cache folder from the KRB5CCNAME environment variable.

Download the hadoop_libs.zip file that Data Replication provides and that contains the .jar files. Extract this zip file into the DataReplication_installation directory.

Verify that the DataReplication_installation/lib directory contains the hadoop subdirectory.

For Cloudera and Hortonworks targets, copy the following configuration files to the DataReplication_installation/lib/hadoop/hadoop_distribution directory:

hdfs-site.xml

core-site.xml

yarn-site.xml

The yarn-site.xml file is required only if the target uses HDFS high availability.

Rename Saved Search

Table of Contents

User Guide

User Guide

Preparing Cloudera and Hortonworks Target Systems

Preparing Cloudera and Hortonworks Target Systems