User Guide

11.0
- 10.0

Back Next

Load Overview

The load mechanism that Fast Clone uses depends on the target type and whether you use DataStreamer.

For target types other than Amazon Redshift, the Hadoop-based targets, and flat files, you can use the native load utilities to load data from output files or pipes to the target database.

For Greenplum, Netezza, Teradata, and Vertica targets, you can optionally use the DataStreamer add-on component to stream data to the target by using the appropriate Greenplum or Teradata load utility, Netezza external tables, or Vertica COPY or LCOPY command. This method is faster than loading data from output files or pipes.

For Amazon Redshift targets, Fast Clone always uses DataStreamer to write source data to the Amazon Simple Storage Service (Amazon S3) and then issues a copy command to copy the data to the Amazon Redshift target. You do not need to manually enable the use of DataStreamer for Amazon Redshift targets.

For Hadoop-based and flat file targets, Fast Clone loads data directly to the targets. Fast Clone does not generate load scripts and control files.

The Fast Clone cloning configuration file contains the target connection information and load settings that are required to load data.

Loading Data from Output Files or Pipes

If you unload data to output files or pipes, Fast Clone generates a load script that you run to load data to the target. In the Fast Clone Console, you can specify the script base name in the

Load script base name

field on the

Runtime Settings

tab >

File Locations

view or accept the source table owner or schema as the base name. Fast Clone adds the extension.cmd (on Windows) or .sh (on Linux or Windows) to the script name. The generated load script will invoke the appropriate load utility or command-line tool for the target type. The following table identifies the load utilities that Fast Clone uses:

Target Type	Load Utility
Greenplum	gpload or gpfdist utility
Microsoft SQL Server	osql utility
Netezza	nzload utility
Oracle	SQL*Loader (sqlldr) utility
Teradata	Teradata FastLoad Teradata MultiLoad Teradata Parallel Transporter (TPT) Load Operator, Update Operator, or Stream Operator
Vertica	COPY command on the server side LCOPY command on the client side

To load data from output files or pipes to a target, perform the following tasks:

Create a configuration file.

Run the data unload job with the configuration file.

Generate the target tables.

Optional. Validate the target schema.

Optional. Copy the output files to the target system if you are not running a Fast Clone Server on the target and you enabled that Fast Clone Server to accept output files from a remote system.

Optional. Disable or drop constraints on the target.

Run the load script.

Optional. Enable or add the constraints that you previously disabled or dropped on the target.

Loading Data with DataStreamer

If you use DataStreamer, Fast Clone does not generate output files and load scripts. DataStreamer streams data directly to the target by using the following load mechanisms:

Target Type	Load Mechanism
Amazon Redshift	Data is sent to temporary files in the Amazon Simple Storage Service (Amazon S3) by using the PostgreSQL ODBC driver on Windows or the DataDirect ODBC driver on Linux and UNIX. Then the AWS COPY command is used to move data from the S3 data files to the target.
Greenplum	gpfdist utility
Netezza	Netezza external tables and Netezza ODBC driver
Teradata	Teradata Parallel Transporter (TPT) Load Operator, Update Operator, or Stream Operator
Vertica	COPY command on the server side LCOPY command on the client side

To load data with DataStreamer, perform the following tasks:

Install connectivity software to connect to the target on the system where you run data unload jobs.

For Teradata targets, install the TPT libraries and JDBC.

For Netezza targets, install the Netezza ODBC driver.

For Amazon Redshift targets, if you run Fast Clone on Windows, install the PostgreSQL ODBC driver.

For more information, see the

Informatica Fast Clone Installation Guide.

Create a configuration file.

For Greenplum targets, select the

Enable Greenplum gpfdist Direct Data Stream

option on the

Runtime Settings

tab >

Greenplum Load Settings

view.

For Netezza targets, select the

Enable Netezza Direct Data Stream

option on the

Runtime Settings

tab >

Netezza Load Settings

view.

For Teradata targets, select the

Enable Teradata Direct Data Stream

option on the

Runtime Settings

tab >

Teradata Load Settings

view.

For Vertica targets, select the

Enable DataStream

option on the

Runtime Settings

tab >

Vertica Load Settings

view.

If you create the configuration file for Greenplum, Netezza, and Teradata targets in a text editor, set the direct_data_stream parameter to true.

For Amazon Redshift, Greenplum, Netezza, Teradata, and Vertica targets, clear the

Suppress trailing null columns option

on the

Runtime Settings

tab >

Format Settings

view of the Fast Clone Console. If you create the configuration file in a text editor, set the suppress_trailing_nullcolls parameter to false.

Generate the target tables.

Optional. Validate the target schema.

Optional. Disable or drop constraints on the target.

Run the data unload job with the configuration file.

Optional. Enable or add the constraints that you previously disabled or dropped on the target.

Loading Data to Hadoop and Flat File Targets

For Hadoop-based and flat file targets, Fast Clone does not generate load scripts and control files because Fast Clone loads the source data directly to the targets. The Hadoop-based targets are Cloudera, Hive, and Hortonworks.

To load data to Hadoop-based and flat file targets, perform the following tasks:

Create a configuration file.

For Hive targets, generate the target tables.

Optional. Validate the target schema.

Run the data unload job with the configuration file.

Loading Data to a Target

Download Guide

Watch

Comments

Communities

Knowledge Base

Success Portal