Table of Contents

Search

  1. Preface
  2. Fast Clone Overview
  3. Configuring and Using the Fast Clone Server
  4. Creating Cloning Configuration Files in the Fast Clone Console
  5. Unloading Data from the Source Database
  6. Loading Data to a Target
  7. Remote Configuration Management
  8. Fast Clone Command Line Interface
  9. Troubleshooting
  10. Fast Clone Configuration File Parameters
  11. Glossary

User Guide

User Guide

Load Overview

Load Overview

The load mechanism that Fast Clone uses depends on the target type and whether you use DataStreamer.
  • For target types other than Amazon Redshift, the Hadoop-based targets, and flat files, you can use the native load utilities to load data from output files or pipes to the target database.
  • For Greenplum, Netezza, Teradata, and Vertica targets, you can optionally use the DataStreamer add-on component to stream data to the target by using the appropriate Greenplum or Teradata load utility, Netezza external tables, or Vertica COPY or LCOPY command. This method is faster than loading data from output files or pipes.
    For Amazon Redshift targets, Fast Clone always uses DataStreamer to write source data to the Amazon Simple Storage Service (Amazon S3) and then issues a copy command to copy the data to the Amazon Redshift target. You do not need to manually enable the use of DataStreamer for Amazon Redshift targets.
  • For Hadoop-based and flat file targets, Fast Clone loads data directly to the targets. Fast Clone does not generate load scripts and control files.
The Fast Clone cloning configuration file contains the target connection information and load settings that are required to load data.

Loading Data from Output Files or Pipes

If you unload data to output files or pipes, Fast Clone generates a load script that you run to load data to the target. In the Fast Clone Console, you can specify the script base name in the
Load script base name
field on the
Runtime Settings
tab >
File Locations
view or accept the source table owner or schema as the base name. Fast Clone adds the extension.cmd (on Windows) or .sh (on Linux or Windows) to the script name. The generated load script will invoke the appropriate load utility or command-line tool for the target type. The following table identifies the load utilities that Fast Clone uses:
Target Type
Load Utility
Greenplum
gpload or gpfdist utility
Microsoft SQL Server
osql utility
Netezza
nzload utility
Oracle
SQL*Loader (sqlldr) utility
Teradata
  • Teradata FastLoad
  • Teradata MultiLoad
  • Teradata Parallel Transporter (TPT) Load Operator, Update Operator, or Stream Operator
Vertica
COPY command on the server side
LCOPY command on the client side
To load data from output files or pipes to a target, perform the following tasks:
  1. Create a configuration file.
  2. Run the data unload job with the configuration file.
  3. Generate the target tables.
  4. Optional. Validate the target schema.
  5. Optional. Copy the output files to the target system if you are not running a Fast Clone Server on the target and you enabled that Fast Clone Server to accept output files from a remote system.
  6. Optional. Disable or drop constraints on the target.
  7. Run the load script.
  8. Optional. Enable or add the constraints that you previously disabled or dropped on the target.

Loading Data with DataStreamer

If you use DataStreamer, Fast Clone does not generate output files and load scripts. DataStreamer streams data directly to the target by using the following load mechanisms:
Target Type
Load Mechanism
Amazon Redshift
Data is sent to temporary files in the Amazon Simple Storage Service (Amazon S3) by using the PostgreSQL ODBC driver on Windows or the DataDirect ODBC driver on Linux and UNIX. Then the AWS COPY command is used to move data from the S3 data files to the target.
Greenplum
gpfdist utility
Netezza
Netezza external tables and Netezza ODBC driver
Teradata
Teradata Parallel Transporter (TPT) Load Operator, Update Operator, or Stream Operator
Vertica
COPY command on the server side
LCOPY command on the client side
To load data with DataStreamer, perform the following tasks:
  1. Install connectivity software to connect to the target on the system where you run data unload jobs.
    • For Teradata targets, install the TPT libraries and JDBC.
    • For Netezza targets, install the Netezza ODBC driver.
    • For Amazon Redshift targets, if you run Fast Clone on Windows, install the PostgreSQL ODBC driver.
    For more information, see the
    Informatica Fast Clone Installation Guide.
  2. Create a configuration file.
    • For Greenplum targets, select the
      Enable Greenplum gpfdist Direct Data Stream
      option on the
      Runtime Settings
      tab >
      Greenplum Load Settings
      view.
    • For Netezza targets, select the
      Enable Netezza Direct Data Stream
      option on the
      Runtime Settings
      tab >
      Netezza Load Settings
      view.
    • For Teradata targets, select the
      Enable Teradata Direct Data Stream
      option on the
      Runtime Settings
      tab >
      Teradata Load Settings
      view.
    • For Vertica targets, select the
      Enable DataStream
      option on the
      Runtime Settings
      tab >
      Vertica Load Settings
      view.
    If you create the configuration file for Greenplum, Netezza, and Teradata targets in a text editor, set the direct_data_stream parameter to true.
  3. For Amazon Redshift, Greenplum, Netezza, Teradata, and Vertica targets, clear the
    Suppress trailing null columns option
    on the
    Runtime Settings
    tab >
    Format Settings
    view of the Fast Clone Console. If you create the configuration file in a text editor, set the suppress_trailing_nullcolls parameter to false.
  4. Generate the target tables.
  5. Optional. Validate the target schema.
  6. Optional. Disable or drop constraints on the target.
  7. Run the data unload job with the configuration file.
  8. Optional. Enable or add the constraints that you previously disabled or dropped on the target.

Loading Data to Hadoop and Flat File Targets

For Hadoop-based and flat file targets, Fast Clone does not generate load scripts and control files because Fast Clone loads the source data directly to the targets. The Hadoop-based targets are Cloudera, Hive, and Hortonworks.
To load data to Hadoop-based and flat file targets, perform the following tasks:
  1. Create a configuration file.
  2. For Hive targets, generate the target tables.
  3. Optional. Validate the target schema.
  4. Run the data unload job with the configuration file.

0 COMMENTS

We’d like to hear from you!