Database Ingestion and Replication

Back Next

Amazon S3, Flat File, Google Cloud Storage, Microsoft Azure Data Lake Storage, Microsoft Fabric OneLake, and Oracle Cloud Object Storage targets

The following list identifies considerations for using Amazon S3, Flat File, Google Cloud Storage, Microsoft Azure Data Lake Storage, Microsoft Fabric OneLake, and Oracle Cloud Infrastructure (OCI) Object Storage targets:

When you define a

database ingestion and replication

task that has an Amazon S3, Google Cloud Storage, Microsoft Azure Data Lake Storage, Microsoft Fabric OneLake, or Oracle Cloud Object Storage target, you can select CSV, Avro, or Parquet as the format for the generated output files that contain the source data to be applied to the target. For flat file targets, you can select either CSV or Avro as the output file format.

If you select the CSV output format,

Database Ingestion and Replication

creates the following files on the target for each source table:

A schema.ini file that describes the schema and includes some settings for the output file on the target.

One or multiple output files for each source table, which contain the source data.

Database Ingestion and Replication

names these text files based on the name of the source table with an appended date and time.

The schema.ini file lists a sequence of columns for the rows in the corresponding output file. The following table describes the columns in the schema.ini file:

Column	Description
ColNameHeader	Indicates whether the source data files include column headers.
Format	Describes the format of the output files. Database Ingestion and Replication uses a comma (,) to delimit column values.
CharacterSet	Specifies the character set that is used for output files. Database Ingestion and Replication generates the files in the UTF-8 character set.
COL`<sequence_number>`	The name and data type of the column. If you selected any of the Add Operation... properties under Advanced on the Target page of the task wizard, the list of columns includes metadata columns for the operation type, time, owner, or transaction ID. If you selected the Add Before Images check box, for each source column, the job creates a `column_name`_OLD column for UNDO data and `column_name`_NEW column for REDO data.

You should not edit the schema.ini file.

If you select the Avro output format, you can select an Avro format type, a file compression type, an Avro data compression type, and the directory that stores the Avro schema definitions generated for each source table. The schema definition files have the following naming pattern: schemaname_tablename.txt.

If you select the Parquet output format, you can optionally select a compression type that Parquet supports.

On Flat File, Microsoft Azure Data Lake Storage, and Microsoft Fabric OneLake targets,

Database Ingestion and Replication

creates an empty directory for each empty source table.

Database Ingestion and Replication

does not create empty directories on Amazon S3, Google Cloud Storage, and Oracle Cloud Object Storage targets.

If you do not specify an access key and secret key in the Amazon S3 connection properties,

Database Ingestion and Replication

tries to find AWS credentials by using the default credential provider chain that is implemented by the DefaultAWSCredentialsProviderChain class. For more information, see the

Amazon Web Services

documentation.

database ingestion and replication

incremental load and combined initial and incremental load jobs replicate Update operations that change primary key values on the source to any of these targets that use the CSV output format, the job processes each Update record as two records on the target: a Delete followed by an Insert. The Delete contains the before image. The Insert contains the after image for the same row.

For Update operations that do not change primary key values,

database ingestion and replication

jobs process each Update as one operation and writes only the after image to the target.

If source tables do not have primary keys,

Database Ingestion and Replication

treats the tables as if all columns were part of the primary key. In this case, each Update operation is processed as a Delete followed by an Insert.

Database Ingestion and Replication

jobs unload binary data in hexadecimal format when the data is sent to an Amazon S3, Flat File, Microsoft Azure Data Lake Storage, or Microsoft Fabric OneLake target. Each hexadecimal column value has the "0x" prefix. If you want to use output files to load the data to a target, you might need to edit the files to remove the "0x" prefixes.

If you run a Secure Agent service on Windows and plan to use Flat File connections, ensure that the logon account for the Secure Agent is an Administrator account. Otherwise, an error occurs when you try to configure a Flat File connection.

Database Ingestion and Replication targets - preparation and usage

Download Guide

Watch

Comments

Cloud Mass Ingestion Homepage