Table of Contents

Search

  1. Preface
  2. Database Ingestion and Replication

Database Ingestion and Replication

Database Ingestion and Replication

Amazon S3, Flat File, Google Cloud Storage, Microsoft Azure Data Lake Storage, Microsoft Fabric OneLake, and Oracle Cloud Object Storage targets

Amazon S3, Flat File, Google Cloud Storage, Microsoft Azure Data Lake Storage, Microsoft Fabric OneLake, and Oracle Cloud Object Storage targets

The following list identifies considerations for using Amazon S3, Flat File, Google Cloud Storage, Microsoft Azure Data Lake Storage, Microsoft Fabric OneLake, and Oracle Cloud Infrastructure (OCI) Object Storage targets:
  • When you define a
    database ingestion and replication
    task that has an Amazon S3, Google Cloud Storage, Microsoft Azure Data Lake Storage, Microsoft Fabric OneLake, or Oracle Cloud Object Storage target, you can select CSV, Avro, or Parquet as the format for the generated output files that contain the source data to be applied to the target. For flat file targets, you can select either CSV or Avro as the output file format.
  • If you select the CSV output format,
    Database Ingestion and Replication
    creates the following files on the target for each source table:
    • A schema.ini file that describes the schema and includes some settings for the output file on the target.
    • One or multiple output files for each source table, which contain the source data.
      Database Ingestion and Replication
      names these text files based on the name of the source table with an appended date and time.
    The schema.ini file lists a sequence of columns for the rows in the corresponding output file. The following table describes the columns in the schema.ini file:
    Column
    Description
    ColNameHeader
    Indicates whether the source data files include column headers.
    Format
    Describes the format of the output files.
    Database Ingestion and Replication
    uses a comma (,) to delimit column values.
    CharacterSet
    Specifies the character set that is used for output files.
    Database Ingestion and Replication
    generates the files in the UTF-8 character set.
    COL
    <sequence_number>
    The name and data type of the column.
    • If you selected any of the
      Add Operation...
      properties under
      Advanced
      on the
      Target
      page of the task wizard, the list of columns includes metadata columns for the operation type, time, owner, or transaction ID.
    • If you selected the
      Add Before Images
      check box, for each source column, the job creates a
      column_name
      _OLD column for UNDO data and
      column_name
      _NEW column for REDO data.
    You should not edit the schema.ini file.
  • If you select the Avro output format, you can select an Avro format type, a file compression type, an Avro data compression type, and the directory that stores the Avro schema definitions generated for each source table. The schema definition files have the following naming pattern:
    schemaname
    _
    tablename
    .txt.
  • If you select the Parquet output format, you can optionally select a compression type that Parquet supports.
  • On Flat File, Microsoft Azure Data Lake Storage, and Microsoft Fabric OneLake targets,
    Database Ingestion and Replication
    creates an empty directory for each empty source table.
    Database Ingestion and Replication
    does not create empty directories on Amazon S3, Google Cloud Storage, and Oracle Cloud Object Storage targets.
  • If you do not specify an access key and secret key in the Amazon S3 connection properties,
    Database Ingestion and Replication
    tries to find AWS credentials by using the default credential provider chain that is implemented by the DefaultAWSCredentialsProviderChain class. For more information, see the
    Amazon Web Services
    documentation.
  • If
    database ingestion and replication
    incremental load and combined initial and incremental load jobs replicate Update operations that change primary key values on the source to any of these targets that use the CSV output format, the job processes each Update record as two records on the target: a Delete followed by an Insert. The Delete contains the before image. The Insert contains the after image for the same row.
    For Update operations that do not change primary key values,
    database ingestion and replication
    jobs process each Update as one operation and writes only the after image to the target.
    If source tables do not have primary keys,
    Database Ingestion and Replication
    treats the tables as if all columns were part of the primary key. In this case, each Update operation is processed as a Delete followed by an Insert.
  • Database Ingestion and Replication
    jobs unload binary data in hexadecimal format when the data is sent to an Amazon S3, Flat File, Microsoft Azure Data Lake Storage, or Microsoft Fabric OneLake target. Each hexadecimal column value has the "0x" prefix. If you want to use output files to load the data to a target, you might need to edit the files to remove the "0x" prefixes.
  • If you run a Secure Agent service on Windows and plan to use Flat File connections, ensure that the logon account for the Secure Agent is an Administrator account. Otherwise, an error occurs when you try to configure a Flat File connection.

0 COMMENTS

We’d like to hear from you!