Table of Contents

Search

  1. Preface
  2. Introduction to Mass Ingestion
  3. Prepare
  4. Create
  5. Deploy
  6. Run
  7. Monitor
  8. Appendix A: infacmd mi Command Reference

Mass Ingestion Guide

Mass Ingestion Guide

DDL Query

DDL Query

When you configure a mass ingestion specification to ingest data to a Hive target, you can configure a custom DDL query to define how data from the source tables is loaded to the target tables.
You can define the DDL query to customize the target table or specify additional parameters. The target table contains the columns that you define in the DDL query.
To define a DDL query, use SQL statements and placeholders. Use the placeholders to fetch the table name, column list, and column names. The Data Integration Service substitutes the placeholders with actual values at run time according to the tables that you ingest. You must enclose the placeholders within curly brackets. For example, {INFA_TABLE_NAME}.
You can use the following placeholders:
INFA_TABLE_NAME
Fetches the target table name at run time.
INFA_COLUMN_LIST
Fetches a list of columns in the target table at run time.
For example, you might ingest a table
CUSTOMER
. To define how you want to ingest the table in the target, you can enter the following DDL query:
CREATE TABLE {INFA_TABLE_NAME} ({INFA_COLUMN_LIST}) CLUSTERED BY (LAST_NAME) INTO 10 BUCKETS STORED AS TEXT
At run time, the Data Integration Service substitutes
{INFA_TABLE_NAME}
with
CUSTOMER
, and it substitutes
{INFA_COLUMN_LIST}
with the list of columns that appear in the table
CUSTOMER
. The Data Integration Service might expand the DDL query to the following query:
CREATE TABLE CUSTOMER (FIRST_NAME STRING, LAST_NAME STRING, EMAIL STRING, GENDER STRING, CREDIT_CARD DECIMAL (38,0), CREDIT_CARD_TYPE STRING, STATE, STRING, USSTATE STRING, CITY STRING) CLUSTERED BY (LAST_NAME) INTO 10 BUCKETS STORED AS TEXT
You cannot use a placeholder to specify the partition columns and clustered by columns. When you specify the partition columns and clustered by columns, enter the column name in the DDL query.
The following image shows the option to configure a DDL query for a Hive target:
This image shows the Hive target table properties where DDL Query is selected. The options to configure the storage format and external location are replaced with the option to configure the query.

0 COMMENTS

We’d like to hear from you!