Table of Contents

Search

  1. About the Enterprise Data Preparation Administrator Guide
  2. Introduction to Enterprise Data Preparation Administration
  3. Getting Started
  4. Administration Process
  5. User Account Setup
  6. Search Configuration
  7. Roles, Privileges, and Profiles
  8. Data Asset Access and Publication Management
  9. Masking Sensitive Data
  10. Monitoring Enterprise Data Preparation
  11. Backing Up and Restoring Enterprise Data Preparation
  12. Managing the Data Lakehouse
  13. Schedule Export, Import and Publish Activities
  14. Interactive Data Preparation Service
  15. Enterprise Data Preparation Service

Enterprise Data Preparation Administrator Guide

Enterprise Data Preparation Administrator Guide

Operationalizing Mappings Generated for Avro, JSON, and Parquet Files

Operationalizing Mappings Generated for Avro, JSON, and Parquet Files

To operationalize a mapping generated for an Avro, JSON Lines, or Parquet file during publication to the data lake, you must run queries to modify Read transformations within the mapping. You must also run queries to modify Lookup transformations within a mapplet that looks up data in the file.
When you open a mapping or mapplet generated for an Avro, JSON, or Parquet file in the Developer tool, you see the syntax for two queries in the Description field for Read or Lookup transformations within the mapping or mapplet. You use the Hive CLI to run to run the first query, which creates an external table in Hive for the mapping. You then run the second query in the Developer tool to update the mapping with the external table you create in Hive.
The following image shows a mapping generated for a JSON file selected in the Developer tool. The instructions to follow to operationalize the mapping appear in the Description field in the
General
tab.
The image shows a mapping selected in the Developer tool. The instructions to follow to operationalize the mapping appear in the Description field in the General tab.
Perform the following steps on each Read and Lookup transformation within a mapping or mapplet:
  1. Open a mapping or mapplet generated for an Avro, JSON, or Parquet file in the Developer tool.
  2. Select a Read transformation within the mapping, or a Lookup transformation within the mapplet.
  3. Click the
    General
    tab.
  4. Copy the file from the HDFS location referenced in Step 1 to a directory in the cluster.
  5. Copy the CREATE EXTERNAL TABLE query displayed for Step 2 in the Description field.
  6. Replace the variables in the query with the actual values.
    The following table lists the variables to set:
    Variable
    Description
    SCHEMA_NAME
    Name the Hive schema in the data lake in which to publish the data.
    TABLE_NAME
    Name of the external table to create.
    LOCATION
    Directory in the cluster to which you copied the file.
  7. Use the Hive CLI to run the query on Hive.
    The query creates the external table for the file.
  8. In the Developer tool, copy the SELECT query displayed for Step 3 in the Description field.
  9. Click the
    Query
    tab.
  10. Select
    Advanced
    , and then click
    Custom Query
    .
  11. Paste the query into the SQL Query field.
  12. Replace the variables in the query with the actual values.
    The following table lists the variables to set:
    Variable
    Description
    SCHEMA_NAME
    Name the Hive schema in the data lake in which to publish the data.
    TABLE_NAME
    Name of the external table created for the file.
  13. Save the mapping.
    The query updates the mapping with the external table.
    If the instruction text is more than 4000 characters in length, the instruction text truncates in the Description field. If the text truncates, you can copy the queries provided in Step 2 and Step 3 directly from the publication log file.
    If you publish an Avro file, you must copy the query the publication log file.