Table of Contents

Search

  1. Preface
  2. Part 1: Introduction to Profiles
  3. Part 2: Profiling with Informatica Analyst
  4. Part 3: Profiling with Informatica Developer

Profile Guide

Profile Guide

Creating a Complex File Data Object from an Avro or Parquet Data Source

Creating a Complex File Data Object from an Avro or Parquet Data Source

You can create a complex file data object from an Avro or Parquet data source with
File
or
Connection
as the access type. You can create a column profile on the data object.
  1. In the
    Object Explorer
    view, select a project.
  2. Click
    File
    New
    Data Object
    .
    The
    New
    dialog box appears.
  3. Select
    Physical Data Objects
    Complex File Data Object
    and click
    Next
    .
    The
    New Complex File Data Object
    dialog box appears.
  4. Enter a name for the data object.
  5. You can choose the access type as
    Connection
    or
    File
    .
    • If you choose the Access Type as
      Connection
      , perform the following steps:
      1. Click
        Browse
        to choose an HDFS connection.
      2. In the
        Choose Connection
        dialog box, choose a data source, and click
        OK
        .
      3. In the
        New Complex File Data Object
        dialog box, click
        Finish
        .
        The data object appears in the project folder.
    • If you choose the Access Type as
      File
      and the Resource Format as
      Binary
      , perform the following steps:
      1. Click
        Browse
        to choose an Avro or Parquet file on the local machine.
      2. In the
        New Complex File Data Object
        dialog box, click
        Finish
        .
        The data object appears in the project folder.
      3. Select the data object in the project folder and click the
        Data Object Operations
        view.
      4. In the
        Data Object Operations
        view, click
        Read
        Advanced
        tab.
      5. In the
        Advanced
        tab, enter the file path of the data source on the Linux or Windows machine in the
        File path
        field.
      6. Enter the File Format as
        Custom Input
        .
      7. Enter
        com.informatica.avro.AvroToXML
        in the
        Input Format
        field for Avro data sources, and enter
        com.informatica.parquet.ParquetToXML
        in the
        Input Format
        field for Parquet data sources. When you add the input format, the Data Processor Transformation processes and transforms the data sources in Avro or Parquet format to a data source in XML format at runtime.
    • If you choose the Access Type as
      File
      and the Resource Format as
      Avro
      or
      Parquet
      , perform the following steps:
      1. Click
        Browse
        to choose an Avro or Parquet file in the local machine.
      2. In the
        New Complex File Data Object
        dialog box, click
        Finish
        .
        The data object appears in the project folder.
      3. After you create the data object, navigate to
        Data Object Operations
        Read
        Advanced
        tab, and verify whether the file path in the
        File path
        field corresponds to the data source in the Linux or Windows machine.
      You can choose the Resource Format as
      Avro
      or
      Parquet
      only for flat-structured Avro and Parquet data sources.
      You can choose a folder with multiple Avro or multiple Parquet files to create a data object. After you create the data object, navigate to
      Data Object Operations
      Read
      Advanced
      tab, and verify whether the file path in the
      File path
      field points to the folder of the data sources in the Linux or Windows machine.

0 COMMENTS

We’d like to hear from you!