Table of Contents

Search

  1. Preface
  2. Components
  3. Business services
  4. File listeners
  5. Fixed-width file formats
  6. Hierarchical schemas
  7. Intelligent structure models
  8. Mapplets
  9. Saved queries
  10. Shared sequences
  11. User-defined functions

Components

Components

Selecting a sample file

Selecting a sample file

When you select a file on which to base an
intelligent structure
, the file should be very similar to files that are used in production, and contain each type of data that you want to parse, in the same format as the data you want to parse.
Use a simplified sample file to generate the model. For example, if the input data has tables, provide a table with just a few sample rows rather than many rows of data. If you use a JSON input file that contains repeating groups of data, limit the number of repetitions.
If the
intelligent structure
does not match the input file that you plan to use, or only partially matches the input file, there might be a large amount of unidentified data and data loss. However, some variations, such as in date format, will still be parsed.
If your production data varies from the original file used to create the
intelligent structure model
, the model might still be able to capture the data, depending on the type of variation. For example, you might create a model with a date format such as
18/Sep/2014
, and your data uses a different format, such as
4-Apr-2014
. The model recognizes and parses the data.
The model can also address data drift in certain cases, as in the following example. In this example, the sample data used to create the model contains the following text:
96E8FA61FD3B02CA0349ACAF0C5152EC.dxTomcat [18/Sep/2014:14:54:24 +0300] 'GET /dx-console/ HTTP/1.1' 302 - 96E8FA61FD3B02CA0349ACAF0C5152EC.dxTomcat [18/Sep/2014:14:54:25 +0300] 'GET /dx-console/com.informatica.b2b.dx.Main/main.jsp HTTP/1.1' 302 - 96E8FA61FD3B02CA0349ACAF0C5152EC.dxTomcat [18/Sep/2014:14:54:25 +0300] 'GET /dx-console/login.jsp HTTP/1.1' 200 4472
The data that you parse with the model contains the following text:
96E8FA61FD3B02CA0349ACAF0C5152EC.dxTomcat this_is_new_version_data [4-Apr-2014 05:14:24WIT] 'GET /dx-console/ HTTP/1.1' 302 - 96E8FA61FD3B02CA0349ACAF0C5152EC.dxTomcat this_is_new_version_data [4-Apr-2014 05:14:24WIT] 'GET /dx-console/com.informatica.b2b.dx.Main/main.jsp HTTP/1.1' 302 - 96E8FA61FD3B02CA0349ACAF0C5152EC.dxTomcat this_is_new_version_data [4-Apr-2014 05:14:24WIT] 'GET /dx-console/login.jsp HTTP/1.1' 200 4472
Some of the data has drifted and is in a different location with relationship to the other data. However, the model can parse this variation.


Updated August 03, 2020