Table of Contents

Search

  1. Preface
  2. Components
  3. Business services
  4. File listeners
  5. Fixed-width file formats
  6. Hierarchical schemas
  7. Intelligent structure models
  8. Mapplets
  9. Saved queries
  10. Shared sequences
  11. User-defined functions

Components

Components

Use Case

Use Case

You work in an operations group for a manufacturing company. Your team wants to process web logs from your server farms to obtain operations analytics and to identify maintenance issues.
Your back-end system collects data regarding server access and system load in your server farms. Your team wants to identify the operations that have created the most server load in the past few weeks. You want to store data afterwards for auditing purposes.
Before your data analysts can begin working with the data, you need to parse the data. However, the logs are semi-structured, and after server upgrades the log file structure might change slightly and some of the information might take a different format. With a standard transformation, this would cause data loss or log processing failures.
Your initial log files have the following structure:
0:0:0:0:0:0:0:1 - 96E8FA61FD3B02CA0349ACAF0C5152EC.dxTomcat [18/Sep/2014:14:54:24 +0300] 'GET /dx-console/ HTTP/1.1' 302 - 0:0:0:0:0:0:0:1 - 96E8FA61FD3B02CA0349ACAF0C5152EC.dxTomcat [18/Sep/2014:14:54:25 +0300] 'GET /dx-console/com.informatica.b2b.dx.Main/main.jsp HTTP/1.1' 302 - 0:0:0:0:0:0:0:1 - 96E8FA61FD3B02CA0349ACAF0C5152EC.dxTomcat [18/Sep/2014:14:54:25 +0300] 'GET /dx-console/login.jsp HTTP/1.1' 200 4472 0:0:0:0:0:0:0:1 - 96E8FA61FD3B02CA0349ACAF0C5152EC.dxTomcat [18/Sep/2014:14:55:47 +0300] 'POST /dx-console/j_spring_security_check HTTP/1.1' 302 - 0:0:0:0:0:0:0:1 - 96E8FA61FD3B02CA0349ACAF0C5152EC.dxTomcat [18/Sep/2014:14:55:47 +0300] 'GET /dx-console/com.informatica.b2b.dx.Main/main.jsp HTTP/1.1' 200 167244
Following server upgrades, some log files have the following structure:
0:0:0:0:0:0:0:1 - 96E8FA61FD3B02CA0349ACAF0C5152EC.dxTomcat 96E8FA61FD3B02CA0349 [4-Apr-2014 05:14:24WIT] 'GET /dx-console/ HTTP/1.1' 302 - 0:0:0:0:0:0:0:1 - 96E8FA61FD3B02CA0349ACAF0C5152EC.dxTomcat 96E8FA61FD3B02CA0349 [4-Apr-2014 05:14:24WIT] 'GET /dx-console/com.informatica.b2b.dx.Main/main.jsp HTTP/1.1' 302 - 0:0:0:0:0:0:0:1 - 96E8FA61FD3B02CA0349ACAF0C5152EC.dxTomcat 96E8FA61FD3B02CA0349 [4-Apr-2014 05:14:24WIT] 'GET /dx-console/login.jsp HTTP/1.1' 200 4472 0:0:0:0:0:0:0:1 - 96E8FA61FD3B02CA0349ACAF0C5152EC.dxTomcat 96E8FA61FD3B02CA0349 [4-Apr-2014 05:14:24WIT] 'POST /dx-console/j_spring_security_check HTTP/1.1' 302 - 0:0:0:0:0:0:0:1 - 96E8FA61FD3B02CA0349ACAF0C5152EC.dxTomcat 96E8FA61FD3B02CA0349 [4-Apr-2014 05:14:24WIT] 'GET /dx-console/com.informatica.b2b.dx.Main/main.jsp HTTP/1.1' 200 167244 0:0:0:0:0:0:0:1 - 96E8FA61FD3B02CA0349ACAF0C5152EC.dxTomcat 96E8FA61FD3B02CA0349 [4-Apr-2014 05:14:24WIT] 'GET /dx-console/com.informatica.b2b.dx.Main/com.informatica.b2b.dx.Main.nocache.js HTTP/1.1' 200 7955 0
The data format varies, and some of the data has drifted to a different location.
The following image shows the data variations:
This image shows differences in the expected input format for log data. The date format differs in different input files, and some data has drifted to a different location.
Instead of manually creating individual transformations, your team can generate an
intelligent structure model
to determine the relevant data sets. You create an
intelligent structure
in
Intelligent Structure Discovery
and automatically identify the structure of the data.
The following image shows the
intelligent structure
that you create:
This image shows the intelligent structure that you generate from a web log input file.
When you examine the data, you realize that the final element in the model,
number
, actually represents the server response size, or system load. You change the element name to
responseSize
.
The following image shows the updated
intelligent structure
:
This image shows the intelligent structure after you rename the number node to responseSize. You can see the intelligent structure in the Visual Model tab and the output data that relates to each node in the Table tab.
After you save the
intelligent structure
as an
intelligent structure model
, you create a Structure Parser transformation and assign the model to it. You can add the transformation to a
Data Integration
mapping with a source, target, and other transformations. After the mapping fetches data from a source connection, such as Amazon S3 input buckets, the Structure Parser processes the data with an intelligent structure model. The transformation passes the web log data to downstream transformations for further processing, and then to a target, such as Amazon S3 output buckets.


Updated August 03, 2020