JupyterLab Extension for INFACore

JupyterLab Extension for INFACore

Configure the structure parser function

Configure the structure parser function

INFACore parses unstructured or semi-structured data using the Intelligent Structure Discovery (ISD) jars which is bundled with the INFACore installation.
To parse data, select the
Parse Unstructured Data
function, and specify the following fields:
  • New DataFrame Name
    : Specify a name for the new DataFrame. A DataFrame is a two-dimensional data structure, where data is aligned in a tabular fashion in rows and columns.
  • Schema file path
    . Specify the file path to the sample schema file.
  • Input file path
    . Specify the input file path of you source data that contains unstructured data.
Example
The following image is a snapshot of the unstructured data in JSON format in the
json_input.json
file that you want parse:
The input file contains data in unstructured format.
Provide the path to the sample schema
sample_schema.txt
file that you want INFACore to refer to parse the unstructured data:
 You can view the sample schema file.
See the sample Python code that displays when you apply the parser function with the input file and sample schema file:
import informatica.infacore as ic pf = ic.ParserFunctions() parser_data = pf.parse_unstructured_data("C:\\Users\\John\\Documents\\FF_SOURCES\\json_input.json", "C:\\Users\\John\\Documents\\FF_SOURCES\\sample_schema.txt")
To apply the Pandas function, invoke the Python SDK to convert the INFACore DataFrame to the Pandas DataFrame and return the rows:
df_reader = ic.DataFrameReader(parser_data) p_df = df_reader.to_pandas() p_df.head()
For more information, see the
INFACore SDK Reference for Python
.
When you run the code, the structure parser function returns data in a structured format:
State
Account Length
Area Code
Phone
Int'l Plan
VMail Plan
VMail Message
token
Mins
Calls
Charge
CustServ Calls
Churn
PA
163
806
403-2562
no
yes
300
Day
8.162204
3
7.579174
3
True.
PA
163
806
403-2562
no
yes
300
Eve
3.933035
4
6.508639
3
True.
PA
163
806
403-2562
no
yes
300
Night
4.065759
100
5.111624
3
True.
PA
163
806
403-2562
no
yes
300
Intl
4.92816
6
5.673203
3
True.
SC
15
836
158-8416
yes
no
0
Day
10.018993
4
4.226289
8
False.

0 COMMENTS

We’d like to hear from you!