Configure a Data Processor transformation Parser in the IntelliScript editor. To create mapping statements, first define marker anchors and content anchors for each data value in the PDF sample file. Then define data holders that identify the XML hierarchy element that is associated with each unstructured data element.
To open the IntelliScript editor, click the Script object that you want to edit. When you create a Script with a Parser, the IntelliScript editor displays it automatically.
In the following image, the IntelliScript editor displays the parser:
To preview the example source in text, perform the following steps:
Next to the
example_source
property, double‐click the equals sign and select
LocalFile
.
Expand the
example_source
property, then click the double‐right arrows.
Next to the
pre_processor
property, double‐click the equals sign and select
PDFToTxt_4
.
Next to the
format
property, double‐click the equals sign and select
TextFormat
.
In the following image, the
Data Viewer
view editor displays the PDF in text format:
To define a marker anchor that identifies where to find the company name, perform the following steps:
In the
Data Viewer
view, find and select the text
Company Name
that marks the location of the name.
Right-click, and then select
Insert Marker
.
In the
Data Viewer
view, the text
Company Name
is highlighted in yellow to identify a marker anchor. In the following image, the
Data Viewer
view displays the highlighted
Marker
element in the
Input
panel:
The IntelliScript editor also adds a
Marker
element. In the following image, the IntelliScript editor displays the
Marker
element:
To define a content anchor that shows where the parser reads the company name, perform the following steps:
In the
Data Viewer
view, find and select the text
Container Shipping Industries
that marks the text to parse.
Right-click, and then select
Insert Content
.
In the
Data Viewer
view, the text
Company Name
is highlighted in red to identify a content anchor. In the following image, the
Data Viewer
view displays the highlighted
Content
element:
The IntelliScript editor also adds a
Content
element. In the following image, the IntelliScript editor displays the
Content
element:
To transform the text
Container Shipping Industries
into the Company_Name element in the output XML data, perform the following steps:
In the
IntelliScript Editor
view, find the
Content
anchor and the
data_holder
property that it contains.
Double‐click the
data_holder
property to display the
Choose Node
picker.
Expand the
no target namespace
element and select the
/Invoice/@Company_Name
output node. Then, click
OK.
In the following image, the IntelliScript editor displays the completed
Marker
and
Content
elements:
To define a marker anchor that identifies where to find the invoice number value, perform the following steps:
In the
Data Viewer
view, find and select the text
INVOICE NUMBER
that marks the location of the value.
Right-click, and then select
Insert Marker
.
To define a content anchor that shows where the parser reads the value of the invoice number, perform the following steps:
In the
Data Viewer
view, find and select the text
536524
that marks the text to parse.
Right-click, and then select
Insert Content
.
To transform the invoice number into the Invoice_No element in the output XML data, perform the following steps:
In the
IntelliScript Editor
view, find the
Content
anchor and the
data_holder
property that it contains.
Double‐click the
data_holder
property to display the
Choose Node
picker.
Expand the
no target namespace
element and select the
/Invoice/*s/Invoice_No
output node. Then, click
OK.
To transform the value for the verified vendors into the Verified_Vendors element in the output XML data, perform the following steps:
In the
Data Viewer
view, find and define the
Verified Vendors
text as a
Marker
anchor.
Find and select the text
9
and define the text as a
Content
anchor.
In the
IntelliScript Editor
view, double‐click the
data_holder
property. In the
Choose Node
picker, expand the nodes to select the
/Invoice/@Verified_Vendors
element. Then click
OK
.
To transform the value for the total number of checks into the Total_Checks element in the output XML data, perform the following steps:
In the
Data Viewer
view, find and define the
Total Amount of Checks
text as a
Marker
anchor.
Find and select the text
10998.68
and define the text as a
Content
anchor.
In the
IntelliScript Editor
view, find the
Content
anchor and change the
closing_marker
to
NewlineSearch
, in case the order number is longer than in the example source.
Double‐click the
data_holder
property. In the
Choose Node
picker, expand the nodes to select the
/Invoice/@Total_Checks
element. Then click
OK
.
To transform the table of check-related data, add a group to hold a logical set of statements and a repeating group to process each line of the table. Perform the following steps:
In the
IntelliScript Editor
view, double-click the last heavy double-arrows under the parser element and select
Group
.
In the
Data Viewer
view, find and define the
Check No.
text as a
Marker
anchor.
In the
IntelliScript Editor
view, double-click the heavy double-arrows under the
contains
dividing line and select
RepeatingGroup
.
Expand the
RepeatingGroup
element and change the value for
separator
to
Marker
.
Expand the
separator
element and change the value for
search
to
NewlineSearch
.
To parse the quantity value for each line of the form, create a content marker for that value. Double-click the heavy double-arrows under the
RepeatingGroup
element and select
Content
.
To assign the quantity value to the Quantity element in the XML output, expand the
Content
anchor and double-click the
data_holder
element. Expand the nodes to select
/Invoice/*s/Order/*s/CheckNo
.
Because the data holder type is a number, the parser takes the first number in each line as the quantity value.
To parse the vendor name for each line of the order, create a content marker for that value. Double-click the heavy double-arrows under the previous element and select
Content
.
In the
Data Viewer
view, find and define the text
Skipper Industries
as an
Offset Content
anchor.
Since some names might be longer than the first vendor name, change the closing marker offset amount. In the
IntelliScript Editor
view, expand the
closing_marker
element and change the
offset
amount from
18
to
50
.
To assign the vendor name to the Vendor_Name element in the XML output, expand the
Content
anchor and double-click the
data_holder
element. Expand the nodes to select
/Invoice/*s/Order/*s/Vendor_Name
.
To parse the check date for each line of the order, create a content marker for that value. Double-click the heavy double-arrows under the previous element and select
Content
.
In the
Data Viewer
view, find and define the text
March 20, 2014
as an
Offset Content
anchor.
Because some dates might be longer than this date, change the closing marker offset amount. In the
IntelliScript Editor
view, expand the
closing_marker
element and change the
offset
amount from
14
to
26
.
To assign the date to the Check_Date element in the XML output, expand the
Content
anchor and double-click the
data_holder
element. Expand the nodes to select
/Invoice/*s/Order/*s/Check_Date
.
To parse the check value for each line of the order, create a content marker for that value. Double-click the heavy double-arrows under the previous element and select
Content
.
Expand the
Content
element and change the value for
phase
to
final
.
Each check value is preceded by a dollar sign. The parser can use the dollar sign to find the check value. Change the value of
opening marker
element. Select
TextSearch
. Change the value of
text
to
$
.
To assign the check value to the Check_Value element in the XML output, expand the
Content
anchor and double-click the
data_holder
element. Expand the nodes to select
/Invoice/*s/Order/*s/Check_Value
.
In the
Data Viewer
view, the repeating group text is highlighted in red to identify the content anchors. In the following image, the
Data Viewer
view displays the highlighted content anchor elements:
In the following image, the IntelliScript editor displays the entire
Group
element and all the sub-elements:
Collapse the
Group
element.
To transform the subtotal to the Sub_Total element in the XML output, perform the following steps:
In the
Data Viewer
view, find and define the
SUBTOTAL
text as a
Marker
anchor.
In the
IntelliScript Editor
view, find the
Marker
anchor element under the
Repeating Group
element. As this element does not repeat, remove it from the repeating group. To remove it, click and drag the
Marker
anchor element to the main double arrows.
In the
Data Viewer
view, find and select the text
10381.56
and define the text as a
Content
anchor.
In the
IntelliScript Editor
view, expand the
Content
anchor and double-click the
data_holder
element. Expand the nodes to select
/Invoice/*s/Sub_Total
.
To transform the tax to the Tax element in the XML output, perform the following steps:
In the
Data Viewer
view, find and define the
TAX
text as a
Marker
anchor.
Find and select the text
717.12
and define the text as a
Content
anchor.
In the
IntelliScript Editor
view, expand the
Content
anchor and double-click the
data_holder
element. Expand the nodes to select
/Invoice/*s/Tax
.
To save the transformation, in the Developer tool select the transformation, then click