Table of Contents

Search

Data Transformation Getting Started Guide

Data Transformation Getting Started Guide

Step 3. Configure the Parser

Step 3. Configure the Parser

Configure a Data Processor transformation Parser in the IntelliScript editor. To create mapping statements, first define marker anchors and content anchors for each data value in the PDF sample file. Then define data holders that identify the XML hierarchy element that is associated with each unstructured data element.
  1. To open the IntelliScript editor, click the Script object that you want to edit. When you create a Script with a Parser, the IntelliScript editor displays it automatically.
    In the following image, the IntelliScript editor displays the parser:
    "The Parser uses mapping statements to define how to transform input data into output data. Use the IntelliScript editor to define mapping statements in the Parser based on an example source that is structured in the same way that the data you expect to transform is formatted."
  2. To preview the example source in text, perform the following steps:
    1. Next to the
      example_source
      property, double‐click the equals sign and select
      LocalFile
      .
    2. Expand the
      example_source
      property, then click the double‐right arrows.
    3. Next to the
      pre_processor
      property, double‐click the equals sign and select
      PDFToTxt_4
      .
    4. Next to the
      format
      property, double‐click the equals sign and select
      TextFormat
      .
      In the following image, the
      Data Viewer
      view editor displays the PDF in text format:
      "The Input panel in the Data Viewer view displays the input text from the example file. If the input is in PDF format, it appears in a binary format that is not readable. After you assign a pre-processor to the Parser, the transformation pre-processes the example source so that it appears as text in the Input panel of the Data Viewer."
  3. To define a marker anchor that identifies where to find the company name, perform the following steps:
    1. In the
      Data Viewer
      view, find and select the text
      Company Name
      that marks the location of the name.
    2. Right-click, and then select
      Insert Marker
      .
      In the
      Data Viewer
      view, the text
      Company Name
      is highlighted in yellow to identify a marker anchor. In the following image, the
      Data Viewer
      view displays the highlighted
      Marker
      element in the
      Input
      panel:
      "The Input panel in the Data Viewer view displays the input text from the example file. Use this view to define Marker anchors and content anchors, that define where the Parser looks for data. To signify to the Parser that you want to transform specific data, highlight the text in the Input panel and assign an anchor to it. To identify where to look for data, you assign a marker anchor."
      The IntelliScript editor also adds a
      Marker
      element. In the following image, the IntelliScript editor displays the
      Marker
      element:
      "The IntelliScript editor displays the mapping statement that the Developer creates when you assign a Marker anchor to text in the example source. When the transformation processes input, it applies this statement to the input file."
  4. To define a content anchor that shows where the parser reads the company name, perform the following steps:
    1. In the
      Data Viewer
      view, find and select the text
      Container Shipping Industries
      that marks the text to parse.
    2. Right-click, and then select
      Insert Content
      .
      In the
      Data Viewer
      view, the text
      Company Name
      is highlighted in red to identify a content anchor. In the following image, the
      Data Viewer
      view displays the highlighted
      Content
      element:
      The IntelliScript editor also adds a
      Content
      element. In the following image, the IntelliScript editor displays the
      Content
      element:
  5. To transform the text
    Container Shipping Industries
    into the Company_Name element in the output XML data, perform the following steps:
    1. In the
      IntelliScript Editor
      view, find the
      Content
      anchor and the
      data_holder
      property that it contains.
    2. Double‐click the
      data_holder
      property to display the
      Choose Node
      picker.
    3. Expand the
      no target namespace
      element and select the
      /Invoice/@Company_Name
      output node. Then, click
      OK.
      In the following image, the IntelliScript editor displays the completed
      Marker
      and
      Content
      elements:
  6. To define a marker anchor that identifies where to find the invoice number value, perform the following steps:
    1. In the
      Data Viewer
      view, find and select the text
      INVOICE NUMBER
      that marks the location of the value.
    2. Right-click, and then select
      Insert Marker
      .
  7. To define a content anchor that shows where the parser reads the value of the invoice number, perform the following steps:
    1. In the
      Data Viewer
      view, find and select the text
      536524
      that marks the text to parse.
    2. Right-click, and then select
      Insert Content
      .
  8. To transform the invoice number into the Invoice_No element in the output XML data, perform the following steps:
    1. In the
      IntelliScript Editor
      view, find the
      Content
      anchor and the
      data_holder
      property that it contains.
    2. Double‐click the
      data_holder
      property to display the
      Choose Node
      picker.
    3. Expand the
      no target namespace
      element and select the
      /Invoice/*s/Invoice_No
      output node. Then, click
      OK.
  9. To transform the value for the verified vendors into the Verified_Vendors element in the output XML data, perform the following steps:
    1. In the
      Data Viewer
      view, find and define the
      Verified Vendors
      text as a
      Marker
      anchor.
    2. Find and select the text
      9
      and define the text as a
      Content
      anchor.
    3. In the
      IntelliScript Editor
      view, double‐click the
      data_holder
      property. In the
      Choose Node
      picker, expand the nodes to select the
      /Invoice/@Verified_Vendors
      element. Then click
      OK
      .
  10. To transform the value for the total number of checks into the Total_Checks element in the output XML data, perform the following steps:
    1. In the
      Data Viewer
      view, find and define the
      Total Amount of Checks
      text as a
      Marker
      anchor.
    2. Find and select the text
      10998.68
      and define the text as a
      Content
      anchor.
    3. In the
      IntelliScript Editor
      view, find the
      Content
      anchor and change the
      closing_marker
      to
      NewlineSearch
      , in case the order number is longer than in the example source.
    4. Double‐click the
      data_holder
      property. In the
      Choose Node
      picker, expand the nodes to select the
      /Invoice/@Total_Checks
      element. Then click
      OK
      .
  11. To transform the table of check-related data, add a group to hold a logical set of statements and a repeating group to process each line of the table. Perform the following steps:
    1. In the
      IntelliScript Editor
      view, double-click the last heavy double-arrows under the parser element and select
      Group
      .
    2. In the
      Data Viewer
      view, find and define the
      Check No.
      text as a
      Marker
      anchor.
    3. In the
      IntelliScript Editor
      view, double-click the heavy double-arrows under the
      contains
      dividing line and select
      RepeatingGroup
      .
    4. Expand the
      RepeatingGroup
      element and change the value for
      separator
      to
      Marker
      .
    5. Expand the
      separator
      element and change the value for
      search
      to
      NewlineSearch
      .
    6. To parse the quantity value for each line of the form, create a content marker for that value. Double-click the heavy double-arrows under the
      RepeatingGroup
      element and select
      Content
      .
    7. To assign the quantity value to the Quantity element in the XML output, expand the
      Content
      anchor and double-click the
      data_holder
      element. Expand the nodes to select
      /Invoice/*s/Order/*s/CheckNo
      .
      Because the data holder type is a number, the parser takes the first number in each line as the quantity value.
    8. To parse the vendor name for each line of the order, create a content marker for that value. Double-click the heavy double-arrows under the previous element and select
      Content
      .
    9. In the
      Data Viewer
      view, find and define the text
      Skipper Industries
      as an
      Offset Content
      anchor.
    10. Since some names might be longer than the first vendor name, change the closing marker offset amount. In the
      IntelliScript Editor
      view, expand the
      closing_marker
      element and change the
      offset
      amount from
      18
      to
      50
      .
    11. To assign the vendor name to the Vendor_Name element in the XML output, expand the
      Content
      anchor and double-click the
      data_holder
      element. Expand the nodes to select
      /Invoice/*s/Order/*s/Vendor_Name
      .
    12. To parse the check date for each line of the order, create a content marker for that value. Double-click the heavy double-arrows under the previous element and select
      Content
      .
    13. In the
      Data Viewer
      view, find and define the text
      March 20, 2014
      as an
      Offset Content
      anchor.
    14. Because some dates might be longer than this date, change the closing marker offset amount. In the
      IntelliScript Editor
      view, expand the
      closing_marker
      element and change the
      offset
      amount from
      14
      to
      26
      .
    15. To assign the date to the Check_Date element in the XML output, expand the
      Content
      anchor and double-click the
      data_holder
      element. Expand the nodes to select
      /Invoice/*s/Order/*s/Check_Date
      .
    16. To parse the check value for each line of the order, create a content marker for that value. Double-click the heavy double-arrows under the previous element and select
      Content
      .
    17. Expand the
      Content
      element and change the value for
      phase
      to
      final
      .
    18. Each check value is preceded by a dollar sign. The parser can use the dollar sign to find the check value. Change the value of
      opening marker
      element. Select
      TextSearch
      . Change the value of
      text
      to
      $
      .
    19. To assign the check value to the Check_Value element in the XML output, expand the
      Content
      anchor and double-click the
      data_holder
      element. Expand the nodes to select
      /Invoice/*s/Order/*s/Check_Value
      .
      In the
      Data Viewer
      view, the repeating group text is highlighted in red to identify the content anchors. In the following image, the
      Data Viewer
      view displays the highlighted content anchor elements:
      In the following image, the IntelliScript editor displays the entire
      Group
      element and all the sub-elements:
    20. Collapse the
      Group
      element.
  12. To transform the subtotal to the Sub_Total element in the XML output, perform the following steps:
    1. In the
      Data Viewer
      view, find and define the
      SUBTOTAL
      text as a
      Marker
      anchor.
    2. In the
      IntelliScript Editor
      view, find the
      Marker
      anchor element under the
      Repeating Group
      element. As this element does not repeat, remove it from the repeating group. To remove it, click and drag the
      Marker
      anchor element to the main double arrows.
    3. In the
      Data Viewer
      view, find and select the text
      10381.56
      and define the text as a
      Content
      anchor.
    4. In the
      IntelliScript Editor
      view, expand the
      Content
      anchor and double-click the
      data_holder
      element. Expand the nodes to select
      /Invoice/*s/Sub_Total
      .
  13. To transform the tax to the Tax element in the XML output, perform the following steps:
    1. In the
      Data Viewer
      view, find and define the
      TAX
      text as a
      Marker
      anchor.
    2. Find and select the text
      717.12
      and define the text as a
      Content
      anchor.
    3. In the
      IntelliScript Editor
      view, expand the
      Content
      anchor and double-click the
      data_holder
      element. Expand the nodes to select
      /Invoice/*s/Tax
      .
  14. To save the transformation, in the Developer tool select the transformation, then click
    File
    Save
    .