Table of Contents

Search

  1. Preface
  2. Introduction to Data Transformation
  3. Data Processor Transformation
  4. Wizard Input and Output Formats
  5. Relational Input and Output
  6. XMap
  7. Libraries
  8. Schema Object
  9. Command Line Interface
  10. Scripts
  11. Parsers
  12. Script Ports
  13. Document Processors
  14. Formats
  15. Data Holders
  16. Anchors
  17. Transformers
  18. Actions
  19. Serializers
  20. Mappers
  21. Locators, Keys, and Indexing
  22. Streamers
  23. Validators, Notifications, and Failure Handling
  24. Validation Rules
  25. Custom Script Components

Data Transformation User Guide

Data Transformation User Guide

Parser

Parser

A Parser reads a source document in any format. You can add child components to perform transformations on the data.
Define Parsers at the global level of the Script. Set a main Parser as the startup component. Call a secondary Parser with the
RunParser
action. For more information, see Parser.
The properties of the
Parser
appear above the
contains
line. Below the line, you can insert child components such as anchors and actions.
The following table describes the properties of the
Parser
component:
Property
Description
example_source
Defines a sample source document to process in the development environment.
You can choose one of the following options:
  • Empty. The Developer tool prompts you for a source document when you run the Parser.
  • InputPort. Defines an input port.
  • LocalFile. Defines a file on the local file system.
  • Text. Defines a string.
  • URL. Defines a URL.
Default is empty.
If the
sources_to_extract
property is set, the
example_values
property is ignored in the design environment.
example_values
Defines simulated values that another transformation might pass to the Parser. Use this property to design a Parser that is called by another Parser. A Parser uses the
example_values
property only when it processes the example source. It ignores the property when it parses a source document.
In the nested
ExampleValue
components, specify the data holders that the calling Parser passes to this Parser and their simulated values.
ExampleValue
Defines an example value under the
example_values
property.
format
Defines the format of the source document. You can choose one of the following options:
  • BinaryFormat
  • CustomFormat
  • HtmlFormat
  • Rtf Format
  • TextFormat
  • XmlFormat
Default is CustomFormat. For more information, see Format Component Reference.
name
A descriptive label for the component. This label appears in the log file and the
Events
view. Use the
name
property to identify the component that caused the event.
no_initial_phase
Determines whether the Script searches for nested anchors in the main phase. You can choose one of the following options:
  • Cleared. Search for nested anchors according to their individual properties.
  • Selected. Search for nested anchors in the main phase.
Default is cleared.
notifications
Defines a list of
NotificationHandler
components that the Parser runs on notifications triggered by nested components. For more information, see Notifications.
on_fail
The action to take if the component fails. You can choose one of the following options:
  • Cleared. Take no action.
  • CustomLog. Write to the user log.
  • LogError. Write an error message to the engine log.
  • LogInfo. Write an information message to the engine log.
  • LogWarning. Write a warning message to the engine log.
  • NotifyFailure. Send a notification.
Default is cleared. For more information about handling component failures, see Failure Handling.
reject_recurring_pages
Determines the number of times the Parser parses the same page. You can choose one of the following options:
  • Selected. The Parser parses a page only once.
  • Cleared. The Parser parses a page each time it follows a link to the page.
Use
reject_recurring_pages
when a web site contains many links to the same page.
The
ResetVisitedPages
action resets the history list and allows a Parser to process a page again, even if
reject_recurring_pages
is selected.
remark
A user-defined comment that describes the purpose or action of the component.
serialization_mode
Defines how the Script processes portions of the example source that the Parser does not output to XML, when you create a serializer from a Parser. For more information, see Controlling How the Create Serializer Command Works.
You can choose one of the following options:
  • Full. Causes the
    Create Serializer
    command to copy the non-XML text to the serializer configuration.
  • Outline. Causes the
    Create Serializer
    command to copy only the delimiters of the non-XML text to the serializer configuration. When
    Outline
    is selected, you can set the
    use_markers
    property.
source
Defines a sequence of data holders for input to the Parser. Each data holder is identified by one of the following properties:
  • Locator. Identifies a single-occurrence or a multiple-occurrence data holder. For multiple-occurrence data holders, each iteration accesses a new occurrence.
  • LocatorByKey. Identifies a multiple-occurrence data holder by key.
  • LocatorByOccurence. Identifies a multiple-occurrence data holder by sequence number.
In a secondary Parser, set
Parser
source
Locator
data_holder
to the data holder defined in the associated
AdditionalInputPort
data_holder
. For more information, see Source Property.
sources_to_extract
Defines a hard-coded list of source documents that the Parser processes. You can choose one of the following options:
  • DocList. Defines a list of
    LocalFile
    ,
    Text
    , and
    URL
    components.
  • Empty. The Parser processes the
    example_source
    .
  • FileSearch. Defines a folder on the local file system and a file name filter.
  • InputPort. Defines an input port. Do not use this option.
  • LocalFile. Defines a file on the local file system.
  • Text. Defines a string.
  • URL. Defines a URL.
Default is empty.
Use the
sources_to_extract
property only in the design environment.
target
Defines a sequence of data holders for output from the Parser. If the data holder does not yet exist, the Parser creates it. Each data holder is identified by one of the following properties:
  • Locator. Identifies a single-occurrence or a multiple-occurrence data holder. For multiple-occurrence data holders, each iteration creates a new occurrence.
  • LocatorByKey. Identifies a multiple-occurrence data holder by key.
  • LocatorByOccurence. Identifies a multiple-occurrence data holder by sequence number.
Use the
target
property when the output of the Parser is used by another component. For more information, see Target Property.
use_markers
Determines whether the
Create Serializer
command copies the content of the
Marker
anchors but only the delimiters of other non-XML text.
use_markers
is an option under the
serialization_mode
property when
outline
is selected. Default is selected.


Updated September 26, 2018