Table of Contents

Search

  1. Preface
  2. Introduction to Data Transformation
  3. Data Processor Transformation
  4. Wizard Input and Output Formats
  5. Relational Input and Output
  6. XMap
  7. Libraries
  8. Schema Object
  9. Command Line Interface
  10. Scripts
  11. Parsers
  12. Script Ports
  13. Document Processors
  14. Formats
  15. Data Holders
  16. Anchors
  17. Transformers
  18. Actions
  19. Serializers
  20. Mappers
  21. Locators, Keys, and Indexing
  22. Streamers
  23. Validators, Notifications, and Failure Handling
  24. Validation Rules
  25. Custom Script Components

Data Transformation User Guide

Data Transformation User Guide

Content

Content

A
Content
anchor retrieves text from the source document. The Parser searches in a defined region according to specified search criteria and stores the retrieved text in a data holder.
The following table describes the properties of the
Content
anchor:
Property
Description
allow_empty_values
Determines whether the
Content
anchor can be empty. You can choose one of the following options:
  • Selected. The
    data_holder
    is assigned an empty value.
  • Cleared. Empty values are not allowed.
allow_empty_values
must be selected in the following situations:
  • When the anchor is configured with
    value = LearnByExample
    and there is nothing between the delimiters.
  • When there is nothing between the
    opening_marker
    and the
    closing_marker
    .
closing_marker
Defines the end of a region where the Parser searches for the
Content
anchor. You can choose one of the following options:
  • NewlineSearch. The end of the
    Content
    anchor is the next newline character.
  • OffsetSearch. The end of the
    Content
    anchor is the number of characters specified in
    offset
    .
  • PatternSearch. The end of the
    Content
    anchor is the first text that matches a specified regular expression.
  • TextSearch. The end of the
    Content
    anchor is a specified text string.
data_holder
Defines a data holder where the
Content
anchor stores the retrieved text.
direction
A search direction for the anchor within the search scope. You can choose one of the following options:
  • backward. Search from the end of the search scope and finds the last instance of the anchor.
  • forward. Search from the start of the search scope and finds the first instance of the anchor.
For a
Marker
anchor, you can modify this behavior by using the
count
property. For example, if
direction = backward
and
count = 2
, the Script finds the second-to-last instance.
Default is forward. For more information, see How a Parser Searches for Anchors.
disable_XSD_type_search
Determines whether the Parser searches for content that matches the data type of the data holder. You can choose one of the following options:
  • Selected. The Parser searches without regard to the data type. After transformers are applied to the content, if the result does not match the data type of the data holder, the anchor fails.
  • Cleared. The Parser searches for content that matches the data type.
Default is cleared. For more information, see Using Data Types to Narrow the Search Criteria.
disabled
Determines whether the Script ignores the component and all of the child components. Use this property to test, debug, and modify a Script. You can choose one of the following options:
  • Selected. The Script ignores the component.
  • Cleared. The Script applies the component.
The default is cleared.
ignore_default_transformers
Determines whether the Parser applies the default transformers to the content. Default is cleared.
For more information, see Transformers Overview.
marking
Determines whether an anchor is used as the start of the search scope for the succeeding anchor. You can choose one of the following options:
  • begin position. Place a reference point before the current anchor.
  • end position. Place a reference point after the current anchor.
  • full. Place a reference point before and after the current anchor.
  • none. Do not create a reference point.
For more information, see How a Parser Searches for Anchors.
name
A descriptive label for the component. This label appears in the log file and the
Events
view. Use the
name
property to identify the component that caused the event.
on_fail
The action to take if the component fails. You can choose one of the following options:
  • Cleared. Take no action.
  • CustomLog. Write to the user log.
  • LogError. Write an error message to the engine log.
  • LogInfo. Write an information message to the engine log.
  • LogWarning. Write a warning message to the engine log.
  • NotifyFailure. Send a notification.
Default is cleared. For more information about handling component failures, see Failure Handling.
opening_marker
Defines the start of a region where the Parser searches for the
Content
anchor. The possible values are the following components:
  • NewlineSearch. The start of the
    Content
    anchor is the next newline character.
  • OffsetSearch. The start of the
    Content
    anchor is the number of characters specified in
    offset
    .
  • PatternSearch. The start of the
    Content
    anchor is the first text that matches a specified regular expression.
  • TextSearch. The start of the
    Content
    anchor is a specified text string.
optional
Determines whether a component failure causes the parent component to fail. You can choose one of the following options:
  • Selected. Component failure does not cause the parent component to fail.
  • Cleared. Component failure causes the parent component to fail.
Default is cleared. For more information about component failure, see Failure Handling.
phase
Determines when the Script processes the component. You can choose one of the following options:
  • initial. The Script processes the component during the initial phase.
  • main. The Script processes the component during the main phase.
  • final. The Script processes the component during the final phase.
For more information, see How a Parser Searches for Anchors.
Default is main.
remark
A user-defined comment that describes the purpose or action of the component.
transformers
Defines a sequence of transformers that the Parser applies to the retrieved text. For more information, Transformers.
validators
Defines a list of validators applied to the data. For more information, see Validators.
value
Defines criteria for a search in the region defined by the
opening_marker
and
closing_marker
attributes. If
opening_marker
is not defined, the search is between the surrounding reference points. For more information, see How a Parser Searches for Anchors. You can choose one of the following options:
  • Empty. The
    Content
    anchor retrieves the entire search scope.
  • AttributeSearch. The
    Content
    anchor retrieves the value from an expression of the type
    AttributeName=...
    . Use this option to retrieve attribute values from an XML or HTML source document.
  • LearnByExample. The Parser learns what text to retrieve according to the Parser format and the example source. For example, if the Parser has a tab-delimited format, it counts the number of tabs from the start of the search scope to the example text. It retrieves the text between the corresponding tabs in the source document.
  • PatternSearch. The
    Content
    anchor retrieves the first text that matches a specified regular expression.
  • TypeSearch. The
    Content
    anchor retrieves the first text that matches a specified data type.
Default is empty. For more information about these options, see the Searcher Component Reference. In addition to the searcher components, the Parser uses the data type of the
data_holder
as a search criterion. For more information, see Using Data Types to Narrow the Search Criteria.
The
opening_marker
and
closing_marker
properties are equivalent to
Marker
anchors in a
Group
component.
  • A
    Content
    anchor with the
    opening_marker
    set is like a
    Group
    component with the following sequence of anchors:
    1. Marker
    2. Content
  • A
    Content
    anchor with the
    closing_marker
    set is like a
    Group
    component with the following sequence of anchors:
    1. Content
    2. Marker
  • A
    Content
    anchor with the
    opening_marker
    and
    closing_marker
    set is like a
    Group
    component with the following sequence of anchors:
    1. Marker
    2. Content
    3. Marker
For more information, see the Searcher Component Reference.


Updated September 26, 2018