Table of Contents

Search

  1. Preface
  2. Introduction to Data Transformation
  3. Data Processor Transformation
  4. Wizard Input and Output Formats
  5. Relational Input and Output
  6. XMap
  7. Libraries
  8. Schema Object
  9. Command Line Interface
  10. Scripts
  11. Parsers
  12. Script Ports
  13. Document Processors
  14. Formats
  15. Data Holders
  16. Anchors
  17. Transformers
  18. Actions
  19. Serializers
  20. Mappers
  21. Locators, Keys, and Indexing
  22. Streamers
  23. Validators, Notifications, and Failure Handling
  24. Validation Rules
  25. Custom Script Components

Data Transformation User Guide

Data Transformation User Guide

Delimiters Component Reference

Delimiters Component Reference

A delimiters component defines a hierarchy of characters or strings that organize the information in a document, such as newlines, spaces, tabs, commas, or vertical bars. You can also use a wildcard pattern to define the delimiters.
The delimiter concept is applicable both to rigidly structured documents that use predefined delimiter characters to separate the data fields, and to loosely structured text or HTML documents that are delimited by newlines and syntactic markup. The delimiter concept also encompasses positionally-structured data, where the fields are located at fixed offsets from one another.
The Parser uses the delimiters to determine the search criteria of
Content
anchors configured with the
LearnByExample
option.
For example, suppose you configure a format with the
TabDelimited
delimiters component. This defines a hierarchy using the following characters as delimiters:
Newline Tab
You might define a
Content
anchor that is located two tab characters after the preceding
Marker
anchor in the example source, like this:
MARKER<tab>abc<tab>CONTENT
When a Parser processes a source document, it searches for the
Content
two tabs after the
Marker
.
In a second example, you might define a
Content
anchor that is located three newlines and one tab after a
Marker
anchor, in the example source.
MARKER abc<tab>de fghi<tab>jkl<tab>mnop pqrst<tab>CONTENT
Within the intermediate lines, the tabs are not counted because the newlines are higher in the hierarchy.
Many of the delimiters components, such as
TabDelimited
or
CommaDelimited
, display a predefined hierarchy of delimiters, which you can edit as required.
The
DelimiterHierarchy
component does not have a predefined hierarchy. You can insert whatever delimiters you need.