Table of Contents

Search

  1. Preface
  2. Introduction to Data Transformation
  3. Data Processor Transformation
  4. Wizard Input and Output Formats
  5. Relational Input and Output
  6. Using the IntelliScript Editor
  7. XMap
  8. Libraries
  9. Schema Object
  10. Command Line Interface
  11. Scripts
  12. Parsers
  13. Script Ports
  14. Document Processors
  15. Formats
  16. Data Holders
  17. Anchors
  18. Transformers
  19. Actions
  20. Serializers
  21. Mappers
  22. Locators, Keys, and Indexing
  23. Streamers
  24. Validators, Notifications, and Failure Handling
  25. Validation Rules
  26. Custom Script Components

User Guide

User Guide

TextFormat

TextFormat

The
TextFormat
format defines the format of text files.
Use this format in combination with a document processor to process other types of documents. For example, you can use it with the
PdfToTxt_4
document processor to process PDF documents.
The following table describes the properties of the
TextFormat
format:
Property
Description
default_transformers
Defines a list of Transformers that the Parser applies to the output of each content anchor.
Default is the following list of Transformers:
  • HtmlProcessor. Converts all combinations of tab, space, or newline to a single space character.
  • RemoveMarginSpace. Removes leading and trailing space.
delimiters
Defines the structure of information in the document. You can choose one of the following options:
  • CommaDelimited. Data fields are separated by commas.
  • DelimiterHierarchy. Data fields are separated or surrounded by text characters.
  • HL7. Data fields are separated as defined in the HL7 standard.
  • Positional. Data fields are defined by the number of characters between them.
  • PostScript. Data fields are defined according to the PostScript format.
  • RTF. Data fields are defined according to the RTF format.
  • SGML. Data fields are defined according to the SGML format.
  • SpaceDelimited. Data fields are separated by spaces.
  • TabDelimited. Data fields are separated by tabs.
For more information, see Delimiters Component Reference.
Default is DelimiterHierarchy.
name
A descriptive label for the component. This label appears in the log file and the
Events
view. Use the
name
property to identify the component that caused the event.
pre_processor
Defines a format preprocessor that processes the input after any document processor that you defined for the
pre_processor
property of the
example_source
. You can choose one of the following options:
  • HtmlProcessor. Converts all combinations of tab, space, or newline to a single space character. It is not restricted to HTML documents.
  • RtfProcessor. Normalizes RTF files.
Default is empty.
remark
A user-defined comment that describes the purpose or action of the component.

0 COMMENTS

We’d like to hear from you!