Table of Contents

Search

  1. Preface
  2. Introduction to Data Transformation
  3. Data Processor Transformation
  4. Wizard Input and Output Formats
  5. Relational Input and Output
  6. Using the IntelliScript Editor
  7. XMap
  8. Libraries
  9. Schema Object
  10. Command Line Interface
  11. Scripts
  12. Parsers
  13. Script Ports
  14. Document Processors
  15. Formats
  16. Data Holders
  17. Anchors
  18. Transformers
  19. Actions
  20. Serializers
  21. Mappers
  22. Locators, Keys, and Indexing
  23. Streamers
  24. Validators, Notifications, and Failure Handling
  25. Validation Rules
  26. Custom Script Components

User Guide

User Guide

PdfToTxt_4 Table Configuration Editor

PdfToTxt_4 Table Configuration Editor

The table configuration editor customizes the way the
PdfToTxt_4
document processor converts tables in PDF documents.
Use the table configuration editor when default settings of the
PdfToTxt_4
document processor do not correctly render column alignment, word wrapping, line spacing, or overflow from one cell to another.
The user interface for the table configuration editor appears only in English.
  1. Add a Parser, Mapper, Serializer, or
    AdditionalInputPort
    to the Script.
  2. Under the
    example_source
    property, set the
    pre_processor
    property to PdfToTxt_4.
  3. Under the
    pre_processor
    property, double-click the
    value
    property.
    The table configuration editor appears. The upper panel displays the input PDF document, and the lower panel displays the
    PdfToTxt_4
    output.
    Table editing commands appear in the toolbar at the top of the window. You can right-click to display an editing menu.
  4. Browse to a table in the PDF document and click
    Add Table
    .
    The name of the table appears in the
    Tables
    field and in the
    Name
    field.
  5. Select
    Use Regular Expressions
    . In the
    Table Start
    field, enter a regular expression that defines the upper left corner of the table.
    Use the headings of the first two columns as the regular expression. Add more column headings as needed to make
    Table Start
    unique. Separate the headings by a single space character, even if the columns are widely separated.
  6. In the
    Table End
    field, enter a regular expression that defines the text immediately after the table.
    The value of
    Table End
    must appear in the body of the document, not in a page footer.
  7. Click
    Process
    .
    The editor displays the table configuration that
    PdfToTxt_4
    detects. The top and bottom of the table appear as horizontal blue lines. The default column borders appear as vertical red lines.
  8. To edit the column borders, perform one or more of the following steps:
    • Drag a column border to the right or left to change its position.
    • Click
      Add Column
      to add a column.
    • Click
      Remove Column
      and select a column border to delete a column.
    If the table contains horizontally merged cells,
    PdfToTxt_4
    might truncate the entries.
  9. Examine the output window to confirm that the table is converted properly. If not, correct the table definitions.
  10. Repeat steps 1-9 for each table in the PDF document.
  11. Click
    OK
    to return to the Developer tool.
    An XML string that defines the table configuration appears in the
    value
    property of the
    PdfToTxt_4
    document processor.

0 COMMENTS

We’d like to hear from you!