Table of Contents

Search

  1. Preface
  2. Introduction to Data Transformation
  3. Data Processor Transformation
  4. Wizard Input and Output Formats
  5. Relational Input and Output
  6. XMap
  7. Libraries
  8. Schema Object
  9. Command Line Interface
  10. Scripts
  11. Parsers
  12. Script Ports
  13. Document Processors
  14. Formats
  15. Data Holders
  16. Anchors
  17. Transformers
  18. Actions
  19. Serializers
  20. Mappers
  21. Locators, Keys, and Indexing
  22. Streamers
  23. Validators, Notifications, and Failure Handling
  24. Validation Rules
  25. Custom Script Components

Data Transformation User Guide

Data Transformation User Guide

PdfToTxt_4 Table Configuration Editor

PdfToTxt_4 Table Configuration Editor

The table configuration editor customizes the way the
PdfToTxt_4
document processor converts tables in PDF documents.
Use the table configuration editor when default settings of the
PdfToTxt_4
document processor do not correctly render column alignment, word wrapping, line spacing, or overflow from one cell to another.
The user interface for the table configuration editor appears only in English.
  1. Add a Parser, Mapper, Serializer, or
    AdditionalInputPort
    to the Script.
  2. Under the
    example_source
    property, set the
    pre_processor
    property to PdfToTxt_4.
  3. Under the
    pre_processor
    property, double-click the
    value
    property.
    The table configuration editor appears. The upper panel displays the input PDF document, and the lower panel displays the
    PdfToTxt_4
    output.
    Table editing commands appear in the toolbar at the top of the window. You can right-click to display an editing menu.
  4. Browse to a table in the PDF document and click
    Add Table
    .
    The name of the table appears in the
    Tables
    field and in the
    Name
    field.
  5. Select
    Use Regular Expressions
    . In the
    Table Start
    field, enter a regular expression that defines the upper left corner of the table.
    Use the headings of the first two columns as the regular expression. Add more column headings as needed to make
    Table Start
    unique. Separate the headings by a single space character, even if the columns are widely separated.
  6. In the
    Table End
    field, enter a regular expression that defines the text immediately after the table.
    The value of
    Table End
    must appear in the body of the document, not in a page footer.
  7. Click
    Process
    .
    The editor displays the table configuration that
    PdfToTxt_4
    detects. The top and bottom of the table appear as horizontal blue lines. The default column borders appear as vertical red lines.
  8. To edit the column borders, perform one or more of the following steps:
    • Drag a column border to the right or left to change its position.
    • Click
      Add Column
      to add a column.
    • Click
      Remove Column
      and select a column border to delete a column.
    If the table contains horizontally merged cells,
    PdfToTxt_4
    might truncate the entries.
  9. Examine the output window to confirm that the table is converted properly. If not, correct the table definitions.
  10. Repeat steps 1-9 for each table in the PDF document.
  11. Click
    OK
    to return to the Developer tool.
    An XML string that defines the table configuration appears in the
    value
    property of the
    PdfToTxt_4
    document processor.


Updated September 26, 2018