Table of Contents

Search

  1. Preface
  2. Introduction to Data Transformation
  3. Data Processor Transformation
  4. Wizard Input and Output Formats
  5. Relational Input and Output
  6. Using the IntelliScript Editor
  7. XMap
  8. Libraries
  9. Schema Object
  10. Command Line Interface
  11. Scripts
  12. Parsers
  13. Script Ports
  14. Document Processors
  15. Formats
  16. Data Holders
  17. Anchors
  18. Transformers
  19. Actions
  20. Serializers
  21. Mappers
  22. Locators, Keys, and Indexing
  23. Streamers
  24. Validators, Notifications, and Failure Handling
  25. Validation Rules
  26. Custom Script Components

PdfToTxt_3_02

PdfToTxt_3_02

The
PdfToTxt_3_02
document processor converts PDF files to text.
The following table describes the properties of the
PdfToTxt_3_02
document processor:
Property
Description
enabled
Defines the value of
param2
or
param4
.
param1
Defines a string or variable that contains the word spacing factor. The
param1
property is named
WordSpacingFactor
and has only one property,
value
, which contains the string or variable. Default is 1.8.
param2
Determines whether the output document is optimized for tables. The
param2
property is named
OptimizeForTables
and has only one property,
enabled
, which has the following options:
  • Selected. The output document is optimized for tables.
  • Cleared. The output document is not optimized for tables.
Default is cleared.
param3
Defines a string or variable that contains the password. The
param3
property is named
Password
and has only one property,
value
, which contains the string or variable.
param4
The
param4
property is named
HideNewPageChar
and has only one property,
enabled
, which has the following options:
  • Selected. New page characters are hidden.
  • Cleared. New page characters are not hidden.
Default is cleared.
param5
Defines a string or variable that contains advanced optimizations. The
param5
property is named
AdvancedOptimizations
and has only one property,
value
, which contains the string or variable.
value
Defines the value of
param1
,
param3
, or
param5
.
The PdfToTxt pre-processor might not support certain PDFs with embedded fonts. If the pre-processor fails, copy the text from the input PDF into Notepad to check for embedded fonts. If you cannot paste the text or if is corrupted, the PDF probably contains embedded fonts.
This component is deprecated. The IntelliScript editor displays it for legacy projects. Do not use it in new Scripts.


Updated February 12, 2020