Preface
Introduction to Data Transformation
- Data Transformation Overview
- Data Transformation Process Architecture
- Data Transformation Components
Data Processor Transformation
- Data Processor Transformation Overview
- Data Processor Transformation Views
- Data Processor Transformation Ports
- Startup Component
- References
- Data Processor Transformation Settings
- Events
  - Event Types
  - Data Processor Events View
- Logs
- Data Processor Transformation Development
- Data Processor Transformation Import and Export
- Data Processor Transformation in a Non-native Environment
Wizard Input and Output Formats
- Wizard Input and Output Formats Overview
- COBOL Processing Library
- JSON
- XML
  - Creating a Transformation that Transforms XML
Relational Input and Output
- Relational Input and Output Overview
- Relational Input
- Relational Output
Using the IntelliScript Editor
- IntelliScript Editor Overview
  - Creating a Script
- Opening an IntelliScript Editor
- Editing Procedures
- IntelliScript Editor Menus
XMap
- XMap Overview
- XMap Schemas
- Mapping Statements
- XPath Expressions
- XMap Variables
  - Creating a Variable in the XMap Editor
- XMap Example
Libraries
- Libraries Overview
- Library Structure
- Element Properties
- Library Management
- Edit Libraries with the Library Editor
- Edit Libraries with the IntelliScript Editor
Schema Object
- Schema Object Overview
- Schema Object Overview View
  - Schema Files
- Schema Object Schema View
- Schema Object Advanced View
- Creating a Schema Object
- Schema Updates
  - Schema Synchronization
  - Schema File Edits
    - Setting a Default Schema File Editor
    - Editing a Schema File
Command Line Interface
- Command Line Interface Overview
- CM_console
Scripts
- Scripts Overview
- Script Components
- Script Component Properties
- Script Startup Components
  - Setting the Startup Component with the IntelliScript Editor
- Example Sources
- IntelliScript Editor
- Validate a Script
- Sample Scripts
  - Importing a Sample Script
Parsers
- Parsers Overview
- Platform-Independent Parsers
  - Newline Markers
  - File Paths
- Parser Component Reference
  - Parser
Script Ports
- Script Ports Overview
- Script Port Component Reference
Document Processors
- Document Processors Overview
- Defining a Document Processor
  - Display of Document Processor Output
- Document Processor Component Reference
- TextML XML Schema
- PdfToTxt_4 Table Configuration Editor
  - Editor Options
  - PDF Conversion Example
    - Configuring the First Table
    - Configuring the Second Table
Formats
- Formats Overview
- Standard Format Properties
- Format Component Reference
- Delimiters Component Reference
- Format Preprocessor Component Reference
  - HtmlProcessor
  - RtfProcessor
Data Holders
- Data Holders Overview
- XML Schemas
- Using a Schema to Map Anchors
- Generating Valid XML
  - Role of Schemas in Parsing
  - Role of Schemas in Serialization and Mapping
- Variables
- Variable Component Reference
  - Variable
- Multiple-Occurrence Data Holders
Anchors
- Anchors Overview
- Mapping Content Anchors to Data Holders
- Defining Anchors
- Standard Anchor Properties
- How a Parser Searches for Anchors
- Anchor Component Reference
- Searcher Component Reference
- Anchor Subcomponent Reference
Transformers
- Transformers Overview
- Defining Transformers
- Standard Transformer Properties
- Transformer Component Reference
Actions
- Actions Overview
- Standard Action Properties
- Action Component Reference
- Action Subcomponent Reference
Serializers
- Serializers Overview
- Serialization Anchors
  - Example of Serialization Anchors
  - Sequence of Serialization Anchors
- Standard Serializer Properties
- Serializer Component Reference
  - Serializer
- Serialization Anchor Component Reference
Mappers
- Creating a Mapper
- Components Nested within a Mapper
- Mapper Example
- Standard Mapper Properties
- Mapper Component Reference
  - Mapper
- Mapper Anchor Component Reference
Locators, Keys, and Indexing
- Overview of Locators, Keys, and Indexing
- Example of Locators
- Example of Indexing by Key
- Source and Target Properties
  - Source Property
  - Target Property
- Standard Locator and Key Properties
- Locator and Key Component Reference
Streamers
- Streamers Overview
- Text Streamers
- XML Streamers
- Standard Streamer Properties
- Streamer Component Reference
- Streamer Subcomponent Reference
Validators, Notifications, and Failure Handling
- Overview of Validators, Notifiers, and Failure Handling
- Failure Handling
  - Using the Optional Property to Handle Failures
  - Writing a Failure Message to the User Log
    - Configuring User Log Output
    - Viewing the User Log
- Validators
- Standard Validator Properties
- Validator Component Reference
- Notifications
- Notification Component Reference
Validation Rules
- Validation Rules Overview
- Validation Rules Element Reference
- Edit the Validation Rules in an External Editor
- Create a Validation Rules Object
- Import a Data Transformation Service with Validation Rules
Custom Script Components
- Custom Script Components Overview
- Custom Component Example
- Custom Component Properties
- Developing a Custom Component in Java
  - Java Interface Example
  - Sample Custom Java Components
- Developing a Custom Component in C or C++
- Configuring a Custom Component
  - Sample Scripts Containing Custom Components

User Guide

10.4.0
- 10.5.6
- 10.5.2
- 10.5

Back Next

PdfToTxt_4

The

PdfToTxt_4

document processor converts PDF files to text or XML.

The following table describes the properties of the

PdfToTxt_4

document processor:

Property	Description
param1	Defines the PDF table layout. The param1 property has only one option: PdfLayout
value	Defines the PDF table layout. Double-click the value property to open the table configuration editor.

The table configuration editor customizes the way tables are read. Use it to correct problems with column alignment, word wrapping, line spacing, and overflow from one cell to another. For more information, see PdfToTxt_4 Table Configuration Editor.

The

PdfToTxt_4

document processor generates text output by default. Use the table configuration editor to select XML output. The XML conforms to the

PDF4.xsd

schema, which you can find in the following directory:

<INSTALL_DIR>\DataTransformation\doc

When you use the

PdfToTxt_4

document processor, set the input encoding to UTF-8 to enable the Parser, Mapper, or Serializer to correctly read the document.

The PdfToTxt pre-processor might not support certain PDFs with embedded fonts. If the pre-processor fails, copy the text from the input PDF into Notepad to check for embedded fonts. If you cannot paste the text or if is corrupted, the PDF probably contains embedded fonts.

Rename Saved Search

Table of Contents

User Guide

User Guide

PdfToTxt_4

PdfToTxt_4