Preface
Data Replication Overview
- Product Overview
- Data Replication Usage Scenarios
- Data Replication Sources and Targets
- Data Replication Architecture and Components
- Stages of Replication Processing
- Alternative Deployment Topologies
- Replication Configurations
Understanding Data Replication
- Overview
- Replicating Source Tables That Do Not Have a Primary Key Definition
- Start Point for the Extractor Task
- Apply Processing
- Checkpointing
- Recovery
  - Recovery Tables
- Tcl and SQL Scripts for Advanced InitialSync and Applier Processing of Data
- Database Character Set Conversion
- Replication of Database-Generated Values
- Generated Virtual Indexes
- Datatype Conversion Rules
- DDL Replication
Sources - Preparation and Replication Considerations
- DB2 for Linux, UNIX, and Windows Sources
- Microsoft SQL Server Sources
- MySQL Sources
- Oracle Sources
Targets - Preparation and Replication Considerations
- Amazon Redshift Targets
- Apache Kafka Targets
  - Preparing Apache Kafka Target Systems
  - Replication Considerations for Apache Kafka Targets
- Cloudera and Hortonworks Targets
  - Preparing Cloudera and Hortonworks Target Systems
- DB2 for Linux, UNIX, and Windows Targets
- Greenplum Targets
- MemSQL Targets
- Microsoft SQL Server Targets
- MySQL Targets
- Netezza Targets
- Oracle Targets
- PostgreSQL Targets
- Teradata Targets
- Vertica Targets
Starting the Server Manager
- Overview
- Installing the Server Manager as a Service on Windows
- Starting the Server Manager as a Windows Service
- Starting the Server Manager as a Daemon on Linux or UNIX
- Manually Starting the Server Manager
- Stopping a Server Manager Service or Daemon
- Uninstalling the Server Manager Service on Windows
Getting Started with the Data Replication Console
- Data Replication Console Interface
- Starting the Data Replication Console
Defining and Managing Server Manager Main Servers and Subservers
- Server Manager Main Server and Subservers
- Defining the Main Server and Its Subservers
- Editing Connection Information for a Main Server or Subserver
- Editing Microsoft SQL Server Instance Settings
- Editing Properties for the Main Server or a Subserver
- Viewing Information About the Server Manager System
- Configuring the Server Manager for HTTPS Communication
- Configuring the Server Manager Main Server to Run with NAT
- Associating a Subserver with Another Main Server
- Deleting Subservers
Creating and Managing User Accounts
- User Account Overview
- Users and Privileges
- Server Manager Security Policies
- Creating a User Account
- Changing the Password for Your User Account - Replication User
- Changing the Password for a User Account - idradmin User
- Unlocking a User Account
- Resetting the Password for the idradmin Account
Creating and Managing Connections
- Connections Overview
- Creating a Source or Target Connection from the Server Manager Tab
- Editing a Source or Target Connection
- Assigning a Different Source or Target Connection to a Configuration
Creating Replication Configurations
- Replication Configuration
- Task Flow: Creating a Replication Configuration
- Defining the Source Database
  - Configuring a Connection to an Oracle ASM Instance
  - Configuring Connections to Oracle RAC Sources for High Availability
    - Connecting to an Oracle RAC by Using Custom Connection Strings
    - Connecting to an Oracle RAC by Using a Virtual IP Address or Host Name
- Defining the Target Database
- Generating Target Tables and Audit Log Tables
- Generating Avro Schemas for Apache Kafka Consumers
- Handling Source Tables with Long Table or Column Names
  - Strategies for Handling Long Table Names
    - Editing the Audit Log Table Suffix
    - Manually Editing Target Table Names in an SQL Script
  - Strategies for Handling Long Column Names
- Mapping Source and Target Tables
- Defining Source Table Indexes
- Enabling Replication of DDL Changes at the Schema and Table Levels
- Customizing Apply Settings for Target Tables
- Configuring the Start Points for Extractor and Applier Tasks
- Configuring Conflict Resolution
  - Example of Configuring a MAXIMUM Resolution Strategy for Update Conflicts
  - Example of Configuring Custom Conflict Resolution
- Customizing Column Mappings
  - Filtering Column Data
- Adding Tcl and SQL Expressions
- Specifying the Database Logs from Which to Extract Data
- Configuring Runtime Settings
- Configuring Message Logging
- Saving Replication Configurations to the Main Server Manager
Materializing Targets with InitialSync
- InitialSync Overview
- Source Connectivity for Data Unload Operations
- Target Connectivity for Data Load Operations
- Sync Point Value
- InitialSync Handling of Target Table Constraints
- Considerations for Running InitialSync
- Task Flow: Using InitialSync to Materialize a Target
Scheduling and Running Replication Tasks
- Methods of Running Replication Tasks
- Types of Replication Tasks
- Schedule and Task Statuses
- Conflicting Replication Tasks
- Running Replication Executables Manually from the Data Replication Console
- Scheduling Replication Tasks
Implementing Advanced Replication Topologies
- Advanced Replication Topologies
- Configuring Continuous Replication
- Configuring Data Replication from One Source to Multiple Targets
- Configuring Bidirectional Replication
- Configuring Cascade Replication
- Loopback Avoidance for Replicated Data
Monitoring Data Replication
- Types of Monitoring Information
- Replication Statistics
- Intermediate Files
- Task Execution Logs
- Server Manager Logs
  - Viewing Server Manager Logs
- User Notifications
- Skipped Transaction Records
- Managing Open Transactions
Managing Replication Configurations
- Configuration Management Tasks
- Switching to Read or Edit Mode for a Replication Configuration
- Editing a Replication Configuration
- Changing the Server Manager Associated with a Replication Configuration
- Clearing User Replication Settings
- Managing Database Supplemental Logging
- Deploying Replication Configurations
- Generating a Reverse-Replication Configuration
- Exporting a Configuration File
- Importing a Configuration File
- Cleaning Replication Processing Information for a Configuration
  - Performing a Clean Operation on a Configuration
- Viewing Earlier Revisions of a Replication Configuration
- Viewing a List of Processed Database Logs
Handling Replication Environment Changes and Failures
- Manually Changing the Source Table Structure After Running Data Replication
- Updating an Avro Schema for Kafka Targets After Running Data Replication
- Adding Table Mappings Manually After Running Data Replication
- Resuming Replication After Upgrading a DB2 for Linux, UNIX, and Windows Source Database
- Resuming Replication After Upgrading a Microsoft SQL Server Source Database
- Handling Applier Failures
Troubleshooting
- Collecting Diagnostic Data for Troubleshooting
- Common Replication Problems
Data Replication Files and Subdirectories
- Files and Subdirectories
- Data Replication Script Files
- Executables Called from the Data Replication Console or Scripts
- Default.cfg File
- Other Key Files
- Subdirectories
Data Replication Runtime Parameters
Command Line Parameters for Data Replication Components
- About Command Line Parameters
- Command Line Parameters for InitialSync
- Command Line Parameters for the Extractor
- Command Line Parameters for the Applier
- Command Line Parameters for the Server Manager
Updating Configurations in the Replication Configuration CLI
- Replication Configuration CLI Overview
- Updating Source and Target Metadata for a Replication Configuration in the CLI
  - Updating Source and Target Metadata for a Configuration in Interactive Mode
  - Updating Source and Target Metadata for a Configuration in Non-interactive Mode
- Replication Configuration CLI Commands
DDL Statements for Manually Creating Recovery Tables
Sample Scripts for Enabling or Disabling SQL Server Change Data Capture
- Microsoft SQL Server Enterprise Edition
Glossary
- Applier
- Applier task
- apply cycle
- Audit Apply
- audit log table
- bidirectional replication
- binary log
- calculated columns
- cascade replication
- change data capture
- checkpoint processing
- Command Line Interface
- configuration file
- continuous replication
- Copy File task
- data files
- data warehouse appliance
- Edit mode
- External task
- Extractor
- Extractor task
- Flat File target
- global transaction
- heterogeneous replication
- Informatica Data Replication Console
- initial materialization
- InitialSync
- InitialSync task
- intermediate files
- log coordinates
- loopback avoidance
- Merge Apply
- Microsoft SQL Server Backup task
- primary target
- CDC Publisher
- Read mode
- recovery table
- Replication Configuration Command Line Interface
- replication configuration file
- replication statistics
- replication tasks
- routing
- secondary target
- Send File task
- Server Manager
- Server Manager Command Line Interface
- Server Manager Main server
- Server Manager subserver
- SQL Apply
- SQL Script Engine
- staging table
- Start Point
- subtask threads
- supplemental logging
- Sync Point
- Tcl expression
- Tcl Script Engine
- transaction files
- transactional replication
- virtual column
- virtual index

User Guide

9.8.0 HotFix 2

Back Next

Applier Processing of Intermediate Files

The Applier run might include multiple apply cycles. During an apply cycle, the Applier processes intermediate files and commits the changes to the target at the end of the cycle.

During each apply cycle, the Applier processes one or more intermediate files depending on the apply.process_intermediate_size_per_job parameter value. This parameter determines the number of intermediate files that the Applier processes during an apply cycle. If this parameter is set to 0, the Applier processes all available intermediate files. If this parameter is set to a value greater than 0, the parameter specifies the maximum total size of all intermediate files, in megabytes, that the Applier processes during a single apply cycle. Data Replication always processes entire intermediate files. Data Replication never splits an intermediate file to avoid exceeding the maximum total size that is specified in this parameter. You control the maximum size of a single intermediate file by setting the

Maximum size of each intermediate file

option on the

Runtime Settings

tab >

General

view.

An intermediate file is composed of a data file (.dat) and a transaction file (.trn). The .trn files contain transaction metadata. The .dat files contain the transaction data changes and can be very large.

When the Applier processes the intermediate files during an apply cycle, it looks for a commit in the .trn files. After the Applier encounters a commit in a .trn file, the Applier starts reading the corresponding .dat files. If the Applier does not encounter a commit in the .trn files during the current apply cycle, the Applier queues the corresponding .dat files. Then, whenever the Applier encounters a commit during a subsequent apply cycle, it processes all of the queued .dat files.

When the Applier processes a .dat file, it applies all committed transactions to the target. The Applier accumulates changes that belong to open transactions in memory buffers. Change data for each long-running transaction is stored in a separate buffer. After the Applier encounters a commit for a long-running transaction during a subsequent apply cycle, the Applier applies the data from the corresponding buffer to the target database and then clears the buffer.

For target data warehouse appliances that restrict the number of load connections, the Applier concurrently loads data to a batch of target tables. The number of tables in a batch cannot exceed the number of available Applier threads. For the remaining tables, the Applier accumulates change data from each source table in a separate memory buffer. After the Applier loads the data for the batch of tables to the target, it reads the data for the next batch of tables from the corresponding buffers. After the Applier loads data to all of the target tables, it commits the changes and finalizes the apply cycle.

The apply.buffer_size_for_split_records runtime parameter specifies the maximum size of the buffers that accumulate data for long-running transactions and target tables. If the amount of data in the buffer exceeds the specified limit, the Applier flushes the data from the buffer to a temporary spill file in the

DataReplication_installation/output/configuration_name/tmp

directory. The Applier then writes subsequent changes to the spill file instead of to the buffer. The spill file names have the following formats:

For long-running transactions: configuration_name_transaction_xid.spill

For target tables: configuration_name_table_id.spill

Because the spill files can be large, the Applier does not flush existing spill files to disk when taking checkpoints that are used to resume processing after an outage. Consequently, when the Applier restarts, it deletes existing spill files that might be incomplete and re-reads the intermediate files to process all of the records in a long-running transaction.

Rename Saved Search

Table of Contents

User Guide

User Guide

Applier Processing of Intermediate Files

Applier Processing of Intermediate Files