To accurately replicate character data, verify character set settings for the source and target databases.
Data Replication can use the International Components for Unicode (ICU) library to convert character data from the source database encoding to the target database encoding. Data Replication supports character set conversion for configurations that have a DB2 for Linux, UNIX, and Windows, Microsoft SQL Server, MySQL, or Oracle source and an Amazon Redshift, Greenplum, Netezza
, Vertica
, or Teradata target. When you create a replication configuration, the Data Replication Console queries the source and target databases to determine the source and target character sets and then writes these character set names to the replication configuration.
If the NLS_LANG environment variable is defined on the Oracle source, InitialSync uses this variable to determine the source database character set. If the NLS_LANG environment variable is not defined, InitialSync uses the character set from the replication configuration. The Applier always uses the source and target character sets from the replication configuration.
Virtual columns do not have the character set property. Instead, a virtual column uses the character set of the mapped source table to which you add the column. If the character set is not defined for the source table, the virtual column uses the character set that is defined for the source schema or database.
Data Replication does not support character set conversion for configurations that include Tcl scripts.
Data Replication does not support character set conversion for constants that are used in SQL expressions.
Data Replication does not support character set conversion for non-Latin characters in source database object names.
If the source character data includes only Latin characters but the source and target databases use incompatible character sets, Informatica recommends that you disable character set conversion to avoid performance degradation. To disable character set conversion, set the
global.icu_enabled
runtime parameter to 0. For example, disable character set conversion if the source character set is UTF-8 and the target character set is Latin 9.
If the source character data includes non-Latin characters and the source and target databases use incompatible character sets, Data Replication ends with an error regardless of the
global.icu_enabled
setting.
For configurations that have sources other than DB2 for Linux, UNIX, and Windows, Microsoft SQL Server, MySQL, or Oracle and that have targets other than Amazon Redshift, Greenplum, Netezza
, Vertica
, or Teradata, the Extractor can convert source character data only from UTF-16 to UTF-8 encoding. For other source character encodings, the Extractor writes extracted change data to intermediate files in the original source character set. The Applier does not convert the character set of the change data when applying the data to the target. In this case, Data Replication requires the source and target databases to use the same character set.
For configurations with Oracle sources and Oracle targets, you can configure the Oracle target databases to convert the change data that the Applier and InitialSync load to the target character set. Define the
NLS_LANG
parameter on the systems where the Applier and InitialSync run. To accurately replicate data, set this parameter value to match the source database character set. In the Data Replication Console, create an environment variables list and add the
NLS_LANG
variable to it. Then assign this environment variables list to the Server Manager that runs the Applier or InitialSync. Oracle performs the conversion if the