Table of Contents

Search

  1. Preface
  2. Introduction to Informatica Big Data Management
  3. Mappings
  4. Sources
  5. Targets
  6. Transformations
  7. Data Preview
  8. Cluster Workflows
  9. Profiles
  10. Monitoring
  11. Hierarchical Data Processing
  12. Hierarchical Data Processing Configuration
  13. Hierarchical Data Processing with Schema Changes
  14. Intelligent Structure Models
  15. Stateful Computing
  16. Appendix A: Connections
  17. Appendix B: Data Type Reference
  18. Appendix C: Function Reference

Incremental Data Extraction for Sqoop Mappings

Incremental Data Extraction for Sqoop Mappings

You can configure a Sqoop mapping to perform incremental data extraction based on an ID or timestamp based source column. With incremental data extraction, Sqoop extracts only the data that changed since the last data extraction. Incremental data extraction increases the mapping performance.
To perform incremental data extraction, configure arguments in the
Additional Sqoop Import Arguments
text box of the Read transformation in the Sqoop mapping.
You must configure all of the following arguments for incremental data extraction:
--infa-incremental-type
Indicates whether you want to perform the incremental data extraction based on an ID or timestamp column.
If the source table contains an ID column that acts as a primary key or unique key column, you can configure incremental data extraction based on the ID column. Set the value of the --infa-incremental-type argument as
ID
. Sqoop extracts rows whose IDs are greater than the last extracted ID.
If the source table contains a timestamp column that contains the last updated time for all rows, you can configure incremental data extraction based on the timestamp column. Set the value of the --infa-incremental-type argument as
timestamp
. Sqoop extracts rows whose timestamps are greater than the last read timestamp value or the maximum timestamp value.
Use the following syntax:
--infa-incremental-type <ID or timestamp>
--infa-incremental-key
Indicates the column name based on which Sqoop must perform the incremental data extraction.
Use the following syntax:
--infa-incremental-key <column_name>
--infa-incremental-value
Indicates the column value that Sqoop must use as the baseline value to perform the incremental data extraction. Sqoop extracts all rows that have a value greater than the value defined in the --infa-incremental-value argument.
Enclose the column value within double quotes.
Use the following syntax:
--infa-incremental-value "<column_value>"
--infa-incremental-value-format
Applicable if you configure incremental data extraction based on a timestamp column. Indicates the timestamp format of the column value defined in the --infa-incremental-value argument.
Enclose the format value within double quotes.
Use the following syntax:
--infa-incremental-value-format "<format>"
Default is "MM/dd/yyyy HH:mm:ss.SSSSSSSSS".
If you want to perform incremental data extraction but you do not configure all the required arguments, the Sqoop mapping fails. The argument values are not case sensitive.

Example

The Sales department in your organization stores information related to new customers in Salesforce. You want to create a Sqoop mapping to read customer records that were created after a certain date and write the data to SAP.
In the Sqoop mapping, you can define the Sqoop import arguments as follows:
--infa-incremental-type timestamp --infa-incremental-key CreatedDate --infa-incremental-value "10/20/2018 00:00:00.000000000" --infa-incremental-value-format "MM/dd/yyyy HH:mm:ss.SSSSSSSSS"
The Data Integration Service reads all customer records whose
CreatedDate
column contains timestamp values greater than
10/20/2018 00:00:00.000000000
. The Sqoop mapping uses the
10/20/2018 00:00:00.000000000
format while reading data.

0 COMMENTS

We’d like to hear from you!