You can configure a Sqoop mapping to perform incremental data extraction based on an ID or timestamp based source column. With incremental data extraction, Sqoop extracts only the data that changed since the last data extraction. Incremental data extraction increases the mapping performance.
To perform incremental data extraction, configure arguments in the
Additional Sqoop Import Arguments
text box of the Read transformation in the Sqoop mapping.
You must configure all of the following arguments for incremental data extraction:
--infa-incremental-type
Indicates whether you want to perform the incremental data extraction based on an ID or timestamp column.
If the source table contains an ID column that acts as a primary key or unique key column, you can configure incremental data extraction based on the ID column. Set the value of the --infa-incremental-type argument as
ID
. Sqoop extracts rows whose IDs are greater than the last extracted ID.
If the source table contains a timestamp column that contains the last updated time for all rows, you can configure incremental data extraction based on the timestamp column. Set the value of the --infa-incremental-type argument as
timestamp
. Sqoop extracts rows whose timestamps are greater than the last read timestamp value or the maximum timestamp value.
Use the following syntax:
--infa-incremental-type <ID or timestamp>
--infa-incremental-key
Indicates the column name based on which Sqoop must perform the incremental data extraction.
Use the following syntax:
--infa-incremental-key <column_name>
--infa-incremental-value
Indicates the column value that Sqoop must use as the baseline value to perform the incremental data extraction. Sqoop extracts all rows that have a value greater than the value defined in the --infa-incremental-value argument.
Enclose the column value within double quotes.
Use the following syntax:
--infa-incremental-value "<column_value>"
--infa-incremental-value-format
Applicable if you configure incremental data extraction based on a timestamp column. Indicates the timestamp format of the column value defined in the --infa-incremental-value argument.
Enclose the format value within double quotes.
Use the following syntax:
--infa-incremental-value-format "<format>"
Default is "MM/dd/yyyy HH:mm:ss.SSSSSSSSS".
If you want to perform incremental data extraction but you do not configure all the required arguments, the Sqoop mapping fails. The argument values are not case sensitive.
Example
The Sales department in your organization stores information related to new customers in Salesforce. You want to create a Sqoop mapping to read customer records that were created after a certain date and write the data to SAP.
In the Sqoop mapping, you can define the Sqoop import arguments as follows: