If the incremental key is a timestamp, the incremental key column must store date/time data, and the timestamp must indicate the last time that the row of data was modified.
When you run the mass ingestion specification, the Spark engine fetches the rows in the source table with a timestamp that is more recent than the oldest timestamp that was previously ingested. If the timestamp for a row in the table is more recent than the oldest timestamp that was ingested, the Spark engine fetches the row associated with the timestamp as incremental data.
For example, you might have ingested the following source table in the previous run of the specification:
EmpLastName
LastModified
'Basquez'
01/27/2017 02:43:05
'Savage'
03/15/2014 07:16:20
'Greene'
12/13/2012 09:42:11
Note that the oldest timestamp is
01/27/2017 02:43:05
.
The following table shows the current data in the source table:
EmpLastName
LastModified
'Basquez'
10/22/2018 04:20:57
'Savage'
03/15/2014 07:16:20
'Greene'
12/13/2012 09:42:11
'Caldwell'
09/13/2018 04:24:26
Since the oldest timestamp that was ingested is
01/27/2017 02:43:05
, the Spark engine fetches the rows from the source table with a timestamp that is more recent than
01/27/2017 02:43:05
.
In the current source table, there are two timestamps that are more recent:
10/22/2018 04:20:57
and
09/13/2018 04:24:26
, so the rows that are associated with these timestamps are incremental data.
When you run the specification, the Spark engine ingests the following rows of data: