Table of Contents

Search

  1. Preface
  2. The Transformation Language
  3. Constants
  4. Operators
  5. Variables
  6. Dates
  7. Functions

Transformation Language Reference

Transformation Language Reference

LAG

LAG

Returns the value that is an offset number of rows before the current row in an Expression transformation. Use this function to compare values in the current row with values in a previous row when you run a mapping on the Spark engine in the Hadoop environment.
A lag value appears before the current row in a set of data.
When you use LAG in a transformation, you must configure the transformation for windowing. Windowing properties define how the data is partitioned and ordered.

Syntax

LAG (
column_name
,
offset
,
default
)
The following table describes the arguments for this command:
Argument
Required/
Optional
Description
column_name
Required
The target column or expression that the function operates on.
offset
Required
Integer data type. The number of rows before the current row to obtain a value from.
default
Optional
The default value to be returned in case the offset is outside the bounds of the partition or table. Default is NULL.

Return Value

The data type of the specified
column_name
.
Default
if the return value is outside the bounds of the specified partition.
NULL if
default
is omitted or set to NULL.

Examples

The following expression returns the date that the previous order was placed:
LAG ( ORDER_DATE, 1, NULL )
ORDER_DATE
ORDER_ID
RETURN VALUE
2017/09/25
1
NULL
2017/09/26
2
2017/09/25
2017/09/27
3
2017/09/26
2017/09/28
4
2017/09/27
2017/09/29
5
2017/09/28
2017/09/30
6
2017/09/29
The lag value of the first row is outside the partition, so the function returned the default value of NULL.
In the following example, your organization receives GPS pings from vehicles that include trip and event IDs and a time stamp. You want to calculate the time difference between each ping.
The following expression calculates the time difference between the current row and the previous row for two separate trips:
DATE_DIFF( EVENT_TIME, LAG ( EVENT_TIME, 1, NULL ), 'ss' )
You partition the data by trip ID and order by event ID.
TRIP_ID
EVENT_ID
EVENT_TIME
RETURN VALUE
101
1
2017-05-03 12:00:00
NULL
101
2
2017-05-03 12:00:34
34
101
3
2017-05-03 12:02:00
86
102
1
2017-05-03 12:00:00
NULL
102
2
2017-05-03 12:01:56
116
102
3
2017-05-03 12:02:00
4
The lag values of the first and fourth row are outside the specified partition, so the function returned two default NULL values.

0 COMMENTS

We’d like to hear from you!