Table of Contents

Search

  1. About the Data Vault SQL Reference
  2. Introduction to SQL Reference for Informatica Data Vault
  3. Date and Time Arithmetic
  4. WHERE Clauses
  5. UNION Operator
  6. Parameterized Query
  7. Functions

Data Vault SQL Reference

Data Vault SQL Reference

CSV File Lookup

CSV File Lookup

The CSV file form of the Lookup predicate generally works the same way as the inline form, except the array of values is contained in an external file referenced by the Lookup instance. The values in the external file are typically in the CSV (Comma-Separated Values) format, although Lookup can also read some custom formats.
The Lookup predicate adheres to the following CSV file characteristics:
  • Each row of values is located on a separate line, delimited by a line break.
  • The last record in the file may or may not have an ending line break.
  • Within each record, there may be one or more fields, separated by commas, and each line contains the same number of fields throughout the file.
  • Each field may or may not be enclosed in double quotation marks (
    "
    ). If fields are not enclosed by double quotation marks, then double quotation marks may not appear inside the fields.
  • Fields that contain line breaks, double quotes, and commas should be enclosed by double quotation marks.
  • If double quotation marks are used to enclose fields, then a double quotation mark appearing inside a field must be escaped by preceding it with another double quotation mark.
One CSV file feature not supported by Lookup is the optional header line. A CSV file referenced by Lookup must not contain the field names on the first line.
Note that there are no limitations on the size of a CSV file, except those imposed by hardware resources or the operating system.
The syntax for a CSV file LOOKUP predicate appears below:
lookup condition
The conditional expression containing placeholders for the CSV values. Note that a comma (,) must follow the
lookup condition
parameter.
The following table describes the lookup condition parameters.
Parameter
Description
SOURCEFILE '
CSV file
'
Specifies the CSV file containing an array of values that will be used in the lookup condition. The CSV file path, which must be enclosed by single quotation marks, can be absolute or relative.
DELIMITER '
delimiter character
'
Optionally specifies which single character separates the values in the CSV file. By default, the
delimiter character
is a comma (
,
), as is standard in CSV files. Note that the rows in the CSV file are delimited by line breaks, and this row delimiter cannot be changed.
REMOTE
Indicates that the specified CSV file is on the Data Vault Service side. If this keyword is omitted, the file is assumed to be on the client side (the machine where the query originated).
While the Lookup predicate is capable of processing a typical CSV file, it is flexible enough to also handle nonstandard CSV files or non-CSV files. As mentioned above, the DELIMITER parameter in the Lookup syntax allows the specification of any other single character as the value delimiter in the file. For example, the following CSV rows use the default comma, pound symbol (
#
), pipe symbol (
|
), and space character, respectively, as value delimiters:
1,"aubergine",-6e23,"Y" 1#"aubergine"#-6e23#"Y" 1|"aubergine"|-6e23|"Y" 1 "aubergine" -6e23 "Y"
As with the inline form of Lookup, columns in the array of values in the CSV file can be ignored by omitting the associated placeholders in the conditional expression.
Unlike the inline form of Lookup, hexadecimal values cannot be specified in the CSV file. However, this should not be a problem, since any literal character can appear in the CSV file. The CSV file can even include control characters like the escape character (ASCII code 27), or special whitespace characters like the vertical tab (ASCII code 11), as long as the containing string is enclosed by double quotation marks.
Line breaks that are part of a string appear as line breaks in the CSV file, in exactly the same way that it would appear if the string were printed out. For instance:
24786,"Dear Joan, Get well soon. Regards, Tim",1997 24787,"shipping notice",2004
In the above example, the string "Dear Joan,<line break><line break>Get well soon.<line break><line break>Regards,<line break>Tim" is interpreted as a single string in the same record as the values "24786" and "1997" that appear on either side of the string. The row that follows ('24787,"shipping notice",2004') is a more typical record.
Optionally, any field value can be enclosed by double quotation marks in the CSV file, whether the value is string or numeric. However, certain characters require that the whole string be contained in quotes:
  • control characters (ASCII codes 1-31 and 127)
  • space character (ASCII code 32)
  • comma (
    ,
    )
  • double quotation marks (
    "
    ), if they are at the start of the string
    Note that if control characters and spaces are not part of a quoted string, they are ignored or treated as delimiters.
Repeated delimiter characters (for example, ",,,,,") are counted as a single delimiter.
Literal double quotation marks in a CSV file string must be escaped by doubling the character (
""
). For example:
24788,"She said, ""OUCH!""",1998
The string in the above example is interpreted as:
She said, "OUCH!"
As with the inline form of Lookup, null values cannot be specified. An empty string can be specified using
""
.
If the null control character (ASCII code 00) is present in the CSV file, either as part of a string literal or as a delimiter, the query will hang when executed against the Data Vault Service.

0 COMMENTS

We’d like to hear from you!