Table of Contents

Search

  1. About the Data Vault SQL Reference
  2. Introduction to SQL Reference for Informatica Data Vault
  3. Date and Time Arithmetic
  4. WHERE Clauses
  5. UNION Operator
  6. Parameterized Query
  7. Functions

Data Vault SQL Reference

Data Vault SQL Reference

SELECT Clauses

SELECT Clauses

The following list provides SELECT clauses:
DISTINCT
The DISTINCT keyword dictates that duplicate rows be omitted from query results. If DISTINCT is specified, the projection list must consist only of named columns, with no functions or expressions. Note that this requirement also applies to all Column Rules defined on columns in the queried table.
AS
The AS keyword can be used to name result columns in a SELECT clause. This is particularly useful for a column derived from an expression or from a function applied to a column value.
The name of a result column is determined as follows:
  • If the AS clause is specified, the name of the result column is the correlation name.
  • If the AS clause is not specified and the result column is derived from a column name, the result column name is the unqualified name of that column.
  • If the AS clause is not specified and the result column is derived from a function or value expression, the result column name is the number of the column, counting from left to right.
    Note that the AS keyword is optional when naming result columns.
FROM
This keyword introduces the archived tables or derived tables from which data is retrieved. The FROM clause
must
be included in a SELECT statement.
Correlation names can be used in place of the actual table name when qualifying columns within the query statement. The correlation name appears immediately after the corresponding table in the FROM clause (separated by whitespace) or after the keyword AS. Correlation names must satisfy the same naming conventions as table names (except that they cannot be delimited by quotation marks). The correlation name qualifies columns that appear in either SELECT or WHERE clauses.
Apart from individual archived table names, the FROM clause can include explicit join syntax.
Explicit Joins
You can use the following explicit join types:
  • INNER JOIN
    Inner joins return matched rows from both tables.
  • LEFT [OUTER] JOIN
    Left outer joins include non-matching rows only from the table named before (that is, to the left of) the LEFT OUTER JOIN clause. Missing values in a row are filled with nulls.
  • RIGHT [OUTER] JOIN
    Right outer joins include non-matching rows only from the table named after (that is, to the right of) the RIGHT [OUTER] JOIN clause. Missing values in a row are filled with nulls.
  • FULL [OUTER] JOIN
    The result of the full outer join contains all matched as well as unmatched rows from both tables.
An explicit join is allowed in the FROM clause with the following restrictions:
  • a single explicit join tree can be specified (that is, multiple explicit joins, separated by commas, are not permitted in the query)
  • the explicit join must appear after all implicit joins (individual archived table names, separated by commas) in the FROM clause, if implicit joins are included in the query.
The FROM clause must therefore comprise
one
of the following:
  • one or more comma-delimited table names
  • a single explicit join tree
  • one or more comma-delimited table names, followed by a single explicit join tree
    For example, the following query constructions are syntactically legal: SELECT * FROM t1, t2, t3 WHERE ... ; SELECT * FROM t4 INNER JOIN t1 ON t4.c1 = t1.c1 AND t4.c2 != t1.c2 AND t1.c3 > t4.c3 WHERE ... ; SELECT * FROM t1 JOIN t2 ON t1.c1 < t2.c1 FULL OUTER JOIN t3 ON t2.c2 != t3.c2 WHERE ... ; SELECT * FROM t1, t2, t7, (t3 INNER JOIN t4 ON t3.c2 = t4.c2) FULL OUTER JOIN (t5 INNER JOIN t6 ON t5.col1 != t6.c1) ON t3.c3 < t6.c3 WHERE ... ;
while the following are syntactically illegal queries:
-- implicit joins (t3, t4) follow explicit join: SELECT * FROM t1 INNER JOIN t2 ON t1.c1 > t2.c1, t3, t4 WHERE ... ; -- multiple join trees: SELECT * FROM t1 INNER JOIN t2 ON t1.c1 > t2.c1, t3 RIGHT JOIN t4 ON t3.c3 != t4.c3 WHERE ... ;
WHERE
The optional WHERE clause sets conditions that each record must meet before being retrieved by a query. If the SELECT statement does not include a WHERE clause, every row in the table being queried will appear in the result.
In the WHERE clause, a
Boolean value expression
defines a selection criterion (or criteria) applied to each row of data values in the table being queried by the SELECT statement. If the data in a particular record matches the selection criteria, then that record is retrieved by the query; otherwise the record is passed over.
Consult the next section for more information about the possible Boolean value expressions.
GROUP BY
The optional GROUP BY clause allows query results from aggregate functions on columns to be grouped in terms of the distinct values appearing in specified columns. The GROUP BY clause collects the rows that meet the specified search criteria into groups containing common values in the specified columns. There must be at least one expression in the SELECT column list that represents a group sharing a single value.
The grouping-column argument(s) must consist of a column name (or names) from among those specified in the column list following the SELECT command keyword. Multiple grouping-column arguments may be specified, separated by commas. A direct column reference need not correspond to a column in the projection list; the query output can be grouped by any column from the archived table.
An alternative to specifying a column by name is to use a
positive integer
to identify the column by its position in the projection list of the SELECT statement. That is, the first item in the projection list is
1
, the second is
2
, and so on. One restriction, however, is that a position number cannot refer to a constant, or an expression composed entirely of constants, in the projection list.
Additionally, grouping can be done by specifying a correlation name from the projection list, which is useful for grouping by a named value expression. For instance, if the SELECT list contains the item
"col1*col2 AS derived_col"
, the GROUP BY clause would reference
derived_col
to group by that derived column.
Grouping can also be based on a value expression from the projection list by including the same expression in the GROUP BY clause. The value expression in the GROUP BY clause must match the one in the projection list exactly (excluding whitespace differences and extraneous parentheses), otherwise an error will be returned.
A HAVING clause can be applied to qualify further the results of a GROUP BY clause.
HAVING
The HAVING clause further qualifies groups in the same way that the WHERE clause qualifies rows. Groups of rows from the tables referenced in the FROM clause that satisfy all the selection criteria arguments are used in composing the query result. Note that a GROUP BY clause must appear in the SELECT statement if there is a HAVING clause.
A HAVING clause may contain Boolean value expressions, simply or compounded together using AND or OR operators. Each Boolean predicate value expression must consist of a grouped column (that is, a column referenced in the GROUP BY clause), an aggregate function, a constant, or some arithmetic combination of grouped columns, aggregate functions, and constants. To optimize query execution speeds, avoid using compound expressions in the HAVING clause and in the projection list; it is best to specify columns only.
Subqueries may not appear in the HAVING clause.
WITH SAMPLE OF
The WITH SAMPLE OF clause returns a random sampling of the result set produced by the SELECT statement. That is, if the same SELECT...WITH SAMPLE OF statement is issued multiple times, a different subset of the total result set is returned to the caller each time.
The number of rows in the returned sample is proportional to the total number of rows in the result set. The "
number
IN
factor
" part of the clause indicates the proportion of returned rows to total rows, which is calculated as
number
:
factor
. For example, the following SELECT statement returns approximately a 10% subset, (a 1:10 proportion) of the entire result set:
SELECT p_no FROM products WITH SAMPLE OF 1 IN 10;
If 100 rows were contained in the result set, then about 10 of those rows, determined randomly, would be returned to the caller.
The syntax rules for
number
and
factor
are as follows:
  • number
    and
    factor
    must be integers greater than zero.
  • number
    must be less than or equal to
    factor
    .
If
number
is equal to
factor
, then no sampling is performed: the entire result set is returned. If more than zero rows are selected but the sample size is calculated as zero, then one row will be randomly chosen. The sampling logic will never return zero rows, unless the result set itself contains zero rows.
ORDER BY
The optional ORDER BY clause sorts the rows returned by the query according to the values in the specified columns, in the order in which they are specified. That is, results are sorted according to the values in the first column argument first; the values in the second column argument next, and so on. The column argument can be a column from among those returned by the query (that is, one of the select list expressions), or it can be any column from the archived table (regardless of whether the column is included in the select list). Multiple column arguments may be specified, separated by commas. Results can be sorted in ascending order (the default) or descending order by specifying the ASC or DESC keywords, respectively.
As an alternative to specifying a column by name, a
positive integer
may be used to identify the column by its position in the projection list of the SELECT statement. That is, the first item in the projection list is
1
, the second is
2
, and so on. In the case of SELECT *, the positive integer refers to the ordinal position of a column in the archived table.
Ordering can also be specified using a correlation name from the projection list, which is useful for ordering results by derived column.
Another way to sort by a value expression from the projection list is to include the same expression in the ORDER BY clause. The value expression in the ORDER BY clause must match the one in the projection list exactly (excluding whitespace differences and extraneous parentheses), otherwise an error will be returned.
EXPORT INTO
The "Direct Export" query option is a method of exporting data from an archived table to one or more flat files. By including the EXPORT INTO clause at the end of a standard SELECT statement, the data fetched by the query will be written to the specified files rather than displayed at the console. The Direct Export method is highly scalable, so increasing the number of computers with active Data Vault Agent processes should decrease the total amount of time required for the export operation. The key principle is to maximize the number of Data Vault Agents writing data to separate flat files in parallel with minimal hardware contention.
As the SELECT...EXPORT INTO command is executing, the fetched data will not be displayed at the console. Instead, as each Data Vault data file is processed, the data file name will appear on the screen. By default, this output is buffered by the client, so the data file names will actually be written in clusters sporadically during the export operation. To have the data file names displayed as they are processed, issue the following command prior to the SELECT...EXPORT INTO command:
ALTER SESSION SET FETCHROWS=1;
This does not affect the export operation performance in any way, only the presentation of information on the client side. (To restore the default data display buffering, execute the same command with FETCHROWS=0.)
Note that if the export operation fails before completion, the flat files written up to that point will most likely be incomplete.

0 COMMENTS

We’d like to hear from you!