Use standard relational database indexing principles to determine your indexing strategy.
Consider the following rules and guidelines for data indexing:
Index table columns that you frequently use in queries. The Data Vault uses the indexes to increase the performance of query WHERE clauses. The optimizer does not consider indexes for multiple table joins.
Consider the partition key of the data files that you want to query. Indexes work best on columns that you did not use as the original partition key.
Consider the query operation. By default, the system uses a range index when you run queries. Index columns if the queries use operations other than range operations. For example, exact comparison operations such as equal to, not equal to, in, or not in. The indexes that you create supplement the default range index.
Consider the cardinality of the table column against the total number of rows in the table. Index table columns that have a large amount of unique values such as transaction IDs, customer IDs, and phone numbers. Indexes on table columns with low cardinality, such as gender, age, and country, do not significantly increase the query performance.
Consider the expected size of the query result set. Index columns if you expect a small result set. If the query result set selects a large amount of rows in a table, then the indexes do not significantly increase the query performance.
Consider query performance versus database space requirements. When you create an index, the query performance might be drastically reduced. However, the overall database size increases. It might take while to create the index. The time it takes to create an index decreases with the number of Data Vault agents that you run. Balance the needs of query performance with the database space requirements and the time it takes to create the index.