Use the following rules and guidelines when you update the search query:
The query that you enter must use valid Cloudera Navigator search syntax.
For information about Cloudera metadata search syntax, see the Cloudera documentation. Metadata Manager does not validate the search syntax.
To validate the search syntax, click
Test Connection
. If the search query is not valid, an error message appears. You can also run the search in Cloudera Navigator before you update the search query in Metadata Manager.
Before you exclude an HDFS directory, verify that no files in the directory or its subdirectories are used in the data flow.
If you exclude any HDFS entity that is used in a data flow, lineage links can break. For example, your Cloudera distribution contains a Pig job template that writes temporary files to directory
/tmp
, and the temporary files are used as inputs for another Pig job template. When you run data lineage, Metadata Manager shows lineage links between the upstream Pig job template and the temporary files and between the temporary files and the downstream Pig job template. If you exclude directory
/tmp
from the metadata load, Metadata Manager shows no lineage links between the two Pig job templates.
Purge the resource after you update the search query.
To extract metadata from a Cloudera Hadoop cluster, the Metadata Manager Service creates temporary files on the machine where the Metadata Manager Service runs. The Metadata Manager Service uses the temporary files to create the IME files that extract metadata from the Hadoop cluster. These files remain on the server until you purge the resource.
The contents of the temporary files vary based on the search query. If you do not purge the resource after you change the search query, Metadata Manager adds the search results from the new query to the temporary files but does not delete the contents from the previous query. This can cause unpredictable search results, especially when the new search query extracts fewer objects than the previous query.
For example, you update the default search query to exclude HDFS directory
/user/test
. If you do not reload the resource, the temporary files related to the default query remain on the server. Metadata Manager still extracts entities from
/user/test
because the default query did not exclude this directory from the metadata load.
To delete the temporary files, purge the resource. The next time you load the resource, Metadata Manager creates new temporary files and extracts metadata based only on the new search query.
You can include all entities in the metadata load.
To include all entities in the metadata load, replace the default search query with an asterisk (
*
) or delete the default query and leave the
Search query
property blank.
Do not exclude all HDFS entities by entering a wildcard character for the file system path.
Do not enter a wildcard character for the file system path to exclude all HDFS entities. If you try to exclude all HDFS entities by entering a wildcard character for the file system path, Metadata Manager excludes all entities that have the fileSystemPath property.
For example, you enter the following search query to exclude all HDFS entities:
NOT (fileSystemPath:*)
If you enter this query, Metadata Manager excludes all HDFS entities. However, because Hive tables, Hive partitions, and Pig tables have the fileSystemPath property, Metadata Manager also excludes these entity types.