Update the default search query to exclude specific entities from the metadata load.
Example 1: Excluding HDFS Entities in a Specific Directory
Your Cloudera distribution contains a temporary user named "test." When you view the HDFS directory
/user/test
in Cloudera Navigator, you see that all of the files owned by the test user write to the directory
/user/test/.Trash
. Therefore, you do not want Metadata Manager to extract HDFS entities in directory
/user/test
or its subdirectories.
To prevent Metadata Manager from extracting the entities, append the
/user/test
file path to the search query as follows:
NOT ((fileSystemPath:*\/.cloudera_manager_hive_metastore_canary*) OR (fileSystemPath:\/hbase\/oldWALs*) OR (fileSystemPath:\/hbase\/WALs*) OR (fileSystemPath:\/tmp\/logs*) OR (fileSystemPath:\/user\/history\/done*) OR (fileSystemPath:\/tmp\/hive-cloudera*) OR (fileSystemPath:\/tmp\/hive-hive*) OR (fileSystemPath:*\/.Trash*)
OR (fileSystemPath:*\/user\/test*)
)
Example 2: Excluding Job Executions
To prevent Metadata Manager from loading YARN, Oozie, and MapReduce job executions and all Sqoop job templates and executions, update the default search query as follows:
NOT ((fileSystemPath:*\/.cloudera_manager_hive_metastore_canary*) OR (fileSystemPath:\/hbase\/oldWALs*) OR (fileSystemPath:\/hbase\/WALs*) OR (fileSystemPath:\/tmp\/logs*) OR (fileSystemPath:\/user\/history\/done*) OR (fileSystemPath:\/tmp\/hive-cloudera*) OR (fileSystemPath:\/tmp\/hive-hive*) OR (fileSystemPath:*\/.Trash*))
AND NOT (((sourceType:YARN OR sourceType:OOZIE OR sourceType:MAPREDUCE) AND type:OPERATION_EXECUTION) OR sourceType:SQOOP)