Table of Contents

Search

  1. About the Enterprise Data Preparation Administrator Guide
  2. Introduction to Enterprise Data Preparation Administration
  3. Getting Started
  4. Administration Process
  5. User Account Setup
  6. Search Configuration
  7. Roles, Privileges, and Profiles
  8. Data Asset Access and Publication Management
  9. Masking Sensitive Data
  10. Monitoring Enterprise Data Preparation
  11. Backing Up and Restoring Enterprise Data Preparation
  12. Managing the Data Lake
  13. Schedule Export, Import and Publish Activities
  14. Interactive Data Preparation Service
  15. Enterprise Data Preparation Service

Enterprise Data Preparation Administrator Guide

Enterprise Data Preparation Administrator Guide

Create Catalog Resources

Create Catalog Resources

Use Informatica Catalog Administrator to create Hive and HDFS resources in Enterprise Data Catalog.
A resource represents a data source from which scanners extract metadata for use in the data lake, or a data lake target to which users upload and publish data assets. Scanners attached to a resource extract metadata from the resource and store the metadata in
Enterprise Data Catalog
.
You must create an HDFS resource for each HDFS location in the data lake into which
Enterprise Data Preparation
users import, upload, or publish assets.
For more information about creating resources and scanners, see "Creating a Resource" in the
Informatica Catalog Administrator Guide
.
  1. Create a Hive resource that Enterprise Data Catalog uses to extract metadata from the Hive tables in the data lake, and users access to upload or publish data to Hive. Configure the Hive resource with the following settings:
    • In the
      URL
      property on the
      General
      Connection Properties
      panel, specify the Fully Qualified Domain Name (FQDN) of the Hive server in the JDBC connection URL.
    • If you are using operating system profiles, the Hive user name that you specify as the value for the
      User
      property must be a Hive superuser. For more information about operating system profiles, see Using Operating System Profiles.
    • Import the relevant connectors to extract metadata from Hive sources.
    For more information about Hive scanner properties, see "Hive Resource Prerequisites and Connection Properties" in the
    Informatica Administrator Guide
    .
  2. Create an HDFS resource for each HDFS location users access to upload or publish data in the data lake.
    Select the
    Recursive Scan
    property for each HDFS resource users can access to publish data to the data lake. An error occurs if the property is not selected when a user publishes data.
    For more information about HDFS resource properties, see "HDFS Resource Connection Properties" in the
    Informatica Catalog Administrator Guide
    .
  3. Run a scan on the resources to load metadata into the catalog.
  4. Create schedules for the resources so that
    Enterprise Data Catalog
    regularly scans the resources. As a best practice, schedule the resource scans to run during non-business hours.

Tools to complete this step:

  • Informatica Catalog Administrator