Table of Contents

Search

  1. Preface
  2. Introduction to Informatica Big Data Management
  3. Connections
  4. Mappings in a Hadoop Environment
  5. Mapping Objects in a Hadoop Environment
  6. Mappings in the Native Environment
  7. Profiles
  8. Native Environment Optimization
  9. Data Type Reference
  10. Function Reference
  11. Parameter Reference

Hadoop Utilities

Hadoop Utilities

Big Data Management uses third-party Hadoop utilities such as Sqoop to process data efficiently.
Sqoop is a Hadoop command line program to process data between relational databases and HDFS through MapReduce programs. You can use Sqoop to import and export data. When you use Sqoop, you do not need to install the relational database client and software on any node in the Hadoop cluster.
To use Sqoop, you must configure Sqoop properties in a JDBC connection and run the mapping in the Hadoop environment. You can configure Sqoop connectivity for relational data objects, customized data objects, and logical data objects that are based on a JDBC-compliant database. For example, you can configure Sqoop connectivity for the following databases:
  • Aurora
  • Greenplum
  • IBM DB2
  • IBM DB2 for z/OS
  • Microsoft SQL Server
  • Netezza
  • Oracle
  • Teradata
The Model Repository Service uses JDBC to import metadata. The Data Integration Service runs the mapping in the Hadoop run-time environment and pushes the job processing to Sqoop. The Sqoop program then creates MapReduce jobs in the Hadoop cluster, which perform the import and export job in parallel.


Updated July 03, 2018