Table of Contents

Search

  1. Preface
  2. Introduction to Informatica Data Engineering Integration
  3. Mappings
  4. Mapping Optimization
  5. Sources
  6. Targets
  7. Transformations
  8. Python Transformation
  9. Data Preview
  10. Cluster Workflows
  11. Profiles
  12. Monitoring
  13. Hierarchical Data Processing
  14. Hierarchical Data Processing Configuration
  15. Hierarchical Data Processing with Schema Changes
  16. Intelligent Structure Models
  17. Blockchain
  18. Stateful Computing
  19. Appendix A: Connections Reference
  20. Appendix B: Data Type Reference
  21. Appendix C: Function Reference

Hadoop Utilities

Hadoop Utilities

Data Engineering Integration uses third-party Hadoop utilities such as Sqoop to process data efficiently.
Sqoop is a Hadoop command line program to process data between relational databases and HDFS through MapReduce programs. You can use Sqoop to import and export data. When you use Sqoop, you do not need to install the relational database client and software on any node in the Hadoop cluster.
To use Sqoop, you must configure Sqoop properties in a JDBC connection and run the mapping in the Hadoop environment. You can configure Sqoop connectivity for relational data objects, customized data objects, and logical data objects that are based on a JDBC-compliant database. For example, you can configure Sqoop connectivity for the following databases:
  • Aurora
  • Greenplum
  • IBM DB2
  • IBM DB2 for z/OS
  • Microsoft SQL Server
  • Netezza
  • Oracle
  • Teradata
  • Vertica
The Model Repository Service uses JDBC to import metadata. The Data Integration Service runs the mapping in the Hadoop run-time environment and pushes the job processing to Sqoop. Sqoop then creates map-reduce jobs in the Hadoop cluster, which perform the import and export job in parallel.

Specialized Sqoop Connectors

When you run mappings through Sqoop, you can use the following specialized connectors:
OraOop
You can use OraOop with Sqoop to optimize performance when you read data from or write data to Oracle. OraOop is a specialized Sqoop plug-in for Oracle that uses native protocols to connect to the Oracle database.
You can configure OraOop when you run Sqoop mappings on the Spark engine.
Teradata Connector for Hadoop (TDCH) Specialized Connectors for Sqoop
You can use the following TDCH specialized connectors for Sqoop to read data from or write data to Teradata:
  • Cloudera Connector Powered by Teradata
  • Hortonworks Connector for Teradata (powered by the Teradata Connector for Hadoop)
  • MapR Connector for Teradata
These connectors are specialized Sqoop plug-ins that Cloudera, Hortonworks, and MapR provide for Teradata. They use native protocols to connect to the Teradata database.
Informatica supports Cloudera Connector Powered by Teradata and Hortonworks Connector for Teradata on the Blaze and Spark engines. When you run Sqoop mappings on the Blaze engine, you must configure these connectors. When you run Sqoop mappings on the Spark engine, the Data Integration Service invokes these connectors by default.
Informatica supports MapR Connector for Teradata on the Spark engine. When you run Sqoop mappings on the Spark engine, the Data Integration Service invokes the connector by default.
For information about running native Teradata mappings with Sqoop, see the
Informatica PowerExchange for Teradata Parallel Transporter API User Guide
.


Updated September 28, 2020