Table of Contents

Search

  1. Preface
  2. Part 1: Version 10.4.1
  3. Part 2: Version 10.4.0
  4. Part 3: Version 10.2.2
  5. Part 4: Version 10.2.1
  6. Part 5: Version 10.2
  7. Part 6: Version 10.1.1
  8. Part 7: Version 10.1

Big Data Management

Big Data Management

This section describes new big data features in version 10.1.1 Update 2.
Truncate Hive table partitions on mappings that use the Blaze run-time engine
Effective in version 10.1.1 Update 2, you can truncate Hive table partitions on mappings that use the Blaze run-time engine.
For more information about truncating partitions in a Hive target, see the
Informatica 10.1.1 Update 2 Big Data Management User Guide
.
Filters for partitioned columns on the Blaze engine
Effective in version 10.1.1 Update 2, the Blaze engine can push filters on partitioned columns down to the Hive source to increase performance.
When a mapping contains a Filter transformation on a partitioned column of a Hive source, the Blaze engine reads only the partitions with data that satisfies the filter condition. To enable the Blaze engine to read specific partitions, the Filter transformation must be the next transformation after the source in the mapping.
For more information, see the
Informatica 10.1.1 Update 2 Big Data Management User Guide
.
OraOop support on the Spark engine
Effective in version 10.1.1 Update 2, you can configure OraOop to run Sqoop mappings on the Spark engine. When you read data from or write data to Oracle, you can configure the direct argument to enable Sqoop to use OraOop.
OraOop is a specialized Sqoop plug-in for Oracle that uses native protocols to connect to the Oracle database. When you configure OraOop, the performance improves.
For more information, see the
Informatica 10.1.1 Update 2 Big Data Management User Guide
.
Sqoop support for native Teradata mappings on Cloudera clusters
Effective in version 10.1.1 Update 2, if you use a Teradata PT connection to run a mapping on a Cloudera cluster and on the Blaze engine, the Data Integration Service invokes the Cloudera Connector Powered by Teradata at run time. The Data Integration Service then runs the mapping through Sqoop.
For more information, see the
Informatica 10.1.1 Update 2 PowerExchange for Teradata Parallel Transporter API User Guide
.
Scheduler support on Blaze and Spark engines
Effective in version 10.1.1 Update 2, the following schedulers are valid for Hadoop distributions on both Blaze and Spark engines:
  • Fair Scheduler. Assigns resources to jobs such that all jobs receive, on average, an equal share of resources over time.
  • Capacity Scheduler. Designed to run Hadoop applications as a shared, multi-tenant cluster. You can configure Capacity Scheduler with or without node labeling. Node label is a way to group nodes with similar characteristics.
For more information, see the Mappings in the Hadoop Environment chapter of the
Informatica 10.1.1 Update 2 Big Data Management User Guide
.
Support for YARN queues on Blaze and Spark engines
Effective in version 10.1.1 Update 2, you can direct Blaze and Spark jobs to a specific YARN scheduler queue. Queues allow multiple tenants to share the cluster. As you submit applications to YARN, the scheduler assigns them to a queue. You configure the YARN queue in the Hadoop connection properties.
For more information, see the Mappings in the Hadoop Environment chapter of the
Informatica 10.1.1 Update 2 Big Data Management User Guide
.
Hadoop security features on IBM BigInsights 4.2
Effective in version 10.1.1 Update 2, you can use the following Hadoop security features on the IBM BigInsights 4.2 Hadoop distribution:
  • Apache Knox
  • Apache Ranger
  • HDFS Transparent Encryption
For more information, see the
Informatica 10.1.1 Update 2 Big Data Management Security Guide
.
SSL/TLS security modes
Effective in version 10.1.1 Update 2, you can use the SSL and TLS security modes on the Cloudera and HortonWorks Hadoop distributions, including the following security methods and plugins:
  • Kerberos authentication
  • Apache Ranger
  • Apache Sentry
  • Name node high availability
  • Resource Manager high availability
For more information, see the
Informatica 10.1.1 Update 2 Big Data Management Installation and Configuration Guide
.
Hive sources and targets on Amazon S3
Effective in version 10.1.1 Update 2, Big Data Management supports reading and writing to Hive on Amazon S3 buckets for clusters configured with the following Hadoop distributions:
  • Amazon EMR
  • Cloudera
  • HortonWorks
  • MapR
  • BigInsights
For more information, see the
Informatica 10.1.1 Update 2 Big Data Management User Guide
.


Updated August 28, 2020