Preface
Introduction to Informatica Big Data Management
Mappings in the Hadoop Environment
Mapping Sources in the Hadoop Environment
Mapping Targets in the Hadoop Environment
Mapping Transformations in the Hadoop Environment
Processing Hierarchical Data on the Spark Engine
Configuring Transformations to Process Hierarchical Data
Processing Unstructured and Semi-structured Data with an Intelligent Structure Model
Stateful Computing on the Spark Engine
Monitoring Mappings in the Hadoop Environment
Mappings in the Native Environment
Profiles
Native Environment Optimization
Cluster Workflows
Connections
Data Type Reference
Function Reference
- Function Support in the Hadoop Environment
- Function and Data Type Processing
  - Rules and Guidelines for Spark Engine Processing
  - Rules and Guidelines for Hive Engine Processing
Parameter Reference
- Parameters Overview
- Parameter Usage

Big Data Management User Guide

Back Next

Step 1. Collect the Data

Identify the data sources from which you need to collect the data.

Big Data Management provides several ways to access your data in and out of Hadoop based on the data types, data volumes, and data latencies in the data.

You can use PowerExchange adapters to connect to multiple big data sources. You can schedule batch loads to move data from multiple source systems to HDFS without the need to stage the data. You can move changed data from relational and mainframe systems into HDFS or the Hive warehouse. For real-time data feeds, you can move data off message queues and into HDFS.

You can collect the following types of data:

Transactional

Interactive

Log file

Sensor device

Document and file

Industry format

Big Data Process

Download Guide

Watch

Comments

Communities