Read from and Write to Big Data Sources and Targets
Read from and Write to Big Data Sources and Targets
In addition to relational and flat file data, you can access unstructured and semi-structured data, social media data, and data in a Hive or Hadoop Distributed File System (HDFS) environment.
You can access the following types of data:
Transaction data
You can access different types of transaction data, including data from relational database management systems, online transaction processing systems, online analytical processing systems, enterprise resource planning systems, customer relationship management systems, mainframe, and cloud.
Unstructured and semi-structured data
You can use data objects with an intelligent structure model, or Data Processor transformations, to read and transform unstructured and semi-structured data.
You can use data objects with an intelligent structure model to read and transform unstructured and semi-structured data on a Spark engine. For example, you can use a complex file data object with an intelligent structure model in a mapping to parse a Microsoft Excel file to load accounting data into S3 storage buckets. For more information, see
Processing Unstructured and Semi-structured Data with Intelligent Structure Model Overview. The intelligent structure model is quickly auto-generated from a representative file and can be easily updated or customized.
Alternatively, you can use the Data Processor transformation in a workflow to parse unstructured and semi-structured data. For example, you can parse a Microsoft Excel file to load customer and order data into relational database tables. Data Processor transformations have broad functionality and format support, but require manual setup. For more information, see the
Data Transformation User Guide
.
You can use HParser with a Data Transformation service to transform complex data into flattened, usable formats for Hive, PIG, and MapReduce processing. HParser processes complex files, such as messaging formats, HTML pages and PDF documents. HParser also transforms formats such as ACORD, HIPAA, HL7, EDI-X12, EDIFACT, AFP, and SWIFT. For more information, see the
Data Transformation HParser Operator Guide
.
Social media data
You can use PowerExchange® adapters for social media to read data from social media web sites like Facebook, Twitter, and LinkedIn. You can also use the PowerExchange for DataSift to extract real-time data from different social media web sites and capture data from DataSift regarding sentiment and language analysis. You can use PowerExchange for Web Content-Kapow to extract data from any web site.
Data in Hadoop
You can use PowerExchange adapters to read data from or write data to Hadoop. For example, you can use PowerExchange for Hive to read data from or write data to Hive. You can use PowerExchange for HDFS to extract data from and load data to HDFS. Also, you can use PowerExchange for HBase to extract data from and load data to HBase.
Data in Amazon Web Services
You can use PowerExchange adapters to read data from or write data to Amazon Web services. For example, you can use PowerExchange for Amazon Redshift to read data from or write data to Amazon Redshift. Also, you can use PowerExchange for Amazon S3 to extract data from and load data to Amazon S3.
For more information about PowerExchange adapters, see the related PowerExchange adapter guides.