Table of Contents

Search

  1. Abstract
  2. Supported Versions
  3. Informatica® Big Data Management 10.2.1 on Microsoft Azure: Architecture and Best Practices

Informatica® Big Data Management 10.2.1 on Microsoft Azure: Architecture and Best Practices

Informatica® Big Data Management 10.2.1 on Microsoft Azure: Architecture and Best Practices

Physical storage

Physical storage

The Big Data Management integration with HDInsight provides native, high-volume connectivity to each of the following storage types:
Azure Data Lake Storage (ADLS Gen1)
Azure Data Lake Storage provides massively scalable data storage optimized for Hadoop analytics engines. You can use ADLS to archive structured and unstructured data, and access it via Hive, Spark, or the native Informatica run-time engine.
General purpose storage v1 and v2
General purpose storage is available in v.1 (GPv1) and v. 2 (GPv2). Both come in standard and premium versions. The standard storage version uses magnetic media tape storage. Only standard storage supports Hadoop. For more information about general purpose storage, see the Azure documentation.
Use this disk storage for data sources and targets.

Storage Type Features

The following table describes features of each storage type:
Storage Type
Description
Accessibility
Storage
(General purpose v1)
  • Does not have latest features of Azure storage.
  • More expensive per-gigabyte pricing model.
  • Pricing is lower for transactions.
  • Oldest type of storage account.
  • Supports blob, files, queues and tables.
Accessible from all Azure storage services
Storage v2
(General purpose v2)
  • Supports hot, cool, and archive storage.
  • Supports lowest per-gigabyte pricing model.
  • When you create a new storage account, General Purpose v2 is the default option.
  • Recommended for source and target data
  • Supports blob, Azure files, messages, queues, and un-managed disks (page blobs).
Accessible from all Azure storage services
ADLS
  • Uses Apache Hadoop.
  • WebHDFS file system compatible.
  • No limits on account sizes, file sizes, or the amount of data that can be stored in a data lake.
  • Individual files can range from kilobyte to petabytes in size.
  • Performance-tuned for big data analytics.
  • Can store any data in their native format.
Accessible from all Azure storage services
You can access ADLS through the Azure API. An HDInsight cluster is not required.
For more information, see the Azure documentation.
Blob storage is also available, but it does not support HDInsight.

Storage Type Security

The following table describes security characteristics of each storage type:
Storage Type
Description
Storage
(General purpose v1)
Uses Resource Manager Role-Based Access Control (RBAC) with storage account keys.
Storage v2
(General purpose v2)
Uses Resource Manager Role-Based Access Control (RBAC) with storage account keys.
ADLS
  • POSIX-compliant fine-grained ACL support.
  • At-rest encryption.
    For more information, see the Azure documentation.
  • Azure Active Directory integration.
  • Storage account firewalls.

Storage Tier Support for HDInsight

The following table shows which storage types are supported with HDInsight:
Storage Account Type
Storage Tier
Supported with HDInsight
General purpose storage
Standard
Yes
General purpose storage
Premium
No
Blob storage
Hot / Cool
No
ADLS
Gen 1
Yes
For more information about storage types to use with HDInsight, see the Azure documentation.

0 COMMENTS

We’d like to hear from you!