Installation for Data Engineering

Back Next

Create the Cluster Configuration

After you configure the data profiling warehouse connection, you can create the cluster configuration to connect to the non-native environment.

Enter the name of the cluster configuration to create.

Specify the non-native distribution for the cluster.

The following table describes the options you can specify:

Prompt	Description
1	Cloudera. You can create a cluster configuration for a Cloudera cluster on either Cloudera Data Platform (CDP) or for Cloudera Distribution Hadoop (CDH).
2	Hortonworks
3	Azure HDInsight
4	MapR. You must import MapR cluster configuration properties from an archive file.
5	Amazon EMR. You must import Amazon EMR cluster configuration properties from an archive file.
6	Databricks
7	Google Dataproc

Before you import Amazon EMR cluster configuration properties, verify that the following ports associated with Amazon EMR are available:

Import configuration properties from the non-native environment to create the cluster configuration.

To import the properties from an archive file, press

. If you create a cluster configuration for an Amazon EMR, MapR, or Google Dataproc cluster, you must import the properties from an archive file.

To import the properties directly from the cluster, press

If you choose to import the properties from an archive file, you must choose the configuration archive file name and path to the file.

If you choose to import the properties directly from the cluster, specify the connection properties.

The following table describes the Cloudera, Hortonworks, or Azure HDInsight cluster properties you specify:

Property	Description
Host	The host name or IP address of the cluster manager.
Port	Port of the cluster manager.
User ID	Cluster user name.
Password	Password for the cluster user.
Cluster Name	Name of the cluster. Use the display name if the cluster manager manages multiple clusters. If you do not provide a cluster name, the wizard imports information based on the default cluster.
Engine type	If you specified a Cloudera cluster, the installer prompts for the engine type. If you are on a CDP cluster, accept the default engine type of Tez. If you are on a CDH cluster, press 2 to set the engine type to MRv2. Default is 1 .

The following table describes the Databricks cluster properties you specify:

Property	Description
Databricks domain	Enter the URL of the Databricks cluster.
Databricks token ID	Enter the token ID of the Databricks cluster.
Databricks cluster ID	Enter the cluster ID of the Databricks cluster.

To create the Hadoop, HDFS, Hive, HBase, or Databricks connections to the cluster, press

The installer appends the connection type to the cluster configuration name to create a connection name.