Table of Contents

Search

  1. Preface
  2. Installing MDM Big Data Relationship Management
  3. Configuring MDM Big Data Relationship Management
  4. Configuring Security
  5. Setting Up the Environment to Process Streaming Data
  6. Configuring Distributed Search
  7. Packaging and Deploying the RESTful Web Services

Installation and Configuration Guide

Installation and Configuration Guide

Configuring the Repository Parameters

Configuring the Repository Parameters

You must configure parameters related to the repository, such as the host server name and the port number on which the server listens, in the configuration file.
To configure the repository parameters, add the following parameters to the
HBASEConfiguration
section in the configuration file:
HbaseMaster
Host name of the HBase Master server and the port number on which the Master server listens. You can use 60000 as the port number.
Specify the port number based on the
hbase.master.port
parameter configured in HBase.
Use the following format to specify a value for the
HbaseMaster
parameter:
<Master Server Host Name>:<Port>
HbaseZookeeperQuorum
Comma-separated list of ZooKeeper servers when HBase uses an ensemble of ZooKeeper servers.
For example: server1.domain.com,server3.domain.com
You can get the list of servers from the
hbase.zookeeper.quorum
property in the following file:
${HBASE_HOME}/conf/hbase-site.xml
HbaseZookeeperClientPort
Optional. Port number on which the ZooKeeper server listens for client connections. Default is 2181.
HbaseRootDirectory
Required if you use Hortonworks Data Platform. Directory that the region servers share on the local file system. Default is
${hbase.tmp.dir}/local
.
The
HbaseRootDirectory
parameter overrides the
hbase.rootdir
parameter configured in the following file:
${HBASE_HOME}/conf/hbase-site.xml
HbaseDistributed
Optional. Indicates whether the HBase runs in the standalone or distributed mode.
Set to true to indicate distributed mode and set to false to indicate standalone mode. In the standalone mode, HBase runs all the HBase and ZooKeeper daemons in a single Java virtual machine (JVM). Default is false.
The
HbaseDistributed
parameter overrides the
hbase.cluster.distributed
parameter configured in the following file:
${HBASE_HOME}/conf/hbase-site.xml
HbaseZookeeperZnodeParent
Required if you use Hortonworks Data Platform. Root ZNode path for the ZooKeeper files. HBase stores all the ZooKeeper files configured with the relative path in the root ZNode path. Default is
/hbase
.
The
HbaseZookeeperZnodeParent
parameter overrides the
zookeeper.znode.parent
parameter configured in the following file:
${HBASE_HOME}/conf/hbase-site.xml
HbaseCompressionAlgorithm
Optional. Compression algorithm that you want HBase to use. Use one of the following compression algorithms:
  • SNAPPY
  • LZO
  • LZ4
  • GZ
  • NONE
Default is NONE.
HbaseDataBlockEncoding
Optional. Type of data block encoding that you want HBase to use. Use one of the following data block encoding types:
  • PREFIX
  • DIFF
  • FAST_DIFF
  • NONE
Default is NONE.
ScanCacheSize
Optional. Number of records for that you want to pass to scanners at once. Default is 500.
ScanBatchSize
Optional. Number of records that you want to return on each scan. Default is 100.
CacheBlock
Optional. Indicates whether you want to enable block cache for the scan.
Set to true to enable block cache and set to false to disable block cache. Default is false.
AutoFlush
Optional. Indicates whether you want to enable auto flush behavior.
Set to true if you want to enable auto flush and set to false if you want to disable auto flush. Default is false.
WALonPUT
Optional. Indicates whether you want to enable Write Ahead Log (WAL) edits for a put method.
Set to true to enable WAL edits for a put method and set to false to disable WAL edits for the put method. When you set to false, the region server does not write the logs to the file-based storage and your data might be at risk. Default is false.
EnableSmallScan
Optional. Indicates whether you want to enable small scan.
Set to true if you want to enable small scan and set to false if you want to disable small scan. Default is false.
RegionSplitSize
Optional. Key size to split the region. The value enables region split policy and groups records based on the prefix of the row key. Default is 8 bytes.
DriverName
Optional. Name of the HBase driver to use. Use one of the following driver names based on the HBase version:
  • For HBase version 0.94.x,
    com.informatica.mdmbde.database.hbase.HBaseDatabaseAdapterImplV1
  • For HBase version 0.96.x,
    com.informatica.mdmbde.database.hbase.HBaseDatabaseAdapterImpl
Default is
com.informatica.mdmbde.database.hbase.HBaseDatabaseAdapterImpl
.
CoprocessorPath
Absolute path and file name for the coprocessor JAR file that you must generate and deploy in HDFS. The JAR file contains the search logic that the region servers use to perform searches.
For example,
/user/cloudera/db-hbase-coprocessorDeploy.jar
.
Use the following name format for the JAR file name:
db-hbase-coprocessor<id>
The name format uses the
id
parameter that indicates a unique identifier for the JAR file. For example, if
id=Deploy
, the JAR file name must be
db-hbase-coprocessorDeploy.jar
.
CoprocessorClass
Name of the class that the coprocessor uses. Specify
com.informatica.mdmbde.database.hbase.coprocessor.BDRMRegionObserver
as the parameter value.
SearchTokenValidity
Optional. Number of seconds that a search token remains valid. When you enable pagination, a search request returns a token with the search results. You can use the token to get the subsequent pages of the search results from cache to avoid performing the search again. The token expires after the specified time. Default is 600 seconds.
KeyTabFile
Optional. Absolute path and file name of the keytab file. A keytab file contains a list of keys that are analogous to user passwords. Applicable if you use Kerberos for authentication.
If you use the
KeyTabFile
parameter, ensure that the name of the file and the absolute path to the file are the same for all the nodes in a distributed Hadoop cluster.
PrincipalName
Required if you use Kerberos for authentication. Service Principal Name (SPN) of the HBase master server. For example, hbase/_Host@realm.com.
You can get the SPN of the HBase master server from the
hbase.master.kerberos.principal
property in the following file:
${HBASE_HOME}/conf/hbase-site.xml
The following sample code shows the parameters for HBase:
<HBASEConfiguration> <HbaseMaster>HadoopServer:60000</HbaseMaster> <HbaseZookeeperClientPort>2181</HbaseZookeeperClientPort> <HbaseZookeeperQuorum>iir-hadoop-test1</HbaseZookeeperQuorum> <HbaseRootDirectory /> <HbaseDistributed>true</HbaseDistributed> <HbaseZookeeperZnodeParent /> <HbaseCompressionAlgorithm>SNAPPY</HbaseCompressionAlgorithm> <HbaseDataBlockEncoding>PREFIX</HbaseDataBlockEncoding> <ScanCacheSize>100000</ScanCacheSize> <CacheBlock>false</CacheBlock> <AutoFlush>false</AutoFlush> <WALonPUT>false</WALonPUT> <ScanBatchSize>100</ScanBatchSize> <EnableSmallScan>false</EnableSmallScan> <RegionSplitSize>8</RegionSplitSize> <DriverName>com.informatica.mdmbde.database.hbase.HBaseDatabaseAdapterImplV1</DriverName> <SearchTokenValidity>1000</SearchTokenValidity> <KeyTabFile>/etc/security/keytabs/hbase.keytab</KeyTabFile> <PrincipalName>hbase/_Host@realm.com</PrincipalName> </HBASEConfiguration>


Updated June 27, 2019