Installation and Configuration Guide

10.1
- 10.1 HotFix 1

Back Next

Configuring the Hadoop Distribution Parameters

You can configure parameters related to the Hadoop distribution, such as split size, in the configuration file.

The number of MapReduce jobs that you want to process the input file depends on the split size. The longer the block size, the longer run time for a single job. If you split the input file into multiple parts based on the split size, a separate job processes each part, which improves the run time.

For example, the input file size is 112 MB. If the block size is 128 MB, the HDFS stores the input file in a single block, and a single job processes the file. If you set the split size as 32 MB, the input file is split into 4 parts and 4 jobs process the file.

To configure the Hadoop distribution, add the following parameters to the

HadoopConfiguration

section in the configuration file:

JobName: Optional. Name for the configuration that you create.
MinInputSplitSize: Optional. Minimum valid size in bytes to split a file. Default is 0.
The
MinInputSplitSize
parameter overrides the
mapred.min.split.size
property of Hadoop when you run a job.
MaxInputSplitSize: Optional. Maximum valid size in bytes to split a file.; By default, the split size is equal to the HDFS block size.
The
MaxInputSplitSize
parameter overrides the
mapred.max.split.size
property of Hadoop when you run a job.

The following sample code shows the parameters for the Hadoop distribution:

<HadoopConfiguration>
     <JobName>Extract Job</JobName>
     <MinInputSplitSize>0</MinInputSplitSize>
     <MaxInputSplitSize>33554432</MaxInputSplitSize>
</HadoopConfiguration>

Rename Saved Search

Table of Contents

Installation and Configuration Guide

Installation and Configuration Guide

Configuring the Hadoop Distribution Parameters

Configuring the Hadoop Distribution Parameters