mapred.compress.map.output
Determines whether the map phase's output is compressed or not. Set to false by default. Informatica recommends turning it on for better performance by setting this parameter to true.
mapred.map.output.compression.codec
Specifies the compression codec to be used for map output compression. Default is set to org.apache.hadoop.io.compress.DefaultCodec. Snappy codec is recommended for better performance. For more information, refer to the following codec:
org.apache.hadoop.io.compress.SnappyCodec.
mapred.map.tasks.speculative.execution
Specifies whether the map tasks can be speculatively executed. Default is set to true. With speculative map task execution, duplicate tasks are spawned for the tasks that are not making much progress. Original and speculative tasks are considered alike. The task that completes first is considered and the other is killed.
Informatica recommends keeping the default value set to true for better performance.
mapred.reduce.tasks.speculative.execution
Specifies whether the reduce tasks can be speculatively executed. Default is set to true. This is similar to map task speculative execution in functionality.
Informatica recommends setting mapred.reduce.tasks.speculative.execution to false to disable the property.
mapred.min.split.size and mapred.max.split.size
Use these two properties in conjunction with the dfs.block.size property. These parameters impact the number of input splits and hence the parallelism.
Informatica recommends using the following formula for each map task on a data block:
mapred.min.split.size < dfs.block.size < mapred.max.split.size
The input split size is calculated by the following formula:
max(minimumSize, min(maximumSize, blockSize))
hive.exec.compress.intermediate
Determines whether the results of intermediate map and reduce jobs in a Hive query are compressed or not. Default is set to false. This should not be confused with the mapred.compress.map.output property that deals with the compression of the output of map task.
Informatica recommends setting the hive.exec.compress.intermediate property to true to enable the property. hive.exec.compress.intermediate uses the same codec specified by mapred.output.compression.codec, and SnappyCodec is recommended for better performance.