Effective in version 10.1, you can push mappings to the Apache Spark engine in the Hadoop environment.
Spark is an Apache project with a run-time engine that can run mappings on the Hadoop cluster. Configure the Hadoop connection properties specific to the Spark engine. After you create the mapping, you can validate it and view the execution plan in the same way as the Blaze and Hive engines.
When you push mapping logic to the Spark engine, the Data Integration Service generates a Scala program and packages it into an application. It sends the application to the Spark executor that submits it to the Resource Manager on the Hadoop cluster. The Resource Manager identifies resources to run the application. You can monitor the job in the Administrator tool.
For more information about using Spark to run mappings, see the