Release Notes (10.4.1.2)

Release Notes (10.4.1.2)

Third-Party Known Issues (10.4.1.2)

Third-Party Known Issues (10.4.1.2)

The following table describes third-party known issues:
Bug
Description
BDM-35661
The Spark engine fails mappings on a Cloudera CDP Public Cloud cluster when the following conditions are true:
  • The mapping reads from a Hive source created with a custom query.
  • The query uses arithmetic operations for the column name. For example, to add 100 to every value in INT_1, you use the following query:
    SELECT INT_1 + 100 FROM Hive_table
    .
You might see the following exception in the log file:
java.lang.reflect.InvocationTargetException ... Caused by: org.apache.spark.sql.AnalysisException: cannot resolve '<column name>' given input columns: [<column names>]
In the SQL override query, provide an alias name for columns that use arithmetic operations in the query. For example,
SELECT INT_1 + 100 as <alias name> FROM Hive_table
.
Cloudera ticket number: CDPD-3293
BDM-35570
When the Spark engine runs a mapping on an Amazon EMR 6.0 cluster fails with an error like:
org.apache.spark.sql.AnalysisException: Column <list of columns> are ambiguous. It's probably because you joined several Datasets together, and some of these Datasets are the same. This column points to one of the Datasets but Spark is unable to figure out which one. Please alias the Datasets with different names via `Dataset.as` before joining them, and specify the column using qualified name, e.g. `df.as("a").join(df.as("b"), $"a.id" > $"b.id")`. You can also set spark.sql.analyzer.failAmbiguousSelfJoin to false to disable this check.
Workaround: Disable the analysis by adding the following advanced property in the Hadoop connection:
spark.sql.analyzer.failAmbiguousSelfJoin=false
Apache ticket number: SPARK-32551
BDM-35133
When the Spark engine runs a mapping that contains an Update Strategy transformation with a DD_DELETE condition on an EMR 6.0 cluster, the mapping fails with an error like:
java.io.IOException: Corrupted records with different bucket ids from the containing bucket file found! Expected bucket id 0, however found the bucket id 1
Apache ticket number: HIVE-20719
BDM-35513
A mapping that runs on the Spark engine on an EMR 6.0 cluster and which contains an Update Strategy transformation with a DD_INSERT condition fails with an error like:
java.io.IOException: Corrupted records with different bucket ids from the containing bucket file found! Expected bucket id 0, however found the bucket id 1
Apache ticket number HIVE-20719


Updated March 26, 2021