Informatica Data Quality
- Informatica Data Quality 10.4.1
- All Products
Bug
| Description
|
---|---|
BDM-35661
| The Spark engine fails mappings on a Cloudera CDP Public Cloud cluster when the following conditions are true:
You might see the following exception in the log file:
java.lang.reflect.InvocationTargetException ... Caused by: org.apache.spark.sql.AnalysisException: cannot resolve '<column name>' given input columns: [<column names>]
In the SQL override query, provide an alias name for columns that use arithmetic operations in the query. For example,
SELECT INT_1 + 100 as <alias name> FROM Hive_table .
Cloudera ticket number: CDPD-3293
|
BDM-35570
| When the Spark engine runs a mapping on an Amazon EMR 6.0 cluster fails with an error like:
org.apache.spark.sql.AnalysisException: Column <list of columns> are ambiguous. It's probably because you joined several Datasets together, and some of these Datasets are the same. This column points to one of the Datasets but Spark is unable to figure out which one. Please alias the Datasets with different names via `Dataset.as` before joining them, and specify the column using qualified name, e.g. `df.as("a").join(df.as("b"), $"a.id" > $"b.id")`. You can also set spark.sql.analyzer.failAmbiguousSelfJoin to false to disable this check.
Workaround: Disable the analysis by adding the following advanced property in the Hadoop connection:
spark.sql.analyzer.failAmbiguousSelfJoin=false
Apache ticket number: SPARK-32551
|
BDM-35133
| When the Spark engine runs a mapping that contains an Update Strategy transformation with a DD_DELETE condition on an EMR 6.0 cluster, the mapping fails with an error like:
java.io.IOException: Corrupted records with different bucket ids from the containing bucket file found! Expected bucket id 0, however found the bucket id 1
Apache ticket number: HIVE-20719
|
BDM-35513
| A mapping that runs on the Spark engine on an EMR 6.0 cluster and which contains an Update Strategy transformation with a DD_INSERT condition fails with an error like:
java.io.IOException: Corrupted records with different bucket ids from the containing bucket file found! Expected bucket id 0, however found the bucket id 1
Apache ticket number
HIVE-20719
|