to parse semi-structured or unstructured data in mappings that run on the Spark engine or the Databricks Spark engine.
Long, complex files with little or no structure can be difficult to understand, much less parse. CLAIRE
Intelligent Structure Discovery
can automatically discover the structure in unstructured data.
CLAIRE uses machine learning algorithms to decipher data in semi-structured or unstructured data files and create a model of the underlying structure of the data. You can generate an
intelligent structure model
, a model of the pattern, repetitions, relationships, and types of fields of data discovered in a file, in
Informatica Intelligent Cloud Services
.
To use the model, you associate it with a data object that represents a complex file source in a data engineering mapping. You can run the mapping on the Spark engine or the Databricks Spark engine to process the data. The mapping uses the
intelligent structure model
to extract and parse data from input files based on the structure expressed in the model.
To associate a model with a data object, you can perform one of the following tasks:
Create and export an
intelligent structure model
in Cloud
Data Integration
and then select the model when you create the data object.
Select an XML, JSON, ORC, Avro, or Parquet sample file when you create the data object.
Intelligent Structure Discovery
creates an
intelligent structure model
based on the sample file that you select.
You cannot edit or refine a model that
Intelligent Structure Discovery
creates automatically. You can only edit a model that you create in Cloud