In advanced mode, you can use a Vector Embedding transformation to generate vector
embeddings for input text, capturing the semantic meaning of the text in a vector format.
Before using a Vector Embedding transformation, use a Chunking transformation to split
the text into chunks and process it to make the data cleaner and semantically more
consistent for vector embedding. Then, the Vector Embedding transformation can generate
vector embeddings for each chunk of text using an embedding model like Word2Vec or BERT,
or your own embedding model. For more information about the Chunking transformation, see
Chunking transformation.
To create an identifer for each vector, you can use either the UUID_STRING function in an
Expression transformation or a Sequence Generator transformation:
If you use the UUID_STRING function
in an Expression transformation, use the function without passing any arguments. The
function returns a globally unique ID that can be stored in a string field with a
precision of 100. For more information, see
Function Reference
.
If you use a Sequence Generator
transformation, create a shared sequence to use across all mappings that load data
to the same index in the vector database.
A Target transformation can write the vectors to a vector database.
The Vector Embedding transformation
can't run in a serverless runtime environment on AWS, on an
advanced cluster
on Google
Cloud, or on GPUs. If the transformation runs on a GPU-enabled cluster, GPUs are
disabled and the transformation consumes CPUs.