In advanced mode, you can use a Vector Embedding transformation to generate vector
embeddings for input text, capturing the semantic meaning of the text in a vector format.
Before using a Vector Embedding transformation, use a Chunking transformation to split
the text into chunks. Then, the Vector Embedding transformation can generate vector
embeddings for each chunk of text using an embedding model like Word2Vec or BERT. For
more information about the Chunking transformation, see Chunking transformation.
To create an identifer for each vector, you can use either the UUID_STRING function in an
Expression transformation or a Sequence Generator transformation:
If you use the UUID_STRING function
in an Expression transformation, use the function without passing any arguments. The
function returns a globally unique ID that can be stored in a string field with a
precision of 100.
UUID_STRING is
an internal function that you can use only in advanced mode. Using it to create
identifiers for other use cases might produce unexpected results.
If you use a Sequence Generator
transformation, create a shared sequence to use across all mappings that load data
to the same index in the vector database.
A Target transformation can write the vectors to a vector database.
The Vector Embedding transformation
can't run in a serverless runtime environment, on an
advanced cluster
on Google Cloud, or on GPUs. If the transformation runs on a
GPU-enabled cluster, GPUs are disabled and the transformation consumes
CPUs.