Table of Contents

Search

  1. Preface
  2. Mappings
  3. Mapping tutorial
  4. Parameters
  5. CLAIRE recommendations
  6. Data catalog discovery

Mappings

Mappings

Creating a RAG ingestion pipeline

Creating a RAG ingestion pipeline

In advanced mode, you can create a retrieval augmented generation (RAG) ingestion pipeline to build a knowledge base for your large language model (LLM) application.
To create a RAG ingestion pipeline, you can use a mapping in advanced mode to upload documents such as articles, invoices, and reports. You can split the text into chunks and convert the chunked text into vector embeddings. Then, you can store both the chunked text and the vector embeddings in a vector database.
When you submit a query to your LLM application, you can provide assisting text by calculating the similarity of the query’s embedding and the existing embeddings stored in the vector database to find the most relevant chunks of text that semantically match the query. The LLM incorporates both the query and the assisting text in the response that it generates and returns to user.
Create the mapping using the following transformations, in order:
  1. Source transformation. Read PDFs to extract the text.
  2. Chunking transformation. Split large pieces of text into smaller segments, or chunks, to increase the content's relevance.
  3. Vector Embedding transformation. Generate vector embeddings for input text, capturing the semantic meaning of the text in a vector format.
  4. Expression or Sequence Generator transformation. Create an identifier for each vector.
    • If you use an Expression transformation, use the UUID_STRING function without passing any arguments. The function returns a globally unique ID that can be stored in a string field with a precision of 100.
      UUID_STRING is an internal function that you can use only in advanced mode. Using it to create identifiers for other use cases might produce unexpected results.
    • If you use a Sequence Generator transformation, create a shared sequence to use across all mappings that load data to the same index in the vector database.
  5. Target transformation. Write vectors to a vector database.
For more information about each transformation, see
Transformations
.

0 COMMENTS

We’d like to hear from you!