Static corpus transforms

When working with static corpuses, you can apply the following types of transforms:

  • Query transforms to fine-tune the AI agent reasoning and refine its decision-making logic

  • Output transforms to modify and format responses provided by the AI agent

Query transforms

Query transforms used with static corpuses allow you to refine the AI agent’s decision-making and direct it to the appropriate data corpus. To adjust the AI reasoning process, you can:

  • Instruct the AI agent which corpus to use for particular types of queries

  • Instruct the AI agent to omit specific data corpuses

Example of use

Assume you have two corpuses that provide information about HTTP requests:

  • HTTP basics

  • HTTP status codes

Dialog script
corpus({
    title: `HTTP basics`,
    urls: [`https://www.tutorialspoint.com/http/http_responses.htm`],
    depth: 1,
    maxPages: 10,
    priority: 1,
});

corpus({
    title: `HTTP status codes`,
    urls: [`https://developer.mozilla.org/en-US/docs/Web/HTTP/Status`],
    depth: 1,
    maxPages: 1,
    priority: 2,
});

The AI agent should retrieve information from corpuses in the following way:

  • If the user asks question about status codes, the AI agent should use the HTTP status codes corpus.

  • For all other questions, the AI agent should refer to the HTTP basics corpus.

To direct the AI agent to a specific corpus:

  1. In the AI agent project, under Transforms, create the code_queries transform with the following data:

    1. In the Instruction field, provide general instructions on how to handle status code queries:

      Instruction
      If a question relates to HTTP status codes, convert it to JSON with an array of search queries and the question; else generate null.
      
    2. In the Examples section, add an example:

      • At the top of the Query field, select the data format: text. In the field below, enter the user query: What does code 401 mean?

      • At the top of the Result field, select the data format: json. In the field below, add the result query the AI agent should generate:

      Transform example
      {
          "search": [
              "Code 401",
              "401 meaning",
              "401 error"
          ],
          "question": "What does code 401 mean?"
      }
      
    ../../../_images/corpus-query-transforms-direct.png
  2. To the HTTP status codes corpus, add the query parameter and define the created transform in it:

    Dialog script
    corpus({
        title: `HTTP status codes`,
        urls: [`https://developer.mozilla.org/en-US/docs/Web/HTTP/Status`],
        query: transforms.code_queries,
        depth: 1,
        maxPages: 1,
        priority: 2,
    });
    

Now, the AI agent will use the HTTP status codes corpus to answer questions about status codes.

../../../_images/corpus-query-transforms-direct-result.png

To instruct the AI agent to omit a specific corpus for particular types of queries:

  1. In the AI agent project, under Transforms, create the basic_queries transform with the following data:

    1. In the Instruction field, provide general instructions on how to handle status code queries:

      Instruction
      If a question relates to status codes, generate null.
      
    2. In the Examples section, add an example:

      • At the top of the Query field, select the data format: text. In the field below, enter the user query: What is a status code?.

      • At the top of the Result field, select the data format: json. In the field below, add the reasoning for the AI agent:

      Transform example
       <thinking>
       This query is about status codes.
       </thinking>
      
       null
      
    ../../../_images/corpus-query-transforms.png
  2. To the HTTP basics corpus, add the query parameter and define the created transform in it:

    Dialog script
    corpus({
        title: `HTTP basics`,
        urls: [`https://www.tutorialspoint.com/http/http_responses.htm`],
        query: transforms.basic_queries,
        depth: 1,
        maxPages: 10,
        priority: 1,
    });
    

Now, the AI agent will use only the HTTP status codes corpus to answer questions about status codes.

Output transforms

Assume you want the AI agent to format responses in a specific way. For this, you can do the following:

  1. To the dialog script, add the corpus() function, specify the data source and name of the transform that will be used to format the answer. In this example, we will use a transform named output.

    Dialog script
    corpus({
        urls: [
            `https://developer.mozilla.org/en-US/docs/Web/HTTP/Status`,
       ],
       transforms: transforms.output,
       maxPages: 3,
       depth: 1
    });
    
  2. In the AI agent project, under Transforms, create the output transform with the following data:

    1. In the Instruction field, provide general instructions describing transform example fields:

      Instruction
      Input is the source text, query is the user query, result describes the final text formatted in Markdown.
      
    2. In the Examples section, set the Result data format to markdown.

      ../../../_images/corpus-output-transform.png
  3. Use the Debugging Chat to ask questions you want to be transformed, for example: What does code 304 mean?

  4. In the code editor, open the output transform and in the top right corner, click History.

  5. The Transforms Explorer displays the results of data transformation. To the right of the row, click the plus icon to add the result to the output examples.

    ../../../_images/transforms-corpus.png
  6. In the Result field, edit the Full Answer and Links sections to format the output as needed.

    ../../../_images/transforms-corpus-result.png

Now, you can ask questions about the status codes and see how the set formatting rules are applied to the output:

../../../_images/transforms-corpus-testing.png