Dynamic corpus¶

You can define dynamic data corpuses for your AI agent.

Dynamic data corpuses allow you to retrieve semi-structured data (JSON) from different data sources, process it using transforms and utilize the retrieved data to answer user queries in natural language. Dynamic data corpuses can be connected with various external systems, such as corporate platforms, databases, API services and other specialized data sources, to provide users with accurate, timely and contextually relevant information.

Corpus parameters¶

To define a dynamic corpus, use the corpus() function:

Dialog script¶

corpus({
   title: `Infrastructure requests`,
   description: `Corpus to answer user queries about infrastructure objects`,
   input: project.objects,
   query: transforms.vms_queries,
   output: project.cleanObjects,
   transforms: transforms.vms_answer,
   priority: 1
});

Name	Type	Required/Optional	Description
`title`	string	Optional	Corpus title.
`description`	string	Optional	Corpus description.
`input`	function	Optional	Function used to populate the `Input` field of the `query` transform.
`query`	function	Required	Transforms function used to process user queries and generate code to retrieve necessary data.
`output`	function	Optional	Function used to process obtained data, before it is passed to the `transforms` function.
`transforms`	function	Optional	Transforms function used to process and format data obtained with the `query` transform and, optionally, the `output` function.
`priority`	integer	Optional	Priority level assigned to the corpus. Corpuses with higher priority are considered more relevant when user requests are processed. For details, see Corpus priority.

How the dynamic corpus works¶

The implementation of a dynamic corpus can vary depending on the specific business use case and scenario. Typically, the data flows through the following stages:

The user makes a request to the dynamic corpus.
Alan AI retrieves JSON data from an external system and applies the query function to it to handle the user’s request and generate the code needed to retrieve the relevant information. This process can involve:
- The input data passed to the query transform
- Functions with JSDoc comments
[Optional] Alan AI may perform additional processing on the data using the output function.
The data is passed to the transforms function. Alan AI applies the transforms function instructions to process and format the output for the user.
The response data is presented to the user.

../../../_images/dynamic-corpus-scheme.png

Example of use¶

Assume you have a JSON object that lists virtual machines (VMs) in a cloud environment. You want to use this data as a dynamic source so users can ask questions about the VMs, and the AI agent can provide a formatted response in natural language.

To do this, perform the following steps:

Retrieve VM data: add a function that retrieves the VM data from the data source.
Add a query transform: instruct the AI agent on how to generate code that will get the necessary data to answer user queries.
Add a data formatting transform: instruct the AI agent on how to format the output of the corpus data.
Add a dynamic corpus: define a dynamic corpus.
[Advanced] Clean up the input data: process the input data before it is passed to the formatting transform.

Step 1. Retrieve data¶

To retrieve JSON data from the data source, you will typically make an API call to the data provider. To keep things simple, we will add a JSON object defining VMs data directly to the dialog script.

In the dialog script, create the project.infrastructure variable:

Dialog script¶

project.infrastructure = {
    "vms": [
        {
            "name": "prod-web-server",
            "cpu": 4,
            "memoryGB": 8,
            "diskGB": 256,
            "location": "US-West",
            "status": "Running",
            "lastUpdated": "2024-08-26T12:00:00Z",
            "createdBy": "AdminUser",
            "notes": "No issues reported."
        },
        {
            "name": "prod-db-server",
            "cpu": 2,
            "memoryGB": 4,
            "diskGB": 128,
            "location": "EU-Central",
            "status": "Stopped",
            "lastUpdated": "2024-08-26T12:00:00Z",
            "createdBy": "AdminUser",
            "notes": "No issues reported."
        },
        {
            "name": "stage-app-server",
            "cpu": 8,
            "memoryGB": 16,
            "diskGB": 512,
            "location": "Asia-East",
            "status": "Running",
            "lastUpdated": "2024-08-26T12:00:00Z",
            "createdBy": "AdminUser",
            "notes": "No issues reported."
        }
    ]
}

Step 2. Add a query transform¶

With a query transform, you can instruct the AI agent on how to process the input data and generate code that returns the information needed to answer user queries.

In this example, we will instruct the AI agent using functions added to transforms.

Note

Each function used in transforms must have an explanation formatted as a JSDoc comment preceding the function code.

To the dialog script, add the getAllVMs() function with the function description:

Dialog script¶

/**
* @tool: Get virtual machines for a group.
* @param: None
* @return: Array with virtual machines descriptions.
* [
*     {
*         "name": "stage-app-server",
*         "cpu": 8,
*         "memoryGB": 16,
*         "diskGB": 512,
*         "location": "Asia-East",
*         "status": "Running/Stopped",
*         "lastUpdated": "2024-08-26T12:00:00Z",
*         "createdBy": "AdminUser",
*         "notes": "No issues reported."
*     },
*     ...
* ]
*/
function getAllVms() {
    const data = project.infrastructure;
    const objects = [];
    if (data.vms && Array.isArray(data.vms)) {
        objects.push(...data.vms);
    }
    return objects;
}

In the AI agent project, under Transforms, create the vms_queries transform with the following data:
1. In the Instruction field, import getAllVms function and provide general instructions on how to process VMs data. Then save the transform.
  Transform instruction¶
  #import getAllVms When a question is asked, make a decision if the question relates to VMs or not. If question does not relate to VMs or is too generic, generate null. If question relates to VMs, write an async function getRequestedData() that takes no parameters. getRequestedData() must call the provided functions to construct a JSON that will have all the necessary information to answer the question.
2. In the Examples section, add an example to answer the Show all VMs question. At the bottom of the view, click Add Row and create a transform example:
  
  Note
  
  To open an example in preview mode, in the top left corner of any cell, click the magnifying glass icon.
  - At the top of the Input field, select the data format: json.
  - At the top of the Query field, select the data format: text. In the field below, enter the user query: Show all VMs.
  - At the top of the Result field, select the data format: javascript. In the field below, add steps in natural language to retrieve all VMs data wrapped with <thinking></thinking> tags:
    Transform example¶
    <thinking> To return all VMs info: 1. Use getAllVms() to get all VMs data. 2. Return the result. </thinking>
  - At the top of the Result field, click the Generate result button to automatically generate code for the instructions specified in the <thinking></thinking> block:
  - To test if the generated function works correctly, in the top right corner of the Result field, click the Run script button:
3. In a similar way, add another example to ask a question: Show VMs with the Stopped status: add steps to retrieve stopped VMs data wrapped with <thinking></thinking> tags and click the Generate result button to automatically generate code for the instructions specified in the <thinking></thinking> block.

Step 3. Add a data formatting transform¶

With a data formatting transform, you can define the output format for the AI agent response.

In the AI agent project, under Transforms, create the vms_answers transform with the following data:

In the Instruction field, provide general instructions on how to format the VMs data. Then save the transform.

Instruction¶

The input contains sample JSON with VM data, the query contains a set of sample user questions, the result field contains the formatted answer to be provided.

../../../_images/transforms-output-general.png

In the Examples section, add an example:

At the top of the Input field, select the data format: json.
At the top of the Query field, select the data format: text. In the field below, enter the user query: Show all VMs.

At the top of the Result field, select the data format: markdown. In the field below, add the VM description formatted in Markdown:

Transform example¶

Here is a list of all VMs:

| Name                 | CPU | Memory (GB) | Disk (GB) | Location    | Status  | Last Updated          | Created By | Notes                |
|----------------------|-----|-------------|-----------|-------------|---------|-----------------------|------------|----------------------|
| **prod-web-server**  | 4   | 8           | 256       | US-West     | Running | 2024-08-26T12:00:00Z  | AdminUser  | No issues reported.  |
| **prod-db-server**   | 2   | 4           | 128       | EU-Central  | Stopped | 2024-08-26T12:00:00Z  | AdminUser  | No issues reported.  |
| **stage-app-server** | 8   | 16          | 512       | Asia-East   | Running | 2024-08-26T12:00:00Z  | AdminUser  | No issues reported.  |

../../../_images/transforms-output-format.png

In a similar way, add another example for a query: Show all running VMs:

Step 4. Add a dynamic corpus¶

To define a dynamic corpus, add the corpus() function with the following parameters to the dialog script:

Dialog script¶

corpus({
    title: `Infrastructure requests`,
    description: `Corpus to answer user queries about infrastructure objects`,
    query: transforms.vms_queries,
    transforms: transforms.vms_answers,
    priority: 1
});

Now, you can ask the AI agent questions like:

Show all VMs
Show all stopped VMs
Show all running VMs

and so on.

../../../_images/transforms-code-result.png

Note

To adjust the generated code and output data for new queries, open the necessary transform, in the top right corner, click History and click the add icon to the right of the necessary query row. The query will be added to transform examples. Here, you can edit it as described above.

Step 5. Clean up the input data¶

Note

This step is required if you want to process the data retrieved with the query function before sending it to formatting transform.

Assume we only want to remove auxiliary VM fields from the answer: lastUpdated, createdBy and notes.

To do this:

In the dialog script, create a set of fields you want to exclude:

Dialog script¶

const excludeFields = new Set([
    "lastUpdated",
    "createdBy",
    "notes"
]);

Add the cleanObjects() function that will return the VM data without excluded fields and save it to project.cleanObjects:

Dialog script¶

function cleanObjects(obj) {
    if (typeof obj !== 'object' || obj === null) {
        return obj;
    }

    const result = {};

    for (const key in obj) {
        if (obj.hasOwnProperty(key) && !excludeFields.has(key)) {
            result[key] = cleanObjects(obj[key]);
        }
    }
    console.log(result);
    return result;
}

project.cleanObjects = cleanObjects;

Update the dynamic corpus to include the output parameter with the project.cleanObjects function:

Dialog script¶

corpus({
    title: `Infrastructure requests`,
    query: transforms.vms_queries,
    output: project.cleanObjects,
    transforms: transforms.vms_answers,
    priority: 1
});

Now, you can ask the AI agent questions like:

Show all VMs
Show all stopped VMs
Show all running VMs

The AI agent will use the cleaned data to provide a response.

../../../_images/transforms-code-result2.png