Data corpuses¶

With Alan AI, you can easily create an AI Agentic Interface that integrates both static and dynamic data sources to respond to user queries. Alan AI uses advanced techniques like semantic search, question answering and AI-driven code generation to provide accurate and relevant responses while maintaining a natural, multi-turn conversation flow.

You can set up data corpuses for your AI Agentic Interface if you already have a pool of information that you want to use for the following purposes:

Handle user requests
Offer 24/7 automated support
Onboard new customers and employees
Provide instructions, training and so on

When building a Q&A Agentic Interface, you can combine diverse types of data sources. The Q&A service accepts the following data formats:

Web pages
Plain text
PDF
CSV
JSON data

Alan Al provides several tools to define and work with data corpuses:

Static corpus

Learn how to crawl static data sources such as websites, web pages and PDF files.

Dynamic corpus

Learn how to crawl dynamic data sources: APIs, databases and so on.

Puppeteer crawler

Learn how to crawl dynamically loaded data and specific page sections with the Puppeteer crawler.

Crawling depth

Understand how Alan AI crawls documents and what the depth parameter means in different types of corpuses.

Corpus priority

Learn how to prioritize one data corpus above the others.

Corpus filtering

Learn how to filter data corpuses based on criteria like user roles, product versions or individual preferences.

Corpus includes and excludes

Learn how to include and exclude specific documents from data corpuses.

Protected resources

Find out how to crawl websites and web pages that require basic authentication.

Crawler tasks

Discover how to manage data crawling tasks in Alan AI Studio.

Corpus Explorer

Learn how to review and examine what data sources and content your Q&A AI Agentic Interface utilizes to converse with users.

Data corpuses¶

See also¶