The media is abuzz with news about large language models (LLM) doing things that were virtually impossible for computers before. From generating text to summarizing articles and answering questions, LLMs are enhancing existing applications and unlocking new ones.
However, when it comes to enterprise applications, LLMs can’t be used as is. In their plain form, LLMs are not very robust and can make errors that will degrade the user experience or possibly cause irreversible mistakes.
To solve these problems, enterprises need to adjust the LLMs to remain constrained to their business rules and knowledge base. One way to do this is through fine-tuning language models with proprietary data. Here is what you need to know.
The hallucination problem
LLMs are trained for “next token prediction.” Basically, it means that during training, they take a chunk from an existing document (e.g., Wikipedia, news website, code repositories), and try to predict the next word. Then they compare their prediction with what actually exists in the document and adjust their internal parameters to improve their prediction. By repeating this process over a very large corpus of curated text, the LLM develops a “model” of the language and model contained in the documents. It can then produce long stretches of high-quality text.
However, LLMs don’t have working models of the real world or the context of the conversation. They are missing many of the things that humans possess, such as multi-modal perception, common sense, intuitive physics, and more. This is why they can get into all kinds of trouble, including hallucinating facts, which means they can generate text that is plausible but factually incorrect. And given that they have been trained on a very wide corpus of data, they can start making up very wild facts with high confidence.
Hallucination can be fun and entertaining when you’re using an LLM chatbot casually or to post memes on the internet. But when used in an enterprise application, hallucination can have very adverse effects. In healthcare, finance, commerce, sales, customer service, and many other areas, there is very little room for making factual mistakes.
Scientists and researchers have made solid progress in addressing the hallucination problem. But it is not gone yet. This is why it is important that app developers take measures to make sure that the LLMs that power their AI Assistants are robust and remain true to the knowledge and rules that they set for them.
Fine-tuning large language models
One of the solutions to the hallucination problem is to fine-tune LLMs on application-specific data. The developer must curate a dataset that contains text that is relevant to their application. Then they take a pretrained model and give it a few extra rounds of training on the proprietary data. Fine-tuning improves the model’s performance by limiting its output within the constraints of the knowledge contained in the application-specific documents. This is a very effective method for use cases where the LLM is applied to a very specific application, such as enterprise settings.
A more advanced fine-tuning technique is “reinforcement learning from human feedback” (RLHF). In RLHF, a group of human annotators provide the LLM with a prompt and let it generate several outputs. They then rank each output and repeat the process with other prompts. The prompts, outputs, and rankings are then used to train a separate “reward model” which is used to rank the LLM’s output. This reward model is then used in a reinforcement learning process to align the model with the user’s intent. RLHF is the training process used in ChatGPT.
Another approach is to use ensembles of LLMs and other types of machine learning models. In this case, several models (hence the name ensemble) process the user input and generate the output. Then the ML system uses a voting mechanism to choose the best decision (e.g., the output that has received the most votes).
While mixing and fine-tuning language models is very effective, it is not trivial. Based on the type of model or service used, developers must overcome technical barriers. For example, if the company wants to self-host its own model, it must set up servers and GPU clusters, create an entire MLOps pipeline, curate the data from across its entire knowledge base, and format it in a way that can be read by the programming tools that will be retraining the model. The high costs and shortage of machine learning and data engineering talent often make it prohibitive for companies to fine-tune and use LLMs.
API services reduce some of the complexities but still require large efforts and manual labor on the part of the app developers.
Fine-tuning language models with Alan AI Platform
Alan AI is committed to providing high-quality and easy-to-use actionable AI platform for enterprise applications. From the start, our vision has been to create AI Platform that makes it easy for app developers to deploy AI solutions to create the next-generation user experience.
Our approach ensures that the underlying AI system has the right context and knowledge to avoid the kind of mistakes that current LLMs make. The architecture of the Alan AI Platform is designed to combine the power of LLMs with your existing knowledge base, APIs, databases, or even raw web data.
To further improve the performance of the language model that powers the Alan AI Platform, we have added fine-tuning tools that are versatile and easy to use. Our general approach to fine-tuning models for the enterprise is to provide “grounding” and “affordance.” Grounding means making sure the model’s responses are based on real facts, not hallucinations. This is done by keeping the model limited within the boundaries of the enterprises knowledge base and training data as well as the context provided by the user. Affordance means knowing the limits of the model and making sure that it only responds to the prompts and requests that fall within its capabilities.
You can see this in the Q&A Service by Alan AI, which allows you to add an Actionable AI assistant on top of the existing content.
The Q&A service is a useful tool that can provide your website with 24/7 support for your visitors. However, it is important that the AI assistant is truthful to the content and knowledge of your business. Naturally, the solution is to fine-tune the underlying language model with the content of your website.
To simplify the fine-tuning process, we have provided a simple function called corpus, which developers can use to provide the content on which they want to fine-tune their AI model. You can provide the function with a list of plain-text strings that represent your fine-tuning dataset. To further simplify the process, we also support URL-based data. Instead of providing raw text, you can provide the function with a list of URLs that point to the pages where the relevant information is located. These could be links to documentation pages, FAQs, knowledge bases, or any other content that is relevant to your application. Alan AI automatically scrapes the content of those pages and uses them to fine-tune the model, saving you the manual labor to extract the data. This can be very convenient when you already have a large corpus of documentation and want to use it to train your model.
During inference, Alan AI uses the fine-tuned model with the other proprietary features of its Actionable AI platform, which takes into account visuals, user interactions, and other data that provide further context for the assistant.
Building robust language models will be key to success in the coming wave of Actionable AI innovation. Fine-tuning is the first step we are taking to make sure all enterprises have access to the best-in-class AI technologies for their applications.