Yesterday, our CTO, Andrey Ryabov shared our learnings in developing a Neural Network for Spoken Language Understanding with students at Stanford University. It is part of the AI course on Neural Networks and Deep Learning. Here is a brief on the topics covered.
Spoken language is quite different from the written language in the following ways
- When speaking, people don’t always follow grammar, use punctuation, and often split their sentences.
- Automatic Speech Recognition (ASR) introduces errors.
- Users tend to use more anaphoras.
- When writing, a person can go back and edit sentences, but for a speaker, it’s not possible, corrections are appended to the sentence.
These specifics of spoken language have to be considered when developing Natural Language systems. Many classical NLP models trained on datasets of written language don’t perform well on spoken language.
How does one develop a Voice AI service that converts speech to the meaning and offers human-like conversations? Here are some tricks we shared on Spoken Language Understanding.
- Develop Word Vectors for sentences using classical NLP training set.
- Augment the LSTM to use word positions and context information.
- Use Attention to make only the important words contribute more.
- Use Augmented Dataset and Transfer Learning to better train the Neural Network.
The above are some of the Neural Network enhancements we shared and will soon be released in our Alan service. If the spoken language understanding problem appeals to you and you are an engineer, email firstname.lastname@example.org to learn more. If you are an enterprise that wants to leverage the Voice AI service, contact us at Alan.