Marco Radic/ June 1, 2020/ Uncategorized

Almost everyone has already interacted with a system that reacts to language. Whether on the smartphone, in the car or with the Smart Home Assistant, the big technology companies have brought speech recognition into the hands of the end user.

Striking is the significant improvement in these systems over the years. By now, smartphones wake up to a speech command and can recognize the owner by voice. We owe this to the massive progress in probably the most popular principle of AI, deep learning by means of neural networks.

Thanks to application-oriented research and advances in the software and hardware segment, we are now enriched by state-of-the-art AI models with smarter software that can solve even the most complex problems by recognizing complex patterns. Thus, allow computers to conduct conversations with people or autonomously control cars in traffic. We understand the comprehension and the process of spoken language as a highly human task, as it represents a large part of our own communication. However, intelligent systems nowadays can even support us there as an interface between man and machine. However, the requirements for the productive use of such a system are often special and full of subtleties. For example, spoken language already differs for different speakers, accents and dialects. It is also necessary to support a wide range of input channels. For example, the sound, quality and background noise of telephone systems, smartphones or conference microphones can differ, sometimes significantly. In addition, there are various requirements in the vocabulary used. In some cases, the system would support technical terms and a constantly increasing vocabulary.

The final integration of the system, which must have real-time capability and reliable connections, is another typical software engineering task. Speech recognition is used, for example, as part of an assistance system for call center operation. Call center employees are often faced with the task of actively conducting a conversation with the customer and simultaneously filling in and navigating forms and system masks on the computer. This requires training and a consistently high level of understanding on the part of the employee over multiple hours to keep the service quality at a high level. In order to support employees in their daily work, the system can ‘listen in’ on telephone calls, recognize and process what is being said by all participants in real time and, depending on the detected concern or context, display or trigger actions automatically on the screen. This increases throughput in the call center, reduces training costs and improves the quality of consulting.

targens offers a solution in this area, which provides speech recognition in German. It is characterized by a high degree of linguistic and technological adaptability, can be operated on site or in the cloud and integrates with the rest of the ‘Conversational AI’ offering. This modern AI stack offers a complete solution for speech processing, speech comprehension and speech output and can thus support AI-supported digitization.

More about this topic in our Blog ‘Artificial intelligence is revolutionizing the bank’s customer relationship’

Image by Gerd Altmann from Pixabay