AMAI, a voice technology startup, has launched a text-to-speech multispeaker model that it claims 97 percent of people can’t distinguish from a real person.
The multi-speaker model can be used to dub movie translations in original celebrity voices, create personalized ads, and localize games. Media outlets will be able to voice all their articles with one click in any language.
“Voice technology is in demand in the market, but many companies are not sure how to use AI voice: voice assistant at the entrance to a supermarket, selling goods in different countries, localize content, etc. The good news is that people no longer fear bots. With a voice multispeaker, text-to-speech becomes even more accessible to ordinary users. And there is no other technology in the world that allows you to do this in real-time with such a high level of quality,” said Pavel Osokin, Co-founder and CEO of AMAI, in a statement.
“Currently, there are no other product-ready models on the market due to the complex nature of this technology. There is a lot of complexity in training models and in the proper approach to writing neural networks, as it is critical to be able to devise your own models and to work with problem samples like acronyms,” said Max Baluev, Co-founder and Chief Technology Officer of AMAI, in a statement.