A text-to-speech (TTS) specialist that is the only firm of its kind in the UK was last night set to premiere the world’s first singing TTS synthesiser as it seeks new tie-ups to help commercialise its technology.
Edinburgh-based CereProc created the first-of-its-kind technology, which was to debut yesterday on US entertainment programme The Tonight Show with Jimmy Fallon, using the social humanoid robot Sophia.
In a jointly-funded project with Hong Kong-headquartered Hanson Robotics, which developed Sophia, CereProc has given the robot, for which it also created a “characterful” speaking voice, the ability to sing. The Scottish firm said it had “pushed its technology to the next level” by creating the world’s first voice cloning TTS synthesiser, which will make the capability to sing “accessible to all individuals” and has the “potential to change the music industry”.
To reproduce the expression, timbre and other characteristics associated with singing, CereProc trained a Deep Neural Network system and built a database specifically designed for singing synthesis.
The firm is in the process of commercialising the system and expects to release a beta version for testing next year.
Similar technology is already used as a singing aid in parts of Asia, particularly where karaoke is a popular hobby, and in Japan “synthetic vocaloid” Maika has become a popular music act, which performs with an animated avatar.
In 2010, CereProc used its technology for the first time to build a voice from past recordings for the film critic Roger Ebert.
Matthew Aylett, chief scientific officer at CereProc, told The Scotsman that the TTS synthesiser has practical applications in the music, toys and games industries.
He said: “At the moment the market is not clear, but there is a lot of potential as we’ve seen with synthetic instruments over the years. What we would need to commercialise this technology is to find a good partner.”
The technology works particularly well for creating backing vocals, enabling musicians to produce music with fewer resources, he said, adding: : “This will probably get a mixed response, just as people had very mixed responses to pitch-changing technology, which was controversial in various ways.
“We are not trying to replace human singing. The synthesiser doesn’t know what it’s singing and one of the most fundamental elements of a human singer is that they understand the sentiment, the reality of what they’re singing. But at the same time, as computers enter the social domain, being able to do these things becomes more and more important.”
Paul Welham, chief executive at CereProc, said: “We believe strongly in continuous innovation.
“CereProc are sure to be partnering Hanson in the future, developing new expressive TTS voices with even more novel functionality.”