IITM: Converting written text to synthesised speech – The Hindu


Automated conversion of the written text to spoken form is very useful, especially in this time of online classes. Having lectures originally presented in English made available in all Indian languages has obvious uses. A group from the computer science department of Indian Institute of Technology Madras is working on this. The researchers are developing the technology to enable text-to-speech conversion for 13 Indian languages: Assamese, Bodo, Bengali, Gujarati, Hindi, Kannada, Malayalam, Manipuri, Marathi, Odia, Rajasthani, Tamil, Telugu and their corresponding Indian English flavours. A study on this was published in the journal IEEE/ACM Transactions on Audio, Speech,and Language Processing.

Indian languages

In order that the synthesised speech sound as natural as possible, and close to a sentence that has been read out by a human being, there is a need to convert punctuations into pauses of suitable lengths. This is the approach when converting English language text into synthesised speech. When applying this to Indian languages, the first difficulty one encounters is that there are no punctuations, save the period. There are many such differences, “The longest English sentence could be about 6 seconds long, while in Indian languages sentences can last as long as 30 seconds,” says Hema A. Murthy from the Computer Science and Engineering department of IIT Madras who led the study. Such long sentences are essentially phrase-based, the researchers found, and each phrase is almost a complete unit.

In the study, voice professionals – news readers and radio jockeys – were made to read out text carefully selected to be representative of various fields. “The audio signal and the text were aligned including pauses. Text was syllabified using rules, and syllables and pauses were identified in the audio using acoustic properties,” explains Prof. Murthy in an email to The Hindu. “Since the text and audio are aligned at the syllable-level, computing syllable-rate, number of syllables between pauses was straightforward,” she adds.

READ  Key U.S. lawmaker objects to Trump 'Space Force' plan

Domains covered

An hour of speech contains about 350-400 sentences. The researchers collected 10 hours of data for every language. “Five hours of data was used for hypothesising, and a set of held out sentences from the database was used for testing the hypothesis,” says Prof. Murthy. The text sentences were chosen in such a way that maximum domain coverage is ensured. “This includes news, sports, fiction, etc, as we work on open domain text-to-speech synthesis systems,” adds Jeena J Prakash from Uniphore Software Systems, IITM Research Park, Chennai, who is the first author of the paper.

Phrase-based synthesis

Using these inputs, the text is split into phrases using the findings. “A phrase location–based speech synthesis system was built [which delineates the first phrase, last phrase and middle phrases]. The phrases of the text were synthesised using the appropriate phrase-based synthesis systems.

The synthesised waveforms were concatenated,” explains Dr Prakash.

The results were tested on listeners to get a subjective evaluation. The original spoken sentences and the synthesised sentences were played out in random order. They found a uniform improvement across all Indian languages. “Currently we are part of a consortium on building speech-to-speech systems, where the objective is to replace the audio in the NPTEL/Swayam Lectures (in English) to vernacular,” says Prof. Murthy.

You have reached your limit for free articles this month.

Subscription Benefits Include

Faster pages

Move smoothly between articles as our pages load instantly.

Dashboard

A one-stop-shop for seeing the latest updates, and managing your preferences.

Briefing

We brief you on the latest and most important developments, three times a day.

Not convinced? Know why you should pay for news.

*Our Digital Subscription plans do not currently include the e-paper ,crossword, iPhone, iPad mobile applications and print. Our plans enhance your reading experience.



READ SOURCE

LEAVE A REPLY

Please enter your comment!
Please enter your name here