自动语音识别与访问：20年后，20个月后，或者明天？

来源：wenku168.com 资料编号：WK1685569 资料等级：★★★★★ %E8%B5%84%E6%96%99%E7%BC%96%E5%8F%B7%EF%BC%9AWK1685569

资料介绍

　　如果人们能够把语音输入到机器，并且可以使语音信息自动转换成文本信息显示在屏幕上，这将是多么的神奇啊！下面就是这种技术，作者介绍了这种技术是怎样用来给那些听力不好而又十分想轻松使用口语词汇的人。

　　到目前为止，你或许已经听说了有关ASR技术在计算机应用方面和改造世界方面的报道。你可以不需要触摸键盘，就能写一篇论文、一条E-Mail信息和打开很多的程序。就像平时的呼叫一样，新的需求是希望出现连续的语音而不是断断续续的或者停顿的语音。以前，一个人一次只能说一个字，在字与字之间必须要有明显的停顿。虽然，这可以通过练习而很容易的完成，但大多数情况下这是非常麻烦和不受欢迎的。

　　这种技术最近有了一些突破性的进展，例如海量存储器和快速处理器技术的发展，已经可以使持续的语音更加逼真。这就意味着，一个人可以按每分钟160个字的速度说话，而计算机可以自动的将语音信息转换成文本，错误率只有3－4％。你停顿的越多，纠正的口误越多，那么程序的识别效果就越好。实际上，许多程序都建议你花30分钟来进行发音训练，当然你也可以花更多的时间或者持续的对ASR进行训练。（在过去，初始训练时间必须得需要一到两个小时,换句话说，通过麦克风将原文输入到计算机这一过程是必须的）。不仅如此，程序的价格也在不断下降，这确实是一个令人激动的消息。

　　如果计算机的反应和演说者的反应很相似，那么就算是成功一半了。瞧！你可能梦想到西方文明教授，配偶，或老板正通过麦克风输入语音。那些语音信息就以文本的形式在计算机屏幕上显示出来。不幸的是，它不是那么得通俗易懂。我愿意去弥补当前语音识别技术的这种缺限，也想解释一下为什么有些人会对这种技术感兴趣，会觉得这种技术对他们来说是必须、是有用处的。这种技术同正在发展的免提听写系统并不一样。

　　自动语音识别和免提听写

　

Automatic Speech Recognition and Access: 20 years, 20 Months, or Tomorrow?

By Cheryl D. Davis

Wouldn't it be great if people could speak into a machine and have it automatically converted to text to be read on a screen? The technology is here. The author explains how it might be used for the person with hearing loss who wants easy access to the spoken word.

By now you have probably heard news reports on automatic speech recognition (ASR) technology for use with computers and the claims of how it will change the world. You can write a paper, compose an e-mail message, and open programs, without ever touching the keyboard. The new claims come about with the advent of continuous speech rather than discrete speech or stop speech, as it is sometimes called. In the past, a person had to speak one word at a time, pausing distinctly between words. Although this was fairly easy to achieve with practice, most still found it cumbersome and undesirable.

Recently, though, several breakthroughs in the technology, such as greater memory allowing for larger dictionaries and faster processors, have made continuous speech a reality. This means that a person can speak at rates of up to 160 words per minute and have the computer automatically convert this speech to text, with error rates of three or four percent. The more you stop and correct your errors, the better the program becomes at recognizing your speech patterns. In fact, some programs advertise that