View this week's entry ad »
Part Inventory
powered by:
Part Finder
Go
powered by:
  • Quick Poll
What Social Networking site do you use the most?



VOTE VIEW RESULTS
Previous Polls
Hotspots » Analog & Mixed SignalPowerEmbedded

Premium Content

Editors' Picks

Featured Industry Resources

Speech Chip Listens Well, Talks Clearly, Plays Music

By Dave Bursky

April 01, 2002

Print
Reprints Comment Subscribe

Low-cost high-quality speech recognition and synthesis has been the holy grail of the electronics industry for several decades. It's here now in the RSC-4x family of speech-recognition and synthesis chips from Sensory Inc. Each chip needs just an additional power supply, a low-cost microphone, a speaker, and a crystal or resonator.

Integration, the use of advanced DSP technology, and an innovative architecture played a key role in the chips' development. All necessary digital-signal processing, data storage, digital control, and analog input and output support are on-chip. As a single-chip solution, the RSC-4x family allows speech I/O to be easily incorporated into a wide range of consumer, automotive, cellular, and portable computing applications, among others.

Some of these chips sell for less than $2.00 in large quantities. They all provide high recognition accuracy for both speaker-dependent and speaker-independent applications, as well as high-quality speech synthesis.

The silicon works in tandem with the company's Sensory Speech 7 firmware, which consists of a suite of tools, algorithms, and libraries. With the tools, products that store over five minutes of compressed speech, multiple speaker-dependent and speaker-independent vocabularies, speaker verification, and all application code can be implemented as a single-chip solution.

The chips can perform speaker-independent or speaker-dependent recognition, speaker verification, speaker-adaptive recognition, word spotting, and continuous-listening recognition. When coupled with the speech-synthesis capability, the RSC-4x processors can also perform voice recording and playback. Data rates of less than 14 kbits/s can be achieved while maintaining very high-quality reproduction.

In the synthesis mode, the chips can deliver high-quality speech synthesis based on a proprietary version of the linear-predictive code algorithm that allows the data rate to go as low as 5 kbits/s. The low data rate permits a considerable amount of prerecorded speech to be stored on-chip, or in an external ROM or flash device. A MIDI-like synthesis mode with four "voices" lets multiple instruments harmonize and generate music simultaneously.

Initially, there will be three versions of the chip--the RSC-4000, RSC-4128, and RSC-4256. The first, a ROMless implementation with address and data buses, enables users to store the vocabulary in external ROMs, flash memories, or static RAMs. The other two are ROM-based versions, with 128 and 256 kbytes of ROM, respectively.

Customized Core Controls All: To keep costs low, Sensory's designers based their signal-processing architecture around an 8051-compatible controller and a vector-processor unit (see the figure). While the controller incorporates an extended instruction set optimized for managing data operations, the custom vector processor handles many of the signal-processing tasks. The vector accelerator includes a single-cycle multiplier and twin DMA-channel controllers. The DMA controllers stream data from the on-chip or external memory to the vector engine.

Also contained on-chip is a 5-kbyte block of static RAM. Most of this RAM can be used to hold word data, but 256 bytes are set aside for the embedded microcontroller to handle the application program parameters.

In the recognition mode, the RSC-4x can execute both hidden-Markov modeling and a neural-network based algorithm. Speaker-dependent recognition, which uses a dynamic time-warping technique, may require some external memory to store speech information (10 words can be stored on-chip), while speaker-independent vocabularies can be stored on- or off-chip.

Speaker-independent recognition requires no training and leverages the word libraries developed by Sensory, or customer-developed libraries incorporated into the on-chip or external ROM. The RSC-4x recognizes up to 16 words in an active set. (An active set is a limited list of words that the chip will recognize. In turn, each word in an active list can open another active list, and so on.)

Only the amount of internal or external memory limits the number of active sets. By using cascaded active lists, up to about 1000 words can be stored in the on-chip memories. But off-chip memory can handle an almost unlimited number of words.

Speaker-dependent recognition lets the user create custom vocabularies. Up to 100 words can be recognized in an active set, but the on-chip RAM only allows six words to be stored on-chip. Off-chip memory must be used for larger vocabularies. In the continuous listening mode, the chip can continuously listen for a specific word. With this feature, a product can be used in a normal environment and only "activates" when a specific word, framed by silence, is spoken.

Additionally, the chips have a word-spotting feature that allows them to continuously listen for up to five speaker-independent or five speaker-dependent words at a time. In the word-spotting mode, the word doesn't require framing by silence.

Average ( Ratings):
Filed Under:

Check for price and availability on Source ESB:

Go
powered by  

Related Products

You must log on before posting a comment.

Are you a new visitor? Register Now

Acceptable Use Policy

Sponsored Links