Algorithm, Simple Circuit Add Natural Voice To A Design
Working on an embedded project? An 8-bit microcontroller with a pulse (PWM) peripheral can provide a low-cost and easy solution to adding natural voice to your next embedded endeavor.
We recently implemented this technique in SuitSat-1 (see "Latest Radio Amateur Satellite Is No Empty Suit," March 16, 2006, p. 25). Listeners could hear recordings of schoolchildren saying "greetings from space," as well as telemetry readings (time, temperature, and battery voltage).
One way to encode speech is through a technique called adaptive differential pulse code modulation (ADPCM), which digitizes analog signals. ADPCM takes advantage of the high correlation between consecutive speech samples and encodes the difference between a predicted sample and the speech sample. It also provides an efficient compression with quality speech playback.
An algorithm developed by the Interactive Multimedia Association (IMA) significantly reduces the conversion's mathematical complexity by simplifying many of the operations and using table lookups where appropriate. Consequently, it's a good choice for 8-bit microcontrollers.
A custom-written PC conversion program converts standard audio files to the ADPCM format used by the system. In this case, the data is a 16-bit unsigned integer with an 8-kHz sample rate. The data is stored in the flash file system.
The figure shows the hardware for this system. The microcontroller addresses the voice file for playback from memory and decodes the file using the PWM module. The output of the PWM module is low-pass filtered at a 4000-Hz band pass. The resulting analog signal can be amplified and played through a speaker.
To make playback interactive, the voice snippets are separated into individual, addressable files. For example, to speak a numeric value for temperature, the numbers one through nine, 10 through 19, 20, 30, 40, 50, 60, 70, 80, and 90 are recorded in separate files. So when the temperature is 21°, the voice will speak two files one after the other: twenty-one. A simple file system is used to store and retrieve the individual ADPCM voice files.
The amount of memory needed to store the voice files depends on the number of bits, the sample rate, and the amount stored. For toll-quality sound, the number of bits is 16 at a rate of 8000 samples per second. (This equates to a 4000-Hz bandwidth.) Thus, the size of one second of voice is 16,000 bytes.
Once the voice file is encoded with the IMA ADPCM algorithm, the file compresses to one-quarter its original size. Depending on the amount of voice needed for a project, it can be stored in the program memory of the microcontroller or an external serial flash memory. Therefore, a 1-Mbit (128-kbyte) serial flash memory can hold approximately 32 seconds of voice.
The adpcm.c source file, which contains the definition of the ADPCMDecoder function, is available at ED Online 12167 at www.electronicdesign.com. The adpcm.h code file is available online as well.
The ADPCM transformation uses a table lookup to combine the prior state with the next data item in the ADPCM data stream in order to return the 16-bit result that's used to drive the PWM. It also modifies the structure so that the next iteration will be based on this result. The process can be done quickly, leaving plenty of headroom for the rest of the application. A more detailed explanation of the use of ADPCM can be found at www.microchip.com/stellent/idcplg?IdcService=SS_GET_PAGE&nodeId=1824&appnote=en011118.
With a little effort in recording voices, encoding them in ADPCM format, and storing them in memory, an embedded project can indeed have a natural voice. But it doesn't stop there. Since voicefiles are merely recordings, various chimes, tones, and buzzing sounds can be introduced. The only limit is your imagination. Now, go ahead and enhance the user experience of your next project.