Amazon’s Alexa is a cloud-based system with voice recognition and natural language processing that can vocally interact with users. It’s part of the Amazon Echo that competes with other voice-activated systems like Google Home and Apple HomePod. Amazon has done a great job currying favor with hardware vendors to deliver development kits for building your own Alexa Echo, or incorporating this functionality into products from smart washing machines and refrigerators to improved interaction within cars and other environments.
I recently tried out Cirrus Logic’s Alexa Voice Capture Development Kit for Amazon AVS. It comes with a Raspberry Pi in addition to a Cirrus Logic DSP that handles noise cancellation, voice recognition, and voice processing. It’s designed as a low-cost solution with two microphones.
This time I’m taking a look at XMOS’s VocalFusion 4-Mic Dev Kit for Amazon AVS (Fig. 1). Though designed to work with a Raspberry Pi 3 as well, it’s not included in the $499 price. The higher cost is due to a four-microphone array and the matching DSP processing support.
1. The XMOS’s VocalFusion 4-Mic Dev Kit for Amazon AVS is designed to work with a Raspberry Pi (not shown), but it can easily interface with other devices.
The DSP board has an XMOS VocalFusion XVF3000 that implements acoustic echo cancellation (AEC), beamforming, dereverberation, noise suppression, and gain control. It can support linear arrays, like the one provided, to deliver 180-deg. coverage. The board also supports circular arrays for 360-deg. coverage. Both provide far-field, hands-free voice control that can be used with the Alexa cloud-based support or host-based voice recognition and processing.
The XVF3000 is based on XMOS’s xCORE platform with 2 MB of flash memory. The platform has sixteen 32-bit logical cores with hardware scheduling support. A core can wait for an external event, delivering very fast response time, and allows for the creation of soft peripherals. The xC compiler is XMOS’s extended C compiler that handles extensions for scheduling, parallel processing, and other functionality provided by the platform. This functionality can be hidden by the VocalFusion firmware.
Like most Alexa development platforms, the XMOS system provides audio input from the processed microphone array via I2S (Fig. 2). Audio output is driven by data supplied via the I2C interface that also has command and control functions. This approach allows the Raspberry Pi t to be replaced by other options.
2. The VocalFusion XVF3000 does the heavy lifting for voice input and output, while the Raspberry Pi runs the Alexa AVS client.
Connecting the Pi
Mating the XMOS DSP board to the Raspberry Pi 3 is accomplished with a cable that links the two headers on the board to the single header on the processor board. The Plexiglass stand is designed to hold the microphone array, the DSP board, and the Raspberry Pi. The 2-A USB power supply is external and not included in the kit. Likewise, the user must provide a powered speaker.
The USB interface can double as a USB audio device in addition to be the power input. The system does come with an xTAG debugger for the XVF3000 to download new firmware, or for development on the XVF3000 itself. Most developers are likely to use it as a black box, since it’s possible to tweak the parameters used by the voice system using the I2C interface.
Download and Configure
The first thing to do with the system is connect up the Raspberry Pi. The next is to download the NOOBS Linux and put it on a microSD card that plugs into the Raspberry Pi. This is on par with the Cirrus Logic kit, as are the next few steps that include creating a new security profile on Amazon Developer Portal. I already had an account a set up for the Cirrus Logic platform, so it was just a matter of creating a new entry and recording the keys for installation on the microSD card. All of this is spelled out nicely in an XMOS Getting Started document.
XMOS provides shell scripts to download, install, and configure the Alexa Voice Service (AVS) client software. This includes using the security keys generated earlier. Most of the Raspberry Pi configuration is left up to the user versus the Cirrus Logic platform that provides a web interface for configuration.
The end result after installing and configuring the software is functionally the same as an Amazon Echo with a directional microphone array. The system can be up and running in an afternoon with most of the time spent reading the guide and downloading the software.
Moving past this point is where most documentation falls down, including XMOS’s documentation. Cirrus Logic’s web interface makes tweaking some parameters easier, and it provides graphical feedback for the audio processing. It takes some digging for both platforms to find out how to programmatically tweak things. The XMOS configuration appears to be extensive, although you will have to see if that’s sufficient for your needs. Most developers are likely to do minimal tweaking, because the system’s default performance is quite good.
The Sensory software is used for recognition of the “Alexa” keyword. Working with Sensory is one way to create a standalone voice-recognition system. This is also a major investment and not one to enter in lightly.
A system can be controlled using the Raspberry Pi via Alexa “skills.” These are functions invoked by voice commands through AVS. It requires a link to the cloud; the Alexa software essentially communicates with a service that would run on the Raspberry Pi. Of course, the skills can initiate actions with any networked device that has AVS support and is appropriately configured.
Overall, the XMOS VocalFusion 4-Mic Dev Kit for Amazon AVS is a solid platform. The key is its advanced voice-processing system running on the XVF3000 multicore processor.