Control your Raspberry Pi via Voice-Control

With a few lines of code you can turn your Raspberry Pi into a nice chat parrner which executes, thanks to voice-control, your commands and even answers to you via Google Translation. The transformation from speech to text – and the other way round – are performed by Google services. And this works astonishingly well: The word-recognition has good match ratios without any training, the speech output sounds clear and is easy to understand.

For voice-recognition you will record a sound file, transform it into FLAC format and send it to Google. The Google servers are going to analyse your FLAC file and send back the text of the probably said word(s). With easy comparisons it is possible to link different words to several actions.

Because the Raspberry Pi has no Audio-Input you have to improvise. Modern webcams have USB microfones included (PLAYSTATION Eye), or you simply purchase a PlayStation 3 – Wireless Singstar Microphone. The USB receiver will be automatically detected as soundcard by the rPi. The Singstar USB interface provides a stereosignal, while the right channel is supplied by the red microphone, the left channel is supplied by the blue one. Because Google servers only interpret the left channel it is obligatory that you use the blue microphone.


After connecting the microphone you can find out its hardware-address by typing arecord -l into the command prompt. If the Singstar device is the only device connected to your Raspberry Pi the hardware-address should be hw:1,0. You may have to increase the level of your mic. to get your Pi working properly. Simply start the tool Alsamixer, navigate with F6 to the proper soundcard and with F4 to the record-properties. By pressing cursor up you can increase the soundinput-sensibility.
The most elegant way to start recording your command is without using an action or a magic word. This can be realized with the small tool Sound eXchange (SOX). SOX permanently controls the sound level and automatically starts recording when a specified sound level is reached. In case of silence it stops the record.

sox -t alsa hw:1,0 test.wav silence 1 0 0.5% -1 1.0 1% &

By the way, SOX can be installed by typing:
sudo apt-get install sox mplayer ffmpeg
into the command prompt.

If you look at the shell-script (download here GERMAN VERSION) you will see that it starts SOX and controls the size of the file. If growth stops it finishes recording, converts the MP3-record into FLAC format and sends the file as HTTP-POST-Request via wget with some decisive parameters to the Google server:
wget -q -U "Mozilla/5.0" --post-file file.flac
--header "Content-Type: audio/x-flac; rate=16000"
-0 -- "" |
cut -d\" -f12 >stt.txt

Please note that the script tries to recognize German words, you will have to change the URL to the above one.

The supplied result is saved in a text-file and the rest is trivial. The script checks if the recognized text contains a specified string and, as the case may be, performs the selected actions.
In our example the script looks for TV-program information and switches light on/off via radio-electrical-outlet (I don’t know if the word exists in English ;)).

Until now conversation was quite uniliteral. Following line of code gives the Raspberry Pi a voice with mplayer:
function say {
mplayer -ao alsa:device=hw=O.O -really-quiet
"User-Agent:Mozilla/5.0 (Windows NT 6.2; WOW64)
AppleWebKit/537.22 (KHTML, like Gecko)
Chrome/25.0.1 364.1 72 Safari/537.22m"

Over the audio Output (hw=0.0) mplayer plays the sound. To suppress the error-message “no socket…” you have to add the line
to mplayer.conf.

With the command say “something” the Raspberry Pi says “something”, any words or sentences are allowed. It outputs the recognized text with the gentle voice of the female Google server.

A complete Raspbian Image with all required packets and scripts pre-installed to get going is ready to be downloaded here.

By the way, with little changes in the script you can control your other PCs with their built in or external microphones

Leave a reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>