Objectives

Voice Control of Media Front Ends and Automation Hub.
Resilient to noise interference
Low bandwith
Low latency

Ideas

Simple Offline Control

Pocketsphinx with very limited vocabulary
Every command is keyword triggered
Quick timeouts

Hybrid Pocketsphinx Google API

Recognise trigger using pocketsphinx
Acknowledge with beeps
- Need to manage mixer controls
Pass commands to online STT engine (http://wit.ai)
Process and control Kodi and openHAB
Fall-back to Simple Offline Control

Hardware

PS3 Eye
Integrated into RaspBMC / OSMC

Software

http://blog.hekkers.net/2014/04/16/home-automation-and-voice-control/

Prerequisites

Support packages

sudo apt-get install alsa-utils python-pip python-yaml python-dateutil python-pyaudio
sudo pip install apscheduler # need never versions, apt versions are too old

$[Get Code]5

ALSA playback

sudo modprobe snd_usb_audio # USB mic, loads as card1 on RPi (after snd-bcm2835)

$[Get Code]6

THIS DOES NOT WORK:

options snd-usb-audio index=0
options snd-bcm2835 index=1

Don't even bother trying to force index=1 for snd-bcm2835, it doesn't support the index parameter:

osmc@osmc:~$ /sbin/modinfo snd-bcm2835
filename:       /lib/modules/4.3.3-3-osmc/kernel/sound/arm/snd-bcm2835.ko
alias:          platform:bcm2835_alsa
license:        GPL
description:    Alsa driver for BCM2835 chip
author:         Dom Cobley
srcversion:     46AE410DEA6D239DB70D2C9
alias:          of:N*T*Cbrcm,bcm2835-audio*
depends:        snd-pcm,snd
intree:         Y
vermagic:       4.3.3-3-osmc preempt mod_unload modversions ARMv6 
parm:           force_bulk:Force use of vchiq bulk for audio (bool)

Let snd-bcm2835 be card0 and load snd-usb-audio as card1:

osmc@osmc:~$ cat /etc/modprobe.d/jasper.conf 
options snd-usb-audio index=1

Then configure defaults in .asoundrc accordingly.

osmc@osmc:~$ arecord -l
**** List of CAPTURE Hardware Devices ****
card 1: CameraB409241 [USB Camera-B4.09.24.1], device 0: USB Audio [USB Audio]
  Subdevices: 1/1
  Subdevice #0: subdevice #0

Audio configuration for PS3 Eye

The PS3 Eye is a camera with a 4-channel array mic.

Local ~/.asoundrc

## Suggested by http://julius.sourceforge.jp/forum/viewtopic.php?f=9&t=66
pcm.array {
  type hw
  card 0
}

pcm.array_gain {
  type softvol
  slave {
    pcm "array"
  }
  control {
    name "Mic Gain"
    count 2
  }
  min_dB -10.0
  max_dB 5.0
}

pcm.cap {
  type plug
  slave {
    pcm "array_gain"
    channels 4
  }
  route_policy sum
}

pcm.!default {
    type asym

    playback.pcm {
     type plug
      slave.pcm {
        @func getenv
        vars [ ALSAPCM ]
        default "hw:0,0"
      }
    }
    capture.pcm {
        type plug
        slave.pcm "cap"
    }
}

Jasper

Project : http://jasperproject.github.io/

Passive STT : pocketsphinx

Active STT : wit.ai

TTS : Flite

Integrates STT and TTS systems. Python-based.

Configuration

~/.jasper/profile.yml

...
stt_passive_engine: sphinx
stt_engine: witai
witai-stt:
  access_token: A0VERY0LONG0ALPHA0NUMERIC0STRING
tts_engine: flite-tts
flite-tts:
  voice: slt
...

For split active and passive STT we need pocketsphinx and related packages.

[RPi2][7][?][7] Installation

For [RPi2][7][?]7 we can use packages from Debian experimental:

sudo su -c "echo 'deb http://ftp.debian.org/debian experimental main contrib non-free' > /etc/apt/sources.list.d/experimental.list"
sudo apt-get update
sudo apt-get -t experimental install cmuclmtk phonetisaurus m2m-aligner mitlm libfst-tools libfst1-plugins-base libfst-dev

$[Get Code]8

[RPi1][9][?][9] Installation

For [RPi1][9][?]9 we can't use packages from Debian experimental so must build from source or install from elsewhere.

Install cognomen packages

add repo

sudo su -c "echo 'deb http://cognomen.co.uk/apt/debian jessie main' > /etc/apt/sources.list.d/cognomen.list"

import pgp key

gpg --keyserver keyserver.ubuntu.com --recv FC88E181D61C9391C4A49682CF36B219807AA92B && gpg --export --armor keymaster@cognomen.co.uk | sudo apt-key add -

update

sudo apt-get update
sudo apt-get install pocketsphinx pocketsphinx-hmm-en-hub4wsj python-pocketsphinx python-yaml phonetisaurus m2m-aligner mitlm libfst-tools libfst1-plugins-base libfst-dev cmuclmtk python-semantic

$[Get Code]10

Building [RPi1][9][?][9] dependencies from source

Trying to Cross Compile

Don't need crosstool-ng can use prebuilt raspberrypi-tools x86-32 linaro cross compiler.

Naïve openfst cross-compile

export PATH=~/src/raspberrypi-tools/arm-bcm2708/gcc-linaro-arm-linux-gnueabihf-raspbian/bin:$PATH
./configure --host arm-linux-gnueabihf --enable-compact-fsts --enable-const-fsts --enable-far --enable-lookahead-fsts --enable-pdt
make -j 8

$[Get Code]11

Cross compilation works but Debian RaspberryPi Packaging doesn't.

Build natively

apt-get source phonetisaurus m2m-aligner mitlm openfst

for each

dpkg-buildpackage -us -uc -rfakeroot

$[Get Code]13

Install to repo

On the system with the signing keys:

sshfs yuggoth:/ yuggoth-ssh
cd yuggoth-ssh/var/www/data/cognomen.co.uk/apt/debian
for i in *.deb
do
reprepro includedeb jessie "$i"
done

$[Get Code]14

Other methods

wit.ai Standalone

Not used by jasper.

sudo apt-get install libsox2
wget https://github.com/wit-ai/witd/releases/download/v0.1/witd-armv6
chmod a+x witd-armv6
./witd-armv6

$[Get Code]15

Voice Command for [RPi][16][?][16]

http://stevenhickson.blogspot.co.uk/2013/06/voice-command-v30-for-raspberry-pi.html

CMU Sphinx, [PocketSphinx][17][?][17], [KodiVC][18][?][18]

http://cmusphinx.sourceforge.net/wiki/raspberrypi

sudo apt-get install build-essential sshfs automake libtool

$[Get Code]19

RaspBMC/Kodi uses pulseaudio so use that for kodivc.

sudo apt-get install bison libpulse-dev

$[Get Code]20

[KodiVC][18][?][18]: github

Google Voice API

V1 API probably doesn't work any more. V2 needs at least a new API key (limited to 50 calls per day).

Old script

From http://blog.oscarliang.net/raspberry-pi-voice-recognition-works-like-siri/ :

!/bin/bash

echo "Recording... Press Ctrl+C to Stop."
arecord -D "plughw:1,0" -q -f cd -t wav | ffmpeg -loglevel panic -y -i - -ar 16000 -acodec flac file.flac > /dev/null 2>&1

echo "Processing..."
wget -q -U "Mozilla/5.0" --post-file file.flac --header "Content-Type: audio/x-flac; rate=16000" -O - "http://www.google.com/speech-api/v1/recognize?lang=en-us&client=chromium" | cut -d\" -f12 >stt.txt

echo -n "You Said: "
cat stt.txt

rm file.flac > /dev/null 2>&1

$[Get Code]22

VoiceRecognition