Objectives

Voice Control of Media Front Ends and Automation Hub.
Resilient to noise interference
Low bandwith
Low latency

Ideas

Simple Offline Control

Pocketsphinx with very limited vocabulary
Every command is keyword triggered
Quick timeouts

Hybrid Pocketsphinx Google API

Recognise trigger using pocketsphinx
Acknowledge with beeps
- Need to manage mixer controls
Pass commands to online STT engine (http://wit.ai)
Process and control Kodi and openHAB
Fall-back to Simple Offline Control

Hardware

PS3 Eye
Integrated into RaspBMC / OSMC

Software

http://blog.hekkers.net/2014/04/16/home-automation-and-voice-control/

Prerequisites

Support packages

sudo apt-get install alsa-utils python-pip python-yaml python-dateutil python-pyaudio
sudo pip install apscheduler # need never versions, apt versions are too old

ALSA playback

sudo modprobe snd_usb_audio # USB mic, loads as card1 on RPi (after snd-bcm2835)

$[Get Code]6

THIS DOES NOT WORK:

options snd-usb-audio index=0
options snd-bcm2835 index=1

Don't even bother trying to force index=1 for snd-bcm2835, it doesn't support the index parameter:

osmc@osmc:~$ /sbin/modinfo snd-bcm2835
filename:       /lib/modules/4.3.3-3-osmc/kernel/sound/arm/snd-bcm2835.ko
alias:          platform:bcm2835_alsa
license:        GPL
description:    Alsa driver for BCM2835 chip
author:         Dom Cobley
srcversion:     46AE410DEA6D239DB70D2C9
alias:          of:N*T*Cbrcm,bcm2835-audio*
depends:        snd-pcm,snd
intree:         Y
vermagic:       4.3.3-3-osmc preempt mod_unload modversions ARMv6 
parm:           force_bulk:Force use of vchiq bulk for audio (bool)

Let snd-bcm2835 be card0 and load snd-usb-audio as card1:

osmc@osmc:~$ cat /etc/modprobe.d/jasper.conf 
options snd-usb-audio index=1

Then configure defaults in .asoundrc accordingly.

osmc@osmc:~$ arecord -l
**** List of CAPTURE Hardware Devices ****
card 1: CameraB409241 [USB Camera-B4.09.24.1], device 0: USB Audio [USB Audio]
  Subdevices: 1/1
  Subdevice #0: subdevice #0

Audio configuration for PS3 Eye

The PS3 Eye is a camera with a 4-channel array mic.

Local ~/.asoundrc

## Suggested by http://julius.sourceforge.jp/forum/viewtopic.php?f=9&t=66
pcm.array {
  type hw
  card 0
}

pcm.array_gain {
  type softvol
  slave {
    pcm "array"
  }
  control {
    name "Mic Gain"
    count 2
  }
  min_dB -10.0
  max_dB 5.0
}

pcm.cap {
  type plug
  slave {
    pcm "array_gain"
    channels 4
  }
  route_policy sum
}

pcm.!default {
    type asym

    playback.pcm {
     type plug
      slave.pcm {
        @func getenv
        vars [ ALSAPCM ]
        default "hw:0,0"
      }
    }
    capture.pcm {
        type plug
        slave.pcm "cap"
    }
}

Jasper

Project : http://jasperproject.github.io/

Passive STT : pocketsphinx

Active STT : wit.ai

TTS : Flite

Integrates STT and TTS systems. Python-based.

Configuration

~/.jasper/profile.yml

...
stt_passive_engine: sphinx
stt_engine: witai
witai-stt:
  access_token: A0VERY0LONG0ALPHA0NUMERIC0STRING
tts_engine: flite-tts
flite-tts:
  voice: slt
...

For split active and passive STT we need pocketsphinx and related packages.

RPi2 Installation

For RPi2 (armv7) we can use packages from Debian experimental:

sudo su -c "echo 'deb http://ftp.debian.org/debian experimental main contrib non-free' > /etc/apt/sources.list.d/experimental.list"
sudo apt-get update
sudo apt-get -t experimental install cmuclmtk phonetisaurus m2m-aligner mitlm libfst-tools libfst1-plugins-base libfst-dev

RPi1 Installation

For RPi1 (armv6) we can't use packages from Debian experimental so must build from source or install from elsewhere.

Install cognomen packages

# add repo
sudo su -c "echo 'deb http://cognomen.co.uk/apt/debian jessie main' > /etc/apt/sources.list.d/cognomen.list"
# import pgp key
gpg --keyserver keyserver.ubuntu.com --recv  FC88E181D61C9391C4A49682CF36B219807AA92B && gpg --export --armor keymaster@cognomen.co.uk | sudo apt-key add -
# update
sudo apt-get update
sudo apt-get install pocketsphinx pocketsphinx-hmm-en-hub4wsj python-pocketsphinx python-yaml phonetisaurus m2m-aligner mitlm libfst-tools libfst1-plugins-base libfst-dev cmuclmtk python-semantic

Building RPi1 dependencies from source

Trying to Cross Compile

Don't need crosstool-ng can use prebuilt raspberrypi-tools x86-32 linaro cross compiler.

Naïve openfst cross-compile

export PATH=~/src/raspberrypi-tools/arm-bcm2708/gcc-linaro-arm-linux-gnueabihf-raspbian/bin:$PATH
./configure --host arm-linux-gnueabihf --enable-compact-fsts --enable-const-fsts --enable-far --enable-lookahead-fsts --enable-pdt
make -j 8

Cross compilation works but Debian RaspberryPi Packaging doesn't.

Build natively

apt-get source phonetisaurus m2m-aligner mitlm openfst
# for each
dpkg-buildpackage -us -uc -rfakeroot

Install to repo

On the system with the signing keys:

sshfs yuggoth:/ yuggoth-ssh
cd yuggoth-ssh/var/www/data/cognomen.co.uk/apt/debian
for i in *.deb
do
    reprepro includedeb jessie "$i"
done

Other methods

wit.ai Standalone

Not used by jasper.

sudo apt-get install libsox2
wget https://github.com/wit-ai/witd/releases/download/v0.1/witd-armv6
chmod a+x witd-armv6
./witd-armv6

Voice Command for RPi

http://stevenhickson.blogspot.co.uk/2013/06/voice-command-v30-for-raspberry-pi.html

CMU Sphinx, PocketSphinx, KodiVC

http://cmusphinx.sourceforge.net/wiki/raspberrypi

sudo apt-get install build-essential sshfs automake libtool

$[Get Code]19

RaspBMC/Kodi uses pulseaudio so use that for kodivc.

sudo apt-get install bison libpulse-dev

$[Get Code]20

KodiVC: github

Google Voice API

V1 API probably doesn't work any more. V2 needs at least a new API key (limited to 50 calls per day).

Old script

From http://blog.oscarliang.net/raspberry-pi-voice-recognition-works-like-siri/ :

#!/bin/bash

echo "Recording... Press Ctrl+C to Stop."
arecord -D "plughw:1,0" -q -f cd -t wav | ffmpeg -loglevel panic -y -i - -ar 16000 -acodec flac file.flac  > /dev/null 2>&1

echo "Processing..."
wget -q -U "Mozilla/5.0" --post-file file.flac --header "Content-Type: audio/x-flac; rate=16000" -O - "http://www.google.com/speech-api/v1/recognize?lang=en-us&client=chromium" | cut -d\" -f12  >stt.txt

echo -n "You Said: "
cat stt.txt

rm file.flac  > /dev/null 2>&1

VoiceRecognition