w3hello.com logo
Home PHP C# C++ Android Java Javascript Python IOS SQL HTML videos Categories
Python having trouble accessing usb microphone using Gstreamer to perform speech recognition with Pocketsphinx on a Raspberry Pi

So I finally got this guy working.

Couple key things I needed to realize:

1. Even if you're using Pulseaudio on your Raspberry Pi, as long as Alsa is still installed you're still able to use it. ( This might seem like a no brainer to others, but I honestly didn't realize I could still use both of these at the same time ) Hint via (syb0rg).

2. When it comes to sending large amounts of raw audio data ( .wav format in my case ) to Pocketsphinx via Gstreamer, (queues) are your friend.

After messing around with gst-launch-0.10 on the command line for a while I came across something that actually worked:

gst-launch-0.10 alsasrc device=hw:1 ! queue ! audioconvert !
audioresample ! queue ! vader name=vader auto-threshold=true ! pocketsphinx
lm=/home/pi/dev/scarlettPi/config/speech/lm/scarlett.lm
dict=/home/pi/dev/scarlettPi/config/speech/dict/scarlett.dic
hmm=/usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k
name=listener ! fakesink dump=1

So what's happening here?

  • Gstreamer is listening to device hw:1 ( Which is my Ps3 Eye USB device ). This device might vary, you can determine this by running :
pi@scarlettpi ~ $ pacmd dump
Welcome to PulseAudio! Use "help" for usage information.

....

load-module module-alsa-card device_id="0" name="platform-bcm2835_AUD0.0"

card_name="alsa_card.platform-bcm2835_AUD0.0" namereg_fail=false tsched=yes fixed_latency_range=no ignore_dB=no deferred_volume=yes card_properties="module-udev-detect.discovered=1"

load-module module-udev-detect

load-module module-bluetooth-discover

load-module module-esound-protocol-unix

load-module module-native-protocol-unix

load-module module-gconf

load-module module-default-device-restore

load-module module-rescue-streams

load-module module-always-sink

load-module module-intended-roles

load-module module-console-kit

load-module module-systemd-login

load-module module-position-event-sounds

load-module module-role-cork

load-module module-filter-heuristics

load-module module-filter-apply

load-module module-dbus-protocol

load-module module-switch-on-port-available

load-module module-cli-protocol-unix

load-module module-alsa-card device_id="1"
name="usb-OmniVision_Technologies__Inc._USB_Camera-B4.09.24.1-01-CameraB409241"
card_name="alsa_card.usb-OmniVision_Technologies__Inc._USB_Camera-B4.09.24.1-01-CameraB409241"
namereg_fail=false tsched=yes fixed_latency_range=no ignore_dB=no

deferred_volume=yes card_properties="module-udev-detect.discovered=1"

....

The important line to notice is:

load-module module-alsa-card device_id="1"
name="usb-OmniVision_Technologies__Inc._USB_Camera-B4.09.24.1-01-CameraB409241"
card_name="alsa_card.usb-OmniVision_Technologies__Inc._USB_Camera-B4.09.24.1-01-CameraB409241"
namereg_fail=false tsched=yes fixed_latency_range=no ignore_dB=no
deferred_volume=yes card_properties="module-udev-detect.discovered=1"

Thats my Playstation 3 Eye, and thats on device_id=1. Hence hw:1

  • The audio data coming in from the ps3 eye gets resampled and added to a gstreamer queue and has to pass through a (vader) element before moving on to pocketsphinx. By passing the audio through the vader element w/ the auto-threshold=true flag on, gstreamer can determine the background noise level, which can be important if you have a lousy soundcard or a far-field microphone. This is how the pocketsphinx element will know when an utterance starts and ends.

  • Add the regular pocketsphix arguments to the pipeline that we already determined (here).

  • Pass everything into a fakesink since we don't need to hear anything right now, we only need pocketsphinx to listen to everything. The dump=1 flag provides us with more debugging information to see what's being processed / if audio is being accepted at all.

** After getting that to run successfully, the new python code looks like this: **

self.pipeline = gst.parse_launch(' ! '.join(['alsasrc device=' +
scarlett_config.gimmie('audio_input_device'),
                                           'queue',
                                           'audioconvert',
                                           'audioresample',
                                           'queue',
                                           'vader name=vader
auto-threshold=true',
                                           'pocketsphinx lm=' +
scarlett_config.gimmie('LM') + ' dict=' + scarlett_config.gimmie('DICT') +
' hmm=' + scarlett_config.gimmie('HMM') + ' name=listener',
                                           'fakesink dump=1']))

Hope this helps someone.

NOTE: Please excuse me if my Gstreamer pipline is using excessive elements. I'm fairly new to Gstreamer, and i'm opener to more efficient ways of doing this.





© Copyright 2018 w3hello.com Publishing Limited. All rights reserved.