GSoC Update 3 - HTML5 Speech API
From MozillaWiki
Six weeks up since coding started. Progress on the project is looking good, though I would've liked to have finished a lot more by now. Lost about 10 days because of examinations, but last week has been quite productive.
Things accomplished since last time:
- Got audio recording to work on mac. (It turns out that the issue I was running into was fixed in a newer version of portaudio. http://www.portaudio.com/trac/ticket/88)
- Figured out how to send audio and receive results - This turned out to be easier than expected. A simple HTTP POST with the audio data gives me the recognition results.
curl -H "Content-Type: audio/x-flac; rate=16000" -F"myfile=@untitle.flac" "https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US"
gives me the result in JSON:
{ "status":0, "id":"b4bbd509bedafc435393b59ce374447d-1", "hypotheses": [ { "utterance":"this is a audio recording", "confidence":0.7447412 } ] }
I'm working on getting the same to work using xmlhttprequest.
Lots more to be done this week:
- UI to get user permission for speech.
- Integrating endpointing, speechrecognizer and everything else.