User:Tedd/Speech Recognition
Contents
Architecture
API limitation
Acess to the speech recognition is limited to certified apps (at the time of writing), using the 'Func' parameter inside the WebIDL constructor:
Func="SpeechRecognition::IsAuthorized"
IsAuthorized is implemented inside the SpeechRecognition class:
bool SpeechRecognition::IsAuthorized(JSContext* aCx, JSObject* aGlobal) { bool inCertifiedApp = IsInCertifiedApp(aCx, aGlobal); bool enableTests = Preferences::GetBool(TEST_PREFERENCE_ENABLE); bool enableRecognitionEnable = Preferences::GetBool(TEST_PREFERENCE_RECOGNITION_ENABLE); bool enableRecognitionForceEnable = Preferences::GetBool(TEST_PREFERENCE_RECOGNITION_FORCE_ENABLE); return (inCertifiedApp || enableRecognitionForceEnable || enableTests) && enableRecognitionEnable; }
Relevant files in the Gecko source tree
pocketsphinx library code:
./media/pocketsphinx
Speech recognition WebIDL:
./dom/webidl/SpeechRecognitionResultList.webidl ./dom/webidl/SpeechRecognitionResult.webidl ./dom/webidl/SpeechRecognitionAlternative.webidl ./dom/webidl/SpeechRecognition.webidl ./dom/webidl/SpeechRecognitionEvent.webidl ./dom/webidl/SpeechRecognitionError.webidl
WebIDL implementation (C++):
gecko/dom/media/webspeech/recognition/SpeechRecognitionResult.h gecko/dom/media/webspeech/recognition/SpeechRecognition.h gecko/dom/media/webspeech/recognition/SpeechRecognitionResultList.h gecko/dom/media/webspeech/recognition/SpeechRecognitionResult.cpp gecko/dom/media/webspeech/recognition/SpeechRecognitionAlternative.cpp gecko/dom/media/webspeech/recognition/SpeechRecognitionResultList.cpp gecko/dom/media/webspeech/recognition/SpeechRecognition.cpp gecko/dom/media/webspeech/recognition/SpeechRecognitionAlternative.h
Recognition service IDL:
./dom/media/webspeech/recognition/nsISpeechRecognitionService.idl
Implementation of the IDL interface (C++):
./dom/media/webspeech/recognition/PocketSphinxSpeechRecognitionService.cpp ./dom/media/webspeech/recognition/PocketSphinxSpeechRecognitionService.h
Events Implementation (C++):
./dom/events/SpeechRecognitionError.h ./dom/events/SpeechRecognitionError.cpp
Association between components for speech recognition
Speech recognition functionality is available in JavaScript through WebIDL which is bound to a C++ class, which in return uses the nsISpeechRecognitionSerivce interface to communicate with the actual recognition service. This section should illustrate how each component is associated with one another.
In JavaScript (given the right permissions) a 'SpeechRecognition' object can be created:
var speech = new SpeechRecognition(); speech.start(stream);
The invoked function is defined inside a WebIDL file (SpeechRecognition.webidl):
interface SpeechRecognition : EventTarget { ... void start(optional MediaStream stream); ... }
The SpeechRecognition interface and the start method, are itself implemented in a C++ class (SpeechRecognition::Start):
void SpeechRecognition::Start(const Optional<NonNull<DOMMediaStream>>& aStream, ErrorResult& aRv) { ... nsresult rv; rv = mRecognitionService->Initialize(this); ... }
mRecognitionService is an instance of the class that implements the nsISpeechRecognitionService interface.
interface nsISpeechRecognitionService : nsISupports { void initialize(in SpeechRecognitionWeakPtr aSpeechRecognition); ... }
In case of pocketpshinx, this class is defined in PocketSphinxSpeechRecognitionService.h which implements the Initialize function as well:
NS_IMETHODIMP PocketSphinxSpeechRecognitionService::Initialize( WeakPtr<SpeechRecognition> aSpeechRecognition) { ... }
This class uses the pocketsphinx library for the speech recognition, an example of the library use is shown here:
rv = ps_process_raw(mPs, &mAudiovector[0], mAudiovector.Length(), FALSE, FALSE); rv = ps_end_utt(mPs); confidence = 0;
Library functions used in pocketsphinx service class
The following exported functions from the library are used inside the SpeechRecognition (pocketsphinx) service
int ps_start_utt(ps_decoder_t *ps); int ps_process_raw(ps_decoder_t *ps, int16 const *data, size_t n_samples, int no_search, int full_utt); int ps_end_utt(ps_decoder_t *ps); char const *ps_get_hyp_final(ps_decoder_t *ps, int32 *out_is_final); int32 ps_get_prob(ps_decoder_t *ps); logmath_t *ps_get_logmath(ps_decoder_t *ps); arg_t const *ps_args(void); ps_decoder_t *ps_init(cmd_ln_t *config); int ps_set_jsgf_string(ps_decoder_t *ps, const char *name, const char *jsgf_string); int ps_set_search(ps_decoder_t *ps, const char *name);