The Vivoka SDK is meant to greatly facilitate the integration of Voice Recognition, Speech Synthesis and Voice Biometrics into your application, regardless of who's providing the underlying engine! This short guide will show you how to get started quickly.
This guide is not meant to be an install guide and will assume everything is at the right place to start developping.
All VSDK engines are to be initialized with a JSON configuration file. We strongly recommend you put this file under a config
directory because some engines will generate additional configuration files.
The content of this file is discussed in separate documents as each engine will have its own configuration block.
The SDK will throw exceptions as a way to report errors and avoid having to check every single function call. The following base program is recommended:
Please note that some part of the SDK might run on another thread of execution, and exceptions can't cross thread boundaries. You have to protect your threads from exceptions by either catching them inside or sending further execution callbacks to the main thread.
You can access the versions of both VSDK and the underlying engines like so:
Starting from VSDK 6, you have access to the Audio Pipeline feature. Put simply, you have an audio producer that will push its audio buffers to consumers. Audio modifiers can be installed in the middle to transform the audio as it goes through the pipeline.
You can't create two separate instances of the same engine! Attempting to create a second one will get you another pointer to the existing engine. Terminate the first engine (e.g. let it go out of scope) then you can make a new instance.
That's it! If no exception was thrown your engine is ready to be used.
Sometimes you need to access a feature specific to the engine you chose. You can access the underlying engine like so:
Although this has been made possible, you should avoid using the native engine unless there is no other way.
Remember, channel must be configured beforehand!
You can also activate a voice right away:
Destruction order is important! Channels must be destroyed before the engine dies or you're in for a ride.
Speech Synthesis is synchronous! That means the call will block the thread until the synthesis is done or an error occured. If you need to keep going, put that in another thread.
You can load your text or SSML input from a text file too:
SynthesisResult
is NOT a pointer type! Avoid copying it around, prefer move operations.
VSDK does not provide an audio player of any sort, it is up to you to choose the one that suits your needs. Once you've chosen one, using the result is very easy:
The audio data is a 16bit signed Little-Endian PCM buffer. Channel count is always 1 and sample rate varies depending on the engine:
Engine | Sample Rate (kHz) |
---|---|
csdk | 22050 |
baratinoo | 24000 |
vtapi | 16000 |
Only PCM extension is available, which means the file has no audio header of any sort.
You can play it by supplying the right parameter, i.e.:
Or add a WAV header:
Two providers are supported : tssv and idvoice
First of all you will need the engine of the provider you chose.
To use the voice biometrics you will need to create models. You can do it via the engine by providing a name for the model and its type (text dependant or independant):
After that you must add records to users via the method addRecord:
Now that you have a user in your model you can create an authenticator or an identificator to test audio against it.
The difference is that an authenticator only tests 1 user, that you must provide as parameter, but the identificator will test the audio provided against all users in the model.
Then you need to subscribe a callback to result event:
Results are only sent if the engine recognized a user, if someone speaks but isn't recognized no error will be sent.
Theses classes are AudioConsumers so you can add them into the pipeline: