Speech recognition using Sphinx4
- This tutorial assumes that RAPP API is installed and built
- This tutorial uses the 0.7.0 version of the C++ API
In this tutorial we are going to recognise the speech from an audio using CMU Sphinx 4, RAPP and CMake.
You have to configure your project in like this tree:
project/
CMakeLists.txt
source/
speech_recognition_sphinx4.cpp
build/
Source code
We created a project called speech_recognition_sphinx4
with a folder source
where we have our
example call speech_recognition_sphinx4.cpp
.
You can see the complete example here.
We are going to initialize the platform information and the service controller, which is in charge of make the cloud calls to the RAPP platform:
rapp::cloud::platform info = {"rapp.ee.auth.gr", "9001", "rapp_token"};
rapp::cloud::service_controller ctrl(info);
The callback for this example need vectors of string to save the words that have been recognised and the possible alternatives.
auto callback = [&](std::vector<std::string> words) {
if (words.size() != 0) {
std::cout << "Words: " << std::endl;
for (auto each_word : words) {
std::cout << each_word << std::endl;
}
}
else {
std::cout << "No words found" << std::endl;
}
};
If we see the documentation of the class rapp::cloud::speech_recognition_sphinx4
,
we'll see that an audio and its type are necessary.
rapp::object::audio audio("../data/yes-no.wav");
rapp::types::audio_source audio_src = rapp::types::nao_wav_1_ch;
NOTE: audio_source
is a enum.
You can find the different types of audio in rapp-api/cpp/rapp/objects/globals.hpp
Sphinx4 can use a JSGF input to achieve more accuracy in the recognition. It's an optional input.
std::string jsgf = "#JSGF V1.0;\r\n\r\n";
jsgf += "grammar simpleExample; \r\n\r\n";
jsgf += "public <greet> = Yes | No;\r\n";
Finally, we can make the call to the platform.
ctrl.make_call<rapp::cloud::speech_recognition_sphinx4>(audio.bytearray(),
audio_src,
"en",
std::vector<std::string>({{jsgf}}),
std::vector<std::string>({"yes", "no"}),
std::vector<std::string>({""}),
callback);
Besides the audio and the callback you need to indicate the language, in this case English "en"
.
These are the same inputs as speech recognition using Google.
CMU Sphinx allows us to add some parameters to improve the accuracy, like a JSGF, keywords or sentences to find in the audio.
All of them are optional inputs.
CMakeLists.txt
In this case it assumes that you have built your RAPP API in the static and shared libraries mode.
We haven't added any new library to do this example, so the CMakeLists.txt is the same that getting_started
examples.
You only have to change the name of your source file and the executable.
Build
The next step is about create our build
folder and run our code.
Now we are going to work in the terminal.
- Go to your project path
-
Build your project
mkdir build cd build cmake .. make
- If everything is ok, you will have created your executable
speech_recognition_sphinx4
in the folder build. - Run your executable
./speech_recognition_sphinx4
Now you can explore and make your own projects!