Я пишу простой калькулятор, используя Google API, и мне нужно сохранить распознанный текст из потокового аудиовхода для дальнейшей обработки. API имеет конфигурационную опцию single_utterance , которая кажется чрезвычайно полезной, но я не могу найти способ ее настройки.
В облачной библиотеке Text-to-Speech library есть строка, которая говорит:
установите для поля single_utterance значение true в StreamingRecognitionConfig объекте.
Пример кода при распознавании речи github
#!/usr/bin/env python3
# NOTE: this example requires PyAudio because it uses the Microphone class
from threading import Thread
try:
from queue import Queue # Python 3 import
except ImportError:
from Queue import Queue # Python 2 import
import speech_recognition as sr
r = sr.Recognizer()
audio_queue = Queue()
def recognize_worker():
# this runs in a background thread
while True:
audio = audio_queue.get() # retrieve the next audio processing job from the main thread
if audio is None: break # stop processing if the main thread is done
# received audio data, now we'll recognize it using Google Speech Recognition
try:
# for testing purposes, we're just using the default API key
# to use another API key, use `r.recognize_google(audio, key="GOOGLE_SPEECH_RECOGNITION_API_KEY")`
# instead of `r.recognize_google(audio)`
print("Google Speech Recognition thinks you said " + r.recognize_google(audio))
except sr.UnknownValueError:
print("Google Speech Recognition could not understand audio")
except sr.RequestError as e:
print("Could not request results from Google Speech Recognition service; {0}".format(e))
audio_queue.task_done() # mark the audio processing job as completed in the queue
# start a new thread to recognize audio, while this thread focuses on listening
recognize_thread = Thread(target=recognize_worker)
recognize_thread.daemon = True
recognize_thread.start()
with sr.Microphone() as source:
try:
while True: # repeatedly listen for phrases and put the resulting audio on the audio processing job queue
audio_queue.put(r.listen(source))
except KeyboardInterrupt: # allow Ctrl + C to shut down the program
pass
audio_queue.join() # block until all current audio processing jobs are done
audio_queue.put(None) # tell the recognize_thread to stop
recognize_thread.join() # wait for the recognize_thread to actually stop
Класс распознавателя из того же github
def recognize_google(self, audio_data, key=None, language="en-US", pfilter=0, show_all=False):
"""
Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using the Google Speech Recognition API.
The Google Speech Recognition API key is specified by ``key``. If not specified, it uses a generic key that works out of the box. This should generally be used for personal or testing purposes only, as it **may be revoked by Google at any time**.
To obtain your own API key, simply following the steps on the `API Keys <http://www.chromium.org/developers/how-tos/api-keys>`__ page at the Chromium Developers site. In the Google Developers Console, Google Speech Recognition is listed as "Speech API".
The recognition language is determined by ``language``, an RFC5646 language tag like ``"en-US"`` (US English) or ``"fr-FR"`` (International French), defaulting to US English. A list of supported language tags can be found in this `StackOverflow answer <http://stackoverflow.com/a/14302134>`__.
The profanity filter level can be adjusted with ``pfilter``: 0 - No filter, 1 - Only shows the first character and replaces the rest with asterisks. The default is level 0.
Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the raw API response as a JSON dictionary.
Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if the speech recognition operation failed, if the key isn't valid, or if there is no internet connection.
"""
assert isinstance(audio_data, AudioData), "``audio_data`` must be audio data"
assert key is None or isinstance(key, str), "``key`` must be ``None`` or a string"
assert isinstance(language, str), "``language`` must be a string"
flac_data = audio_data.get_flac_data(
convert_rate=None if audio_data.sample_rate >= 8000 else 8000, # audio samples must be at least 8 kHz
convert_width=2 # audio samples must be 16-bit
)
if key is None: key = "AIzaSyBOti4mM-6x9WDnZIjIeyEU21OpBXqWBgw"
url = "http://www.google.com/speech-api/v2/recognize?{}".format(urlencode({
"client": "chromium",
"lang": language,
"key": key,
"pFilter": pfilter
}))
request = Request(url, data=flac_data, headers={"Content-Type": "audio/x-flac; rate={}".format(audio_data.sample_rate)})
# obtain audio transcription results
try:
response = urlopen(request, timeout=self.operation_timeout)
except HTTPError as e:
raise RequestError("recognition request failed: {}".format(e.reason))
except URLError as e:
raise RequestError("recognition connection failed: {}".format(e.reason))
response_text = response.read().decode("utf-8")
# ignore any blank blocks
actual_result = []
for line in response_text.split("\n"):
if not line: continue
result = json.loads(line)["result"]
if len(result) != 0:
actual_result = result[0]
break
# return results
if show_all: return actual_result
if not isinstance(actual_result, dict) or len(actual_result.get("alternative", [])) == 0: raise UnknownValueError()
if "confidence" in actual_result["alternative"]:
# return alternative with highest confidence score
best_hypothesis = max(actual_result["alternative"], key=lambda alternative: alternative["confidence"])
else:
# when there is no confidence available, we arbitrarily choose the first hypothesis.
best_hypothesis = actual_result["alternative"][0]
if "transcript" not in best_hypothesis: raise UnknownValueError()
return best_hypothesis["transcript"]
Theogn_google ( ) функция в классе Recognizer, по-видимому, не имеет передаваемого аргумента, который мог бы установить для поля single_utterance значение true.
Помимо github, документы Google на самом деле не включают библиотеку распознавания речи.
I Я пытался изменить ключ, чтобы подключиться с помощью моего собственного API и взглянуть с точки зрения консоли, но там я тоже столкнулся с проблемами.