87 lines
4.4 KiB
YAML
87 lines
4.4 KiB
YAML
- release_tracks: [ALPHA, BETA]
|
|
|
|
help_text:
|
|
brief: Transcribe speech in a video.
|
|
description: |
|
|
Transcribe speech within a video.
|
|
examples: |
|
|
To transcribe speech in video file 'gs://my_bucket/input_file.mp4':
|
|
|
|
$ {command} gs://my_bucket/input_file.mp4
|
|
request:
|
|
_COMMON_: request
|
|
static_fields:
|
|
features: [SPEECH_TRANSCRIPTION]
|
|
modify_request_hooks:
|
|
- googlecloudsdk.command_lib.ml.video.util:UpdateRequestWithInput
|
|
|
|
async:
|
|
_COMMON_: async
|
|
|
|
arguments:
|
|
params:
|
|
- _COMMON_args
|
|
- api_field: videoContext.speechTranscriptionConfig.enableAutomaticPunctuation
|
|
arg_name: enable-automatic-punctuation
|
|
help_text: |
|
|
If enabled, adds punctuation to recognition result hypotheses. This feature is only
|
|
available in select languages. Setting this for requests in other languages has no effect at
|
|
all. The default disabled case does not add punctuation to result hypotheses.
|
|
- api_field: videoContext.speechTranscriptionConfig.enableSpeakerDiarization
|
|
arg_name: enable-speaker-diarization
|
|
help_text: |
|
|
If enabled, perform speaker detection for each recognized word in the top alternative of the
|
|
recognition result using a speaker_tag provided in the WordInfo results section. When this
|
|
is enabled, we send all the words from the beginning of the audio for the top alternative in
|
|
every consecutive responses. This is done in order to improve our speaker tags as our models
|
|
learn to identify the speakers in the conversation over time.
|
|
- api_field: videoContext.speechTranscriptionConfig.diarizationSpeakerCount
|
|
arg_name: diarization-speaker-count
|
|
help_text: |
|
|
Optional estimated number of speakers in the conversation. If not specified, defaults to
|
|
'2'. Ignored unless --enable-speaker-diarization is enabled.
|
|
- api_field: videoContext.speechTranscriptionConfig.maxAlternatives
|
|
arg_name: max-alternatives
|
|
help_text: |
|
|
Maximum number of recognition hypotheses to be returned. Specifically, the maximum number of
|
|
`SpeechRecognitionAlternative` messages within each `SpeechTranscription`. The server may
|
|
return fewer than `max_alternatives`. Valid values are `0`-`30`. A value of `0` or `1` will
|
|
return a maximum of one. If omitted, will return a maximum of one
|
|
- api_field: videoContext.speechTranscriptionConfig.languageCode
|
|
arg_name: language-code
|
|
required: true
|
|
help_text: |
|
|
The language of the supplied audio as a
|
|
[BCP-47](https://www.rfc-editor.org/rfc/bcp/bcp47.txt)
|
|
language tag. Example: "en-US". See
|
|
[Language Support](https://cloud.google.com/speech/docs/languages) for a list of the
|
|
currently supported language codes.
|
|
- api_field: videoContext.speechTranscriptionConfig.filterProfanity
|
|
arg_name: filter-profanity
|
|
help_text: |
|
|
If enabled, the server will attempt to filter out profanities, replacing all but the
|
|
initial character in each filtered word with asterisks, e.g. ```f***```. If disabled or
|
|
omitted, profanities will not be filtered out.
|
|
- api_field: videoContext.speechTranscriptionConfig.enableWordConfidence
|
|
arg_name: enable-word-confidence
|
|
help_text: |
|
|
If enabled, the top result includes a list of words and the confidence for those words. If
|
|
disabled, no word-level confidence information is returned. The default is disabled.
|
|
- api_field: videoContext.speechTranscriptionConfig.audioTracks
|
|
arg_name: audio-tracks
|
|
processor: googlecloudsdk.command_lib.ml.video.util:AudioTrackProcessor
|
|
help_text: |
|
|
For file formats, such as MXF or MKV, supporting multiple audio tracks, specify up to two
|
|
commas separated tracks. Default track is 0.
|
|
- api_field: videoContext.speechTranscriptionConfig.speechContexts.phrases
|
|
arg_name: hints
|
|
default: []
|
|
help_text: |
|
|
list of strings containing words and phrases "hints" so that the speech recognition is
|
|
more likely to recognize them. This can be used to improve the accuracy for specific words
|
|
and phrases, for example, if specific commands are typically spoken by the user. This can
|
|
also be used to add additional words to the vocabulary of the recognizer. See
|
|
(https://cloud.google.com/speech/limits#content).
|
|
output:
|
|
format: yaml
|