WEBINAR Monitor LLMs and ML Models in Production with Label Studio 📈

Automatic Speech Recognition using Segments

Listen to an audio file and segment it, then transcribe the contents of each segment in natural language, performing speech recognition using segments.

Interactive Template Preview

Labeling Configuration

<View>
  <Labels name="labels" toName="audio">
    <Label value="Speech" />
    <Label value="Noise" />
  </Labels>
  <Audio name="audio" value="$audio"/>
  <TextArea name="transcription" toName="audio"
            rows="2" editable="true"
            perRegion="true" required="true" />
</View>

About the labeling configuration

All labeling configurations must be wrapped in View tags.

Use the Labels control tag to allow annotators to highlight portions of the audio that represent different types of noise:

<Labels name="labels" toName="audio">
    <Label value="Speech" />
    <Label value="Noise" />
</Labels>

Use the Audio object tag to display a waveform of audio that can be labeled:

<Audio name="audio" value="$audio"/>

Use the TextArea control tag to prompt annotators to provide a transcript for each segment of audio:

<TextArea name="transcription" toName="audio"
          rows="2" editable="true"
          perRegion="true" required="true" />

The editable="true" argument specifies that the transcript can be edited, and required="true" sets the transcript as a required field for the annotator. Without a transcript provided for each segment of the audio clip (set by the perRegion="true" argument), the annotation can’t be submitted.

Enhance this template

Add context to specific audio segments

If you want to prompt annotators to add context to specific audio segments, such as by selecting the accent or assumed gender of the speakers in a given audio clip, you can add the following to your labeling configuration:

<View visibleWhen="region-selected">
  <Header value="Select the assumed gender of the speaker:" />
  <Choices name="gender" toName="audio"
           perRegion="true" required="true">
    <Choice value="Man" />
    <Choice value="Woman" />
  </Choices>
</View>

The visibleWhen parameter for the View tag means that the choice is only visible when a specific audio segment is selected. The Header tag provides instructions to the annotator. The Choices tag includes the perRegion parameter to apply the selected choice only to the selected audio segment.