by Lynn H.

Describe Modality from PoV of Information Theory / Informatics

From an Information Theory/ Informatics PoV is a Modality the use of a specific channel within our actions in Order to provide information to a system

What is machine learning?

Machine learning is a field of study in artificial intelligence that focuses on developing algorithms and models that enable computers to learn and make predictions or decisions based on data without being explicitly programmed.

Approaches for machine learning

Explain the following approaches

Supervised Learning

Semi-supervised Learning

Unsupervised Learning

Self-supervised Learning

Reinforcement Learning

Approaches for machine learning

Supervised Learning: learns from (high amount of) labeled data

Semi-supervised Learning: learns from labeled and unlabeled data

Unsupervised Learning: learns patterns and structures from unlabeled data

Self-supervised Learning: learns from unlabeled data by predicting hidden parts

Reinforcement Learning: learns from interacting with an environment by maximizing a reward

Describe the classification of interaction modalities (Bernsen, 1999)

Overview of multimodal interactive systems. Classification of interaction modalities (Bernsen, 1999)

Linguistic vs. non-linguistic
Analogue vs. non-analogue
Arbitrary vs. non-arbitrary
Graphical (visually perceivable), acoustic (auditorily perceivable), haptic
Static vs. dynamic (only output modalities)

Which contexts are important in automatic speech recognition?

Speech: Language, dialect, speaking style, …
Speaker:
- dependent vs. independent vs. adaptive
- known vs. unknown
- cooperative vs. uncooperative
Target units: Type, number, complexity
Environment: Background noise, transmission channels, etc.

→ Automatic speech recognition is a very difficult task!

Why is speech rec. so dificult (for machines)

Task complexity:

Variances and invariances in the speech signal have to be differentiated
„to recognize speech“ vs. „to wreck a nice beach“
→ Contextual knowledge helps to understand
Speech is a continuous signal, not a sequence of elementary sounds (even across word boundaries)
Articulation depends on the surrounding sounds
(co-articulation)

Speech Recognition

Complete the block diagram for an Automatic Speech Recognition:

Describe it

The feature extraction block finds the information in the speech signal that is relevant for the detection of what words or sentences were actually spoken. Most importantly, the resulting features should not include information that holds no value for the detection such as information about the speaker and the environment. The is done for speech blocks of about 𝟐𝟎 ms.

In the next step, the phoneme probabilities must be estimated from these feature vectors (classification). This is done using both the acoustic model and the vocabulary. There exist different approaches for this, such as the Hidden Markov Model (HMM) or the neural network.

From the probabilities, the decoder combines consecutive phonemes to a full word and finally to a full sentence. To do this, the information about the grammar (language model) is used in addition to the vocabulary

Which approaches excist to generate a speech signal and what are their advantages and disadvantages?

(PSOLA = Concatenative)

Speech signal generation approaches

Which of the following is NOT a synthesis method for generating speech signals?

a) Parametric synthesis

b) HMM-based synthesis

c) Random shift synthesis

d) Unit-selection synthesis

a) Parametric synthesis

b) HMM-based synthesis

c) Random shift synthesis

d) Unit-selection synthesis

Word Accuracy (WA)

Transcribed text:

Can you please give me the best connection between Munich

and Duisburg.I have to arrive on Saturday at noon latest.

Thank you.

Recognized text (hypothesis of the speech recognizer):

Could you please give me the connection and between Munich

and Duisburg.I have arrive on Saturday at noon latest fest.

Thank you.

Calculate the WA!

0.782608695652174

Join Course

Preview

Author

Lynn H.

Information

Last changed
7 months ago

Report course

Chapter 8: Multimodal Input Systems

Approaches for machine learning

Approaches for machine learning

Speech Recognition

Speech signal generation approaches

Word Accuracy (WA)

Transcribed text:

Recognized text (hypothesis of the speech recognizer):

Author

Lynn H.

Information