Describe Modality from PoV of Information Theory / Informatics
From an Information Theory/ Informatics PoV is a Modality the use of a specific channel within our actions in Order to provide information to a system
What is machine learning?
Machine learning is a field of study in artificial intelligence that focuses on developing algorithms and models that enable computers to learn and make predictions or decisions based on data without being explicitly programmed.
Explain the following approaches
Supervised Learning
Semi-supervised Learning
Unsupervised Learning
Self-supervised Learning
Reinforcement Learning
Supervised Learning: learns from (high amount of) labeled data
Semi-supervised Learning: learns from labeled and unlabeled data
Unsupervised Learning: learns patterns and structures from unlabeled data
Self-supervised Learning: learns from unlabeled data by predicting hidden parts
Reinforcement Learning: learns from interacting with an environment by maximizing a reward
Describe the classification of interaction modalities (Bernsen, 1999)
Overview of multimodal interactive systems. Classification of interaction modalities (Bernsen, 1999)
Linguistic vs. non-linguistic
Analogue vs. non-analogue
Arbitrary vs. non-arbitrary
Graphical (visually perceivable), acoustic (auditorily perceivable), haptic
Static vs. dynamic (only output modalities)
Which contexts are important in automatic speech recognition?
Speech: Language, dialect, speaking style, …
Speaker:
dependent vs. independent vs. adaptive
known vs. unknown
cooperative vs. uncooperative
Target units: Type, number, complexity
Environment: Background noise, transmission channels, etc.
→ Automatic speech recognition is a very difficult task!
Why is speech rec. so dificult (for machines)
Task complexity:
Variances and invariances in the speech signal have to be differentiated
„to recognize speech“ vs. „to wreck a nice beach“
→ Contextual knowledge helps to understand
Speech is a continuous signal, not a sequence of elementary sounds (even across word boundaries)
Articulation depends on the surrounding sounds
(co-articulation)
Complete the block diagram for an Automatic Speech Recognition:
Describe it
The feature extraction block finds the information in the speech signal that is relevant for the detection of what words or sentences were actually spoken. Most importantly, the resulting features should not include information that holds no value for the detection such as information about the speaker and the environment. The is done for speech blocks of about 𝟐𝟎 ms.
In the next step, the phoneme probabilities must be estimated from these feature vectors (classification). This is done using both the acoustic model and the vocabulary. There exist different approaches for this, such as the Hidden Markov Model (HMM) or the neural network.
From the probabilities, the decoder combines consecutive phonemes to a full word and finally to a full sentence. To do this, the information about the grammar (language model) is used in addition to the vocabulary
Which approaches excist to generate a speech signal and what are their advantages and disadvantages?
(PSOLA = Concatenative)
Which of the following is NOT a synthesis method for generating speech signals?
a) Parametric synthesis
b) HMM-based synthesis
c) Random shift synthesis
d) Unit-selection synthesis
Can you please give me the best connection between Munich
and Duisburg.I have to arrive on Saturday at noon latest.
Thank you.
Could you please give me the connection and between Munich
and Duisburg.I have arrive on Saturday at noon latest fest.
Calculate the WA!
0.782608695652174
Last changed11 days ago