Assume again your network is 𝐸𝑤 → 𝑂 ← 𝐸𝑏 → 𝑀. Give the formula for 𝑃(𝑂|𝑀, 𝐸𝑏 ) in terms of the entries of the probability tables of the network.
You have already placed the bet. What does that mean for the relationship between 𝑂 and 𝑀? How does that affect the memory needed for the conditional probability tables of the network?
The edge 𝑂 → 𝑀 is deterministic. We do not need to store a probability table for 𝑀; instead we have to store the function that computes the value of 𝑀 from the value of 𝑂.
when is a node X in a bayesian network deterministic?
if its value is completely determined by the values of parents(x)
hidden markov models: smoothing algorithm; b formula
give the matrix form of the smoothing algorithm
Explain values of the matrices O.
O is a diagonal matrix obtained from the column of S corresponding to e_i (even after the state in question)
give the hypothesis space for finding a linear separator
The set of functions 𝐰 ⋅ 𝐱 + 𝑏 for real numbers 𝐰1 , 𝐰2 , 𝑏.
[Alternatively, one can use ℝ3 with some explanation that it holds the tuples (𝐰1 , 𝐰2 , 𝑏).]
what does it mean, intuetively, if a linear separator exists for a dataset after this transformation?
the two categories are the inside and the outside of a circle around the origin
Briefly explain what part-of-speech tagging means
The process of attributing to every word in a corpus its syntactic category, like noin, participle etc.
What is the role of the window width when machine-learning part-of-speech tags?
the size of the context that is kept around the word that is to be tagged. for example, with a window width of 5, the two words before and after are added as input to the learning system
Explain the role of word embeddings when learning part-of-speech tags, and the idea behind tfidf
a word embedding maps a word to a vector of numbers that can be used as input to a neural network. tfidf is a specific embedding, whose definition uses the frequency of words in the documents of the corpus to map words to numbers.
using this grammar as example, explain the difference between grammar rules and lexicon
both are productions of the grammar. Grammar rules define the language in general (s to VP, above); the lexicon defines the specific identifiers used in a context (article to transVerb, above)
What is the purpose of the smoothing algorithm?
to estimate past states based on observations of all evidence (even after the state in question)
Zuletzt geändertvor 8 Monaten