undefined

Buffl

von Moritz K.

Was ist das Ziel von RL.

Learn a behavior that maximizes the reward in the long term.

Welches Modell beschreibt die Problemstellung von RL und nenne 4 Bestandteile.

Was besagt die Markov Bedingung?

The current state 𝑠𝑡 comprises all relevant information from the past

Zuletzt geändert
vor einem Jahr