Markov Decision Process
In a Reinforcement Learning problem,
- An agent learns how to behave in an environment by taking actions
- Then observing the consequences (rewards and next state), of the action taken.
- The control objective of the agent is to learn a policy to accumulate maximum cumulative rewards over a period of time.
- All
of RL problems are based on the Markov assumption: the current state
contains all relevant information to take the current action.
Running log of questions to ask ...
1. Re https://learn.upgrad.com/v/course/272/session/60574/segment/336735 defines Policy Evaluation and Policy Improvement. Policy Evaluation description states "
Say you know a policy and you want to evaluate how good it is, i.e., compute the state-value functions for the existing policy". While it is clear how you compute the state-value functions, what is the actual evaluation you are doing here? Are we comparing different state-value functions?