Monday, January 6, 2020

Reinforcement Learning



Markov Decision Process

In a Reinforcement Learning problem,
  • An agent learns how to behave in an environment by taking actions
  • Then observing the consequences (rewards and next state), of the action taken.
  • The control objective of the agent is to learn a policy to accumulate maximum cumulative rewards over a period of time.
  • All of RL problems are based on the Markov assumption: the current state contains all relevant information to take the current action.

Running log of questions to ask ...

1. Re https://learn.upgrad.com/v/course/272/session/60574/segment/336735 defines Policy Evaluation and Policy Improvement.  Policy Evaluation description states "Say you know a policy and you want to evaluate how good it is, i.e., compute the state-value functions for the existing policy". While it is clear how you compute the state-value functions, what is the actual evaluation you are doing here? Are we comparing different state-value functions?