Data Science (Incl AI/ML): January 2020

Markov Decision Process

In a Reinforcement Learning problem,

An agent learns how to behave in an environment by taking actions
Then observing the consequences (rewards and next state), of the action taken.
The control objective of the agent is to learn a policy to accumulate maximum cumulative rewards over a period of time.
All of RL problems are based on the Markov assumption: the current state contains all relevant information to take the current action.

Running log of questions to ask ...

1. Re https://learn.upgrad.com/v/course/272/session/60574/segment/336735 defines Policy Evaluation and Policy Improvement. Policy Evaluation description states "Say you know a policy and you want to evaluate how good it is, i.e., compute the state-value functions for the existing policy". While it is clear how you compute the state-value functions, what is the actual evaluation you are doing here? Are we comparing different state-value functions?

Monday, January 6, 2020

Reinforcement Learning

Markov Decision Process