• /  54
  • 下载费用: 20.00积分  

POMDP_lecture

'POMDP_lecture'
1 Partially Observable Markov Decision Processes (POMDPs) Geoff Hollinger Graduate Artificial Intelligence Fall, 2007 *Some media from Reid Simmons, Trey Smith, Tony Cassandra, Michael Littman, and Leslie Kaelbling 2 Outline for POMDP Lecture ?Introduction ?What is a POMDP anyway? ?A simple example ?Solving POMDPs ?Exact value iteration ?Policy iteration ?Witness algorithm, HSVI ?Greedy solutions ?Applications and extensions ?When am I ever going to use this (other than in homework five)? 3 So who is this Markov guy? ?Andrey Andreyevich Markov (1856-1922) ?Russian mathematician ?Known for his work in stochastic processes ?Later known as Markov Chains 4 What is a Markov Chain? ?Finite number of discrete states ?Probabilistic transitions between states ?Next state determined only by the current state ?This is the Markov property Rewards: S1 = 10, S2 = 0 5 What is a Hidden Markov Model? ?Finite number of discrete states ?Probabilistic transitions between states ?Next state determined only by the current state ?We’re unsure which state we’re in ?The current states emits an observation Rewards: S1 = 10, S2 = 0 Do not know state: S1 emits O1 with prob 0.75 S2 emits O2 with prob 0.75 6 What is a Markov Decision Process? ?Finite number of discrete states ?Probabilistic transitions between states and controllable actions in each state ?Next state determined only by the current state and current action ?This is still the Markov property Rewards: S1 = 10, S2 = 0 7 What is a Partially Observable Markov Decision Process? ?Finite number of discrete states ?Probabilistic transitions between states and controllable actions ?Next state determined only by the current state and current action ?We’re unsure which state we’re in ?The current state emits observations Rewards: S1 = 10, S2 = 0 Do not know state: S1 emits O1 with prob 0.75 S2 emits O2 with prob 0.75 8 A Very Helpful Chart 9 POMDP versus MDP ?MDP ?+Tractable to solve ?+Relatively easy to specify ?-Assumes perfect knowledge of state ?POMDP ?+Treats all sources of uncertainty uniformly ?+Allows for information gathering actions ?-Hugely intractable to solve optimally 10 Simple Example ?Initial distribution: [0.1, 0.9] ?Discount factor: 0.5 ?Reward: S1 = 10, S2 = 0 ?Observations: S1 emits O1 with prob 1.0, S2 emits O2 with prob 1.0 11 Simple Example ?Initial distribution: [0.9, 0.1] ?Discount factor: 0.5 ?Reward: S1 = 10, S2 = 0 ?Observations: S1 emits O1 with prob 1.0, S2 emits O2 with prob 1.0 12 Simple Example ?Initial distribution: [0.1, 0.9] ?Discount factor: 0.5 ?Reward: S1 = 10, S2 = 0 ?Observations: S1 emits O1 with prob 0.75, S2 emits O2 with prob 0.75 13 Simple Example ?Initial distribution: [0.5, 0.5] ?Discount factor: 0.5 ?Reward: S1 = 10, S2 = 0 ?Observations: S1 emits O1 with prob 1.0, S2 emits O2 with prob 1.0 14 Simple Example ?Initial distribution: [0.5, 0.5] ?Discount factor: 0.5 ?Reward: S1 = 10, S2 = 0 ?Observations: S1 emits O1 with prob 0.5, S2 emits O2 with prob 0.5 15 Time for Some Fo
关 键 词:
POMDP_lecture
 天天文库所有资源均是用户自行上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作他用。
关于本文
本文标题:POMDP_lecture
链接地址: https://www.wenku365.com/p-39781105.html
关于我们 - 网站声明 - 网站地图 - 资源地图 - 友情链接 - 网站客服点击这里,给天天文库发消息,QQ:1290478887 - 联系我们

本站为“文档C2C交易模式”,即用户上传的文档直接卖给(下载)用户,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有【成交的100%(原创)】。本站是网络服务平台方,若您的权利被侵害,侵权客服QQ:1290478887 欢迎举报。

1290478887@qq.com 2017-2027 https://www.wenku365.com 网站版权所有

粤ICP备19057495号 

收起
展开