书签 分享
• /  54
• 下载费用： 20.00积分

# POMDP_lecture 1 Partially Observable Markov Decision Processes (POMDPs) Geoff Hollinger Graduate Artificial Intelligence Fall, 2007 *Some media from Reid Simmons, Trey Smith, Tony Cassandra, Michael Littman, and Leslie Kaelbling 2 Outline for POMDP Lecture ?Introduction ?What is a POMDP anyway? ?A simple example ?Solving POMDPs ?Exact value iteration ?Policy iteration ?Witness algorithm, HSVI ?Greedy solutions ?Applications and extensions ?When am I ever going to use this (other than in homework five)? 3 So who is this Markov guy? ?Andrey Andreyevich Markov (1856-1922) ?Russian mathematician ?Known for his work in stochastic processes ?Later known as Markov Chains 4 What is a Markov Chain? ?Finite number of discrete states ?Probabilistic transitions between states ?Next state determined only by the current state ?This is the Markov property Rewards: S1 = 10, S2 = 0 5 What is a Hidden Markov Model? ?Finite number of discrete states ?Probabilistic transitions between states ?Next state determined only by the current state ?We’re unsure which state we’re in ?The current states emits an observation Rewards: S1 = 10, S2 = 0 Do not know state: S1 emits O1 with prob 0.75 S2 emits O2 with prob 0.75 6 What is a Markov Decision Process? ?Finite number of discrete states ?Probabilistic transitions between states and controllable actions in each state ?Next state determined only by the current state and current action ?This is still the Markov property Rewards: S1 = 10, S2 = 0 7 What is a Partially Observable Markov Decision Process? ?Finite number of discrete states ?Probabilistic transitions between states and controllable actions ?Next state determined only by the current state and current action ?We’re unsure which state we’re in ?The current state emits observations Rewards: S1 = 10, S2 = 0 Do not know state: S1 emits O1 with prob 0.75 S2 emits O2 with prob 0.75 8 A Very Helpful Chart 9 POMDP versus MDP ?MDP ?+Tractable to solve ?+Relatively easy to specify ?-Assumes perfect knowledge of state ?POMDP ?+Treats all sources of uncertainty uniformly ?+Allows for information gathering actions ?-Hugely intractable to solve optimally 10 Simple Example ?Initial distribution: [0.1, 0.9] ?Discount factor: 0.5 ?Reward: S1 = 10, S2 = 0 ?Observations: S1 emits O1 with prob 1.0, S2 emits O2 with prob 1.0 11 Simple Example ?Initial distribution: [0.9, 0.1] ?Discount factor: 0.5 ?Reward: S1 = 10, S2 = 0 ?Observations: S1 emits O1 with prob 1.0, S2 emits O2 with prob 1.0 12 Simple Example ?Initial distribution: [0.1, 0.9] ?Discount factor: 0.5 ?Reward: S1 = 10, S2 = 0 ?Observations: S1 emits O1 with prob 0.75, S2 emits O2 with prob 0.75 13 Simple Example ?Initial distribution: [0.5, 0.5] ?Discount factor: 0.5 ?Reward: S1 = 10, S2 = 0 ?Observations: S1 emits O1 with prob 1.0, S2 emits O2 with prob 1.0 14 Simple Example ?Initial distribution: [0.5, 0.5] ?Discount factor: 0.5 ?Reward: S1 = 10, S2 = 0 ?Observations: S1 emits O1 with prob 0.5, S2 emits O2 with prob 0.5 15 Time for Some Fo

POMDP_lecture 天天文库所有资源均是用户自行上传分享，仅供网友学习交流，未经上传用户书面授权，请勿作他用。

## 当前资源信息 不努力梦想只是梦 • ## 建筑施工教学课件：钢结构工程.pptx

• 本站为“文档C2C交易模式”，即用户上传的文档直接卖给(下载)用户，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有【成交的100%(原创)】。本站是网络服务平台方，若您的权利被侵害，侵权客服QQ：1290478887 欢迎举报。

1290478887@qq.com 2017-2027 https://www.wenku365.com 网站版权所有

收起
展开