资源描述:
《物流管理课件、案例与文献 MDP.ppt》由会员上传分享,免费在线阅读,更多相关内容在教育资源-天天文库。
1、BriefintroductiontoMarkovDecisionProcessesReferencebooksMarkovdecisionprocesses,MartinPuterman,1994.Introductiontostochasticdynamicprogramming,SheldonRoss,1983.CONTENTFiveelementsofMDP.FinitehorizonMarkovDecisionProcessesInfinite-horizonmodelsDiscountedMarkovdecisionproblemsTheexpecte
2、drewardcriterionAverageRewardcriterionTworesearchpapers.ModelformulationDecisionepochs:Eitheradiscretesetoracontinuum;Eitherafiniteoraninfiniteset.Correspondingproblems:Discretetimeproblemsorcontinuousproblems;Finitehorizonproblemsorinfinitehorizonproblems.Stateandactionsets:Ateachdec
3、isionepoch,thesystemoccupiesastates.WedenotethesetofpossiblesystemstatesbyS,s∈S.Whenobservingthesysteminstates,thedecisionmakermaychooseactionaofallowableactionsinstates,a∈As.ThesetofSandAscanbeeither:arbitraryfinitesets,arbitrarycountablyinfinitesets,compactsubsetsoffinitedimensional
4、Euclideanspace,ornon-emptyBorelsubsetsofcomplete,separablemetricspaces.RewardsandTransitionprobabilities:Thedecisionmakerreceivesareward,rt(s,a),asaresultofchoosingactionainstatesatdecisionepocht,Thesystemstateatthenextdecisionepochisdeterminedbytheprobabilitydistributionpt(·
5、s,a).The
6、expectedvalueatdecisionepochtisexpressedas:Wereferthecollectionofobjects{T,S,As,pt(·
7、s,a),rt(s,a)}asaMarkovdecisionprocesses.Markovisusedbecausethetransitionprobabilityandrewardfunctionsdependonthepastonlythroughthecurrentstatofthesystemandtheactionselectedbythedecisionmakerinthestate
8、.DecisionrulesAdecisionruleprescribesaprocedureforactionselectionineachstateataspecifieddecisionepoch.Markoviananddeterministicdecisionrulesdt:S→As;Deterministicandhistorydependentdecisionrule;Markovianandrandomizeddecisionrules;Historydependentandrandomizeddecisionrules.Underwhatcond
9、itionsisitoptimaltousedeterministicMarkoviandecisionruleateachstage?FinitehorizonMarkovDecisionProcesses(Existence)AssumeSisfiniteorcountable,andthatAsisfiniteforeachs∈S,orAsiscompact,rt(s,a)iscontinuousinaforeachs∈S,thereexistsanM<∞forwhich
10、rt(s,a)
11、<=Mforalla∈Asands∈Sandpt(s,a)iscont
12、inuou