资源描述:
《Bayesian optimization, Thompson sampling and bandits》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、CPSC540Bayesianoptimization,banditsandThompsonsamplingNandodeFreitasFebruary2013Multi-armedbanditproblemmoney!Multi-armedbanditproblemActionsReward(s)Sequenceoftrials•Trade-offbetweenExplorationandExploitation•Regret=Playerreward–RewardofbestactionfunctionAcquisitionCPSC3405ParameterExplorati
2、on-exploitationtradeoffRecalltheexpressionsforGPprediction:Weshouldchoosethenextpointxwherethemeanishigh(exploitation)andthevarianceishigh(exploration).Wecouldbalancethistradeoffwithanacquisitionfunctionasfollows:AcquisitionfunctionsAnacquisitionfunction:ProbabilityofImprovementPeopleasBayesi
3、anreasonersBayesanddecisiontheoryUtilitarianview:Weneedmodelstomaketherightdecisionsunderuncertainty.Inferenceanddecisionmakingareintertwined.LearnedposteriorCost/Rewardmodelu(x,a)P(x=healthy
4、data)=0.9P(x=cancer
5、data)=0.1Wechoosetheactionthatmaximizestheexpectedutility,orequivalently,whichmin
6、imizestheexpectedcost.EU(a)=u(x,a)P(x
7、data)SSSSxEU(a=treatment)=EU(a=notreatment)=AnexpectedutilitycriterionAtiterationn+1,choosethepointthatminimizesthedistancetotheobjectiveevaluatedatthemaximumx*:Wedon’tknowthetrueobjectiveatthemaximum.Toovercomethis,Mockusproposedthefollowingacquisitionfu
8、nction:ExpectedimprovementForthisacquisition,wecanobtainananalyticalexpression:Athirdcriterion:GP-UCBDefinetheregretandcumulativeregretasfollows:TheGP-UCBcriterionisasfollows:Betaissetusingasimpleconcentrationbound:[Srinivasetal,2010]Afourthcriterion:ThompsonsamplingAcquisitionfunctionsPortfo
9、liosofacquisitionfunctionshelpWhyBayesianOptimizationworksIntelligentuserinterfacesExample:TuningNPhardproblemsolversWhyrandomtuningworkssometimesExample:TuningrandomforestsExample:TuninghybridMonteCarlo24Thegamesindustry,richinsophisticatedlarge-scalesimulators,isagreatenvironmentforthedesig
10、nandstudyofautomaticdecisionmakingsystems.Hierarchicalpolicyexample–High-levelmodel-basedlearningfordecidingwhentonavigate,park,pickupanddropoffpassengers.–Mid-levelactivepathlearningfornavigatingatopologicalmap.–Low-levelactivepolicyoptimize