欢迎来到天天文库
浏览记录
ID:39077530
大小:1.37 MB
页数:7页
时间:2019-06-24
《1644-generalized-model-selection-for-unsupervised-learning-in-high-dimensions无监督的广义模型选择 高维度学习》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、GeneralizedModelSelectionForUnsupervisedLearningInHighDimensionsShivakumarVaithyanathanByronDomIBMAlmadenResearchCenterIBMAlmadenResearchCenter650HarryRoad650HarryRoadSanJose,CA95136SanJose,CA95136Shiv@almaden.ibm.comdom@almaden.ibm.comAbstractWedescribeaBaye
2、sianapproachtomodelselectioninunsupervisedlearningthatdeterminesboththefeaturesetandthenumberofclusters.Wethenevaluatethisscheme(basedonmarginallikelihood)andonebasedoncross-validatedlikelihood.FortheBayesianschemewederiveaclosed-formsolutionofthemarginallike
3、lihoodbyassumingappropriateformsofthelikelihoodfunctionandprior.Extensiveexperimentscomparetheseapproachesandallresultsareverifiedbycomparisonagainstgroundtruth.IntheseexperimentstheBayesianschemeusingourobjectivefunctiongavebetterresultsthancross-validation.
4、1IntroductionRecenteffortsdefinethemodelselectionproblemasoneofestimatingthenumberofclusters[10,17].Itiseasytosee,particularlyinapplicationswithlargenumberoffeatures,thatvariouschoicesoffeaturesubsetswillrevealdifferentstructuresunderlyingthedata.Itisourconte
5、ntionthatthisinterplaybetweenthefeaturesubsetandthenumberofclustersisessentialtoprovideappropriateviewsofthedata.Wethusdefinetheproblemofmodelselectioninclusteringasselectingboththenumberofclustersandthefeaturesubset.Towardsthisendweproposeaunifiedobjectivefu
6、nctionwhoseargumentsincludetheboththefeaturespaceandnumberofclusters.Wethendescribetwoapproachestomodelselectionusingthisobjectivefunction.ThefirstapproachisbasedonaBayesianschemeusingthemarginallikelihoodformodelselection.Thesecondapproachisbasedonaschemeusi
7、ngcross-validatedlikelihood.Insection3weapplytheseapproachestodocumentclusteringbymakingassumptionsaboutthedocumentgenerationmodel.Further,fortheBayesianapproachwederiveaclosed-formsolutionforthemarginallikelihoodusingthisdocumentgenerationmodel.Wealsodescrib
8、eaheuristicforinitialfeatureselectionbasedonthedistributionalclusteringofterms.Section5describestheexperimentsandourapproachtovalidatetheproposedmodelsandalgorithms.Section6reportsanddisc
此文档下载收益归作者所有