资源描述:
《association analysis》由会员上传分享,免费在线阅读,更多相关内容在工程资料-天天文库。
1、10AssociationanalysisThedetectionofassociationrulesisanotherdescriptivemethodwhichisverypopularindatamining,especiallyinsuchareasaswebmining,whereitisusedtoanalysethepagesvisitedbyawebuser,andtheretailindustry,whereitcananalysetheproductsboughtbyacustomeronasing
2、levisit.Thisexplainsthealternativenameforthismethod:marketbasketanalysis.Ofcourse,thismethodcanbeusefullyappliedtootheractivitiesaswell.Itdoesnothavethesametheoreticaldifficultiesasclusteringandclassificationmethods;instead,thedifficultiesarisefromtheneedtoprocesse
3、normousvolumesofdata(uptoseveralmilliontillreceipts,forexample)andtopickoutnewandinterestingassociationsfromtheoverwhelmingmajorityofirrelevantorpreviouslyknownassociations.10.1PrinciplesFindingassociationrulesisamatteroffindingrulesofthefollowingtype:‘If,foranyo
4、neindividual,variableA¼xA,variableB¼xB,andsoon,then,in80%ofcases,variableZ¼xZ,andthisconfigurationisfoundfor20%oftheindividuals.’Inotherwords,theaimistofindthemostfrequentcombinedvaluesofasetofvariablesofadataset.Inmarketbasketanalysis,thevariablesaretheindicators
5、oftheproducts,andtherulesareappliedtoindicatorsequalto1,inotherwordstheproductsbought.Notethatsomerecentresearchhasbeencarriedouton‘negative’rules,whereweareinterestedintheproductsthatarenotbought.Thevalueof80%iscalledtheindexofconfidenceandthevalueof20%iscalledt
6、hesupportindexoftherule{A¼xA,B¼xB,...}){Z¼xZ}.Thefirstpartoftheruleiscalledthe‘antecedent’or‘condition’;thesecondpartiscalledthe‘consequent’or‘result’;andexpressionsoftheform{A¼xA}arecalled‘items’.Inanassociationrule,anitemcanneverbeinboththeconditionandtheresult
7、simultaneously.DataMiningandStatisticsforDecisionMaking,FirstEdition.Ste´phaneTuffe´ry.Ó2011JohnWiley&Sons,Ltd.Published2011byJohnWiley&Sons,Ltd.288ASSOCIATIONANALYSISAruleisthereforeanexpressionoftheform:IfCondition;thenResult:Hereisanexampletakenfrommarketing(
8、mythical,ifnotveracious):IfNappiesandSaturday;thenBeer:ThesupportindexistheprobabilityProbðConditionandResultÞ:TheconfidenceindexistheprobabilityProbðConditionandResul