资源描述:
《the item-set tree a data structure for data mining》由会员上传分享,免费在线阅读,更多相关内容在教育资源-天天文库。
1、*TheItem-SetTree:ADataStructureforDataMining121AlaaeldinHafez,JitenderDeogun,andVijayV.RaghavanAbstract.Enhancementsindatacapturingtechnologyhaveleadtoexponentialgrowthinamountsofdatabeingstoredininformationsystems.Thisgrowthinturnhasmotivatedresearchers
2、toseeknewtechniquesforextractionofknowledgeimplicitorhiddeninthedata.Inthispaper,wemotivatetheneedforanincrementaldataminingapproachbasedondatastructurecalledtheitem-settree.Themotivatedapproachisshowntobeeffectiveforsolvingproblemsrelatedtoefficiencyofh
3、andlingdataupdates,accuracyofdataminingresults,processinginputtransactions,andansweringuserqueries.Wepresentefficientalgorithmstoinserttransactionsintotheitem-settreeandtocountfrequenciesofitemsetsforqueriesaboutstrengthofassociationamongitems.Weprovetha
4、ttheexpectedcomplexityofinsertingatransactionis»O(1),andthatoffrequencycountingisO(n),wherenisthecardinalityofthedomainofitems.1IntroductionAssociationminingthatdiscoversdependenciesamongvaluesofanattributewasintroducedbyAgrawaletal.[1]andhasemergedasapr
5、ominentresearcharea.Theassociationminingproblemalsoreferredtoasthemarketbasketproblemcanbeformallydefinedasfollows.LetI={i1,i2,...,in}beasetofitemsasS={s1,s2,...,sm}beasetoftransactions,whereeachtransactionsiÎSisasetofitemsthatissiÍI.Anassociationruleden
6、otedbyXÞY,whereX,YÌIandXÇY=F,describestheexistenceofarelationshipbetweenthetwoitemsetsXandY.SeveralmeasureshavebeenintroducedtodefinethestrengthoftherelationshipbetweenitemsetsXandYsuchassupport,confidence,andinterest.Thedefinitionsofthesemeasures,fromap
7、robabilisticmodelaregivenbelow.I.Support(XÞY)=P(X,Y),orthepercentageoftransactionsinthedatabasethatcontainbothXandY.II.Confidence(XÞY)=P(X,Y)/P(X),orthepercentageoftransactionscontainingYintransactionsthosecontainX.III.Interest(XÞY)=P(X,Y)/P(X)P(Y)repres
8、entsatestofstatisticalindependence.*ThisresearchwassupportedinpartbytheU.S.DepartmentofEnergy,GrantNo.DE-FG02-97ER1220,andbytheArmyResearchOffice,GrantNo.DAAH04-96-1-0325,underDEPSCoRprogramofAdvancedResearchProjectsAgency