资源描述:
《ADVANCED_METHODS_FOR_RECORD_LINKAGE_940920》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、ADVANCEDMETHODSFORRECORDLINKAGE940920WilliamE.Winkler*,BureauoftheCensus,WashingtonDC20233-9100,bwinkler@census.govKEYWORDS:stringcomparator,assignmentalgorithm,EMalgorithm,latentclassRecordlinkage,orcomputermatching,isneededforthecreationandmaintenanceofnameandaddressliststhatsupportoperationsf
2、orandevaluationsofaYear2000Census.Thispaperdescribesthreeadvances.Thefirstisanenhancedmethodofstringcomparisonfordealingwithtypographicalvariationsandscanningerrors.Itimprovesuponstringcomparatorsincomputerscience.Thesecondisalinearassignmentalgorithmthatcanuseonly0.002asmuchstorageasexistingalg
3、orithmsinoperationsresearch,requiresatmostanadditional0.03increaseintime,andhaslessofatendencytomakeerroneousmatchingassignmentsthanexistingsparse-arrayalgorithmsbecauseofhowitdealswithmostarcs.Thethirdisanexpectation-maximizationalgorithmforestimatingparametersinlatentclass,loglinearmodelsofthe
4、typearisinginrecordlinkage.TheassociatedtheoryandsoftwarearetheonlyknownmeansofdealingwithgeneralinteractionpatternsandallowweakuseofaprioriinformationviaageneralizationtotheMCECMalgorithmofMengandRubin.Modelsassumingthatinteractionsareconditionallyindependentgiventheclassaretypicallyconsideredi
5、nbiostatisticsandsocialscience.Recordlinkage,orcomputermatching,isameansofcreating,updating,andunduplicatingliststhatmaybeusedinsurveys.Itservesasameansoflinkingindividualrecordsvianameandaddressinformationfromdifferingadministrativefiles.Ifthefilesarelinkedusingpropermathematicalmodels,thenthef
6、ilescanbeanalyzedusingstatisticalmethodssuchasregressionandloglinearmodels(ScheurenandWinkler1993).Modernrecordlinkagerepresentsacollectionofmethodsfromthreedifferentdisciplines:computerscience,statistics,andoperationsresearch.Whilethefoundationsarefromstatistics,beginningwiththeseminalworkofNew
7、combe(Newcombeetal.1959,alsoNewcombe1988)andFellegiandSunter(1969),themeansofimplementingthemethodshaveprimarilyinvolvedcomputerscience.Recordlinkagebeginswithhighlyevolvedsoftwareforparsingandstandardizingnamesandaddressest