资源描述:
《Feature-rich memory-based classification for shallow nlp and information extraction》由会员上传分享,免费在线阅读,更多相关内容在行业资料-天天文库。
1、Feature-RichMemory-BasedClassicationforShallowNLPandInformationExtractionJakubZavrel1andWalterDaelemans21TextkernelBV,Nieuwendammerkade28/a17,1022AB,Amsterdam,TheNetherlandszavrel@textkernel.nl2CNTS,UniversityofAntwerp,Universiteitsplein1,BuildingA,B-2610Antwerpen,
2、Belgiumwalter.daelemans@uia.ua.ac.beAbstract.Memory-BasedLearning(MBL)isbasedonthestorageofallavailabletrainingdata,andsimilarity-basedreasoningforhandlingnewcases.Byinter-pretingtaskssuchasPOStaggingandshallowparsingasclassicationtasks,theadvantagesofMBL(implicits
3、moothingofsparsedata,automaticintegrationandrelevanceweightingofinformationsources,handlingexceptionaldata)contributetostate-of-the-artaccuracy.However,HiddenMarkovModels(HMM)typicallyachievehigheraccuracythanMBL(andotherMachineLearningapproaches)fortaskssuchasPOSta
4、ggingandchunking.Inthispaper,weinvestigatehowtheadvantagesofMBL,suchasitspotentialtointegratevarioussourcesofinforma-tion,cometoplaywhenwecompareourapproachtoHMMsontwoInformationExtraction(IE)datasets:thewell-knownSeminarAnnouncementdatasetandanewGermanCurriculumVit
5、aedataset.1Memory-BasedLanguageProcessingMemory-BasedLearning(MBL)isasupervisedclassication-basedlearningmethod.Avectoroffeaturevalues(aninstance)isassociatedwithaclassbyaclassierthatlazilyextrapolatesfromthemostsimilarset(nearestneighbors)selectedfromallstoredtra
6、iningexamples.Thisisincontrasttoeagerlearningmethodslikedecisiontreelearning[26],ruleinduction[9],orInductiveLogicProgramming[7],whichabstractageneralizedstructurefromthetrainingsetbeforehand(forgettingtheexamplesthemselves),andusethattoderiveaclassicationforanewin
7、stance.InMBL,adistancemetriconthefeaturespacedeneswhatarethenearestneighborsofaninstance.Metricswithfeatureweightsbasedoninformation-theoryorotherrelevancestatisticsallowustouserichrepre-sentationsofinstancesandtheircontext,andtobalancethein
uencesofdiverseinformat
8、ionsourcesincomputingdistance.NaturalLanguageProcessing(NLP)taskstypicallyconcernthemappingofaninputrepresentation(e.g.,aseriesofwords)int