资源描述:
《中文分词在企业互联网网站文字过滤上的应用》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、中文分词在企业互联网网站文字过滤上的应用CHINESEWORDSEGMENTATIONINBUSINESSONTHEINTERNETWEBSITETEXTFILTERINGAPPLICATIONABSTRACTInternetinformationmonitoringisoneofimportantresearchprojectsthatallcountryintheworldistryingtosolve,andtextinformationisthemostimportantformofInternetinfor
2、mationdissemination.SohowtoeffectivelymanagetheInternettextmessages,becomesapopularresearchdirection.Inthispaper,wedesigntheweb-filtering,atechniquebasedonChinesewordtextfilteringsystem,whichwecalledWFSonCWS(TheWordFiltrationSystemBasedonChineseWordSegmentation
3、),andsuccessfullyappliedtoacompanyWebsite,achievedgoodresults.Thisapproachwiththetraditionalsimplekeyword-basedfilteringoftextfilteringsystems,fullytakingintoaccounttheintegrityofthearticle,fromthefulltextofperspective,effectivewaytoavoidthetraditionalfilterofs
4、everalmajorshortcomings:First,identifythekeywordsthatexistedwhentraditionalmethodsofthelargerone-sidedness,sinceitdidnotuseasimplewordidentificationstring,resultinginalargenumberofmiscarriageofjusticeoccurred,whilethefullwordthewaytoeffectivelysolvethisproblem;
5、2isbasedonfullanalysisofthetextfiltertoidentifyandchangethetraditional"failednotthrough"mechanismfor"qualifiedby"sothatthedeformationcommonlyusedtoevadethewordfilter,spellingproblemstobeaneffectivesolution;3isatraditionalkeywordfiltersdefectsfromthesourceisjudg
6、edoutofcontext,theauditresultsistheabsolute,whichisinrealityexpressingthetextisnotconsistent,theIII中文分词在企业互联网网站文字过滤上的应用applicationsegmentationtechniqueswecanwordthefulltextofarticlesdealingwith,andthenwordsofpraiseandabuseinthearticleandsensitivitycharacteristi
7、csofthearticletodetermineeligibility,sothatauditfilterstoincreaseacertainamountof"flexibility"capacity,withmorepractical.Inaddition,thispaperproposestheadoptionoftheInternettolearnnewwords,butalsotoaddresstheChinesewordsegmentationtechnology,newwordidentificati
8、onproblemprovidesaneffectiveprogramandresearchideas.KeywordsTextFilter,Chinesewordsegmentation,AmbiguityIdentification,Newwordrecognition,FullanalysisIV中文分词在企业互联网网站文字过滤上的应用1