资源描述:
《微软讲座:EvaluationinIR-信息检索的评估方法.pdf》由会员上传分享,免费在线阅读,更多相关内容在应用文档-天天文库。
1、EvaluationinInformationRetrievalZhichengDouWebSearchandDataMiningGroup,MSRAsiaOutlines•BasicIREvaluationMetrics•IREvaluationMethodology•StandardTestCollections•EvaluationBasedonImplicitFeedbackOutlines•BasicIREvaluationMetrics•IREvaluationMethodology•StandardTestCollections•Eva
2、luationBasedonImplicitFeedbackMotivatedExamples•Whichsetisbetter?–S={r,r,r,n,n}vs.S={r,r,n,n,n}12–S={r}vs.S={r,r,n}34•Whichrankinglistisbetter?–L=vs.L=12–L=vs.L=34–r:relevantn:non-relevanth:highlyrelevantPrecision&Recall•Precisionisfr
3、actionoftheretrieveddocumentsthatarerelevant
4、R
5、aPrecision
6、A
7、•Recallisfractionoftherelevantdocumentsthatareretrieved
8、R
9、aRecall
10、R
11、RRAa(relevantdocuments)(retrieveddocuments)Precision&RecallPrecision&Recall•Example3:L=vs.L=12–P=0.6,r=0.3,forbothrankedlists•
12、UsersusuallyscantheresultsfromtoptobottomAveragePrecision(AP)OtherMetricsbasedonBinaryJudgmentsMetricsbasedonGradedRelevance•Example4:L=vs.L=34–r:relevantn:non-relevanth:highlyrelevant–Whichrankinglistisbetter?•NormalizedDiscountedCumulatedGains(NDCG)•Twoa
13、ssumptionsaboutrankedresultlist–Usefulness:highlyrelevantdocument>marginallyrelevantdocuments>irrelevantdocuments–HighlyrelevantdocumentsaremoreusefulwhenappearingearlierinaresultlistCGDCGnDCGOtherMetricsbasedonGradedJudgmentsIncompleteJudgments•Unjudged=Irrelevant•bpref–R:then
14、umberofjudgedrelevantdocuments–N:thenumberofjudgedirrelevantdocuments–r:arelevantretrieveddocument–namemberofthefirstRjudgedirrelevantdocumentsas:retrievedbythesystem–theinverseofthefractionofjudgedirrelevantdocumentsthatareretrievedbeforerelevantonesEvaluatingResultDiversity•S
15、earchResultDiversification:covermajorintentsinthetopresultstoaccommodatedifferentuserneedsformulti-intentqueriesEvaluatingResultDiversityIREvaluationMetrics:SummarizationJudgmentSetRankedListTypeAd-hocBinaryPrecisionAP/MAPRecallR-RecallP@10RR/MRRGradednDCG@10RBPERRGAPDiversityB
16、inaryα-nDCGMAP-IAGradednDCG-IAQUESTIONS?Outlines•Basic