An introduction to categorical data analysis Using R.pdf

ID：34816183

大小：93.76 KB

页数：38页

时间：2019-03-11

上传者：不努力梦想只是梦

An introduction to categorical data analysis Using R.pdf_第1页

An introduction to categorical data analysis Using R.pdf_第2页

An introduction to categorical data analysis Using R.pdf_第3页

An introduction to categorical data analysis Using R.pdf_第4页

An introduction to categorical data analysis Using R.pdf_第5页

资源描述：

《An introduction to categorical data analysis Using R.pdf》由会员上传分享，免费在线阅读，更多相关内容在学术论文-天天文库。

AnIntroductiontoCategoricalDataAnalysisUsingRBrettPresnellMarch28,2000 AbstractThisdocumentattemptstoreproducetheexamplesandsomeoftheexercisesinAnIntroductiontoCategor-icalDataAnalysis[1]usingtheRstatisticalprogrammingenvironment. Chapter0AboutThisDocumentThisdocumentattemptstoreproducetheexamplesandsomeoftheexercisesinAnIntroductiontoCategori-calDataAnalysis[1]usingtheRstatisticalprogrammingenvironment.NumberingandtitlesofchapterswillfollowthatofAgresti’stext,soifaparticularexample/analysisisofinterest,itshouldnotbehardtoﬁnd,assumingthatitishere.SinceRisparticularlyversatile,thereareoftenanumberofdifferentwaystoaccomplishatask,andnaturallythisdocumentcanonlydemonstratealimitednumberofpossibilities.Thereaderisurgedtoexploreotherapproachesontheirown.InthisregarditcanbeveryhelpfultoreadtheonlinedocumentationforthevariousfunctionsofR,aswellasothertutorials.ThehelpﬁlesformanyoftheRfunctionsusedherearealsoincludedintheappendixforeasyreference,buttheonlinehelpsystemisdeﬁnitelythepreferredwaytoaccessthisinformation.Itisalsoworthnotingthatasofthiswriting(early2000),Risstillverymuchunderdevelopment.Thusnewfunctionalityislikelytobecomeavailablethatmightbemoreconvenienttousethansomeoftheapproachestakenhere.OfcourseanyusercanalsowritetheirownRfunctionstoautomateanytask,sothepossibilitiesareendless.Donotbeintimidatedthough,forthisisreallythefunofusingRanditsbestfeature:youcanteachittodowhateverisneede,insteadofbeingconstrainedonlytowhatis“builtin.”ANoteontheDatasetsOfteninthisdocumentIwillshowhowtoenterthedataintoRasaparttheexample.However,mostofthedatasetsareavaiablealreadyinRformatintheRpackageforthecourse,sta4504,availablefromthecoursewebsite.AfterinstallingthelibraryonyourcomputerandstartingR,youcanlistthefunctionsanddataﬁlesavailableinthepackagebytyping>library(help=sta4504)>data(package=sta4504)YoucanmaketheﬁlesinthepackagetoyourRsessionbytyping>library(sta4504)andyoucanreadoneofthepackage’sdatasetsintoyourRsessionsimplybytyping,e.g.,>data(deathpen)1 Chapter1Introduction1.3Inferencefora(Single)ProportionThefunctionprop.test(appendixA.1.3)willcarryouttestofhypothesesandproduceconﬁdenceintervalsinproblemsinvolvingoneorseveralproportions.Intheexampleconcerningopiniononabortion,therewere424“yes”responsesoutof950subjects.Hereisonewaytouseprop.testtoanalyzethesedata:>prop.test(424,950)1-sampleproportionstestwithcontinuitycorrectiondata:424outof950,nullprobability0.5X-squared=10.7379,df=1,p-value=0.001050alternativehypothesis:truepisnotequalto0.595percentconfidenceinterval:0.41446340.4786078sampleestimates:p0.4463158Notethatbydefault:thenullhypothesis=:5istestedagainstthetwo-sidedalternative6=:5;a95%conﬁdenceintervalforiscalculated;andboththetestandtheCIincorporateacontinuitycorrection.Anyofthesedefaultscanbechanged.Thecallaboveisequivalenttoprop.test(424,950,p=.5,alternative="two.sided",conf.level=0.95,correct=TRUE)Thus,forexample,totestthenullhypothesisthat=:4versustheone-sidedalternative>:4anda99%(one-sided)CIfor,allwithoutcontinuitycorrection,justtypeprop.test(424,950,p=.4,alternative="greater",conf.level=0.99,correct=FALSE)2 Chapter2Two-WayContingencyTablesEnteringandManipulatingDataThereareanumberofwaystoentercountsforatwo-waytableintoR.Forasimpleconcreteexample,weconsiderthreedifferentwaysofenteringthe“beliefinafterlife”data.Othermethodsandtoolswillbeintroducedaswegoalong.EnteringTwo-WayTablesasaMatrixOnewayistosimplyenterthedatausingthematrixfunction(thisissimilartousingthearrayfunctionwhichwewillencounterlater).Forthe“beliefinafterlife”example,wemighttype:>afterlife<-matrix(c(435,147,375,134),nrow=2,byrow=TRUE)>afterlife[,1][,2][1,]435147[2,]375134Thingsaresomewhateasiertoreadifwenametherowsandcolumns:>dimnames(afterlife)<-list(c("Female","Male"),c("Yes","No"))>afterlifeYesNoFemale435147Male375134Wecandressthingsevenmorebyprovidingnamesfortherowandcolumnvariables:>names(dimnames(afterlife))<-c("Gender","Believer")>afterlifeBelieverGenderYesNoFemale435147Male375134Calculatingthetotalsamplesize,n,andtheoverallproportions,fpijgiseasy:>tot<-sum(afterlife)>tot[1]10913 >afterlife/totBelieverGenderYesNoFemale0.39871680.1347388Male0.34372140.1228231Tocalculatetherowandcolumntotals,ni+andn+jandtherowandcolumnproportions,pi+andp+j,onecanusetheapply(appendixA.1.1)andsweep(appendixA.1.4)functions:>rowtot<-apply(afterlife,1,sum)>coltot<-apply(afterlife,2,sum)>rowtotFemaleMale582509>coltotYesNo810281>rowpct<-sweep(afterlife,1,rowtot,"/")>rowpctBelieverGenderYesNoFemale0.74742270.2525773Male0.73673870.2632613>round(rowpct,3)BelieverGenderYesNoFemale0.7470.253Male0.7370.263>sweep(afterlife,2,coltot,"/")BelieverGenderYesNoFemale0.5370370.5231317Male0.4629630.4768683EnteringTwo-WayTablesasaDataFrameOnemightalsoputthedataintoadataframe,treatingtherowandcolumnvariablesasfactorvariables.ThisapproachisactuallybemoreconvenientwhenthedataisstoredinaseparateﬁletobereadintoR,butwewillconsideritnowanyway.>Gender<-c("Female","Female","Male","Male")>Believer<-c("Yes","No","Yes","No")>Count<-c(435,147,375,134)>afterlife<-data.frame(Gender,Believer,Count)>afterlifeGenderBelieverCount1FemaleYes4352FemaleNo1473MaleYes3754MaleNo134>rm(Gender,Believer,Count)#NolongerneededAsmentionedabove,youcanalsojustenterthedataintoatextﬁletobereadintoRusingtheread.tablecommand.Forexample,iftheﬁleafterlife.datcontainedthelines4 GenderBelieverCountFemaleYes435FemaleNo147MaleYes375MaleNo134thenthecommand>read.table("afterlife.dat",header=TRUE)wouldgetyoutothesamepointasabove.1Toextractacontingencytable(amatrixinthiscase)forthesedata,youcanusethetapply(appendixA.1.5)functioninthefollowingway:>attach(afterlife)#attachthedataframe>beliefs<-tapply(Count,list(Gender,Believer),c)>beliefsNoYesFemale147435Male134375>detach(afterlife)#candetachthedatawhenlongerneeded>names(dimnames(beliefs))<-c("Gender","Believer")>beliefsBelieverGenderNoYesFemale147435Male134375>beliefs<-beliefs[,c(2,1)]#reversethecolumns?>beliefsBelieverGenderYesNoFemale435147Male375134Atthisstage,beliefscanbemanipulatedasintheprevioussubsection.2.3ComparingProportionsinTwo-by-TwoTablesAsexplainedbythedocumentationforprop.test(appendixA.1.3),thedatamayberepresentedinseveraldifferentwaysforuseinprop.test.WewillusethematrixrepresentationofthelastsectioninexaminingthePhysician’sHealthStudyexample.>phs<-matrix(c(189,10845,104,10933),byrow=TRUE,ncol=2)>phs[,1][,2][1,]18910845[2,]10410933>dimnames(phs)<-+list(Group=c("Placebo","Aspirin"),MI=c("Yes","No"))>phs1Actually,thereisonesmalldifference:thelevelsofthefactor“Believer”willbeorderedalphabetically,andthiswillmakeasmalldifferenceinhowsomethingsarepresented.Ifyouwanttomakesurethatthelevelsofthefactorsareorderedastheyappearinthedataﬁle,youcanusetheread.table2functionprovidedinthesta4504packageforR.Orusetherelevelcommand.5 MIGroupYesNoPlacebo18910845Aspirin10410933>prop.test(phs)2-sampletestforequalityofproportionswithcontinuitycorrectiondata:phsX-squared=24.4291,df=1,p-value=7.71e-07alternativehypothesis:two.sided95percentconfidenceinterval:0.0045971340.010814914sampleestimates:prop1prop20.017128870.00942285Acontinuitycorrectionisusedbydefault,butitmakesverylittledifferenceinthisexample:>prop.test(phs,correct=F)2-sampletestforequalityofproportionswithoutcontinuitycorrectiondata:phsX-squared=25.0139,df=1,p-value=5.692e-07alternativehypothesis:two.sided95percentconfidenceinterval:0.0046877510.010724297sampleestimates:prop1prop20.017128870.00942285Youcanalsosavetheoutputofthetestandmanipulateitinvariousways:>phs.test<-prop.test(phs)>names(phs.test)[1]"statistic""parameter""p.value""estimate"[5]"null.value""conf.int""alternative""method"[9]"data.name">phs.test$estimateprop1prop20.017128870.00942285>phs.test$conf.int[1]0.0045971340.010814914attr(,"conf.level")[1]0.95>round(phs.test$conf.int,3)[1]0.0050.011attr(,"conf.level")[1]0.95>phs.test$estimate[1]/phs.test$estimate[2]%relativerisk6 prop11.8178022.4TheOddsRatioRelativeriskandtheoddsratioareeasytocalculate(youcandoitinlotsofwaysofcourse):>phs.test$estimateprop1prop20.017128870.00942285>odds<-phs.test$estimate/(1-phs.test$estimate)>oddsprop1prop20.0174273860.009512485>odds[1]/odds[2]prop11.832054>(phs[1,1]*phs[2,2])/(phs[2,1]*phs[1,2])#ascross-prodratio[1]1.832054Here’sonewaytocalculatetheCIfortheoddsratio:>theta<-odds[1]/odds[2]>ASE<-sqrt(sum(1/phs))>ASE[1]0.1228416>logtheta.CI<-log(theta)+c(-1,1)*1.96*ASE>logtheta.CI[1]0.36466810.8462073>exp(logtheta.CI)[1]1.4400362.330790Itiseasytowriteaquickanddirtyfunctiontodothesecalculationsfora22table.odds.ratio<-function(x,pad.zeros=FALSE,conf.level=0.95){if(pad.zeros){if(any(x==0))x<-x+0.5}theta<-x[1,1]*x[2,2]/(x[2,1]*x[1,2])ASE<-sqrt(sum(1/x))CI<-exp(log(theta)+c(-1,1)*qnorm(0.5*(1+conf.level))*ASE)list(estimator=theta,ASE=ASE,conf.interval=CI,conf.level=conf.level)}Thishasbeenaddedtothesta4504package.Fortheexampleabove:>odds.ratio(phs)$estimator7 [1]1.832054$ASE[1]0.1228416$conf.interval[1]1.4400422.330780$conf.level[1]0.952.5Chi-SquaredTestsofIndependenceGenderGapExampleThechisq.testfunctionwillcomputePearson’schi-squaredteststatistic(X2)andthecorrespondingP-value.Hereitisappliedtothegendergapexample:>gendergap<-matrix(c(279,73,225,165,47,191),byrow=TRUE,nrow=2)>dimnames(gendergap)<-list(Gender=c("Females","Males"),+PartyID=c("Democrat","Independent","Republican"))>gendergapPartyIDGenderDemocratIndependentRepublicanFemales27973225Males16547191>chisq.test(gendergap)Pearson’sChi-squaretestdata:gendergapX-squared=7.0095,df=2,p-value=0.03005Incaseyouareworriedaboutthechi-squaredapproximationtothesamplingdistributionofthestatistic,youcanusesimulationtocomputeanapproximateP-value(oruseanexacttest;seebelow).TheargumentB(defaultis2000)controlshowmanysimulatedtablesareusedtocomputethisvalue.Moreisbetter,buteventuallyyouwillrunoutofeithercomputememoryortime,sodon’tgetcarriedaway.ItisinterestingtodoitafewtimesthoughtoseehowstablethesimulatedP-valueis(doesitchangemuchfromruntorun).InthiscasethesimulatedP-valuesagreecloselywiththechi-squaredapproximation,suggestingthatthechi-squaredapproximationisgoodinthisexample.>chisq.test(gendergap,simulate.p.value=TRUE,B=10000)Pearson’sChi-squaretestwithsimulatedp-value(basedon10000replicates)data:gendergapX-squared=7.0095,df=NA,p-value=0.032>chisq.test(gendergap,simulate.p.value=TRUE,B=10000)Pearson’sChi-squaretestwithsimulatedp-value(basedon10000replicates)8 data:gendergapX-squared=7.0095,df=NA,p-value=0.0294AnexacttestofindependenceinIJtablesisimplementedinthefunctionﬁsher.testofthectest(classicaltests)package(thispackageisnowpartofthebasedistributionofRandisloadedautomaticallywhenanyofitsfunctionsarecalled).ThistestisjustageneralizationofFisher’sexacttestfor22ta-bles.NotethattheP-valuehereisinprettygoodagreementwiththesimulatedvaluesandthechi-squaredapproximation.>library(ctest)#thisisnotneededwithRversions>=0.99>fisher.test(gendergap)Fisher’sExactTestforCountDatadata:gendergapp-value=0.03115alternativehypothesis:two.sidedJobSatisfactionExampleForthejobsatisfactionexamplegiveninclass,therewassomeworryaboutthechi-squaredapproximationtothenulldistributionoftheteststatistic.HowevertheP-valueagainagreescloselywiththesimulatedP-valuesandP-valueforthetheexacttest:>jobsatis<-c(2,4,13,3,2,6,22,4,0,1,15,8,0,3,13,8)>jobsatis<-matrix(jobsatis,byrow=TRUE,nrow=4)>dimnames(jobsatis)<-list(+Income=c("<5","5-15","15-25",">25"),+Satisfac=c("VD","LS","MS","VS"))>jobsatisSatisfacIncomeVDLSMSVS<5241335-152622415-2501158>2503138>chisq.test(jobsatis)Pearson’sChi-squaretestdata:jobsatisX-squared=11.5243,df=9,p-value=0.2415Warningmessage:Chi-squareapproximationmaybeincorrectin:chisq.test(jobsatis)>chisq.test(jobsatis,simulate.p.value=TRUE,B=10000)Pearson’sChi-squaretestwithsimulatedp-value(basedon10000replicates)data:jobsatisX-squared=11.5243,df=NA,p-value=0.2408>fisher.test(jobsatis)9 Fisher’sExactTestforCountDatadata:jobsatisp-value=0.2315alternativehypothesis:two.sidedA”PROCFREQ”forRHereisalittleRfunctiontodosomeofthecalculationsthatSAS’sPROCFREQdoes.Thereareotherwaystogetallofthisinformation,sothemainideaissimplytoillustratehowyoucanwriteRfunctionstodothesortsofcalculationsthatyoumightﬁndyourselfdoingrepeatedly.Also,youcanalwaysgobacklaterandmodifyyourfunctionaddcapabilitiesthatyouneed.Notethatthisisjustsupposedtoserveasasimpleutilityfunction.IfIwantedittobereallynice,Iwouldwriteageneralmethodfunctionandaprintmethodfortheoutput(youcanalsoﬁndsourceforthisfunctiononthecoursewebpage)."procfreq"<-function(x,digits=4){total<-sum(x)rowsum<-apply(x,1,sum)colsum<-apply(x,2,sum)prop<-x/totalrowprop<-sweep(x,1,rowsum,"/")colprop<-sweep(x,2,colsum,"/")expected<-(matrix(rowsum)%*%t(matrix(colsum)))/totaldimnames(expected)<-dimnames(x)resid<-(x-expected)/sqrt(expected)adj.resid<-resid/sqrt((1-matrix(rowsum)/total)%*%t(1-matrix(colsum)/total))df<-prod(dim(x)-1)X2<-sum(residˆ2)attr(X2,"P-value")<-1-pchisq(X2,df)##Mustbecarefulaboutzerofreqencies.Want0*log(0)=0.tmp<-x*log(x/expected)tmp[x==0]<-0G2<-2*sum(tmp)attr(G2,"P-value")<-1-pchisq(G2,df)list(sample.size=total,row.totals=rowsum,col.totals=colsum,overall.proportions=prop,row.proportions=rowprop,col.proportions=colprop,expected.freqs=expected,residuals=resid,adjusted.residuals=adj.resid,chi.square=X2,likelihood.ratio.stat=G2,df=df)}Ifyousavethisfunctiondeﬁnitioninaﬁlecalled“procfreq.R”andthen“source”itintoR,youcanuseitjustlikeanybuilt-infunction.Hereisprocfreqinactionontheincomedata:10 >source("procfreq.R")>jobsat.freq<-procfreq(jobsatis)>names(jobsat.freq)[1]"sample.size""row.totals"[3]"col.totals""overall.proportions"[5]"row.proportions""col.proportions"[7]"expected.freqs""residuals"[9]"adjusted.residuals""chi.square"[11]"likelihood.ratio.stat""df">jobsat.freq$expectedSatisfacIncomeVDLSMSVS<50.84615382.96153813.326924.8653855-151.30769234.57692320.596157.51923115-250.92307693.23076914.538465.307692>250.92307693.23076914.538465.307692>round(jobsat.freq$adjusted.residuals,2)SatisfacIncomeVDLSMSVS<51.440.73-0.16-1.085-150.750.870.60-1.7715-25-1.12-1.520.221.51>25-1.12-0.16-0.731.51>jobsat.freq$chi.square[1]11.52426attr(,"P-value")[1]0.2414764>jobsat.freq$likelihood.ratio.stat[1]13.46730attr(,"P-value")[1]0.1425759Fisher’sExactTestAsmentionedabove,Fisher’sexacttestisimplementedasﬁsher.testinthectest(classicaltests)package.HereistheteatastingexampleinR.Notethatthedefaultistotestthetwo-sidedalternative.>library(ctest)#notneededwithRversions>=0.99>tea<-matrix(c(3,1,1,3),ncol=2)>dimnames(tea)<-+list(Poured=c("milk","tea"),Guess=c("milk","tea"))>teaGuessPouredmilkteamilk31tea13>fisher.test(tea)Fisher’sExactTestforCountDatadata:teap-value=0.485711 alternativehypothesis:trueoddsratioisnotequalto195percentconfidenceinterval:0.2117329621.9337505sampleestimates:oddsratio6.408309>fisher.test(tea,alternative="greater")Fisher’sExactTestforCountDatadata:teap-value=0.2429alternativehypothesis:trueoddsratioisgreaterthan195percentconfidenceinterval:0.3135693Infsampleestimates:oddsratio6.40830912 Chapter3Three-WayContingencyTablesTheCochran-Mantel-Haenszeltestisimplementedinthemantelhaen.testfunctionofthectestlibrary.TheDeathPenaltyExampleHereweillustratetheuseofmantelhaen.testaswellastheftablefunctiontopresentamultiwaycontigencytableina“ﬂat”format.BothoftheseareincludedinbaseRasofversion0.99.Notethatbydefaultmantelhaen.testappliesacontinuitycorrectionindoingthetest.>dp<-c(53,414,11,37,0,16,4,139)>dp<-array(dp,dim=c(2,2,2))>dimnames(dp)<-list(DeathPen=c("yes","no"),+Defendant=c("white","black"),Victim=c("white","black"))>dp,,Victim=whiteDefendantDeathPenwhiteblackyes5311no41437,,Victim=blackDefendantDeathPenwhiteblackyes04no16139>ftable(dp,row.vars=c("Victim","Defendant"),col.vars="DeathPen")DeathPenyesnoVictimDefendantwhitewhite53414black1137blackwhite016black4139>mantelhaen.test(dp)Mantel-Haenszelchi-squaretestwithcontinuitycorrection13 data:dpMantel-HaenszelX-square=4.779,df=1,p-value=0.02881>mantelhaen.test(dp,correct=FALSE)Mantel-Haenszelchi-squaretestwithoutcontinuitycorrectiondata:dpMantel-HaenszelX-square=5.7959,df=1,p-value=0.01606SmokingandLungCancerinChinaExampleThisisabiggerexamplethatusestheCochran-Mantel-Haenszeltest.Firstwewillenterthedataasa“dataframe”insteadofasanarrayasinthepreviousexample.Thisismostlyjusttodemonstrateanotherwaytodothings.>cities<-c("Beijing","Shanghai","Shenyang","Nanjing","Harbin",+"Zhengzhou","Taiyuan","Nanchang")>City<-factor(rep(cities,rep(4,length(cities))),levels=cities)>Smoker<-+factor(rep(rep(c("Yes","No"),c(2,2)),8),levels=c("Yes","No"))>Cancer<-factor(rep(c("Yes","No"),16),levels=c("Yes","No"))>Count<-c(126,100,35,61,908,688,497,807,913,747,336,598,235,+172,58,121,402,308,121,215,182,156,72,98,60,99,11,43,104,89,21,36)>chismoke<-data.frame(City,Smoker,Cancer,Count)>chismokeCitySmokerCancerCount1BeijingYesYes1262BeijingYesNo1003BeijingNoYes354BeijingNoNo615ShanghaiYesYes9086ShanghaiYesNo6887ShanghaiNoYes4978ShanghaiNoNo8079ShenyangYesYes91310ShenyangYesNo74711ShenyangNoYes33612ShenyangNoNo59813NanjingYesYes23514NanjingYesNo17215NanjingNoYes5816NanjingNoNo12117HarbinYesYes40218HarbinYesNo30819HarbinNoYes12120HarbinNoNo21521ZhengzhouYesYes18222ZhengzhouYesNo15623ZhengzhouNoYes7224ZhengzhouNoNo9825TaiyuanYesYes6026TaiyuanYesNo9914 27TaiyuanNoYes1128TaiyuanNoNo4329NanchangYesYes10430NanchangYesNo8931NanchangNoYes2132NanchangNoNo36>rm(cities,City,Smoker,Cancer,Count)#CleaningupAlternatively,wecanreadthedatadirectlyfromtheﬁlechismoke.dat.Notethatifwewant“Yes”before“No”wehavetorelevelthefactors,becauseread.tableputsthelevelsinalphabeticalorder.>chismoke<-read.table("chismoke.dat",header=TRUE)>names(chismoke)[1]"City""Smoker""Cancer""Count">levels(chismoke$Smoker)[1]"No""Yes">chismoke$Smoker<-relevel(chismoke$Smoker,c("Yes","No"))>levels(chismoke$Smoker)[1]"Yes""No">levels(chismoke$Cancer)[1]"No""Yes">chismoke$Cancer<-relevel(chismoke$Cancer,c("Yes","No"))>levels(chismoke$Cancer)[1]"Yes""No"Ifyouusethefunctionread.table2inthesta4504package,youwillnothavetorelevelthefactors.Ofcourseifyouhavethepackage,thenNow,returningtotheexample:>attach(chismoke)>x<-tapply(Count,list(Smoker,Cancer,City),c)>detach(chismoke)>names(dimnames(x))<-c("Smoker","Cancer","City")>#ftablewillbeinthenextreleaseofR>ftable(x,row.vars=c("City","Smoker"),col.vars="Cancer")CancerYesNoCitySmokerBeijingYes126100No3561ShanghaiYes908688No497807ShenyangYes913747No336598NanjingYes235172No58121HarbinYes402308No121215ZhengzhouYes182156No7298TaiyuanYes609915 No1143NanchangYes10489No2136>ni.k<-apply(x,c(1,3),sum)>ni.kCitySmokerBeijingShanghaiShenyangNanjingHarbinZhengzhouYes22615961660407710338No961304934179336170CitySmokerTaiyuanNanchangYes159193No5457>n.jk<-apply(x,c(2,3),sum)>n.jkCityCancerBeijingShanghaiShenyangNanjingHarbinZhengzhouYes16114051249293523254No16114951345293523254CityCancerTaiyuanNanchangYes71125No142125>n..k<-apply(x,3,sum)>mu11k<-ni.k[1,]*n.jk[1,]/n..k>mu11kBeijingShanghaiShenyangNanjingHarbinZhengzhou113.0000773.2345799.2830203.5000355.0000169.0000TaiyuanNanchang53.000096.5000>sum(mu11k)[1]2562.517>sum(x[1,1,])[1]2930>varn11k<-ni.k[1,]*ni.k[2,]*n.jk[1,]*n.jk[2,]/(n..kˆ2*(n..k-1))>sum(varn11k)[1]482.0612>>MH<-(sum(x[1,1,]-mu11k))ˆ2/sum(varn11k)>MH[1]280.137516 Chapter4Chapter4:GeneralizedLinearModelsSnoringandHeartDiseaseThiscoverstheexampleinSection4.2.2andalsoExercise4.2.ThereareseveralwaystoﬁtalogisticregressioninRusingtheglmfunction(moreonthisinChapter5).Inthemethodillustratedhere,theresponseinthemodelformula(e.g.,snoringinsnoringscores.a)isamatrixwhoseﬁrstcolumnisthenumberof“successes”andwhosesecondcolumnisthenumberof“failures”foreachobservedbinomial.>snoring<-+matrix(c(24,1355,35,603,21,192,30,224),ncol=2,byrow=TRUE)>dimnames(snoring)<-+list(snore=c("never","sometimes","often","always"),+heartdisease=c("yes","no"))>snoringheartdiseasesnoreyesnonever241355sometimes35603often21192always30224>scores.a<-c(0,2,4,5)>scores.b<-c(0,2,4,6)>scores.c<-0:3>scores.d<-1:4>#Fittingandcomparinglogisticregressionmodels>snoring.lg.a<-glm(snoring˜scores.a,family=binomial())>snoring.lg.b<-glm(snoring˜scores.b,family=binomial())>snoring.lg.c<-glm(snoring˜scores.c,family=binomial())>snoring.lg.d<-glm(snoring˜scores.d,family=binomial())>coef(snoring.lg.a)(Intercept)scores.a-3.86624810.3973366>coef(snoring.lg.b)(Intercept)scores.b-3.77737550.3272648>coef(snoring.lg.c)(Intercept)scores.c-3.77737550.6545295>coef(snoring.lg.d)17 (Intercept)scores.d-4.43190500.6545295>predict(snoring.lg.a,type="response")#comparetotable4.1[1]0.020507420.044295110.093054110.13243885>predict(snoring.lg.b,type="response")[1]0.022370770.042174660.078109380.14018107>predict(snoring.lg.c,type="response")[1]0.022370770.042174660.078109380.14018107>predict(snoring.lg.d,type="response")[1]0.022370770.042174660.078109380.14018107Notethatthedefaultlinkfunctionwiththebinomialfamilyisthelogitlink.Todoaprobitanalysis,sayusingtheoriginalscoresusedinTable4.1:>snoring.probit<-+glm(snoring˜scores.a,family=binomial(link="probit"))>summary(snoring.probit)Call:glm(formula=snoring˜scores.a,family=binomial(link="probit"))DevianceResiduals:[1]-0.61881.03880.1684-0.6175Coefficients:EstimateStd.ErrorzvaluePr(>|z|)(Intercept)-2.060550.07017-29.367<2e-16***scores.a0.187770.023487.9971.28e-15***---Signif.codes:0‘***’0.001‘**’0.01‘*’0.05‘.’0.1‘’1(Dispersionparameterforbinomialfamilytakentobe1)Nulldeviance:65.9045on3degreesoffreedomResidualdeviance:1.8716on2degreesoffreedomAIC:26.124NumberofFisherScoringiterations:3>predict(snoring.probit,type="response")#comparewithTable4.1[1]0.019672920.045993250.095187620.13099512Thereisnoidentitylinkprovidedforthebinomialfamily,sowecannotreproducethethirdﬁtgiveninTable4.1.Thisisnotsuchagreatlossofcourse,sincelinearprobabilitymodelsarerarelyused.GroupedCrabsDataThisistheexampledoneinclass(slightlydifferentfromthatdoneinthetext.Thedataareintheﬁle“crabs.dat”(availableonthecoursewebsite)andcanbereadintoRusingtheread.tablefunction:>crabs<-read.table("crabs.dat",header=TRUE)Alternatively,thesedatacanaccesseddirectlyfromthesta4504packagebytyping18 >library(sta4504)>data(crabs)Bytheway,youwillnoticethatthesedatalookslightlydifferentfromthosegiveninTable4.2(pp.82–3)ofthetext.Thisisbecauseherethecolorcodesgofrom2to5(theyareonemorethanthecodesusedinthetext),andtheweightsarerecordedingrams,notkilograms.Forthisanalysiswewanttocreateagroupedversionofthedatabydividingthefemalecrabsintoweightcategories.Againthegroupeddataisavaiblealreadyfromthesta4504package(data(crabsgp)),buthereishowtheywerecreatedfromtherawdata,incaseyouareinterested.>attach(crabs)>table(cut(weight,+breaks=c(0,1775,2025,2275,2525,2775,3025,Inf),+dig.lab=4))(0,1775](1775,2025](2025,2275](2275,2525](2525,2775]1631302224(2775,3025](3025,Inf]2129>grp<-cut(weight,+breaks=c(0,1775,2025,2275,2525,2775,3025,Inf),+dig.lab=4)>cases<-table(grp)>avgwt<-tapply(weight,grp,mean)>avgwt(0,1775](1775,2025](2025,2275](2275,2525](2525,2775]1526.5621917.4842174.1672372.7272642.708(2775,3025](3025,Inf]2900.8103310.345>avgwt<-round(avgwt/1000,2)>avgwt(0,1775](1775,2025](2025,2275](2275,2525](2525,2775]1.531.922.172.372.64(2775,3025](3025,Inf]2.903.31>totsat<-tapply(satell,grp,sum)>totsat(0,1775](1775,2025](2025,2275](2275,2525](2525,2775]1355727861(2775,3025](3025,Inf]86140>crabsgp<-+data.frame(cbind(weight=avgwt,cases=cases,satell=totsat))>crabsgpweightcasessatell(0,1775]1.531613(1775,2025]1.923155(2025,2275]2.173072(2275,2525]2.372278(2525,2775]2.642461(2775,3025]2.902186(3025,Inf]3.312914019 >rm(cases,avgwt,totsat)#cleaningup>detach(crabs)Toﬁttheloglinearmodelonthesedata,wejustusetheglmcommand.Heretheﬁrstargumentisthemodelformulafortheﬁt,thesecondistheoffset,andthethirdspeciﬁestherandomcomponentfortheGLM.Notethelog-linkisthedefaultforthePoissonfamily.Finally“data=crabsgp”tellsthefunctionthatthedatawillbetakenfromthedataframecrabsgp.>crabsgp.glm1<-+glm(satell˜weight,offset=log(cases),family=poisson,+data=crabsgp)Theresultscanbeviewedinanumberofways.Perhapsmostimportantlythereisthesummaryfunction:>summary(crabsgp.glm1)Call:glm(formula=satell˜weight,family=poisson,data=crabsgp,offset=log(cases))DevianceResiduals:(0,1775](1775,2025](2025,2275](2275,2525]-2.1285-0.30810.65642.7036(2525,2775](2775,3025](3025,Inf]-1.67760.7307-0.6560Coefficients:EstimateStd.ErrorzvaluePr(>|z|)(Intercept)-0.78700.2229-3.530.000415***weight0.73000.08238.87<2e-16***---Signif.codes:0‘***’0.001‘**’0.01‘*’0.05‘.’0.1‘’1(Dispersionparameterforpoissonfamilytakentobe1)Nulldeviance:96.312on6degreesoffreedomResidualdeviance:16.144on5degreesoffreedomAIC:61.748NumberofFisherScoringiterations:3NotethatsummaryincludestheWaldteststatistic,z=8:87andtheassociatedP-valueforthetwo-tailedtest,21016(thisisclosetothelimitsofmachineaccuracy,solet’sjustsaytheP-valueis“veryclosetozero”).Italsoshowsthedeviance(labelledresidualdeviance)anditsassociateddegreesoffreedom.The“nulldeviance”isthedevianceforthe“interceptonly”model,sothelikelihoodratiotestcanbecar-riedoutjustbydifferencingthesetwodeviancesandtheirdegreesoffreedomandreferringtoachi-squaredistribution:>96.312-16.144[1]80.168>1-pchisq(96.312-16.144,1)[1]020 Yes,that’saverysmallP-valueagain.Finally,AICistheAkaikeInformationCriterion,whichisoftenusedformodelselection(thesmallerthebetterfornestedmodels).Abetterwaytogettheresultsofthelikelihoodratiotestistousetheanovafunction,whichprintsasortofANOVAtable:>anova(crabsgp.glm1,test="Chisq")AnalysisofDevianceTableModel:poisson,link:logResponse:satellTermsaddedsequentially(firsttolast)DfDevianceResid.DfResid.DevP(>|Chi|)NULL696.312weight180.168516.1440.000Thereisanotherapproachtogettingtheresultsofthelikelihoodratiotestthatgeneralizesbettertomorecomplicatedmodelcomparisons.Thisinvolvesﬁttingthenullmodelandthencomparingmodelsusingtheanovafunction.Thenullmodelcaneitherbeﬁtasbeforeusingtheglmfunctionorusingtheupdatefunction.Wewillshowthelatterhere.Notethattheinterceptonlymodelisdenotedbysatell˜1.>crabsgp.glm0<-update(crabsgp.glm1,satell˜1)>crabsgp.glm0Call:glm(formula=satell˜1,family=poisson,data=crabsgp,offset=log(cases))Coefficients:(Intercept)1.071DegreesofFreedom:6Total(i.e.Null);6ResidualNullDeviance:96.31ResidualDeviance:96.31AIC:139.9>summary(crabsgp.glm1)Call:glm(formula=satell˜weight,family=poisson,data=crabsgp,offset=log(cases))DevianceResiduals:(0,1775](1775,2025](2025,2275](2275,2525](2525,2775]-2.1285-0.30810.65642.7036-1.6776(2775,3025](3025,Inf]0.7307-0.6560Coefficients:EstimateStd.ErrorzvaluePr(>|z|)(Intercept)-0.78700.2229-3.530.000415***21 weight0.73000.08238.87<2e-16***---Signif.codes:0‘***’0.001‘**’0.01‘*’0.05‘.’0.1‘’1(Dispersionparameterforpoissonfamilytakentobe1)Nulldeviance:96.312on6degreesoffreedomResidualdeviance:16.144on5degreesoffreedomAIC:61.748NumberofFisherScoringiterations:3>anova(crabsgp.glm0,crabsgp.glm1)AnalysisofDevianceTableResponse:satellResid.DfResid.DevDfDeviance1696.312weight516.144180.168TheresidualsfunctionwillextracttheunadjustedPearson(orchi-square)anddevianceresiduals(thedefault)fromaﬁttedglmobject.Thereseemstobenobuiltinfunctionforcomputingtheadjusted,orstandardizedresiduals,butlm.inﬂuencewillextractthediagonalofthesocalled“hatmatrix”,andthisisenoughtoconstructtheadjustedresiduals:>round(residuals(crabsgp.glm1),2)(0,1775](1775,2025](2025,2275](2275,2525](2525,2775]-2.13-0.310.662.70-1.68(2775,3025](3025,Inf]0.73-0.66>round(residuals(crabsgp.glm1,type="pearson"),2)(0,1775](1775,2025](2025,2275](2275,2525](2525,2775]-1.96-0.310.672.86-1.62(2775,3025](3025,Inf]0.74-0.65>h<-lm.influence(crabsgp.glm1)$hat>round(residuals(crabsgp.glm1,type="deviance")/sqrt(1-h),2)(0,1775](1775,2025](2025,2275](2275,2525](2525,2775]-2.43-0.370.752.92-1.82(2775,3025](3025,Inf]0.81-1.25>round(residuals(crabsgp.glm1,type="pearson")/sqrt(1-h),2)(0,1775](1775,2025](2025,2275](2275,2525](2525,2775]-2.24-0.370.763.09-1.76(2775,3025](3025,Inf]0.82-1.24Ofcourseyoucouldeasilywriteasimplefunctiontoautomatethisprocess:adj.residuals<-function(fit,...){22 residuals(fit,...)/sqrt(1-lm.influence(fit)$hat)}Nowtheabovecomputationsbecome>round(adj.residuals(crabsgp.glm1),2)(0,1775](1775,2025](2025,2275](2275,2525](2525,2775]-2.43-0.370.752.92-1.82(2775,3025](3025,Inf]n0.81-1.25>round(adj.residuals(crabsgp.glm1,type="pearson"),2)(0,1775](1775,2025](2025,2275](2275,2525](2525,2775]-2.24-0.370.763.09-1.76(2775,3025](3025,Inf]0.82-1.24Finally,here’showtocalculatethegroupmeansandvariancesdiscussedinlecturewhenexaminingoverdispersion.HereIusethetapplyfunction,whichappliesafunction(herethemeanandvarfunctions)toavariable(satell)aftergroupingaccordingtosomefactor(s)(grp).Recallthatthevariablegrpwascreatedealier.Thepredictfunctiongivestheactualﬁttedvaluesfromthemodel,andhereweneedtodividethisbythenumberofcasesineachgrouptogettheﬁtted“rates”.NotethatIattachthecrabsdataframetomakeitsvariableslocal(otherwiseIwouldhavetotypecrabs$satellforexample).>attach(crabs)#Makesthevariablesinthedataframelocal>tapply(satell,grp,mean)(0,1775](1775,2025](2025,2275](2275,2525](2525,2775]0.8125001.7741942.4000003.5454552.541667(2775,3025](3025,Inf]4.0952384.827586>tapply(satell,grp,var)(0,1775](1775,2025](2025,2275](2275,2525](2525,2775]1.8958337.3139787.97241412.5454556.432971(2775,3025](3025,Inf]12.49047610.647783>predict(crabsgp.glm1,type="response")(0,1775](1775,2025](2025,2275](2275,2525](2525,2775]22.2533157.3163066.5725656.4940575.05670(2775,3025](3025,Inf]79.40089147.90622>predict(crabsgp.glm1,type="response")/crabsgp$cases(0,1775](1775,2025](2025,2275](2275,2525](2525,2775]1.3908321.8489132.2190852.5679113.127362(2775,3025](3025,Inf]3.7809955.100214Sohere’showtogetRtoprintthetablefromclass:>round(cbind(+weight=crabsgp$weight,+fitted.mean=predict(crabsgp.glm1,type="response")/crabsgp$cases,+sample.mean=tapply(satell,grp,mean),+sample.variance=tapply(satell,grp,var)),+2)23 weightfitted.meansample.meansample.variance(0,1775]1.531.390.811.90(1775,2025]1.921.851.777.31(2025,2275]2.172.222.407.97(2275,2525]2.372.573.5512.55(2525,2775]2.643.132.546.43(2775,3025]2.903.784.1012.49(3025,Inf]3.315.104.8310.65>detach(crabs)UngroupedAnalysisofCrabDataHere’sthesinglepredictorPoissonloglinearmodelﬁttotheungroupedcrabdata.>crabs.glm1<-glm(satell˜weight,family=poisson(),data=crabs)>summary(crabs.glm1)Call:glm(formula=satell˜weight,family=poisson(),data=crabs)DevianceResiduals:Min1QMedian3QMax-2.9307-1.9981-0.56270.92984.9992Coefficients:EstimateStd.ErrorzvaluePr(>|z|)(Intercept)-4.284e-011.789e-01-2.3940.0167*weight5.893e-046.502e-059.064<2e-16***---Signif.codes:0‘***’0.001‘**’0.01‘*’0.05‘.’0.1‘’1(Dispersionparameterforpoissonfamilytakentobe1)Nulldeviance:632.79on172degreesoffreedomResidualdeviance:560.87on171degreesoffreedomAIC:920.16NumberofFisherScoringiterations:5>anova(crabs.glm1,test="Chisq")AnalysisofDevianceTableModel:poisson,link:logResponse:satellTermsaddedsequentially(firsttolast)DfDevianceResid.DfResid.DevP(>|Chi|)NULL172632.79weight171.93171560.870.0024 APlotHere’showIproducedtheplotgiveninclass.Themainthinggoingonistheuseofthepredictfunctiontogetthepredictedmeanforthemodelonagridofxvalues.NotethatIhavetobecarefulaboutthefactthatweightismeasuredinkilogramsinonedatasetandingramsintheother.>x<-seq(1.2,5.2,length=100)>gp.muhat<-+predict(crabsgp.glm1,+newdata=data.frame(weight=x,cases=1),type="response")>muhat<-+predict(crabs.glm1,+newdata=data.frame(weight=1000*x),type="response")>grp.means<-tapply(satell,grp,mean)>plot(crabsgp$weight,grp.means,+xlab="Weight",ylab="NumberofSatellites")>lines(x,gp.muhat)>lines(x,muhat,lty=2)#$25 Chapter5Chapter5:LogisticRegressionTherearethreewaystoﬁtabinomialregressioninRusingglm().FollowingAnIntroductiontoR[2]andVenablesandRipley[3]:Theresponsemaybeavectorofbinary(0/1)orlogicalresponses.InthiscaseitiseasilyhandledsimilarlytoﬁttinganyotherGLM.Iftheresponseisaanumericvectorrepresentingproportionsofsuccesses,thenthenumberoftrialsfortheproportionsmustbegivenasavectorofweightsusingtheweightsargument.Iftheresponseisatwocolumnmatrixitisassumedthattheﬁrstcolumnholdsthenumberofsuccessesforthetrialsandthesecondholdsthenumberoffailures.Inthiscasenoweightsargumentisrequired.Threelinkfunctions(logit,probit,andcloglog(complementarylog-log,anasymmetriclinkfunc-tion)areprovided.Ofcourseinthischapterweareconcernedwiththelogitlink.SnoringandHeartDiseaseRevisitedHere’sanotherlookatthesnoringexamplefromthelastchapter,justtoillustratetwoofthethreewaysofcarryingoutalogisticregressionwithsuchdata.Hereweinitiallyenterthedataasadataframeinsteadofamatrix,withscores(0,2,4,5)representingthelevelsofsnoring:>snoring<-data.frame(+snore=c(0,2,4,5),+heartdisyes=c(24,35,21,30),+n=c(1379,638,213,254))>snoringsnoreheartdisyesn10241379223563834212134530254Hereishowwemightﬁtalogisticregressionmodelusingtheproportionofcaseswithheartdiseaseastheresponseandthenumberofcasesastheweights:>snoring.lg<-+glm(heartdisyes/n˜snore,weights=n,family=binomial(),data=snoring)>snoring.lgCall:glm(formula=heartdisyes/n˜snore,family=binomial(),data=snoring,weights=n)26 Coefficients:(Intercept)snore-3.86620.3973DegreesofFreedom:3Total(i.e.Null);2ResidualNullDeviance:65.9ResidualDeviance:2.809AIC:841.7Toﬁtthemodelwithamatrixofsuccessandfailurecountsastheresponse,weﬁrstaddthismatrixtothedataframe:>snoring$YN<-cbind(snoring$heartdisyes,snoring$n-snoring$heartdisyes)>snoringsnoreheartdisyesnYN.1YN.210241379241355223563835603342121321192453025430224Nowexactlythesameﬁtisachievedasbeforebytreatingthismatrixastheresponse(withnoweightsargument):>snoring.lg<-+glm(YN˜snore,family=binomial(),data=snoring)>snoring.lgCall:glm(formula=YN˜snore,family=binomial(),data=snoring)Coefficients:(Intercept)snore-3.86620.3973DegreesofFreedom:3Total(i.e.Null);2ResidualNullDeviance:65.9ResidualDeviance:2.809AIC:27.06CrabsHerewewillﬁtafewlogisticregressionmodelstothecrabsdata.YoumayalreadyhavethesedataavailableinRfromworkingwiththeminChapter4,oryoucanreadtheminasdescribedthere.Iwillloadthemfromthesta4504packagehereandaddalogicalvariablepsatindicatingthepresenceofabsenceofsatellites.Thiswillbeusedasthebinaryresponsewhenﬁttinglogisticregressionmodelstotheungroupeddata.>library(sta4504)>data(crabs)>names(crabs)[1]"color""spine""width""satell""weight">crabs$psat<-crabs$satell>0Weﬁrstﬁtasimplelogisticregressionwithcrabweightaspredictor.>crabs.lg.1<-glm(psat˜weight,family=binomial(),data=crabs)>summary(crabs.lg.1)27 Call:glm(formula=psat˜weight,family=binomial(),data=crabs)DevianceResiduals:Min1QMedian3QMax-2.1108-1.07490.54260.91221.6285Coefficients:EstimateStd.ErrorzvaluePr(>|z|)(Intercept)-3.69463380.8779167-4.2082.57e-05***weight0.00181510.00037554.8331.34e-06***---Signif.codes:0‘***’0.001‘**’0.01‘*’0.05‘.’0.1‘’1(Dispersionparameterforbinomialfamilytakentobe1)Nulldeviance:225.76on172degreesoffreedomResidualdeviance:195.74on171degreesoffreedomAIC:199.74TocomparethismodelwithanullmodelhavingnopredictorswecaneitherusetheWaldtestabove,withz=4:833andP-value<:0001,orathelikelihoodratiotest.>crabs.lg.0<-glm(psat˜1,family=binomial(),data=crabs)>anova(crabs.lg.0,crabs.lg.1,test="Chisq")AnalysisofDevianceTableResponse:psatResid.DfResid.DevDfDevianceP(>|Chi|)1172225.759weight171195.737130.0214.273e-08Notethatyoudon’tactuallyneedtoﬁtthenullmodelhere:>anova(crabs.lg.1,test="Chisq")AnalysisofDevianceTableModel:binomial,link:logitResponse:psatTermsaddedsequentially(firsttolast)DfDevianceResid.DfResid.DevP(>|Chi|)NULL172225.759weight130.021171195.7374.273e-0828 Bibliography[1]AlanAgresti.AnIntroductiontoCategoricalDataAnalysis.JohnWiley&Sons,Inc.,NewYork,1996.[2]RCoreDevelopmentTeam.AnIntroductiontoR,2000.[3]W.N.VenablesandB.D.Ripley.ModernAppliedStatisticsWithS-Plus.Springer-Verlag,NewYork,secondedition,1997.29 AppendixAAppendixA.1HelpFilesforSomeRFunctionsA.1.1applyapplyApplyFunctionsOverArrayMarginsapplyUsageapply(x,MARGIN,FUN,...)Argumentsxthearraytobeused.MARGINavectorgivingthesubscriptswhichthefunctionwillbeappliedover.1indicatesrows,2indicatescolumns,c(1,2)indicatesrowsandcolumns.FUNthefunctiontobeapplied.Inthecaseoffunctionslike+,%*%,etc.,thefunctionnamemustbequoted....optionalargumentstoFUN.ValueIfeachcalltoFUNreturnsavectoroflengthn,thenapplyreturnsanarrayofdimensionc(n,dim(x)[MARGIN])ifn>1.Ifnequals1,applyreturnsavectorifMARGINhaslength1andanarrayofdimensiondim(x)[MARGIN]otherwise.IfthecallstoFUNreturnvectorsofdifferentlengths,applyreturnsalistoflengthdim(x)[MARGIN].SeeAlsolapply,tapply,andconveniencyfunctionssweepandaggregate.30 Examples##Computerowandcolumnsumsforamatrix:x<-cbind(x1=3,x2=c(4:1,2:5))dimnames(x)[[1]]<-letters[1:8]apply(x,2,mean,trim=.2)col.sums<-apply(x,2,sum)row.sums<-apply(x,1,sum)rbind(cbind(x,Rtot=row.sums),Ctot=c(col.sums,sum(col.sums)))all(apply(x,2,is.vector))#TRUE[wasnotinR<=0.63.2]##Sortthecolumnsofamatrixapply(x,2,sort)##-functionwithextraargs:cave<-function(x,c1,c2)c(mean(x[c1]),mean(x[c2]))apply(x,1,cave,c1="x1",c2=c("x1","x2"))ma<-matrix(c(1:4,1,6:8),nr=2)maapply(ma,1,table)#-->alistoflength2apply(ma,1,quantile)#5xnmatrixwithrownamesall(dim(ma)==dim(apply(ma,1:2,sum)))##wasn’tokbeforeR0.63.1A.1.2ftableftableFlatContingencyTablesftableDescriptionCreateandmanipulate“ﬂat”contingencytables.Usageftable(...,exclude=c(NA,NaN),row.vars=NULL,col.vars=NULL)ftable2table(x)Arguments...Robjectswhichcanbeinterpretedasfactors(includingcharacterstrings),oralist(ordataframe)whosecomponentscanbesointerpreted,oracontingencytableobjectofclass"table"or"ftable".excludevaluestouseintheexcludeargumentoffactorwheninterpretingnon-factorob-jects.row.varsavectorofintegersgivingthenumbersofthevariables,oracharactervectorgivingthenamesofthevariablestobeusedfortherowsoftheﬂatcontingencytable.31 col.varsavectorofintegersgivingthenumbersofthevariables,oracharactervectorgivingthenamesofthevariablestobeusedforthecolumnsoftheﬂatcontingencytable.xanarbitraryRobject.Detailsftablecreates“ﬂat”contingencytables.Similartotheusualcontingencytables,thesecontainthecountsofeachcombinationofthelevelsofthevariables(factors)involved.Thisinformationisthenre-arrangedasamatrixwhoserowsandcolumnscorrespondtouniquecombinationsofthelevelsoftherowandcolumnvariables(asspeciﬁedbyrow.varsandcol.vars,respectively).Thecombinationsarecreatedbyloopingoverthevariablesinreverseorder(sothatthelevelsofthe“left-most”variablevarytheslowest).Displayingacontingencytableinthisﬂatmatrixform(viaprint.ftable,theprintmethodforobjectsofclass"ftable")isoftenpreferabletoshowingitasahigher-dimensionalarray.ftableisagenericfunction.Itsdefaultmethod,ftable.default,ﬁrstcreatesacontingencytableinarrayformfromallargumentsexceptrow.varsandcol.vars.Iftheﬁrstargumentisofclass"table",itrepresentsacontingencytableandisusedasis;ifitisaﬂattableofclass"ftable",theinformationitcontainsisconvertedtotheusualarrayrepresentationusingftable2table.Otherwise,theargumentsshouldbeRobjectswhichcanbeinterpretedasfactors(includingcharacterstrings),oralist(ordataframe)whosecomponentscanbesointerpreted,whicharecross-tabulatedusingtable.Then,theargumentsrow.varsandcol.varsareusedtocollapsethecontingencytableintoﬂatform.Ifneitherofthesetwoisgiven,thelastvariableisusedforthecolumns.Ifbotharegivenandtheirunionisapropersubsetofallvariablesinvolved,theothervariablesaresummedout.Functionftable.formulaprovidesaformulamethodforcreatingﬂatcontingencytables.ftable2tableconvertsacontingencytableinﬂatmatrixformtooneinstandardarrayform.Valueftablereturnsanobjectofclass"ftable",whichisamatrixwithcountsofeachcombinationofthelevelsofvariableswithinformationonthenamesandlevelsofthe(rowandcolumns)variablesstoredasattributes"row.vars"and"col.vars".SeeAlsoftable.formulafortheformulainterface;tablefor“ordinary”cross-tabulation.Examples##Startwithacontingencytable.data(Titanic)ftable(Titanic,row.vars=1:3)ftable(Titanic,row.vars=1:2,col.vars="Survived")ftable(Titanic,row.vars=2:1,col.vars="Survived")##Startwithadataframe.data(mtcars)x<-ftable(mtcars[c("cyl","vs","am","gear")])xftable(x,row.vars=c(2,4))A.1.3prop.test32 prop.testTestforEqualorGivenProportionsprop.testDescriptionprop.testcanbeusedfortestingthenullthattheproportions(probabilitiesofsuccess)inseveralgroupsarethesame,orthattheyequalcertaingivenvalues.Usageprop.test(x,n=NULL,p=NULL,alternative="two.sided",conf.level=0.95,correct=TRUE)Argumentsxavectorofcountsofsuccessesoramatrixwith2columnsgivingthecountsofsuc-cessesandfailures,respectively.navectorofcountsoftrials;ignoredifxisamatrix.pavectorofprobabilitiesofsuccess.Thelengthofpmustbethesameasthenumberofgroupsspeciﬁedbyx,anditselementsmustbegreaterthan0andlessthan1.alternativeindicatesthealternativehypothesisandmustbeoneof"two.sided","greater"or"less".Youcanspecifyjusttheinitialletter.Onlyusedfortestingthenullthatasingleproportionequalsagivenvalue,orthattwoproportionsareequal;ignoredotherwise.conf.levelconﬁdencelevelofthereturnedconﬁdenceinterval.Mustbeasinglenumberbe-tween0and1.Onlyusedwhentestingthenullthatasingleproportionequalsagivenvalue,orthattwoproportionsareequal;ignoredotherwise.correctalogicalindicatingwhetherYates’continuitycorrectionshouldbeapplied.DetailsOnlygroupswithﬁnitenumbersofsuccessesandfailuresareused.Countsofsuccessesandfailuresmustbenonnegativeandhencenotgreaterthanthecorrespondingnumbersoftrialswhichmustbepositive.Allﬁnitecountsshouldbeintegers.IfpisNULLandthereismorethanonegroup,thenulltestedisthattheproportionsineachgrouparethesame.Iftherearetwogroups,thealternativesarethattheprobabilityofsuccessintheﬁrstgroupislessthan,notequalto,orgreaterthantheprobabilityofsuccessinthesecondgroup,asspeciﬁedbyalternative.Aconﬁdenceintervalforthedifferenceofproportionswithconﬁdencelevelasspeciﬁedbyconf.levelandclippedto[1;1]isreturned.Continuitycorrectionisusedonlyifitdoesnotexceedthedifferenceofthesampleproportionsinabsolutevalue.Otherwise,iftherearemorethan2groups,thealternativeisalways"two.sided",thereturnedconﬁdenceintervalisNULL,andcontinuitycorrectionisneverused.Ifthereisonlyonegroup,thenthenulltestedisthattheunderlyingprobabilityofsuccessisp,or.5ifpisnotgiven.Thealternativeisthattheprobabilityofsuccessiflessthan,notequalto,orgreaterthanpor0.5,respectively,asspeciﬁedbyalternative.Aconﬁdenceintervalfortheunderlyingproportionwithconﬁdencelevelasspeciﬁedbyconf.levelandclippedto[0;1]isreturned.Continuitycorrec-tionisusedonlyifitdoesnotexceedthedifferencebetweensampleandnullproportionsinabsolutevalue.33 Finally,ifpisgivenandtherearemorethan2groups,thenulltestedisthattheunderlyingprobabilitiesofsuccessarethosegivenbyp.Thealternativeisalways"two.sided",thereturnedconﬁdenceintervalisNULL,andcontinuitycorrectionisneverused.ValueAlistwithclass"htest"containingthefollowingcomponents:statisticthevalueofPearson’schi-squareteststatistic.parameterthedegreesoffreedomoftheapproximatechi-squaredistributionoftheteststatistic.p.valuethep-valueofthetest.estimateavectorwiththesampleproportionsx/n.conf.intaconﬁdenceintervalforthetrueproportionifthereisonegroup,orforthedifferenceinproportionsifthereare2groupsandpisnotgiven,orNULLotherwise.InthecaseswhereitisnotNULL,thereturnedconﬁdenceintervalhasanasymptoticconﬁdencelevelasspeciﬁedbyconf.level,andisappropriatetothespeciﬁedalternativehypothesis.null.valuethevalueofpifspeciﬁedbythenull,orNULLotherwise.alternativeacharacterstringdescribingthealternative.methodacharacterstringindicatingthemethodused,andwhetherYates’continuitycorrec-tionwasapplied.data.nameacharacterstringgivingthenamesofthedata.Examplesheads<-rbinom(1,size=100,pr=.5)prop.test(heads,100)#continuitycorrectionTRUEbydefaultprop.test(heads,100,correct=FALSE)##DatafromFleiss(1981),p.139.##H0:Thenullhypothesisisthatthefourpopulationsfromwhich##thepatientsweredrawnhavethesametrueproportionofsmokers.##A:Thealternativeisthatthisproportionisdifferentinat##leastoneofthepopulations.smokers<-c(83,90,129,70)patients<-c(86,93,136,82)prop.test(smokers,patients)A.1.4sweep34 sweepSweepoutArraySummariessweepUsagesweep(x,MARGIN,STATS,FUN="-",...)Argumentsxanarray.MARGINagivingtheextentsofxwhichcorrespondtoSTATS.STATSthesummarystatisticwhichistobesweptout.FUNthefunctiontobeusedtocarryoutthesweep.Inthecaseofbinaryoperatorssuchas"/"etc.,thefunctionnamemustbequoted....optionalargumentstoFUN.ValueAnarraywiththesameshapeasx,butwiththesummarystatisticssweptout.SeeAlsoapplyonwhichsweepisbased;scaleforcenteringandscaling.Examplesdata(attitude)med.att<-apply(attitude,2,median)sweep(data.matrix(attitude),2,med.att)#subtractthecolumnmediansA.1.5tapplytapplyApplyaFunctionOvera“Ragged”ArraytapplyUsagetapply(X,INDEX,FUN=NULL,simplify=TRUE,...)ArgumentsXanatomicobject,typicallyavector.INDEXlistoffactors,eachofsamelengthasX.FUNthefunctiontobeapplied.Inthecaseoffunctionslike+,%*%,etc.,thefunctionnamemustbequoted.IfFUNisNULL,tapplyreturnsavectorwhichcanbeusedtosubscriptthemulti-wayarraytapplynormallyproduces.simplifyIfFALSE,tapplyalwaysreturnsanarrayofmode"list".IfTRUE(thedefault),thenifFUNalwaysreturnsascalar,tapplyreturnsanarraywiththemodeofthescalar....optionalargumentstoFUN.35 ValueWhenFUNispresent,tapplycallsFUNforeachcellthathasanydatainit.IfFUNreturnsasingleatomicvalueforeachcell(e.g.,functionsmeanorvar)andwhensimplifyistrue,tapplyreturnsamulti-wayarraycontainingthevalues.ThearrayhasthesamenumberofdimensionsasINDEXhascomponents;thenumberoflevelsinadimensionisthenumberoflevels(nlevels(.))inthecorrespondingcomponentofINDEX.NotethatcontrarytoS,simplify=TRUEalwaysreturnsanarray,possibly1-dimensional.IfFUNdoesnotreturnasingleatomicvalue,tapplyreturnsanarrayofmodelistwhosecomponentsarethevaluesoftheindividualcallstoFUN,i.e.,theresultisalistwithadimattribute.SeeAlsotheconveniencefunctionaggregate(usingtapply);apply,lapplywithitsversionsapply.Examplesgroups<-as.factor(rbinom(32,n=5,p=.4))tapply(groups,groups,length)#-isalmostthesameastable(groups)data(warpbreaks)##contingencytablefromdata.frame:arraywithnameddimnamestapply(warpbreaks$breaks,warpbreaks[,-1],sum)tapply(warpbreaks$breaks,warpbreaks[,3,drop=F],sum)n<-17;fac<-factor(rep(1:3,len=n),levels=1:5)table(fac)tapply(1:n,fac,sum)tapply(1:n,fac,sum,simplify=FALSE)tapply(1:n,fac,range)tapply(1:n,fac,quantile)ind<-list(c(1,2,2),c("A","A","B"))table(ind)tapply(1:3,ind)#->thesplitvectortapply(1:3,ind,sum)36

当前文档最多预览五页，下载文档查看全文

侵权申诉



1 1 2 3 4 5 / 38



此文档下载收益归作者所有

当前文档最多预览五页，下载文档查看全文

温馨提示：
1. 部分包含数学公式或PPT动画的文件，查看预览时可能会显示错乱或异常，文件下载后无此问题，请放心下载。
2. 本文档由用户上传，版权归属用户，天天文库负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容，确认文档内容符合您的需求后进行下载，若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误，付费完成后未能成功下载的用户请联系客服处理。

大家都在看

近期热门

An introduction to categorical data analysis Using R.pdf

An introduction to categorical data analysis Using R.pdf

最近更新

大家都在看

相关文章

相关标签