creating your mysql database practical design tips and techniques

creating your mysql database practical design tips and techniques

ID:14825412

大小:1.93 MB

页数:105页

时间:2018-07-30

上传者:U-14522
creating your mysql database practical design tips and techniques_第1页
creating your mysql database practical design tips and techniques_第2页
creating your mysql database practical design tips and techniques_第3页
creating your mysql database practical design tips and techniques_第4页
creating your mysql database practical design tips and techniques_第5页
资源描述:

《creating your mysql database practical design tips and techniques》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库

CreatingyourCreatingyourMySQLDatabase:MySQLPracticalDesignTipsandTechniquesDatabaseThepopularityofMySQLandphpMyAdminhasbroughtmanynon-ITspecialiststothefieldofdatabasedesign,usuallywithaviewtobuildingadynamicwebsitewithaMySQLbackend.Mostuserswouldbeinterestedmainlyindevelopingafunctionalwebsite,but:PracticalDesignwouldhavelittleinterestinlearningaboutgoodpracticesindesigningtheirMySQLdatabases.OnereasonisthatMySQLdesignisseenasanadvancedandcomplextopicthatrequiresalotoftime,whichmostpeoplewouldnotbeabletoaffordorjustwouldnotcaretoinvest.Thisbookattemptstoovercomethisbarrier,whichisbothperceptionalandreal,bypositioningitselfasafastandeasywaytolearnthemostimportantaspectsofMySQLdatabasedesign.TipsandWhatyouwilllearnfromthisbookTechniques•AskinguserstherightquestionswhencollectingrelevantdataforthesystemyouarebuildingFromTechnologiestoSolutions•Detectingbadstructures•Sounddatanamingtechniques,bothfortableandcolumnnames•Modelingdatawithfuturegrowthinmind•ImplementingsecuritypolicieswithdataprivilegesandviewsCreatingyour•Tuningthestructureforperformance•Producingsystemdocumentation(datadictionary,relationalschema)•TestingthemodelwithappropriateSQLqueriesMySQLDatabaseWhothisbookiswrittenforMarcDelislePracticalDesignTipsandTechniquesThisbookisfornewwebdevelopersandMySQLdatabaseadministratorswhowanttolearnhowtobuildbetterdatastructures.AbasicunderstandingofMySQLandSQLisassumed.Ashortguideforeveryoneonhowtostructuretheirdataand$19.99USPacktPublishingsetuptheirMySQLdatabasetablesefficientlyandeasily£12.99UKBirmingham-Mumbai€16.99EUwww.packtpub.comPricesdonotincludelocalsalestaxorVATwhereapplicableMarcDelisle CreatingyourMySQLDatabase:PracticalDesignTipsandTechniquesAshortguideforeveryoneonhowtostructuretheirdataandsetuptheirMySQLdatabasetablesefficientlyandeasilyMarcDelisleBIRMINGHAM-MUMBAI CreatingyourMySQLDatabase:PracticalDesignTipsandTechniquesCopyright©2006PacktPublishingAllrightsreserved.Nopartofthisbookmaybereproduced,storedinaretrievalsystem,ortransmittedinanyformorbyanymeans,withoutthepriorwrittenpermissionofthepublisher,exceptinthecaseofbriefquotationsembeddedincriticalarticlesorreviews.Everyefforthasbeenmadeinthepreparationofthisbooktoensuretheaccuracyoftheinformationpresented.However,theinformationcontainedinthisbookissoldwithoutwarranty,eitherexpressorimplied.Neithertheauthor,PacktPublishing,noritsdealersordistributorswillbeheldliableforanydamagescausedorallegedtobecauseddirectlyorindirectlybythisbook.PacktPublishinghasendeavoredtoprovidetrademarkinformationaboutallthecompaniesandproductsmentionedinthisbookbytheappropriateuseofcapitals.However,PacktPublishingcannotguaranteetheaccuracyofthisinformation.Firstpublished:November2006ProductionReference:1141106PublishedbyPacktPublishingLtd.32LincolnRoadOltonBirmingham,B276PA,UK.ISBN1-904811-30-2www.packtpub.comCoverImagebywww.visionwt.com CreditsAuthorProjectManagerMarcDelislePatriciaWeirReviewerIndexerRudyLimebackBhushanPangaonkarDevelopmentEditorProofreaderLouayFatoohiMartinBrooksAssistantDevelopmentEditorLayoutsandIllustrationsNikhilBangeraShantanuZagadeTechnicalEditorCoverDesignerMithilKulkarniShantanuZagadeEditorialManagerDipaliChittar AbouttheAuthorMarcDelisleisamemberoftheMySQLDevelopersGuild,whichregroupscommunitydevelopers—becauseofhisinvolvementwithphpMyAdmin.HestartedtocontributetothispopularMySQLwebinterfaceinDecember1998,whenhemadethefirstmulti-languageversion.HehasbeenactivelyinvolvedwiththephpMyAdminprojectsinceMay2001asadeveloperandprojectadministrator.Hehasworkedsince1980atCollègedeSherbrooke,Québec,Canada,asanapplicationprogrammerandnetworkmanager.Hehasalsobeenteachingnetworking,security,Linuxservers,andPHP/MySQLapplicationdevelopment.IwouldliketothankthewholePacktteamfortheirsupport,especiallyLouayFatoohiandNikhilBangera;theiradvicehelpedshapingthisbook.MythanksalsogotoRudyLimebackforhisinsight.ThedevelopersoftheMySQLsoftwarehaveearnedmyrespect;maytheyfindheremywarmgratitudefortheirexcellentproduct.Ihopethatthisbookwillassistreadersintobuildingeffectivedatastructures.ToCarole,André,Corinne,Annie,andGuillaume,withallmylove. AbouttheReviewerRudyLimebackisanSQLConsultantwithcloseto20yearsofexperienceusingSQLinonedatabasesystemoranother.HeislocatedinToronto,Canadabut,thankstothemiraclethatistheInternet,consultsforclientsalloverthewideworld.MoreinformationonSQLandWebdevelopmentcanbefoundonRudy'swebsite,http://www.r937.com/. TableofContentsPreface1Chapter1:IntroducingMySQLDesign5MySQL'sPopularityandImpact5TheNeedforMySQLDesign6"WhatdoIdoNext?"6DataDesignSteps6DataasaResource7ButthisismyData!7DataModeling8OverviewoftheRelationalModel9Rule#110Rule#210SimplifiedDesignTechnique10CaseStudy11OurCarDealer11TheSystem'sGoals12TheTaleoftheTooWideTable12Summary16Chapter2:DataCollecting17SystemBoundariesIdentification17ModularDevelopment18ModelFlexibility19DocumentGathering19GeneralReading19Forms20ExistingComputerizedSystems20Interviews20FindingtheRightUsers21 TableofContentsPerceptions21AskingtheRightQuestions21ExistingInformationSystems21ChronologicalEvents22SourcesandDestinations22Urgency22AvoidFocusingonReportsandScreens22DataCollectedforourCaseStudy22FromtheGeneralManager23FromtheSalesperson23FromtheStoreAssistant24OtherNotes25Summary25Chapter3:DataNaming27DataCleaning27SubdividingDataElements28DataElementsContainingFormattingCharacters29DatathatareResults29DataasaColumn'sorTable'sName30PlanningforChanges32PitfallsoftheFreeFieldsTechnique33NamingRecommendations34Designer'sCreativity34Abbreviations34ClarityversusLength:anArt35Suffixing35ThePluralForm35NamingConsistency36MySQL'sPossibilitiesversusPortability36TableNameintoaColumnName36Summary37Chapter4:DataGrouping39InitialListofTables39RulesforTableLayout40PrimaryKeysandTableNames40DataRedundancyandDependency41CompositeKeys42ImprovingtheStructure44ScalabilityoverTime44EmptyColumns45AvoidingENUMandSET46[ii] TableofContentsMultilingualPlanning48ValidatingtheStructure48Summary49Chapter5:DataStructureTuning51DataAccessPolicies51Responsibility51SecurityandPrivileges53Views53StorageEngines54ForeignKeyConstraints55Performance58Indexes58HelpingtheQueryOptimizer:AnalyzeTable60AccessingReplicationSlaveServers60SpeedandDataTypes61TableSizeReduction62In-ColumnDataEncoding62CaseStudy'sFinalStructure63Vehicle65Person68Sale69Othertables72Summary74Chapter6:SupplementalCaseStudy75ResultsfromtheDocumentGatheringPhase75PreliminaryListofDataElements80TablesandSampleValues80CodeTables81ThemedTables82Composite-KeyTables85AirlineSystemDataSchema87SampleQueries87InsertingSampleValues88BoardingPass88PassengerList88AllPersonsonaFlight89Summary90Index91[iii] PrefaceMySQL,launchedin1995,hasbecomethemostpopularopensourcedatabasesystem.ThepopularityofMySQLandphpMyAdminhasallowedmanynon-ITspecialiststobuilddynamicwebsiteswithaMySQLbackend.ThisbookisashortbutcompleteguideshowingbeginnershowtodesigngooddatastructuresforMySQL.ItteacheshowtoplanthedatastructureandhowtoimplementitphysicallyusingMySQL'smodel.WhatThisBookCoversChapter1introducestheconceptofMySQL,anddiscussesMySQL'sgrowingpopularityanditsimpactasapowerfultool.ThischaptergivesusabriefoverviewoftherelationalmodelsandCodd'srules,whicharerequiredfordesigningpurposes.Abriefintroductiontoourcasestudy—"cardealer"isprovidedattheend.Chapter2showshowtodealwiththerawdatainformationthatcomesfromtheusersorothersources,andthetechniquesthatcanhelpusbuildacomprehensivedatacollection.Also,thischaptercoverstheexactlimitsoftheanalyzedsystem,howoneshouldgatherdocuments,andinterviewactivitiesforourcasestudy.Chapter3emphasisesontransformingthedataelementsgatheredinthecollectionprocessintoacohesivesetofcolumnnames.Theconceptofdatanamingisalsodiscussedinthischapter.Chapter4providesthetechniqueofgroupingcolumnnamesintotables.Rulesfortablelayout,theconceptssuchasprimarykey,uniquekey,dataredundancy,anddatadependencyarecoveredinthischapter.Chapter5presentsvarioustechniquesforimprovingourdatastructureintermsofsecurity,performance,anddocumentation.Thefinaldatastructureforthecardealer'scasestudyisprovidedattheend. PrefaceChapter6coversasupplementalcasestudyaboutanairlinesystem.Thiscasestudyinvolvesvariousstepssuchasgatheringdocuments,preparingpreliminarylistofdataelements,preparingalistoftables,samplevalues,andqueriesfortheairlinesystem.WhatYouNeedforThisBookBasicknowledgeofSQLisrequired.EmphasisismadeonthephpMyAdminweb-basedinterfaceforreproducingtheexamples,althoughthe"mysql"command-linetoolcanbeused.NoknowledgeofMySQLserveradministrationoranyspecificoperatingsystemisrequired.ConventionsInthisbook,youwillfindanumberofstylesoftextthatdistinguishbetweendifferentkindsofinformation.Herearesomeexamplesofthesestyles,andanexplanationoftheirmeaning.Therearethreestylesforcode.Codewordsintextareshownasfollows:"InthisInthiscase,wecanaddemployeeinformation,theemployeecodetothecar_eventtable".Ablockofcodewillbesetasfollows:CREATETABLE`event`(`code`int(11)NOTNULL,`description`char(40)NOTNULL,PRIMARYKEY(`code`))ENGINE=MyISAMDEFAULTCHARSET=latin1;INSERTINTO`event`VALUES(1,'washed');Whenwewishtodrawyourattentiontoaparticularpartofacodeblock,therelevantlinesoritemswillbemadebold:CREATETABLE`event`(`code`int(11)NOTNULL,`description`char(40)NOTNULL,PRIMARYKEY(`code`))ENGINE=MyISAMDEFAULTCHARSET=latin1;INSERTINTO`event`VALUES(1,'washed');[2] PrefaceNewtermsandimportantwordsareintroducedinabold-typefont.Wordsthatyouseeonthescreen,inmenus,ordialogboxesforexample,appearinourtextlikethis:"Itbecomesimpossibletolinkthis"column"(forexamplethespecialpaintcolor)toalookuptable".Warningsorimportantnotesappearinaboxlikethis.Tipsandtricksappearlikethis.ReaderFeedbackFeedbackfromourreadersisalwayswelcome.Letusknowwhatyouthinkaboutthisbook,whatyoulikedormayhavedisliked.Readerfeedbackisimportantforustodeveloptitlesthatyoureallygetthemostoutof.Tosendusgeneralfeedback,simplydropanemailtofeedback@packtpub.com,makingsuretomentionthebooktitleinthesubjectofyourmessage.Ifthereisabookthatyouneedandwouldliketoseeuspublish,pleasesendusanoteintheSUGGESTATITLEformonwww.packtpub.comoremailsuggest@packtpub.com.Ifthereisatopicthatyouhaveexpertiseinandyouareinterestedineitherwritingorcontributingtoabook,seeourauthorguideonwww.packtpub.com/authors.CustomerSupportNowthatyouaretheproudownerofaPacktbook,wehaveanumberofthingstohelpyoutogetthemostfromyourpurchase.DownloadingtheExampleCodefortheBookVisithttp://www.packtpub.com/support,andselectthisbookfromthelistoftitlestodownloadanyexamplecodeorextraresourcesforthisbook.Thefilesavailablefordownloadwillthenbedisplayed.Thedownloadablefilescontaininstructionsonhowtousethem.[3] PrefaceErrataAlthoughwehavetakeneverycaretoensuretheaccuracyofourcontents,mistakesdohappen.Ifyoufindamistakeinoneofourbooks—maybeamistakeintextorcode—wewouldbegratefulifyouwouldreportthistous.Bydoingthisyoucansaveotherreadersfromfrustration,andhelptoimprovesubsequentversionsofthisbook.Ifyoufindanyerrata,reportthembyvisitinghttp://www.packtpub.com/support,selectingyourbook,clickingontheSubmitErratalink,andenteringthedetailsofyourerrata.Onceyourerratahavebeenverified,yoursubmissionwillbeacceptedandtheerrataaddedtothelistofexistingerrata.Theexistingerratacanbeviewedbyselectingyourtitlefromhttp://www.packtpub.com/support.QuestionsYoucancontactusatquestions@packtpub.comifyouarehavingaproblemwithsomeaspectofthebook,andwewilldoourbesttoaddressit.[4] IntroducingMySQLDesignDatadesignisanessentialpartoftheapplicationdevelopmentcycle.Byanalogy,buildinganapplicationislikebuildingahouse.Havingtherighttoolsisimportant,butweneedasolidfoundation:thedatastructure.However,producingagooddatastructurecanbeadauntingchallenge;thequestforaperfectdatastructurecanleadustonewterritorieswheremanymethodsareavailable.Whichoneisthebest?Howcanwekeepourfocusonthegoaltoachieve,withoutlosingourtime?DatadesignforMySQLdatabasesisbothascienceandanart,andtheremustbeagoodbalancebetweenthescientificandtheempiricaspectsofthemethod.Thescientificaspectreferstoinformationtechnology(IT)principles,whereastheempiricfacetismostlybasedonintuitionsandexperience.ThisbookisprimarilyorientedtowardsMySQLdatabases.ItteacheshowtoplanthedatastructureandhowtoimplementitphysicallyusingMySQL'smodel.Theplanningpartissometimesreferredtoaslogicaldesign,butitispreferabletoviewthelogical/physicalprocessasawhole.MySQL'sPopularityandImpactMySQL(www.mysql.com),launchedin1995,hasbecomethemostpopularopensourcedatabasesystem.VirtuallyallwebprovidersincludeMySQLaspartoftheirhostingplan,oftenontheubiquitousLAMP(Linux,Apache,MySQL,PHP)platform.AnotherrootcauseofMySQL'spopularityhasbeentheongoingsuccessofphpMyAdmin(www.phpmyadmin.net),awell-establishedMySQLweb-basedinterface.ThereforemanywebsitesuseMySQLastheirback-enddatarepository. IntroducingMySQLDesignTheNeedforMySQLDesignOverall,MySQL'spopularityhasattractedmanywebdevelopers,someofthemhavingnopriorITexperience.Whenfacedwiththetaskoftransformingastaticwebsiteintoadynamic/transactionalone,orintegratingcorporatedataintothesite,developersaresometimesinclinedtoimproviseadatastructure.Thisstructure(orlackofstructure)mayworkforacertaintimebutlaterfailsbecauseoflackofdepth.Maybethesysteminitiallyworksbecauseitstartedsmall,withonlyafewfunctionsplannedandimplemented,butfallsapartwhenusersaskmoreofit.Apoorlydesigneddatastructurecanonlybepatchedtoacertainextent.Itcanalsohavescalingissues,whentheinitialtestinghasbeendonewithonlyafewrowsofdata.Theapparentfacilityofusingthetoolsmayhidethefactthatdatabasedesigndependsuponessentialprinciples.Eludingthemcanrenderanapplicationcostlytomaintain,becausecorrectingdatastructuralerrorsafterapplicationcodinghasbegunistimeconsuming."WhatdoIdoNext?"HereisanexampleoftheimpactofMySQLintheranksofnon-ITpeople.IoncesawthisquestioninaphpMyAdmindiscussionforum–Iamcitingitfrommemory:"I'veinstalledMySQLandphpMyAdmin,nowIneeddirections:whatdoIdonext?"Ianswered"Maybeyoucouldcreateatable,andtheninsertsomedataintoit.Nextyoucouldbrowseforyourdata."Clearly,thosetoolswereperceivedasinterestingbythisperson,butIcanonlywonderwhatkindoftablestructurecameintoexistenceafterthisforumconversation.DataDesignStepsWecanthinkofdatadesignasasequenceofstepswhosegoalistoproducethephysicalMySQLdatabases,tables,andcolumnsnecessarytosupportanapplication.[6] Chapter1Startingwiththeoutershell,wefirstneedtolearnaboutourdatabycollectingit.Wethenstarttoorganizethesedataelementsbynamingthemappropriately.Thisisfollowedbyregroupingthedataelementsintotables,takingintoaccounttheneededkeys.Whereasthepreviousstepscouldhavebeendoneonlyonpaper,thefinalstepistoimplementthemodelwithinMySQL'sstructure.Allthesestepsarecoveredindistinctchaptersofthisbook.DataasaResourceBeforeexaminingthevarioustechniquesavailablefordesign,let'sthinkabouttheconceptofdataitself.Organizationsandenterprisesusemanyassets,forexamplebuildings,furniture,brains,butperhapsthemostvaluableassetisinformationordata.Weremarkthatdatadocumentstheenterprise'sprocedures,andbindspeopleintoanongoingexchangeofinformation,calledinformationflow.Computershelptoformalizethisdatabutwehavetorememberthatitexistsbyitself.ButthisismyData!Whenbuildingdatadesigns,wehavetomeetusersandunderstandtheenterprise'sdataflow.Inanidealworld,everydepartment,includingtheITdepartment,andeveryuserwouldcollaborateinordertohelpdatafloweasilybetweendepartments.However,fromtimetotime,onecanwitnesstwoattitudesthatimpedethenormaldataflowinenterprises.ThefirstoneisthatsomeITdepartments,having[7] IntroducingMySQLDesigntheresponsibilityforthecomputerswheredataresides,cometothinkthatthedataistheirs.Thishastheeffectofkeepingacertainlevelofsecrecythathidesdataandcanblockthedatadesignprocess.Thesecondoneisavariationofthefirstone,thistimecausedbyauser–dataoriginatesfromthisuserandhehasatendencynottoshareit.Asanexampleofthislatterattitude,let'sconsideraccountingdata.BeforethePCera,accountingsystemsexistedinsidemainframesorminicomputers,andtheITdepartmentmanagedalldataincludingaccountingdata.Sincetheadventofmicrocomputersandspreadsheetapplications,anaccountingclerkcanmanageagreatdealofdata,producinghigh-qualityreportsaboutit.However,thisdataoftenresidesonhiscomputer;heentersit,heproducesthereport,andhegetstheaccoladesforitfromhisboss.Sothedatabelongstotheaccountingclerk,right?Thiswayofthinkingimpedesdataflowbetweenindividualsanddepartmentsandhasatendencyofleadingtoredundant,disjointdatathroughouttheorganization.Afterthedatadesignprocess,bridgesarebuiltbetweentheseisolateddataislandscreatedbyusersordepartmentssothatthedatacanbenefitthewholeenterprise.Itmayalsohappenthatfewerislandsexistandredundantdataiseliminated.DataModelingDataisnormallyorganizedintoaninformationsystem.Thissystemcanbecomparedtosomethingassimpleasaloose-sheetbinder,howeverthisbookdescribesthedatadesignprocessinthecontextofcomputer-basedinformationsystems,ordatabases.Moreover,databasesfollowadesignmodel,andwewillusethemostpopularone–therelationalmodel.[8] Chapter1Thecompletedatacollectionofanenterpriseislargerthanwhatourmodelwillencompass.Wewillbuildamodelthatrepresentsonlyasubsetofthedataspectrum.Thequestioniswhichsubset?We'llseeinChapter2thatwemustsetboundariestotheanalyzedsystem'sdatascope.Tobuildinformationsystemsthatlast,datamustbetamedandmoldedtocorrectlyrepresentreality.Correctlyheremeans:•Followtheneedsoftheorganization,includingthesystem'sboundaries•Conformtothechosendatadesignmodel(here,therelationalone)•PossessahighdegreeofadaptabilitytoadjustitselftothechangingenvironmentOverviewoftheRelationalModelWeowetoDr.EdgarF.Coddtheconceptoftherelationalmodel,fromhis1970paperARelationalModelofDataforLargeSharedDataBanks(http://www.acm.org/classics/nov95/toc.html).Dr.Coddlaterexplainedhismodelbydefiningasetofrules–theso-calledCodd'sTwelverules(http://en.wikipedia.org/wiki/Codd%27s_12_rules).Anidealdatabasemanagementsystem(DBMS)wouldimplementallthoserules,butfewifanydo.Butthisisnotaprobleminpracticesincethebenefitsoftherelationalmodelareachievedeveninproductsthatdonotapplyalltherules.WeareperfectlycapableofbuildinganefficientrelationaldatadesignwithcurrentlyavailabledatabaseproductslikeMySQL.[9] IntroducingMySQLDesignWhendealingwithdatadesign,Ibelievethatthemostimportantrulesarenumber1andnumber2.HereisasummaryofthesetwoCodd'srules.Rule#1Thisrulestatesthatdataiscontainedintables.Atablelogicallyregroupsinformationaboutacertainsubject,forexample,cars.Thetabularformat–rowsandcolumnsistheimportantideahere.Arowdescribesinformationaboutasingleitem,forexample,aspecificcar,whereasacolumndescribesasinglecharacteristic(orattribute)ofeachitem,forexample,itscolor.WewillseeinChapter3thatthedecompositionofdataintowell-adjustedcolumnsisimportanttohaveaflexibleandusefulstructure.Theintersectionofarowandacolumncontainsthevalueofaspecificattributeforasingleitem.Wesometimesrefertothisintersectionasacellcontainingourdata–thisisthesameideaasinaspreadsheet.Rule#2Dataisnotretrievedorreferencedbyphysicallocation–findthethirdrecordinthisfile.Instead,datamustbefetchedbyreferencingatable,auniquekey–theprimarykey–andoneormanycolumnnames.Forexample,withthecarstable,weusethecarserialnumbertoretrievethiscar'scolor.ThisrulewillbestudiedinChapter4,wherewedescribedatagroupingandtheconceptofchoosingkeys.Properkeychoosingisofutmostimportance.SimplifiedDesignTechniqueManyyearsago,Istartedtoelaboratedatastructuresusingtherelationalmodel.Iwasusingamethodthatcouldbesummarizedbythissentence:"determinewherethedatafitsthebestinthestructure".ThenIlearnedaboutthedesigntechniquesthatweretaughttoITspecialistsandevolvedfromtherelationalmodel.Thetechnique,whichisfrequentlytaughtconsistsofbuildinganentity-relationshipdiagram.Inthiskindofdiagram,werepresentnouns,forexample,acar,acustomer,usingentities,andtherelationshipsbetweenthemareexpressedusingverbs.Anexampleofrelationshipbindingtwoentitiesis"acustomerbuysacar".Whenthediagramisdone,itmustbesomewhattransformedintoamodelconsistingoftablesandcolumns,usingatechniquecallednormalizationthatusesmanystepstorefinethemodelintoaneffectivedatastructure.Thesetechniquesproducereports,diagrams,andeventuallyatheoreticaldatadesignthatcanbeimplementedphysicallyinaDBMS.[10] Chapter1WhenIbecamefamiliarwiththosetraditionaltechniques,Ithoughtthatformeatleasttheywerealossoftime.Thosemethodsteachawaybuttheultimategoal–aworkingrelationaldatabaseandassociateddocumentationcanbeachievedmoredirectly.Moreover,thosetechniquessufferaproblem:theycannotbeappliedblindfoldedandmechanically.Thedeveloperalwayshastothinkaboutdatanaming,datagrouping,andchoosingkeyswhiletryingtobalanceusers'needsandconstraintsimposedby:•thehardware•thechosendatabasemanagementsystem•plannedgrowth•time•budgetIrealizedthatthetraditionaltechniquesaretaughteverywhere,andIrespecttheteacherswhoteachthem.Butbelieveme,whenit'stimetodeliveranapplicationnotwithstandingtheinterfaceitself,it'simportanttoavoidlosingtimetointermediateby-productsandgostraightforwardtoaworkingprototype.Usingamoredirectmethodduringthedatadesignphasefreesmoretimetorefinetheinterface,tocatchunforeseenneedsandaddressthem.Thisbook'sgoalistoteachtheminimumprinciplesonehastoapplyinordertobuildaneffectivedatastructure.CaseStudyThevariousstepsofdatadesigncanbeexplainedinaverypracticalwaybyusingtwocasestudies.Acasestudyisthebestwayofexplainingideasthatcansomewhatbecometooabstractwithoutrealexamples.Chapters1through5arebasedonasinglecasestudy:"Cardealership".Chapter6consistsofanothercasestudythatrecapitulatesallthenotionsseeninthepreviouschapters.OurCarDealerSupposewe'vebeencontactedbyacardealerwhowantstocomputerizepartsofhisbusiness.Let'sdescribealittlebitaboutthisbusiness.InChapter2,wewillexaminethedatacollectingphaseforoursystemmoreformally.Thiscardealeroperatesatasingleaddress.Theyemployninesalespersonswhodutifullywelcomepotentialcustomersandshowthemthecarmodelsthatareavailableonthefloor.Inaddition,twostoreassistantshandlecarmovements,andanofficeclerktakesnotesaboutcustomers'appointments.FontaxandLicornearethe[11] IntroducingMySQLDesigntwofictitiousbrandsofferedbythisdealer.Eachbrandhasanumberofmodels,forexampleMitsou,Wanderer,andGazelle.TheSystem'sGoalsWewanttokeepinformationaboutthecars'inventoryandsales.Thefollowingaresomesamplequestionsthatdemonstratethekindofinformationoursystemwillhavetodealwith:•HowmanycarsofFontaxMitsou2007dowehaveinstock?•Howmanyvisitorstest-drovetheWandererlastyear?•HowmanyWanderercarsdidwesellduringacertainperiod?•WhoisourbestsalespersonforMitsou,Wanderer,oroverallin2007?•Arebuyersmostlymenorwomen(percarmodel)?Herearethetitlesofsomereportsthatareneededbythiscardealer:•Detailedsalespermonth:salesperson,numberofcars,revenue•Yearlysalespersalesperson•Inventoryefficiency:averagedelayforcardeliverytothedealer,ortothecustomer•Visitorsreport:percentageofvisitorstryingacar;percentageofroadteststhatleadtoasale•Customersatisfactionaboutthesalesperson•ThesalescontractInadditiontothis,screenapplicationsmustbebuilttosupporttheinventoryandsalesactivities.Forexample,beingabletoconsultandupdatetheappointmentschedule;consultthecardeliveryscheduleforthenextweek.Afterthisdatamodelisbuilt,theremainingphasesoftheapplicationdevelopmentcycle,suchasscreenandreportdesign,willprovidethiscardealerwithreports,andon-lineapplicationstomanagethecarinventoryandthesalesinabetterway.TheTaleoftheTooWideTableThisbookfocusesonrepresentingdatainMySQL.ThecontainersoftablesinMySQL,andotherproductsarethedatabases.Itisquitepossibletohavejustonetableinadatabaseandthusavoidfullyapplyingtherelationalmodelconceptinwhichtablesarerelatedtoeachotherthroughcommonvalues;howeverwewillusethemodelinitsnormalway:havingmanytablesandcreatingrelationsbetweenthem.[12] Chapter1Thissectiondescribesanexampleofdatacrammedintoonehugetable,alsocalledatoowidetablebecauseitisformedwithtoomanycolumns.Thistoowidetableisfundamentallynon-relational.Sometimesthedatastructureneedstobereviewedorevaluated,asitmightbebasedonpoordecisionsintermsofdatanamingconventions,keychoosing,andthenumberoftables.Probablythemostcommonproblemisthatthewholedataisputintoonebig,widetable.Thereasonforthiscommonstructure(orlackofstructure)isthatmanydevelopersthinkintermsoftheresultsorevenoftheprintedresults.Maybetheyknowhowtobuildaspreadsheetandtrytoapplyspreadsheetprinciplestodatabases.Let'sassumethatthemaingoalofbuildingadatabaseistoproducethissalesreport,whichshowshowmanycarsweresoldineachmonth,byeachsalesperson,describingthebrandname,thecarmodelnumber,andthename.SalespersonPeriodBrandNameCarmodelCarmodelnameQuantitynumberandyearsoldMurray,Dan2006-01Fontax1A8Mitsou20073Murray,Dan2006-01Fontax2X12Wanderer20067Murray,Dan2006-02Fontax1A8Mitsou20074Smith,Peter2006-01Fontax1A8Mitsou20071Smith,Peter2006-01LicorneLKCGazelle20071Smith,Peter2006-02LicorneLKCGazelle20076Withoutthinkingmuchabouttheimplicationsofthisstructure,wecouldbuildjustonetable,sales:salespersonbrandmodel_numbermodel_name_yearqty_2006_01qty_2006_02Murray,DanFontax1A8Mitsou200734Murray,DanFontax2X12Wanderer20067Smith,PeterFontax1A8Mitsou20071Smith,PeterLicorneLKCGazelle200716Atfirstsight,wehavetabularizedalltheinformationthatisneededforthereport.[13] IntroducingMySQLDesignThebook'sexamplescanbereproducedusingthemysqlcommand-lineutility,orphpMyAdmin,amoreintuitivewebinterface.YoucanrefertoMasteringphpMyAdmin2.8forEffectiveMySQLManagementbookfromPacktPublishing(ISBN1-904811-60-6).InphpMyAdmin,theexactcommandsmaybetypedinusingtheSQLQueryWindow,orwecanbenefitfromthemenusandgraphicaldialogs.Bothwayswillbeshownthroughoutthebook.Hereisthestatementwewouldusetocreatethesalestablewiththemysqlcommand-lineutility:CREATETABLEsales(salespersonchar(40)NOTNULL,brandchar(40)NOTNULL,model_numberchar(40)NOTNULL,model_name_yearchar(40)NOTNULL,qty_2006_01int(11)NOTNULL,qty_2006_02int(11)NOTNULL)ENGINE=MyISAMDEFAULTCHARSET=latin1;Inthepreviousstatement,whilechar(40)meansacolumnwith40characters,int(11)meansanintegerwithadisplaywidthof11inMySQL.UsingthephpMyAdminwebinterfaceinstead,wewouldobtain:[14] Chapter1Herewehaveenteredsampledataintooursalestable:INSERTINTOsalesVALUES('Murray,Dan','Fontax','1A8','Mitsou2007',3,4);INSERTINTOsalesVALUES('Murray,Dan','Fontax','2X12','Wanderer2006',7,0);INSERTINTOsalesVALUES('Smith,Peter','Licorne','LKC','Gazelle2007',1,6);INSERTINTOsalesVALUES('Smith,Peter','Fontax','1A8','Mitsou2007',1,0);Howeverthisstructurehasmanymaintenanceproblems.Forinstance,wheredowestorethefiguresforMarch2006?Todiscoversomeoftheotherproblems,let'sexaminesampleSQLstatementswecoulduseonthistabletoqueryaboutspecificquestions,followedbytheresultsofthosestatements:/*displaysthemaximumnumberofcarsofasinglemodelsoldbyeachvendorinJanuary2006*/SELECTsalesperson,max(qty_2006_01)FROMsalesGROUPBYsalesperson/*findstheaveragenumberofcarssoldbyoursalesforcetakenasawhole,inFebruary2006*/SELECTavg(qty_2006_02)FROMsalesWHEREqty_2006_02>0[15] IntroducingMySQLDesign/*findsforwhichmodelmorethanthreecarsweresoldinJanuary*/SELECTmodel_name_year,SUM(qty_2006_01)FROMsalesGROUPBYmodel_name_yearHAVINGSUM(qty_2006_01)>3Wenoticethat,althoughwegottheanswerswewerelookingfor,withtheaboveSQLqueries,wewouldhavetomodifycolumnnamesinthequeriestoobtainresultsforothermonths.Also,itbecomestrickyifwewanttoknowthemonthforwhichthesaleshavesurpassedtheyearlyaverage,becausewehavetopotentiallydealwithtwelvecolumnnames.Anotherproblemwouldarisewhenattemptingtoreportfordifferentyears,ortocompareayearwithanotherone.Moreover,asituationthatcoulddemonstratethepoorstateofthisstructureistheneedforanewreport.Astructurethatisbasedtoocloselyonasinglereportinsteadofbeingbasedontheintrinsicrelationsbetweendataelementsdoesnotscalewellandfailstoaccommodatefutureneeds.Chapter4willunfoldthoseproblems.SummaryWesawthatMySQL'spopularityhasputapowerfultoolonthedesktopofmanyusers;someofthemarenotonparaboutdesigntechniques.Dataisanimportantresourceandwehavetothinkabouttheorganization'sdataasawhole.Thepowerfulrelationalmodelcanhelpusforstructuringactivities.Thisbookavoidsspecialized,academicvocabularyabouttherelationalmodel,focusinginsteadontheimportantprinciplesandtheminimumtasksneededtoproduceagoodstructure.Wethensawourmaincasestudy,andwenoticedhowit'sunfortunatelyeasytobuildwide,inefficienttables.[16] DataCollectingInordertostructuredata,onemustfirstgatherdataelementsandestablishthedomaintowhichthisdataapplies.Thischapterdealswithrawdatainformationthatcomesfromtheusersorothersources,andthetechniquesthatcanhelpustobuildacomprehensivedatacollection.Thiscollectionwillbecomeourinputforallfurtheractivitieslikedatanamingandgrouping.Tobeabletobuildadatacollection,wewillfirstidentifythelimitsofthesystem.Thiswillbefollowedbygatheringdocumentsinordertofindsignificantdataelements.Thenextstepwillbetoconductinterviewswithkeyusersinordertorefinethelistofdataelements.Allthesestepsaredescribedinthischapter.SystemBoundariesIdentificationLet'sestablishthescenario.Wehavebeencalledbyalocalcardealertosubmitaproposalaboutanewinformationsystem.Thestatedgoalistoproducereportsaboutcarsalesandtohelptrackthecarinventory.Reportsare,ofcourse,anoutputofthefuturesystem.Theideahiddenbehindreportscouldbetoimprovesales,tounderstanddeliverydelays,ortofindoutwhysomecarsdisappear.Thedatastructureitselfisprobablynotreallyimportantintheusers'opinion,butweknowthatthisstructurematterstothedeveloperswhoproducetherequiredoutput.It'simportanttofirstlookattheprojectscope,beforestartingtoworkonthedetailsofthesystem.Doestheprojectcover:•Thecompleteenterprise•Justoneadministrativearea•Multipleadministrativeareas•Onefunctionoftheenterprise DataCollectingAnorganizationalwayshasamainpurpose;itcanbesellingcars,teaching,orprovidingwebsolutions.Inadditiontothis,everyorganizationhassub-activitieslikehumanresourcemanagement,payroll,andmarketing.Theapproachtodatacollectingwillvary,dependingupontheexactareawearedealingwith.Let'ssaywelearnthatourcardealeralsooperatesarepairshop,whichhasitsowninventory,alongwithacarrentalservice.Doweincludetheseinventoriesinouranalyzingtasks?Wehavetocorrectlyunderstandtheplaceofthisnewinformationsysteminitscontext.Whenpreparingadatamodel,thebiggestchallengeisprobablytodrawaline,toclearlystatewheretostop.Thisischallengingforvariousreasons:•Ourusermighthaveonlyavagueideaofwhattheywant,ofthebenefitstheyexpectfromthenewsystem•Conflictinginterestsmightexistbetweenourfutureusers;someofthemmightwanttoprioritizeissuesinadifferentwayfromothers,maybebecausetheyareinvolvedwiththetedioustasksthatthenewsystempromisestoeliminate•Wemightbetemptedtoimproveenterprise-wideinformationflowbeyondthescopeofthisparticularprojectIt'snotaneasytasktobalanceuser-perceivedgoalswiththeneedsoftheorganizationasawhole.ModularDevelopmentItisgenerallyadmittedthatbreakingaproblemortaskintosmallerpartshelpsustofocusonmoremanageableunitsand,inthelongrun,permitsustoachieveabettersolution,andacompletesolution.Havingsmallersegmentsmeansthatdefiningeachpart'spurposeissimplerandthatthetestingprocessiseasier–asasmallersegmentcontainslessdetails.Thisiswhy,whenestablishingthesystemboundaries,weshouldthinkintermsofdevelopingbymodules.Inourcasestudy,asimplewayofdividingintomoduleswouldbethefollowing:•Module1:carsales•Module2:carinventoryDeliveringaninformationsysteminincrementalstepscanhelpreassurethecustomeraboutthefinalproduct.Definingthemodulesandascheduleaboutthemcanmotivateusersandalsothedevelopers.Withapublicizedschedule,everyoneknowswhattoexpect.Withtheideaofmodulescomestheideaofbudgetandthenotionofprioritiesfordevelopment.Dowehavetodeliverthecarsalesmodulebeforeoraftertheinventorymodule?Canthosemodulesbedoneseparately?Aretheresomeconstraintsthatmust[18] Chapter2beaddressed,likeanewreportaboutthecarsalesthattheChiefExecutiveOfficer(CEO)needsbyJune20?Anotherpointtotakeintoaccountishowthemodulesarerelated.Chancesaregoodthatsomedatawillbesharedbetweenmodules,sothedatamodelpreparedformodule1willprobablybereusedandrefinedduringmodule2developments.ModelFlexibilityAnotherpointnotdirectlyrelatedtoouruserbuttousasdevelopersis:canthedatamodelbebuilttobeflexibleandmoregeneral?Thisway,itcouldbeappliedtoothercardealers,alwayskeepinginmindcontractissuesbetweenthedeveloperandtheuser.(Whowillownthework?)Shouldthedatastructurebedevelopedwithothersalesdomainsinmind?Forinstance,thiscouldleadtoatablenamedgoodsinsteadofcars.Maybethiskindofgeneralizationcanhelp,maybenot,becausedataelementsdescriptionmustalwaysremainclear.DocumentGatheringThisstepcanbedonebeforetheinterviews.Thegoalistogatherdocumentsaboutthisorganizationandstartdesigningourquestionsfortheinterviews.Ofcourse,adatamodelforcarsaleshassomethingsincommonwithothersalessystems,butthereisaspecialcultureaboutcars.Anothersetofdocumentswillbecollectedduringtheinterviewswhilewelearnabouttheformsusedbytheinterviewees.GeneralReadingHerearesomereadingsuggestions:•Enterpriseannualreport•Corporategoalsstatement•President'sspeech•Publicitymaterial•BulletinboardIoncelearnedalotaboutinformationflowfromagrocerystore'sbulletinboardfortheemployees.Thereweresmallnotesfrommanagementtoemployeesexplaininghowtohandleclientswhopaybycheque(whichpersonalinformationmustbeobtainedfromtheclientbeforethestorecanaccepttheircheque),anddetailingthescheduleforsickemployees'replacement.Alsoexplainedontheboard,wastheproceduretouseonthecashregistertogiverewardpointstoclientswhopaywiththestore'screditcard.Thisinformationissometimesmoreusefulthananannual[19] DataCollectingreportbecauseweareseekingdetailsfromthepersonswhoareinvolvedwiththedailytasks.FormsTheforms,whichrepresentpaperworkbetweentheenterpriseandexternalpartners,orbetweeninternaldepartments,shouldbescrutinized.Theycanrevealamassiveamountofdata,eveniffurtheranalysisshowsunused,imprecise,orredundantdata.Manyorganizationssufferfromtheformdisease–atendencytousetoomanypaperatendencytousetoomanypaperorscreenformsandtoproducetoocomplexforms.Nonetheless,ifweareabletolookattheformscurrentlyusedtoconveyinformationaboutthecarinventoryorcarsales,forexample,apurchaseorderfromthecardealertothemanufacturer,wemightfindontheseformsessentialdataaboutthepurchasethatwillbeusefultocompleteourdatacollection.ExistingComputerizedSystemsThecardealerhasalreadystartedsalesoperationsanumberofyearsago.Tosupportthesesales,theywereprobablyusingsomekindofcomputerizedsystem,evenifthiscouldhavebeenonlyaspreadsheet.Thispre-existingsystemsurelycontainsinterestingdataelements.Weshouldtrytohavealookatthisexistinginformationsystem,ifoneexists,andifweareallowedto.Regardingthedatastructuringprocessitself,wecanlearnaboutsomedataelementsthatarenotseenonthepaperforms.Also,thiscanhelpwhenthetimecomestoimplementanewsystembyeasingtransitionandtraining.InterviewsThegoalforconductinginterviewsistolearnaboutthevocabularypertainingtothestudiedsystem.Thisbookisaboutdatastructures,buttheinformationgatheredduringtheinterviewscansurelyhelpinsubsequentactivitiesofthesystem'sdevelopmentlikecoding,testing,andrefinements.Interviewsareacriticalpartofthewholeprocess.Inourexample,acustomeraskedforasystemaboutcarsalesandinventorytracking.Atthispoint,manyuserscannotexplainfurtherwhattheywant.Theproblemisexactlythis:howcanI,asadeveloper,findoutwhattheywant?Aftertheinterviewphase,thingsbecomeclearersincewewillhavegathereddataelements.Moreover,oftenthecustomerwhoorderedanewsystemdoesnotgraspthedataflow'sfullpicture;itmightalsohappenthatthiscustomerwon'tbetheonewhowillworkwithallaspectsofthesystem,thosewhicharemoretargetedtowardsclericalpersons.[20] Chapter2FindingtheRightUsersThesuggestedapproachwouldbetocontactthebestpersonforthequestionsaboutthenewsystem.Sometimes,thepersoninchargeinsiststhathe/sheisthebestperson,itmightbetrue,ornot.Thiscanbecomedelicate,especiallyifwefinallymeetsomeonewhoknowsbetter,evenifthisisduringaninformalmeeting.Thinkingaboutthefollowingissuescanhelptofindthebestcandidates:•Whowantsthissystembuilt?•Whowillprofitfromit?•Whichuserswouldbemostcooperative?Evidently,thiscanleadtomeetingwithseveralpeopletoexplorethevarioussub-domains.Someofthesedomainsmightintersect,withapotentialnegativeimpact–divergingopinions,orwithapotentialpositiveimpact–validatingfactswithmorethanoneinterviewee.PerceptionsDuringtheinterviews,wewillmeetdifferentkindsofusers.Someofthesewillbeveryknowledgeableabouttheprocessesinvolvedwiththecardealer'sactivities,forexample,meetingwithapotentialcustomer,invitingthemforatestdrive,andorderingacar.Someotheruserswillonlyknowapartofthewholeprocess,theirknowledgescopeislimited.Duetothevaryingscope,wewillheardifferentperceptionsaboutthesamesubject.Forexample,talkingabouthowtoidentifyacar,wewillheardivergingopinions.Somewillwanttoidentifyacarwithitsserialnumber;otherswillwanttousetheirownin-housecarnumber.Theyallrefertothesamecarwithadifferentangle.Thesevariousopinionswillhavetobereconciledlaterwhenproceedingwiththedatanamingphase.AskingtheRightQuestionsTherearevariouswaystoconsiderwhichquestionsarerelevantandwhichwillenableustogathersignificantdataelements.ExistingInformationSystemsIsthereanexistinginformationsystem:manualorcomputerized?Whatwillhappenwiththisexistingsystem?Eitherweexportrelevantdatafromthisexistingsystemtofeedthenewone,tocompletelydoawaywiththeoldsystem,orwekeeptheexistingsystem–temporarilyorpermanently.[21] DataCollectingIfwemustkeeptheexistingsystem,we'llprobablybuildabridgebetweenthetwosystemsforexchangingdata.Inthiscase,doweneedaone-waybridgeoratwo-waybridge?ChronologicalEventsWhoordersacarfortheshowroomandwhy;howistheordermade–phone,fax,email,website;canacarintheshowroombesoldtoacustomer?SourcesandDestinationsHerewequestionaboutinformation,money,bills,goods,andservices.Forexample,whatisthesourceofacar?What'sitsdestination?Isthebuyerofacaralwaysanindividual,orcanitbeanothercompany?UrgencyThinkingaboutthecurrentwayinwhichyoudealwithinformation,whichproblemsdoyouconsiderthemosturgenttosolve?AvoidFocusingonReportsandScreensAnapproachtoocenteredonthe(perceived)needsoftheusersmayleadtogapsinthedatastructure,becauseeachuserdoesnotnecessarilyhaveanaccuratevisionofalltheirneedsoralltheneedsofotherusers.It'squiterareinanenterprisetofindsomeonewhograspsthewholedatapicture,withthecomplexinter-departmentalinteractionsthatfrequentlyoccur.Thisbiaswillshowupduringtheinterviews.Usersareusuallymorefamiliarwithitemstheycanseeorvisualizeandlessfamiliarwithconcepts.However,therearedistinctionsbetweentheuserinterface(UI)andtheunderlyingdata.UIdesignconsidersergonomicandaestheticissues,whereasdatastructuringhastofollowdifferent,non-visualrulestobeeffective.DataCollectedforourCaseStudyHereisalist,jotteddownduringtheinterviews,ofpotentialdataelementsanddetailswhichseemimportanttothecurrentinformationflow.It'sveryimportantduringthiscollectiontonote,notonlythedataelements'names–shallwesay"provisionalnames"atthispoint–butalsosamplevalues.ThebenefitofthiswillbecomeapparentinChapter3.Inthefollowingdatacollection,weincludesamplevaluesinbracketswhereappropriate.[22] Chapter2FromtheGeneralManagerOurfriendtheGeneralManagerkeepssurveysfilledbybuyersabouttheirbuyingexperienceasawhole.Thosesurveyscontainremarksaboutthesalespersonbehavior.Evidently,thisinformationisconfidential,asonlytheGeneralManagerandtheofficeclerkhaveaccesstoit.Surveyinformationincludes:•Date:(2006-01-02)•Salesperson'sname:(Harper,Paul)•Buyer'sname:(Smith,Joe)•Thepointstoevaluate:courtesy,qualityofinformationgiven,etc•Foreachpoint,themarkgivenbythebuyerfromonetoten.FromtheSalespersonThemainformpreparedbyasalespersonistheSalesContract,andthispersonsurelyhopestoprepareplentyofthese!HerearetheelementspresentontheSalesContract:•Buyer'sinformation:name,address,postalcode,phonenumber•Dealer'sinformation:name,address,postalcode,phonenumber•Salespersoninformation:name,address,postalcode,phonenumber•Quantityofvehiclesforthissale(usually1)•Cardescription:brand,model,year(FontaxMitsou2007)•Carcondition:new/used•Carserialnumber:(D34HTT987)•Carcolor:(aquamarine)color:(aquamarine)•Sellingprice:(32,500)•Insurancecompanyname:(MicMacCarInsuranceInc.)•Insurancepolicynumber:(J44-5764,buteachcompanyhasitsowncodesystemforthis)•Preparationcost:(800)•Taxamount:(2,400)•Totalprice:(35,700)•Vehiclegivinginexchange:°brand:(Licorne)°model:(Wanderer)[23] DataCollecting°year:(2006)°serialnumber:(D45TGH45738)°priceoftheexchange:(12,000)•Downpayment:(4,000)•Interestrate:(9%)•Interestamount:(6345)•Typeofcreditrate:fixed/variable•Datesoffirstandlastpayments:(2007-07-01,2011-06-01)•Numberofpayments:(48)•Financialinstitution'sinformation:name,address,postalcode,phonenumberFromtheStoreAssistantAstoreassistantassignsacarnumbertoeachvehiclethatentersthefloor.Thishelpstomanagewhichsetofkeysbelongstowhichcar,werefertophysicalkeyshere–thekeysneededtounlockandstartthecar,notthedatabasekeys.Thecarnumberdoesnotrefertothecar'sserialnumber;it'sassignedsequentiallyandusedinternallyonly.Storeassistantsalsoprepareadeliverycertificatewhichcontainsthefollowinginformation:•Buyer'sname:(JoeSmith)•Dealer'snumber:(53119)•Vehicleidnumber:(1400)•Keynumber:(81947)•Foursignaturesanddates,fromthebuyer,generalmanager,salesperson,andthestoreassistantFinally,thestoreassistantskeeparegisteraboutallcarmovements.Foreachcar,acard-indexcontains:•Idnumberofthecar:(432)•Carordered:date(2007-02-03)•Cararrived:date(2007-02-17)•Carplacedintheshowroom:date(2007-02-19)•Carwashed:date(2007-05-30)[24] Chapter2•Cargastankfilled-up:date(2007-05-30)•Cardeliveredtobuyer:date(2007-06-01)OtherNotes•Doweincludeinthemodelsomeinformationabouttheoldcarthatthecustomerexchangesfortheirnewcar?•Boundary:duringtheinterviewsitwasdecidedthat,fornow,themodelwillnotincludethedealer'scarrentalactivities,northeirrepairservice,althoughmuchoftheinformationaboutcarscouldbeappliedtothoseactivities.Thesubsequentchapterswillputorderinthenamingaspectsofthisdataandwillexplaingroupingtechniques.SummaryBuildingacomprehensivecollectionofdataelementsisessentialtothesuccessofadatastructuringactivity.However,weneedtoknowtheexactlimitsoftheanalyzedsystem.Then,bygatheringdocumentsandproceedingwithinterviewactivities,wecanrecordalistofpotentialdataelements–ourfuturecolumnnames.[25] DataNamingInthischapter,wefocusontransformingthedataelementsgatheredinthecollectionprocessintoacohesivesetofcolumnnames.Althoughthischapterhassectionsforthevariousstepsweshouldaccomplishforefficientdatanaming,thereisnospecificorderinwhichtoapplythosesteps.Infact,thewholeprocessisbrokendownintostepstoshedsomelightoneachoneinturn,buttheactualnamingprocessappliesallthosestepsatthesametime.Moreover,thedivisionbetweenthenamingandgroupingprocessesissomewhatartificial–you'llseethatsomedecisionsaboutnaminginfluencethegroupingphase,whichisthesubjectofthenextchapter.DataCleaningHavinggatheredinformationelementsfromvarioussources,somecleaningworkisappropriatetoimprovethesignificanceoftheseelements.Thewayeachintervieweenamedelementsmightbeinconsistent;moreover,thesignificanceofatermcanvaryfrompersontoperson.Thus,asynonymdetectionprocessisinorder.Sincewetooknoteofsamplevalues,nowitistimetocross-referenceourlistofelementswiththosesamplevalues.Hereisapracticalexample,usingthecar'sidnumber.Whenthedecisionismadetoorderacar–aMitsou2007–theofficeclerkopensanewfileandassignsasequentialnumberdubbedcar_idnumbertothefile,forinstance,725.Atthispoint,noconfirmationhasbeenreceivedfromanycarsupplier,sotheclerkdoesnotknowthefuturecar'sserialnumber–auniquenumberstampedontheengineandothercriticalpartsofthevehicle.Thiscar'sidnumberisreferredtoasthecar_numberbytheofficeclerk.Thestoreassistantswhoregistercarmovementsusethenamestock_number.Butusingthiscarnumberorthestocknumberisnotmeaningfulforfinancingandinsurancepurposes;thecar'sserialnumberisusedinsteadforthatpurpose. DataNamingAtthispoint,aconsensusmustbereachedbyconvincingusersabouttheimportanceofstandardterms.Itmustbecomecleartoeveryonethatthetermcar_numberisnotpreciseenoughtobeused,soitwillbereplacedbycar_internal_numberintheinthedataelementslist,probablyalsoinanyuserinterface(UI)orreport.Itcanbearguedthatcar_internal_numbershouldbereplacedbysomethingmoreappropriate;theimportantpointhereiswemergedtwosynonyms:car_numberandstock_number,andestablishedthedifferencebetweentwoelementsthatlookedandestablishedthedifferencebetweentwoelementsthatlookedsimilarbutwerenot,eliminatingasourceofconfusion.Thereforeweendupwiththefollowingelements:•Car_serial_number•Car_internal_number(formercaridnumberandstocknumber)Eventually,whendealingwithdatagrouping,anotherdecisionwillhavetobetaken:towhichnumber,serialorinternal,doweassociatethecar'sphysicalkeynumber.SubdividingDataElementsInthissection,wetrytofindoutifsomeelementsshouldbebrokenintomoresimpleones.Thereasonfordoingsoisthat,ifanelementiscomposedofmanyparts,applicationswillhavetobreakitforsortingandselectionpurposes.Thusit'sbettertobreaktheelementsrightnowatthesource.Recomposingitwillbeeasierattheapplicationlevel.BreakingtheelementsprovidesmoreclarityattheUIlevel.Therefore,atthislevelwewillavoid(asmuchaspossible)thewell-knownlast-name/first-nameinversionproblem.Asanexampleforthisproblem,let'stakethebuyer'sname.Duringtheinterview,wenoticedthatthenameisexpressedinvariouswaysontheforms:FormHowthenameisexpressedDeliverycertificateMrJoeSmithSalescontractSmith,JoeWenoticethat•Thereisasalutationelement,Mr•Theelementnameistooimprecise;wereallyhaveafirstnameandalastname•Onthesalescontract,thecommaafterourlastnameshouldreallybeexcludedfromtheelement,asit'sonlyaformattingcharacter[28] Chapter3Asaresult,wedeterminethatweshouldsub-dividethenameintothefollowingelements:•Salutation•Firstname•LastnameSometimesit'susefultosub-divideanelement,sometimesit'snot.Let'sconsiderthedateelements.Wecouldsub-divideeachoneintoyear,month,andday(threeintegers)butbydoingso,wewouldlosethedatecalculationpossibilitiesthatMySQLoffers.Amongthoseare,findingtheweekdayfromadate,ordeterminingthedatethatfallsthirtydaysafteracertaindate.Soforthedate(andtime),asinglecolumncanhandleitall,althoughattheUIlevel,separateentryfieldsshouldbedisplayedforyear,month,andday.Thisistoavoidanypossibilityofmix-upandalsobecausewecannotexpectuserstoknowaboutwhatMySQLacceptsasavaliddate.Thereisacertainlatitudeintherangeofvalidvaluesbutwecantakeitforgrantedthatusershaveunlimitedcreativity,regardinghowtoenterinvalidvalues.IfasinglefieldispresentontheUI,cleardirectionsshouldbeprovidedtohelpwithfillingthisfieldcorrectly.DataElementsContainingFormattingCharactersThelastcasewe'llexamineisthephonenumber.Inmanypartsoftheworld,thephonenumberfollowsaspecificpatternandalsousesformattingcharactersforlegibility.InNorthAmerica,wehavearegionalcode,anexchangenumber,andphonenumber,forexample,418-111-2222;anextensioncouldpossiblybeappendedtothephonenumber.However,inpracticeonlytheregionalcodeandextensionareseparatedfromtherestintodataelementsoftheirown.Moreover,peopleoftenenterformattingcharacterslike(418)111-2222andexpectthosetobeoutputback.So,astandardoutputformatmustbechosen,andthenthecorrectnumberofsub-elementswillhavetobesetintothemodeltobeabletorecreatetheexpectedoutput.DatathatareResultsEventhoughitmightseemnaturaltohaveadistinctelementforthetotal_priceofthecar,inpracticethisisnotjustified.Thereasonisthatthetotalpriceisacomputedresult.Havingthetotalpriceprintedonasalescontractconstitutesanoutput.Thus,weeliminatethisinformationinthelistofcolumnnames.Forthesamereason,wecouldomitthetaxcolumnbecauseitcanbecomputed.[29] DataNamingByremovingthetotalpricecolumn,wecouldencounterapitfall.Wehavetobesurethatwecanreconstructthistotalpricefromothersub-totalelements,nowandinthefuture.Thismightnotbepossibleforanumberofreasons:•Thetotalpriceincludesanamountlocatedinanothertable,andthistablewillchangeovertime(forexample,thetaxrate).Toavoidthisproblem,seetherecommendationsintheScalabilityoverTimesectioninChapter4.•Thistotalpricecontainsanarbitraryvalue,duetosomeexceptionalcases,forexample,wherethereisaspecialsale,andtherebatewasnotplannedinthesystem,orwhentheluckybuyeristhebrother-in-lawofthegeneralmanager!Inthiscase,adecisioncanbemade:addinganewcolumnother_rebate.DataasaColumn'sorTable'sNameNowisthetimetouncoverwhatisperhapstheleastknownofthedatanamingproblems:datahiddeninacolumn'sorevenatable'sname.WehadoneexampleofthisinChapter1.Remembertheqty_2006_1columnname.Althoughthisisacommonlyseenmistake,it'samistakenonetheless.Weclearlyhavetwoideashere,thequantityandthedate.Ofcourse,tobeabletousejusttwocolumns,someworkwillhavetobedoneregardingthekeys–thisiscoveredinChapter4.Fornow,weshouldjustuseelementslikequantityanddateinourelementslist,avoidingrepresentingdatainacolumn'sname.Tofindthoseproblematiccasesinourmodel,apossiblemethodistolookfornumbers.Columnnameslikeaddress1,address2orphone1,phone2shouldlooksuspicious.Now,havealookinChapter2atthedataelementswegotfromourstoreassistant.Canyoufindacaseofdatabeinghiddeninacolumnname?Ifyouhavedonethisexercise,youmighthavefoundmanypastparticipleshiddenintothecolumnnames,likeordered,arrived,andwashed.Thesedescribetheeventsthathappentoacar.Wecouldtrytoanticipateallpossibleeventsbutitmightproveimpossible.Whoknowswhenanewcolumncar_provided_with_big_ribbonwillbeneeded?Suchevents,iftreatedasdistinctcolumnnames,mustbeaddressedby•Achangeinthedatastructure•Achangeinthecode(UIandreports)Tostayflexibleandavoidthewide-tablesyndrome,weneedtwotables:car_eventandevent.[30] Chapter3Herearethestructureandsamplevaluesforthosetables:CREATETABLE`event`(`code`int(11)NOTNULL,`description`char(40)NOTNULL,PRIMARYKEY('code'))ENGINE=MyISAMDEFAULTCHARSET=latin1;INSERTINTO`event`VALUES(1,'washed');Theusageofbacktickshere('event'),althoughnotstandardSQL,isaMySQLextensionusedtoencloseandprotectidentifiers.Inthisspecificcase,itcouldhelpuswithMySQL5.1inwhichtheeventkeywordisscheduledtobecomepartofthelanguageforsomeanotherpurpose(CREATEEVENT).Atthetimeofwriting,betaversionMySQL5.1.11acceptsCREATETABLEevent,butitmightnotalwaysbetrue.ThefollowingimageshowssamplevaluesenteredintotheeventtablefromwithintheInsertsub-pageofphpMyAdmin:CREATETABLE`car_event`(`internal_number`int(11)NOTNULL,`moment`datetimeNOTNULL,`event_code`int(11)NOTNULL,PRIMARYKEY('internal_number'))ENGINE=MyISAMDEFAULTCHARSET=latin1;INSERTINTO`car_event`VALUES(412,'2006-05-2009:58:38',1);[31] DataNamingAgain,samplevaluesareenteredviaphpMyAdmin:Datacanalsohideinatablename.Let'sconsiderthecarandtrucktables.Theyshouldprobablybemergedintoavehicletable,sincethevehicle'scategory–truck,car,andothervalueslikeminivanisreallyanattributeofaparticularvehicle.Wecouldalsofindanothercaseforthistablenameproblem:atablenamedvehicle_1996.PlanningforChangesWhendesigningadatastructure,wehavetothinkabouthowtomanageitsgrowthandthepossibleimplicationsofthechosentechnique.Let'ssayanunplannedcarcharacteristic–theweight–hastobesupported.Thenormalwayofsolvingthisistofindthepropertableandaddacolumn.Indeed,thisisthebestsolution;however,someonehastoalterthetable'sstructure,andprobablytheUItoo.Thefreefieldstechnique,alsocalledsecond-leveldataorEAV(Entity-Attribute-Value)techniqueissometimesusedinthiscase.Tosummarizethistechnique,weusedinthiscase.Tosummarizethistechnique,weuseacolumnwhosevalueisacolumnnamebyitself.Evenifthistechniqueisshownhere,Idonotrecommendusingit,forthereasonsexplainedinthePitfallsoftheFreeFieldsTechniquesectionbelow.Thedifferencebetweenthistechniqueandourcar_eventtableisthat,forcar_event,thevariousattributescanallberelatedtoacommonsubject,whichistheevent.Onthecontrary,freefieldscanstoreanykindofdissimilardata.Thismightalsobeawaytostoredataspecifictoasingleinstanceorrowofatable.[32] Chapter3Inthefollowingexample,weusethecar_free_fieldtabletostoreunplannedinformationaboutthecarwhoseinternal_numberis412.Theweightandspecialpainthadnotbeenplanned,sotheUIgavetheuserthechancetospecifywhichinformationtheywanttokeep,andthecorrespondingvalue.WeseehereascreenshotfromphpMyAdminbutmostprobably,anotherUIwouldbepresentedtotheuser–forexamplethesalespersonwhomightnotbetrainedtoplayatthedatabaselevel.CREATETABLE`car_free_field`(`internal_number`int(11)NOTNULL,`free_name`varchar(30)NOTNULL,`free_value`varchar(30)NOTNULL,PRIMARYKEY('internal_number','free_name'))ENGINE=MyISAMDEFAULTCHARSET=latin1;INSERTINTO`car_free_field`VALUES(412,'weight','2000');INSERTINTO`car_free_field`VALUES(412,'specialpaintneeded','gold');PitfallsoftheFreeFieldsTechniqueEvenifit'stemptingtousethiskindoftableforaddedflexibilityandtoavoiduserinterfacemaintenance,thereareanumberofreasonswhyweshouldavoidusingit.•Itbecomesimpossibletolinkthis"column"(forexamplethespecialpaintneeded)toalookuptablecontainingthepossiblecolors,withaforeignkeyconstraint.[33] DataNaming•Thefree_valuefielditselfmustbedefinedwithagenericfieldtypelikeVARCHARwhosesizemustbewideenoughtoaccommodateallvaluesforallpossiblecorrespondingfree_namevalues.•Itpreventseasyvalidation(foraweight,weneedanumericvalue).•CodingtheSQLqueriesonthesefreefieldsbecomesmorecomplex–i.e.SELECTinternal_numberfromcar_free_fieldwherefree_name='weight'andfree_value>2000.NamingRecommendationsHerewetouchasubjectthatcanbecomesensitive.Establishinganamingconventionisnoteasilydone,becauseitcaninterferewiththepsychologyofthedesigners.Designer'sCreativityProgrammersanddesignersusuallythinkofthemselvesasimaginative,creativepeople;UIdesignanddatamodelaretheareasinwhichtheywanttoexpressthosequalities.Sincenamingiswriting,theywanttoputapersonalstamptothecolumnandtablenames.Thisiswhyworkingasateamfordatastructuredesignnecessitatesagooddoseofhumilityandachievesgoodresultsonlyifeveryoneisagoodteamplayer.Also,whenlookingattheworkofothersinthisarea,thereisagreattemptationtoimprovethedataelementsnames.Somedisciplineinthestandardizationhastobeappliedandalltheteammembershavetocollaborate.AbbreviationsProbablybecauseolderdatabasesystemshadsevererestrictionsabouttherepresentationofvariablesanddataelementsingeneral,thepracticeofabbreviatinghasbeentaughtformanyyearsandisfollowedbymanydatastructuredesignersandprogrammers.Iusedprogramminglanguagesthatacceptedonlytwocharactersforvariablenames–wehadtoextensivelycommentthecorrespondencebetweenthosecroppedvariablesandtheirmeaning.Nowadays,Iseenovalidreasonsforsystematicallyabbreviatingallcolumnandtablenames;afterall,whowillunderstandthemeaningofyourT1tableoryourB7field?[34] Chapter3ClarityversusLength:anArtAconsistentstyleofabbreviationsshouldbeused.Ingeneral,onlythemostmeaningfulwordsofasentenceshouldbeputintoaname,droppingprepositions,andothersmallwords.Asanexample,let'stakethepostalcode.Wecouldexpressthiselementwithdifferentcolumnnames:•the_postal_code•pstl_code•pstlcd•postal_codeIrecommendthelastoneforitssimplicity.SuffixingCarefullychosensuffixescanaddclaritytocolumnnames.Asanexample,forthedateoffirstpaymentelement,Iwouldsuggestfirst_payment_date.Infact,thelastwordofacolumnnameisoftenusedtodescribethetypeofcontent–likecustomer_no,color_code,interest_amount.ThePluralFormAnotherpointofcontroversyfortablenames:shouldweusethepluralformcarstable?Itcanbearguedthattheanswerisyesbecausethistablecontainsmanycars–inotherwords,itisaset.Nonetheless,Itendnottousethepluralformforthesimplereasonthatitaddsnothingintermsofinformation.Iknowthatatableisaset,sousingthepluralformwouldberedundant.Itcanbesaidalsothateachrowdescribesonecar.Ifweconsiderthesubjectontheangleofqueries,wecandrawdifferentconclusionsdependingonthequery.Aqueryreferringtothecartable–selectcar.color_codefromcarwherecar.id=34ismoreelegantifthepluralformisnotused,becausethemainideahereisthatweretrieveonecarwhoseidequals34.Someotherqueriesmightmakemoresensewithaplural,likeselectcount(*)fromcars.Asaconclusionforthissection,thedebateisnotover,butthemostimportantpointistochooseaformandbeconsistentthroughoutthewholesystem.[35] DataNamingNamingConsistencyWeshouldensurethatadataelementthatispresentinmorethanonetableisrepresentedeverywherebythesamecolumnname.InMySQL,acolumnnamedoesnotexistbyitself;itisalwaysinsideatable.Thisiswhy,unfortunately,wecannotpickupconsistentcolumnnamesfrom,say,apoolofstandardizedcolumnnamesandassociateitwiththetables.Instead,duringeachtable'screationweindicatetheexactcolumnnameswewantandtheirattributes.So,let'savoidusingdifferentnames–internal_numberandinternal_numwhentheyrefertothesamereality.Anexceptionforthis:ifthecolumn'snamereferstoakeyinanothertable–thestatecolumn–andwehavemorethanonecolumnreferringtoitlikestate_of_birth,`state_of_residence`.MySQL'sPossibilitiesversusPortabilityMySQLpermitstheuseofmanymorecharactersforidentifiers–database,table,andcolumnnamesthanitscompetitors.Theblankspaceisacceptedasareaccentedcharacters.Thesimpletrade-offisthatweneedtoenclosesuchspecialnameswithbackquoteslike'stateofresidence'.Thisprocuresagreatlibertyintheexpressionofdataelements,especiallyfornon-Englishdesigners,butintroducesastateofnon-portabilitybecausethoseidentifiersarenotacceptedinstandardSQL.EvensomeSQLimplementationsonlyacceptuppercasecharactersforidentifiers.Irecommendbeingveryprudentbeforedecidingtoincludesuchcharacters.EvenwhenstayingfaithfultoMySQL,therehasbeenaportabilityissuebetweenversionsearlierthan4.1whenupgradingto4.1.In4.1.x,MySQLstartedtorepresentidentifiersinternallyinUTF-8code,soarenamingoperationhadtobedonetoensurethatnoaccentedcharactersinthedatabase,table,columnandconstraintnameswerepresentbeforetheupgrade.Thistediousoperationisnotverypracticalina24/7systemavailabilitycontext.TableNameintoaColumnNameAnotherstyleIoftensee:onewouldsystematicallyaddthetablenameasaprefixtoeverycolumnname.Thustheeverycolumnname.Thusthecolumnname.Thusthecartablewouldbecomprisedofthecolumns:car_id_number,car_serial_number.Ithinkthisisredundantanditshowsitsinelegancewhenexaminingthequerieswebuild:selectcar_id_numberfromcarisnottoobad,butwhenjoiningtableswegetaquerysuchasselectcar.car_id_number,buyer.buyer_namefromcar,buyer[36] Chapter3Sinceattheapplicationlevel,themajorityofquerieswecodearemulti-tablesliketheoneusedabove,theclumsinessofusingatablenameevenabbreviatedaspartofcolumnnamesbecomesreadilyapparent.Ofcourse,thesameexceptionwesawintheNamingConsistencysectionapplies:acolumn–foreignkey–referringtoalookuptablenormallyincludesthistable'snameaspartofthecolumn'sname.Forexample,inthecar_eventtable,wehaveevent_codewhichreferstothecodecolumnintableevent.SummaryTogetaclearandunderstandabledatastructure,properdataelementsnamingisimportant.Weexaminedmanytechniquestoapplyinordertobuildconsistenttableandcolumnnames.[37] DataGroupingInthepreviouschapters,webuiltadatacollection,andstartedtocleanitbypropernaming.Wehadalreadyintroduced,inChapter1,thenotionofatable,whichlogicallyregroupsinformationaboutacertainsubject.Someofthecolumnswegatheredweregroupedintotablesduringthenamingprocess.Whiledoingso,wenoticedthattheprocessofnamecheckingwassometimesleadingustodecomposedataintomoretables,likewedidforthecar_eventandeventtables.Thegoalofthepresentchapteristoprovidefinishingtouchestoourstructure,byexaminingthetechniqueofgroupingcolumnnamesintotables.Ourdataelementswon'tbelivingOurdataelementswon'tbeliving"intheair";theywillhavetobeorganizedintotables.Exactlywhichcolumnsmustbeplacedintowhichtablewillbeconsideredhere.InitialListofTablesWhenbuildingthestructure,wecanstartbyfindinggeneral,naturalsubjectswhichlookpromisingforgroupingdata.Thesesubjectswillprovideourinitiallistoftables–hereisanabridgedexampleofwhatthislistmightlooklike:•vehicle•customer•event•vehiclesale•customersatisfactionsurveyWe'llbeginourcolumnsgroupingworkbyconsideringthevehicletable. DataGroupingRulesforTableLayoutTherecanbemorethanonecorrectsolution,butanycorrectsolutionwilltendtorespectthefollowingprinciples:•eachtablehasaprimarykey•noredundantdataispresentwhenconsideringalltablesasawhole•allcolumnsinatabledependdirectlyuponallsegmentsoftheprimarykeyTheseprincipleswillbestudiedindetailsinthefollowingsections.PrimaryKeysandTableNamesLet'sstartbydefiningtheconceptofauniquekey.Acolumnonwhichauniquekeyisdefinedcannotholdthesamevaluemorethanonceforthistable.Theprimarykeyiscomposedofoneormorecolumns,itisavaluethatcanbeusedtoidentifyaisavaluethatcanbeusedtoidentifyauniquerowinatable.Whydoweneedaprimarykey?MySQLitselfdoesnotforceWhydoweneedaprimarykey?MySQLitselfdoesnotforceustohaveaprimarykey,neitherauniquekeynoranyotherkindofkey,foraspecifictable.ThusMySQLputsusundernoobligationtofollowCodd'srules.However,inpracticeit'simportanttohaveaprimarykey;experienceacquiredwhilebuildingwebinterfacesandotherapplicationsshowsthatit'sveryusefultobeabletorefertoakeyidentifyingarowinauniqueway.InMySQL,aprimarykeyisauniquekeywhereallcolumnshavetobedefinedasNOTNULL;thenameofthiskeyisPRIMARY.Choosingtheprimarykeyisdonealmostatthesametimeaschoosingthetable'sname.Selectingthenameofourtablesisadelicateprocess.Wehavetobegeneralenoughtoprovideforfutureexpansion–likethevehicletableinsteadofcarandtruck.Atthesametime,wetrytoavoidhavingholes–emptycolumnsinourtables.Todecideifweshouldhaveavehicletableortwoseparatetables,welookatthepossibleattributesforeachkindofvehicle.Aretheycommonenough?Bothvehicletypeshaveacolor,amodel,ayear,aserialnumber,andaninternalidnumber.color,amodel,ayear,aserialnumber,andaninternalidnumber.,amodel,ayear,aserialnumber,andaninternalidnumber.Theoretically,thelistofcolumnsmustbeidenticalforustodecidethatagroupofcolumnswillbelongtoasingletable;butwecancheatabit,ifthereareonlyafewattributesthataredifferent.Let'ssaywedecidetohaveavehicletable.Forreasonsexplainedearlier,wewanttotrackavehiclesincethemomentweorderit–we'lluseitsinternalidnumberastheprimarykey.Whendesigningthistable,weaskourselveswhetherthistablecanbeusedtostoreinformationaboutthevehicleswereceiveinexchangefromthecustomer.Theanswerisyes,sincedescribingavehiclehasnothingtodowiththetransactionsthathappentoit(newvehiclesold,usedvehicleboughtfromthecustomer).ThesectionValidatingtheStructuregivesfurtherexamplesthatcanhelpcatchingproblemsinthestructure.Hereisversion1oftheHereisversion1ofthevehicletable,with[40] Chapter4columnnamesandsamplevalues–wemarkthecolumnscomprisingtheprimarykeywithanasterisk:table:vehiclecolumnnamesamplevalue*internal_id123serial_numberD8894JFbrandLicornemodelGazelleyear2007coloroceanblueconditionnewShouldweincludethesalesinfo,forexample,pricinganddateofsale,inthistable?Wedeterminethattheanswerisnosinceanumberofthingscanhappen:•thevehiclecanberesold•thetablemightbeusedtoholdinformationaboutavehiclereceivedinexchangeWenowhavetoexamineourworkandverifythatwehaverespectedtheprinciples.Wehaveaprimarykey,butwhataboutredundancyanddependency?DataRedundancyandDependencyWheneverpossible,weshouldevacuateredundantdataintolookuptables–alsocalledreferencetablesandstoreonlythevalueofthecodesintoourmaintables.Wedon'twanttorepeat"Licorne"intoourvehicletableforeachLicornesold.Redundantdatawastesdiskspaceandincreasesprocessingtimewhendoingdatabasemaintenance:ifamodificationneedarises,allinstancesofthesamedatamustbeupdated.�Regardingegardingthevehicletable,itwouldberedundanttostoreafulldescriptivevalueinthebrand,modelandcolorcolumns–storingthreecodeswillsuffice.Wehavetobecarefulaboutevacuatingredundantdata.Forexample,wewon'tbeewon'tbecodingtheyear;thiswouldbetoomuchcodingfornosaving–usingAfor2006,Bfor2007makesnopracticalsavingofspaceafterafewthousandyears!Evenforasmallnumberofyears,thespacesavingwouldnotbesignificant;beside,wewouldlosetheabilitytodocomputationsontheyear.Next,weverifydependency.Eachcolumnmustbedependentontheprimarykey.Istheconditionnew/usenew/useddirectlydependentonthevehicle?No,ifweconsideritddirectlydependentonthevehicle?No,ifweconsideritdirectlydependentonthevehicle?No,ifweconsiderit[41] DataGroupingoverthetimedimension.Intheory,thedealercansellacar,andthenacceptitlaterinexchange.Theconditionisrelatedmoretothetransactionitself,foraspecificdate,soitreallybelongstothesaletable–shownhereinanon-finalstate.Wenowhaveversion2:table:vehiclecolumnnamesamplevalue*internal_id123serial_numberD8894JFbrand_codeLmodel_codeGyear2007color_code1A6table:brandcolumnnamesamplevalue*codeLdescriptionLicornetable:modelcolumnnamesamplevalue*codeGdescriptionGazelletable:colorcolumnnamesamplevalue*code1A6descriptionoceanbluetable:salecolumnnamesamplevalue*date2006-03-17*internal_id123condition_codeNCompositeKeysAcompositekey,alsocalledascompoundkey,isakeythatconsistsofmorethanonecolumn.[42] Chapter4Whenlayingoutourcodetables,wemustverifythatthedatagroupingprinciplesarealsorespectedonthosetables.Usingsampledata,andalsoourimaginationtoalsorespectedrespectedonthosetables.Usingsampledata,andalsoourimaginationtoonthosetablesonthosetables.Usingsampledata,andalsoourimaginationto.Usingsampledata,andalsoourimaginationto.Usingsampledata,andalsoourimaginationtosupplementincompletesampledata,canhelptouncoverproblemsinthisarea.Inourversion2,weoverlookedonepossibility.Whatifthecompaniesmarketingtwodifferentbrandschoseanidenticalcolocolorcode1A6torepresentdifferentcolors?Thercode1A6torepresentdifferentcolors?Thecode1A6torepresentdifferentcolorscolors?The?The?Thesamecouldhappenformodelcodessoweshouldrefinethestructuretoincludethebrandcode–whichrepresentsFontax,Licorneorafuturebrandname–intothemodelandcolortables.Thusversion3displaysthetwotablesthathavechangedfromversion2:table:modelcolumnnamesamplevalue*brand_codeL*codeGdescriptionGazelletable:colorcolumnnamesamplevalue*brand_codeL*code1A6descriptionoceanblueBoththemodelandcolortablesresultinhavingacompositekey.AnotherexampleofacompositekeywasseeninChapter3:thecar_eventtable–seetheDataasaColumn'sorTable'snamesection.Inthesekindsoftables,theprimarykeyiscomposedofmorethanoneelement.Thishappenswhenwehavetodescribedatathatrelatestomorethanonetable.Usually,thenewlyformedtableforcar_eventcontainingthecarinternalnumberandtheeventcodehasfurtherattributeslikethedatewhenaspecificeventoccursforaspecificcar.Anotherpossibilityforacompositekeyariseswhenweencountersubsetslikeadepartmentofacompany.Associatinganemployeeidtojustthecompanycodeorjustthedepartmentcodewouldnotdescribethesituationcorrectly.Anemployeeidisuniqueonlywhenconsideringboththedepartmentandthecompany.Wehavetoverifythatallthenon-keydataelementsofthistabledependdirectlyuponthekeytakeninitsentirety.Hereisaproblematiccasewherethecompany_namecolumnismisplacedbecauseit'snotrelatedtodept_code:[43] DataGroupingtable:company_deptcolumnnamesamplevalue*company_code1*dept_code16dept_nameMarketingcompany_nameFontaxThepreviousexampleisnon-optimalbecausethecompanynamewouldbepresentineveryrowofatableintendedtodescribeeachdepartment.Thecorrectstructureforthepreviousexampleimpliestheuseoftwotables:table:deptcolumnnamesamplevalue*company_code1*code16nameMarketingtable:companycolumnnamesamplevalue*code1nameFontaxImprovingtheStructureEvenwhenourtablelayoutrespectstherules,wecanstillrefineitbylookingatthefollowingadditionalissues.ScalabilityoverTimeInChapter3(sectionDatathatareResults),wesawthatwecouldavoidreservingacolumnforthetaxamount,providedwehavetheexacttaxrateinareferencetable.Howeverthisratecouldchangesoweneedamorecompletetablethatcontainsdaterangesandthecorrespondingrate.Thisway,projectingthesystemoverthetimedimension,wecanensurethatitwillaccommodateratefluctuations.Notethatthefollowingsaletableisnotcomplete:table:salecolumnnamesamplevalue*date2006-03-17*internal_id123condition_codeN[44] Chapter4table:conditioncolumnnamesamplevalue*codeNdescriptionNewComparingthedatecolumnfromthesaletablewiththestart_dateandend_datefromthefollowingtax_ratetable,wecanfindtheexacttaxrateforthedateofsale:table:tax_ratecolumnnamesamplevalue*start_date2006-01-01*end_date2006-04-01rate.075Infact,alltablesshouldbeanalysedtofindwhetherthetimefactorhasbeenconsidered.Anotherexamplewouldbethecolortable.Assumingweareusingthecolorcodesdesignedbyeachcarmanufacturer,doesamanufacturerreusecolorcodesinasubsequentyearforadifferentcolor?Ifthisisthecase,wewouldaddayearcolumntothecolortable.EmptyColumnsAlthoughemptycolumnsarenotnecessarilyproblematic,havingsomerowswhereoneormanycolumnsareemptycanrevealastructuralproblem:twotablesfoldedintoone.Let'sconsiderthecarmovements.Webuiltastructurehavingacar'sinternalnumber,thecodeoftheevent,andthemoment.Butwhatifsomeeventsneedmoredatatobedescribed?Inthepaperforms,wediscoverthatwhenacariswashed,theinitialsofthestoreassistantwhodidthewashingappearontheform,andduringtheinterviews,welearnedthattheseinitialsareanimportantdataelement.Inthiscase,wecanaddemployeeinformation,theemployeecode,tothecar_eventtable.Thiswouldhavethebenefitofenablingthesystemtoidentifywhichstoreassistantparticipatedtoanyeventoccurringtoacar,leadingtobetterqualitycontrol.Anotherissuethatmightariseisthatforaspecificevent(saywashing)werequiremoredatamoredatalikethequantityofcleaningproduct,andtheamountoftimeusedtowash.Ofthosetwoelements,onecanbebeneficialtoimproveourstructure:storingthestartandendtimeoftheevent.Butaddingacolumnlikequantity_cleaning_producttothecar_eventtablehastobeanalyzedcarefully.Foralleventsexceptwashing,thiscolumnwouldremainempty,leadingtoexception[45] DataGroupingtreatmentintheapplications.Thestructurewouldonlyworsenifweaddedanothercolumnrelatedtoanotherspecialevent.Inthiscase,it'sbettertocreateanothertablewiththesamekeysandtheadditionalcolumns.Wecannotavoidhavingsomedataelementsinthisnewtablename:car_washing_event.table:car_washing_eventcolumnnamesamplevalue*internal_number412quantity_cleaning_product12AvoidingENUMandSETMySQLandSQLingeneralofferwhatlookslikeconvenientdatatypes:ENUMandSETtypes.Bothtypespermitustospecifyalistofpossiblevaluesforacolumn,alongwithadefaultvalue;thedifferencebeingthataSETcolumncanholdmultiplevalues,whereasanENUMcancontainonlyoneofthepotentialvalues.Weseehereaverysmallsaletablewiththecredit_ratecolumnbeinganENUM:CREATETABLE`sale`(`internal_number`int(11)NOTNULL,`date`dateNOTNULL,`credit_rate`ENUM('fixed','variable')NOTNULL,PRIMARYKEY(`internal_number`))ENGINE=MyISAMDEFAULTCHARSET=latin1;WhenafieldisdefinedasENUMorSETandweareusingphpMyAdmin'sinsertionordataeditpanels,adropdownlistofthevaluesisdisplayedsoitmightbetemptingtousethosedatatypes.[46] Chapter4Let'sexaminethebenefitsofsuchtypes:•Insteadofstoringthecompletevalue,MySQLstoresonlyanintegerindex,whichusesoneortwobytes,dependingonthenumberofvaluesinthelist•MySQLitselfrefusesanyvaluethatisnotcomprisedinthelistEvenafterconsideringthesebenefits,itisrecommendednottouseENUMandSETtypesforthefollowingreasons:•Changingthelistofpossiblevaluesneedsadeveloperaction,suchasastructuremodificationintervention•Therearelimitsforthosetypes:65535possiblevaluesinthelist;alsoaSETcanhave64activemembers,whicharethechosenvaluesintheset•It'sbettertokeepthesystemmoresimple,becauseifinsomecasesweuselookuptablesandinothercasesENUMorSETtypes,theprogramcodeismorecomplextobuildandmaintainItcouldbearguedthatproblemnumberonecanbesolvedbyincludingintheapplicationsomeALTERTABLEstatementstochangethelistofvalues,butthisdoesnotseemthenormalwaytodealwiththismatter.ALTERTABLEisadatadefinitionstatementthatshouldbeusedduringsystemdevelopment,notattheapplicationlevel.So,anENUMorSETcolumnshouldbecomeaseparatetablewhoseprimarykeyisacode.Then,thetablewhichreferstothiscodesimplyincludesitasaforeignkey.InthecaseofSETcolumn,adistincttablewouldcontainthekeyofthemastertableplusthekeyofthetablewhichcontainsthoseSETvalues.[47] DataGroupingtable:salecolumnnamesamplevalue*internal_number122*date2006-05-27credit_rate_codeFtable:credit_ratecolumnnamesamplevalue*codeFdescriptionfixedPropervalidationintheapplicationensuresthattheinsertedcodesbelongtothelookuptables.MultilingualPlanningThereisanotherbenefitofusingacodetable:ifwestorethecarconditionnew/used,it'smorecomplextodoamulti-lingualapplication.Ontheotherhand,ifwecodethecar'scondition,thenwecanhaveaconditiontableandalanguagetable:table:conditioncolumnnamesamplevaluelanguage_codeEcondition_codeNdescriptionnewtable:languagecolumnnamesamplevaluelanguage_codeEdescriptionEnglishValidatingtheStructureValidationisdonebyusingpreciseexamples,askingourselvesifwehaveacolumntoplaceallinformation,coveringallcases.Maybetherewillbeexceptions–whattodowiththose?Shouldourstructurehandlethem?Wecanassesstheriskfactorassociatedwiththoseexceptions,versusthecostofhandlingthemandthepossiblelossinperformanceforthequeries.[48] Chapter4Anexampleofanexception:acustomerbuystwocarsthesameday–thiscouldinfluencethechoiceofprimarykey,ifadateispartofthiskey,itwillbeconducivetoaddacolumntothiskey:thetimeofdayforthesale.ThephpMyAdminutilitycanproveusefulhere.Tablesareeasilybuiltwiththissoftware,whileitsindexmanagementfeaturepermitsustocraftourprimarykeys.Thenwecanusethemulti-tablequerygeneratortosimulatevariousreportsandwhat-ifs.SummaryWehaveseenthatourlistofcolumnsneedstobeplacedintoappropriatetables,eachhavingaprimarykeyandrespectingsomerulesforincreasedefficiencyandclarity.Wecanalsoimprovethemodelbylookingatthescalabilityandmultilingualissues;thenwelearnedawaytovalidatethismodel.[49] DataStructureTuningThischapterpresentsvarioustechniquestoimproveourdatastructureintermsofsecurity,performance,anddocumentation.Wethenpresentthefinaldatastructureforthecardealer'scasestudy.DataAccessPoliciesWesawinChapter1thatdataisanimportantresource,soaccesstothisresourcemustbecontrolledandclearlydocumented.Aseachpieceofdataoriginates,theresponsibilityfordataentrymustbeclearlyestablished.Afterthedatahasmadeitswayintothedatabase,policiesmustbeinplacetocontrolaccesstoit,andthesepoliciesareimplementedbyMySQL'sprivilegesandtheuseofviews.ResponsibilityWeshoulddeterminewhointheenterprise–intermsofaperson'snameorafunctionname–isresponsibleforeachdataelement.Thisshouldthenbedocumentedandagoodplacetodosoisdirectlyinthedatabasestructure.Analternativewouldbetodocumentdataresponsibilityonpaper,butinformationonpapercanbeeasilylostandhasatendencytobecomeobsoletequickly.Insomecases,therewillbeaprimarysourceandanapprobation-levelsource.Bothshouldbedocumented–thishelpsfor•applicationdesign,whenscreenshavetoreflectthechainofauthorityfordataentry•privilegemanagement,ifdirectMySQLdataaccessisgrantedttoendusersoendusersendusersphpMyAdminpermitsustodescribeeachcolumnbyaddingcommentstoit.IfthecurrentMySQLversionsupportsnativecomments,thosewillbeused;otherwise,phpMyAdmin'slinked-tablesinfrastructurehastobeconfiguredtoenablethestorage DataStructureTuningofcolumncommentsasmeta-data.Wewillindicateresponsibilitydetailsforthiscolumninthecorrespondingcolumncomment.ToreachthepagethatpermitsustoentercommentsinphpMyAdmin,weusetheleftnavigationpaneltoopenthedatabase(heremarc)thenthetable(herecar_event).WethenclickonStructureandchoosetoeditafield'sstructure(hereevent_code)byclickingonthepencilicon.WecanthenusephpMyAdmin'sPrintViewfromtheStructurepagetoobtainalistingofthetablewithcomments.[52] Chapter5SecurityandPrivilegesTherearetwowaysofconsideringthesecurityofourdata.Thefirstandmostcommonlyimplementedisattheapplicationlevel.Normally,applicationsshouldaskforcredentials:username,password,andusethesecredentialstogeneratewebpagesordesktopscreensthatreflectthetaskspermittedtothisuser.NotethattheunderlyingapplicationstillconnectstoMySQLwithalltheprivilegesofadeveloperaccountbut,ofcourse,onlyshowsappropriatedataaccordingtotheuser'srights.AnotherissuetoconsideriswhenauserhasdirectaccesstoMySQL,eitherusingacommand-lineutilityoraninterfacelikephpMyAdmin.Thismightbethecasebecausetheend-userapplicationhasbeendevelopedonlytoacertainpointanddoesnotpermitmaintenanceofcodetables,forexample.Inthiscase,specialMySQLusersshouldbecreatedthathaveonlytheneededrights.MySQLsupportsanaccessmatrixbasedonrightsondatabases,tables,columns,andviews.Thisway,wecouldhidespecificcolumns,likethesellingprice,toallunauthorizedpersons.ViewsSinceMySQL5.0,wecanbuildviewsviews,whichlookliketablesbutarebasedon,whichlookliketablesbutarebasedonwhichlookliketablesbutarebasedonqueries.Theseviewscanbeusedto:•hidesomecolumns•generatemodifiedinformationbasedontablecolumnsandtheuseofexpressionsonthem•procureashortcutfordataaccessbyjoiningmanytablessoastomakethemappearasasingletableSincewecanassociateprivilegestotheseviewswithoutgivingaccesstotheunderlyingtables,viewscanprovehandytoletusersdirectlyaccessMySQLandcontroltheiractionsatthesametime.Hereisanexampleofaviewshowingthecareventsandtheirdescription–here,wewanttohidetheevent_codecolumn:createviewexplained_eventsasselectcar_event.internal_number,car_event.moment,event.descriptionfromcar_eventleftjoineventoncar_event.event_code=event.code[53] DataStructureTuningBrowsingthisviewinphpMyAdmindisplaysthefollowingreport:Askingausertoworkwithviewsdoesnotmeanthatthisusercanonlyreadthisdata.Inmanycases,viewscanbeupdated.Forexample,thisstatementisallowed:UPDATE`explained_events`SET`moment`='2006-05-2709:58:38'WHERE`explained_events`.`internal_number`=412;StorageEnginesMySQLisinternallystructuredinsuchawaythatthelow-leveltasksofstoringandmanagingdataareimplementedbytheplugablestorageenginearchitecture.MySQLABandothercompaniesareactiveinR&Dtoimprovetheofferinthestorageenginesspectrum.Formoreinformationaboutthearchitectureitself,refertohttp://dev.mysql.com/tech-resources/articles/mysql_5.0_psea1.html.Everytimewecreateatable,evenifwedon'tnoticeit,weareaskingtheMySQLserver(implicitlyorexplicitly)touseoneoftheavailablestorageenginestostoreourdataphysically.ThedefaultandtraditionalstorageengineisnamedMyISAM.AwholechapterintheMySQLReferenceManual(http://dev.mysql.com/doc/refman/5.0/en/storage-engines.html)describestheavailableengines.Ourchoiceofstorageenginecanvaryfromtabletotable.Thereisnosuchthingasaperfectstorageengine;wehavetochoosethebestoneaccordingtoourneeds.Herearesomepointstoconsiderwhenmakingachoice:•MyISAMsupportsFULLTEXTindexesandcompressedread-onlystorage,andusesaboutthreetimeslessdiskspacethanInnoDBfortheequivalentamountofdata•InnoDBoffersforeignkeyconstraints,multi-statementtransactionswithROLLBACKsupport;also,duetoitslockingmechanism,itsupportsmoreconcurrentSELECTqueriesthanMyISAM•MEMORYisofcourseveryfastbutthecontent(data)isnotpermanentlystoredon-disk,whilethetabledefinitionitselfison-disk[54] Chapter5•NDB(NetworkDataBase),alsocalledMySQLCluster,offerssynchronousreplicationbetweentheservers–therecommendedminimumnumberofserversintheclusterisfour;thusthereisnosinglepointoffailureinsuchaclusterInshort,hereisageneralguideline:iftheapplicationrequiresmulti-statementtransactionsandforeign-keyconstraints,weshouldchooseInnoDB;otherwise,MyISAM,thedefaultstorageengine,issuggested.ForeignKeyConstraintsTheInnoDBstorageengine(http://www.innodb.com),whichisincludedinMySQLoffersafacilitytodescribeforeignkeysinthetable'sstructure.Aforeignkeyisacolumn(orgroupofcolumns)thatpointstoakeyinatable.Usually,thekeythatispointedtoislocatedinanothertableandisaprimarykeyofthatothertable.Foreignkeysarecommonlyusedaslookuptables.Thereareanumberofbenefitstodescribingtheserelationsdirectlyinthestructure:•referentialintegrityofthetablesismaintainedbytheengine–wecannotaddaneventcodeintothecar_eventtableifthecorrespondingcodeisnotalreadypresentintheeventtable,andwecannotremoveacodefromtheeventtableifit'sstillreferencedbyarowinthecar_eventtable•wecanprogramactionsthatMySQLwillaccomplishinreactiontocertainevents;forexample,whathappensinthereferencingtableifthereferencedcodeisupdatedLet'stransposeourcar_eventexampleintoInnoDB.Let'sfirstcreateandpopulatethereferencedtable,event–noticetheENGINE=InnoDBclause:CREATETABLE`event`(`code`int(11)NOTNULL,`description`char(40)NOTNULL,PRIMARYKEY(`code`))ENGINE=InnoDBDEFAULTCHARSET=latin1;INSERTINTO`event`VALUES(1,'washed');INSERTINTO`event`VALUES(2,'arrived');Next,thereferencingtable,car_event:CREATETABLE`car_event`(`internal_number`int(11)NOTNULLCOMMENT'Resp.:Officeclerk',`moment`datetimeNOTNULLCOMMENT'Resp.:storeassistant',`event_code`int(11)NOTNULLCOMMENT'Resp.:storeassistant',PRIMARYKEY(`internal_number`),[55] DataStructureTuningKEY`event_code`(`event_code`))ENGINE=InnoDBDEFAULTCHARSET=latin1;INSERTINTO`car_event`VALUES(412,'2006-05-2709:58:38',2);INSERTINTO`car_event`VALUES(500,'2006-05-2916:37:46',1);INSERTINTO`car_event`VALUES(600,'2006-05-3016:38:51',2);INSERTINTO`car_event`VALUES(700,'2006-05-3116:39:21',2);Wemusthaveanindexontheevent_codecolumntobeabletouseitinanInnoDBforeignkeyconstraint,whichisdefinedhere:ALTERTABLE`car_event`ADDCONSTRAINT`car_event_ibfk_1`FOREIGNKEY(`event_code`)REFERENCES``event`(`code`)ONUPDATECASCADE;eventevent`(`code`)ONUPDATECASCADE;`(`code`)ONUPDATECASCADE;(``code`)ONUPDATECASCADE;codecode`)ONUPDATECASCADE;``)ONUPDATECASCADE;)ONUPDATECASCADE;Theforeignkeyincar_eventcanalsobedefinedintheinitialCREATETABLEstatement.ThepreviousexamplewasdoneusingALTERTABLEtoshowthatforeignkeyscanbeaddedlater.AlltheseoperationscanbehandledinamorevisualwayviaphpMyAdmin.TheOperationssub-pageenablesustoswitchtheenginetoInnoDB:[56] Chapter5Also,whenthetablesareundertheInnoDBstorageengine,phpMyAdmin'sRelationviewenablesustodefineandmodifytheforeignkeyandrelatedactions:HavingdefinedthisONUPDATECASCADEclause,let'sseewhathappenswhenwemodifyacodevalueintheeventtable.Wedecidethatthecodeforwashedshouldbe10insteadof1:[57] DataStructureTuningWenowbrowsethecar_eventtable;sureenough,thecodeforwashedhasbeenchangedautomaticallytothevalue10:PerformanceAnumberofpointsmustbeexaminedifwewanttoimproveourstructure'sefficiencyintermsofaccessspeedordiskspaceused.IndexesAddingindexesoncolumnsthatareusedinaWHEREclauseisacommonwayofspeedingupthequeries.Let'ssaythatweintendtofindallvehiclesforaspecificbrand.Thevehicletablehasabrand_idcolumnandwewanttocreateanindexonthiscolumn.Inthiscase,theindexwon'tbeuniquebecauseeachbrandisrepresentedbymanyvehicles.UsingphpMyAdmin,therearetwowaystocreateanindex.First,iftheindexappliestoasinglecolumn,wecanopentheStructurepageforatableandclicktheindex(flash)icononthesamelineasthebrand_idcolumn:Thisgeneratesthefollowingstatement:ALTERTABLE``vehicle`ADDINDEX(`brand_id`)vehiclevehicle`ADDINDEX(`brand_id`)`ADDINDEX(`brand_id`)ADDINDEX(``brand_id`)brand_idbrand_id`)``))Wecouldalsocreateanindexonacompositekey,forexamplemodel_idplusyear.Forthis,weenterthenumberofcolumnsforourindex(two)ontheStructurepageandhitGo.[58] Chapter5Next,ontheindexmanagementpage,wechoosewhichcolumnswillbepartoftheindex;thenweinventanameforthisindex(heremodel-year)andclickGotocreateit.TherelatedSQLcommandforthisactionisALTERTABLE``vehicle`ADDINDEX`model-year`(`model_id`,`year`)vehiclevehicle`ADDINDEX`model-year`(`model_id`,`year`)`ADDINDEX`model-year`(`model_id`,`year`)ADDINDEX``model-year`(`model_id`,`year`)model-yearmodel-year`(`model_id`,`year`)`(`model_id`,`year`)(``model_id`,`year`)model_idmodel_id`,`year`)``,`year`),,`year`)``year`)yearyear`)``))Toascertainwhichindexareusedonaparticularquery,wecanprefixthisquerywiththeEXPLAINkeyword.Forexample,weissuethiscommandinphpMyAdmin'squerybox:explainselect*fromvehiclewherebrand_id=1[59] DataStructureTuningTheresultstellusthatanindexonthebrand_idcolumnisapossiblekeyforretrieval:HelpingtheQueryOptimizer:AnalyzeTableWhenwesendaquerytotheMySQLserver,itusesitsqueryoptimizertofindthebestwayofretrievingtherows.WecanhelpthequeryoptimizerachievebetterresultsbyloadingatablewithdataandthenexecutingtheANALYZETABLEstatementonit.ThisstatementasksMySQLtostorethekeydistributionforatable,whichmeansthatitcountsthenumberofkeysforeachindexandstoresthisinformationforlaterreuse.Forexample,aftertheANALYZETABLEonthevehicletable,MySQLmightnoticethatthereare12differentbrands,1000differentvehiclesand100differentmodel-years.Thisinformationwillbeusedlaterifweeversendaqueryusingoneoftheseindexes.Thus,theANALYZETABLEshouldbeexecutedperiodically;theexactfrequencydependsonthenumberofupdatesforthistable.AccessingReplicationSlaveServersMySQLsupportsaschemewhereone-way,asynchronousreplicationofdataoccursbetweenamasterserverandoneormoreslaveservers.Sincenormally,themajorityoftherequestssenttoMySQLareSELECTqueries,wecanimproveresponsetimebysendingthosereadrequeststoaslaveserver.Thisprocuresaload-balancingeffect.Caremustbetakentosendwrite-typestatementssuchasINSERT,UPDATEandDELETEtothemaster.IncurrentMySQLversion(5.0.26),wehavetochoosetheproperserverattheapplicationleveltoachievethisbalancing;however,MySQLplanstoofferafeaturethatwouldautomaticallysendtheSELECTqueriestoslaves.[60] Chapter5ReplicationisanadvancedfeatureofMySQLthatshouldbesetupbyaseasonedMySQLadministrator..SpeedandDataTypesWhencreatingacolumn,wehavetospecifyadatatypeforit.Characterdatatypes(CHAR,VARCHAR)areverycommonlyused.ForCHAR,weindicatethelengthofthecolumn(0to255),andthiscolumnoccupiesafixedamountofspace.ForVARCHAR,eachvalueonlytakesthespaceitneedsinthetable;theindicatedlengthisthemaximumlength–255beforeMySQL5.0.3and65532sincethisversion.Numerictypes–likeINT,FLOAT,andandDECIMALarefixed-length.Tosummarize,herearesomedatatypesandinformationabouthowtheyarestored:DatatypeStoragemethodCHARfixedINTfixedFLOATfixedDECIMALfixedVARCHARvariableWeshouldbeawarethatMySQLcansilentlydecidetoconvertadatatypetoanotherone.ThereasonsforthisareexplainedintheMySQLmanual:http://dev.mysql.com/doc/refman/5.0/en/silent-column-changes.html.Thisiswhy,afterthetable'screation,weshouldre-examineitsstructuretoverifythatsilentconversionhasoccurred.ItmightseemthatweshouldalwayschooseVARCHARforcharacterfields,sincebyusingthisdatatype,ashortervaluetakeslessspacebutthereisareasontostillwanttouseCHAR:speed.Inatable,whenallfieldsareusingnon-variabledatatypes,theMyISAMstorageengineusesafixedtableformat.Inthisformat,MySQLcanpredictthesizeofeachrowandthuscaneasilyfindthedistancefrom,say,onefirst_namecolumntothefirst_namecolumnofthenextrow.Thisimpliesthatqueriesonnon-indexedcolumnsarerelativelyquicklyexecuted.Onthecontrary,whenthereisevenoneVARCHARcolumninatable,thisisnolongerpossible,asMyISAMusesdynamictableformatinthiscase.Hence,adecisionmustbemadebetweenthespeedofdataretrievalandthespaceoverheadofusingfixed-lengthcolumns.[61] DataStructureTuningInphpMyAdmin,whenlookingatatable'sstructure,theRowStatisticssectioninformsusaboutthefixedordynamicformat:Thereisanotherpointinfavorofusingthefixedformat.Whenrowsaredeleted,thespacepreviouslyoccupiedbythese–theholesinthetablebecomeavailableforfutureinsertionssothetabledoesnotbecomephysicallyfragmented.BLOBandTEXTdatatypesarealsovariableinlength.ABLOBistypicallyusedtostorebinarydatalikeacar'soracustomer'sphoto.MySQLtakescareinternallytostorethesecolumnsseparatelyfromtheremainderofthetable'sdata,sotheimpactofhavingtheminatableisnotsignificant.TableSizeReductionAutility,myisampack,canbeusedtotransformaMyISAMtableintoaread-onlyonewhilecompressingdata.Insomecases,thetable'sphysicalsizecouldbereducedby70%.Thistechniqueisonlyavailableifwehaveaccesstothiscommand-lineutility–thereisnoSQLquerywhichwecansendtoachievethisresult.In-ColumnDataEncodingThesituationIamabouttodescribehappenedwhileIwasworkingonasearchengineforbibliographicdatabutIamtransposingitforthecardealer'ssystem.Whenwehavetomigratedatafromapre-existingsystemintoournewly-borndatastructure,wemightencounterdatathatwasformattedinaspecialway.Forexample,alistofpossiblecolorsforacarmodelcouldbeexpressedasaseriesofcolorcodes,separatedbysemi-colons:1A6;1A7;2B7;2T1A65[62] Chapter5Usersofthepre-existingsystemarecomfortablewiththismethodofenteringdatainthisformat,andinthecaseIexperienced,usersrefusedtoletgoofthiswayofenteringdata–andtheyhaddirectaccesstotheMySQLtables.Fromadeveloper'sperspective,however,suchformatmakesthetaskofquerygenerationmorecomplex.Findingthe1A6colorinvolvessplittingthedataelement,andavoidingthe2T1A65dataelement,whichalsohasthe1A6string.Aproperstructureforthiscaseimpliesgettingridofthesemicolon-basedformatcompletelyandstoringjustthepuredataintableformat:table:model_colorcolumnnamesamplevalue*model1color_code1A6Anothercaseforwhichit'sevenmorecomplextofindcoherentdatawithaqueryiswhenthereismorethanoneelementbetweentheseparators,suchasalistofnames:MurrayDan;SmithPeter;BlackPaulSpecialcaremustbetakentoavoidmatchingthislistofnameswhenwesearchforMurray,PaulbecauseMurrayandPaularepresentinthefullstring.Thiscaseonlyreinforcesthecaseformovingawayfromsuchaformatoratleast–ifwemustkeepthisformatduetopoliticalissues–forbuildinganintermediarytable,whichwillbeusedforsearching.Inthiscase,thespecialtablemustbesynchronizedwheneverthemaintable'scontentschanges.CaseStudy'sFinalStructureInthissectionweexaminethefinaldatastructureforourcasestudy.Therearemanywaystopresentthisstructure.Firstwe'llseeallthetablesthatarerelatedtoeachother–almostalltablesare–thenwewillexaminegroupofrelatedtablesandtheircolumns.ThefollowingschemaisproducedbyphpMyAdmin'sPDFPagefeature.Toaccessthisfeature,weopenadatabaseandaccesstheOperationssub-page.ThenweclickEditPDFpages.[63] DataStructureTuningWhengeneratingthePDFschema,wecanalsoaskphpMyAdmintoproduceadatadictionary.Inordertodothis,weclicktheDataDictionarycheckboxintheDisplayPDFschemadialog.Hereisthepageofthisdictionarydescribingthepersontable:Thiscombineddatadictionary/schemaoffersanoteworthyfeature:wecanclickonatablenameintheschematoreachthetable'sdescriptioninthedictionary,andtheotherwayaround.[64] Chapter5ThefollowingCREATETABLEcommandscomedirectlyfromtheExportfeatureofphpMyAdmin.Toaccessthisfeature,simplyopenadatabaseandchoosetheExportmenu,thenselectallthetables,clicktheSQLcheckboxandhitGo.Thecommandshavebeengroupedintosmallerchunksofrelatedtables,evenifultimatelytherearerelationsbetweenthetablesofthosegroups.You'llnoticethatphpMyAdminadds–intheformofcommentsintheexportfile–informationabouttherelationswithothertables.Anotherpointtonote:theprimarykeyformosttablesisid,aninteger.Thus,acolumnpointingtotheidcolumnoftablebrandisnamedbrand_id.Vehicle----Tablestructurefortable`brand`--CREATETABLE`brand`(`id`int(11)NOTNULL,`description`varchar(40)NOTNULL,PRIMARYKEY(`id`))ENGINE=MyISAMDEFAULTCHARSET=latin1;--------------------------------------------------------------Tablestructurefortable`brand_color`--[65] DataStructureTuningCREATETABLE`brand_color`(`brand_id`int(11)NOTNULL,`id`int(11)NOTNULL,`description`varchar(40)NOTNULL,PRIMARYKEY(`brand_id`,`id`))ENGINE=MyISAMDEFAULTCHARSET=latin1;----RELATIONSFORTABLE`brand_color`:--`brand_id`--`brand`->`id`----------------------------------------------------------------Tablestructurefortable`brand_model`--CREATETABLE`brand_model`(`brand_id`int(11)NOTNULL,`id`int(11)NOTNULL,`description`varchar(40)NOTNULL,PRIMARYKEY(`brand_id`,`id`))ENGINE=MyISAMDEFAULTCHARSET=latin1;----RELATIONSFORTABLE`brand_model`:--`brand_id`--`brand`->`id`----------------------------------------------------------------Tablestructurefortable`event`--CREATETABLE`event`(`id`int(11)NOTNULL,`description`varchar(40)NOTNULL,PRIMARYKEY(`id`))ENGINE=MyISAMDEFAULTCHARSET=latin1;--------------------------------------------------------------Tablestructurefortable`vehicle`--[66] Chapter5CREATETABLE`vehicle`(`internal_number`int(11)NOTNULL,`serial_number`varchar(50)NOTNULL,`brand_id`int(11)NOTNULL,`model_id`int(11)NOTNULL,`year`year(4)NOTNULL,`physical_key_id`int(11)NOTNULL,`color_id`int(11)NOTNULL,`category_id`int(11)NOTNULL,PRIMARYKEY(`internal_number`))ENGINE=MyISAMDEFAULTCHARSET=latin1;----RELATIONSFORTABLE`vehicle`:--`brand_id`--`brand`->`id`--`category_id`--`vehicle_category`->`id`--`color_id`--`brand_color`->`id`--`model_id`--`brand_model`->`id`----------------------------------------------------------------Tablestructurefortable`vehicle_category`--CREATETABLE`vehicle_category`(`id`int(11)NOTNULL,`description`varchar(40)NOTNULL,PRIMARYKEY(`id`))ENGINE=MyISAMDEFAULTCHARSET=latin1;--------------------------------------------------------------Tablestructurefortable`vehicle_event`--CREATETABLE`vehicle_event`(`internal_number`int(11)NOTNULL,`moment`dateNOTNULL,`event_id`int(11)NOTNULL,`person_id`int(11)NOTNULL,PRIMARYKEY(`internal_number`,`moment`)[67] DataStructureTuning)ENGINE=MyISAMDEFAULTCHARSET=latin1;----RELATIONSFORTABLE`vehicle_event`:--`event_id`--`event`->`id`--`internal_number`--`vehicle`->`internal_number`--`person_id`--`person`->`id`--Person----Tablestructurefortable`gender`--CREATETABLE`gender`(`id`TINYINT(4)NOTNULL,`description`varchar(40)NOTNULL,PRIMARYKEY(`id`))ENGINE=MyISAMDEFAULTCHARSET=latin1;--------------------------------------------------------------Tablestructurefortable`person`--CREATETABLE`person`(`id`int(11)NOTNULL,`category_id`int(11)NOTNULL,`gender_id`TINYINT(4)NOTNULL,`salutation_id`TINYINT(4)NOTNULL,`first_name`varchar(50)NOTNULL,`last_name`varchar(50)NOTNULL,`address`varchar(300)NOTNULL,`city`varchar(50)NOTNULL,`postal_code`varchar(20)NOTNULL,`phone_area`varchar(20)NOTNULL,`phone_number`varchar(20)NOTNULL,`phone_extension`varchar(20)NOTNULL,`email`varchar(100)NOTNULL,PRIMARYKEY(`id`))ENGINE=MyISAMDEFAULTCHARSET=latin1;[68] Chapter5----RELATIONSFORTABLE`person`:--`category_id`--`person_category`->`id`--`gender_id`--`gender`->`id`--`salutation_id`--`salutation`->`id`----------------------------------------------------------------Tablestructurefortable`person_category`--CREATETABLE`person_category`(`id`int(11)NOTNULL,`description`varchar(40)NOTNULL,PRIMARYKEY(`id`))ENGINE=MyISAMDEFAULTCHARSET=latin1;--------------------------------------------------------------Tablestructurefortable`salutation`--CREATETABLE`salutation`(`id`TINYINT(4)NOTNULL,`description`varchar(40)NOTNULL,PRIMARYKEY(`id`))ENGINE=MyISAMDEFAULTCHARSET=latin1;----------------------------------------------------------Sale----Tablestructurefortable`condition`--CREATETABLE`condition`(`id`int(11)NOTNULL,`description`char(15)NOTNULL,PRIMARYKEY(`id`))ENGINE=MyISAMDEFAULTCHARSET=latin1;----------------------------------------------------------[69] DataStructureTuning----Tablestructurefortable`credit_rate`--CREATETABLE`credit_rate`(`id`int(11)NOTNULL,`description`char(30)NOTNULL,PRIMARYKEY(`id`))ENGINE=MyISAMDEFAULTCHARSET=latin1;--------------------------------------------------------------Tablestructurefortable`sale`--CREATETABLE`sale`(`internal_number`int(11)NOTNULL,`date_sold`dateNOTNULL,`condition_id`int(11)NOTNULL,`customer_id`int(11)NOTNULL,`salesperson_id`int(11)NOTNULL,`base_price`decimal(9,2)NOTNULL,`insurance_id`int(11)NOTNULL,`insurance_policy_number`varchar(40)NOTNULL,`preparation_cost`decimal(9,2)NOTNULL,`exchange_vehicle_id`int(11)NOTNULL,`exchange_price`decimal(9,2)NOTNULL,`down_payment`decimal(9,2)NOTNULL,PRIMARYKEY(`internal_number`,`date_sold`))ENGINE=MyISAMDEFAULTCHARSET=latin1;----RELATIONSFORTABLE`sale`:--`condition_id`--`condition`->`id`--`customer_id`--`person`->`id`--`exchange_vehicle_id`--`vehicle`->`internal_number`--`insurance_id`--`organization`->`id`--`internal_number`--`vehicle`->`internal_number`--`salesperson_id`--`person`->`id`[70] Chapter5----------------------------------------------------------------Tablestructurefortable`sale_financing`--CREATETABLE`sale_financing`(`internal_number`int(11)NOTNULLauto_increment,`date_sold`dateNOTNULL,`financial_id`int(11)NOTNULL,`interest_rate`decimal(9,4)NOTNULL,`credit_rate_id`int(11)NOTNULL,`first_payment_date`dateNOTNULL,`term_years`int(11)NOTNULL,PRIMARYKEY(`internal_number`))ENGINE=MyISAMDEFAULTCHARSET=latin1AUTO_INCREMENT=1;----RELATIONSFORTABLE`sale_financing`:--`credit_rate_id`--`credit_rate`->`id`--`financial_id`--`organization`->`id`--`internal_number`--`vehicle`->`internal_number`----------------------------------------------------------------Tablestructurefortable`tax_rate`--CREATETABLE`tax_rate`(`start_date`dateNOTNULL,`end_date`dateNOTNULL,`rate`decimal(9,4)NOTNULL,PRIMARYKEY(`start_date`,`end_date`))ENGINE=MyISAMDEFAULTCHARSET=latin1;----------------------------------------------------------[71] DataStructureTuningOthertables----Tablestructurefortable`parameters`--CREATETABLE`parameters`(`dealer_number`varchar(30)NOTNULL)ENGINE=MyISAMDEFAULTCHARSET=latin1;--------------------------------------------------------------Tablestructurefortable`organization`--CREATETABLE`organization`(`id`int(11)NOTNULL,`category_id`int(11)NOTNULL,`name`varchar(50)NOTNULL,`address`varchar(300)NOTNULL,`city`varchar(50)NOTNULL,`postal_code`varchar(20)NOTNULL,`phone_area`varchar(20)NOTNULL,`phone_number`varchar(20)NOTNULL,`phone_extension`varchar(20)NOTNULL,`email`varchar(100)NOTNULL,PRIMARYKEY(`id`))ENGINE=MyISAMDEFAULTCHARSET=latin1;----RELATIONSFORTABLE`organization`:--`category_id`--`organization_category`->`id`----------------------------------------------------------------Tablestructurefortable`organization_category`--CREATETABLE`organization_category`(`id`int(11)NOTNULL,`description`varchar(40)NOTNULL,PRIMARYKEY(`id`))ENGINE=MyISAMDEFAULTCHARSET=latin1;----------------------------------------------------------[72] Chapter5----Tablestructurefortable`road_test`--CREATETABLE`road_test`(`internal_number`int(11)NOTNULL,`date`dateNOTNULL,`customer_id`int(11)NOTNULL,`salesperson_id`int(11)NOTNULL,`customer_comments`varchar(255)NOTNULL,PRIMARYKEY(`internal_number`,`date`,`customer_id`))ENGINE=MyISAMDEFAULTCHARSET=latin1;----RELATIONSFORTABLE`road_test`:--`customer_id`--`person`->`id`--`internal_number`--`vehicle`->`internal_number`--`salesperson_id`--`person`->`id`----------------------------------------------------------------Tablestructurefortable`survey`--CREATETABLE`survey`(`id`int(11)NOTNULL,`date`dateNOTNULL,`customer_id`int(11)NOTNULL,`salesperson_id`int(11)NOTNULL,PRIMARYKEY(`id`))ENGINE=MyISAMDEFAULTCHARSET=latin1;----RELATIONSFORTABLE`survey`:--`customer_id`--`person`->`id`--`salesperson_id`--`person`->`id`--------------------------------------------------------------[73] --Tablestructurefortable`survey_answer`--CREATETABLE`survey_answer`(`survey_id`int(11)NOTNULL,`question_id`int(11)NOTNULL,`answer`varchar(30)NOTNULL,PRIMARYKEY(`survey_id`,`question_id`))ENGINE=MyISAMDEFAULTCHARSET=latin1;----RELATIONSFORTABLE`survey_answer`:--`question_id`--`survey_question`->`id`--`survey_id`--`survey`->`id`----------------------------------------------------------------Tablestructurefortable`survey_question`--CREATETABLE`survey_question`(`id`int(11)NOTNULL,`description`varchar(40)NOTNULL,PRIMARYKEY(`id`))ENGINE=MyISAMDEFAULTCHARSET=latin1;----------------------------------------------------------SummaryWeimprovedourdatastructure'simplementationbyassessingtheresponsiblepersonforeachdataelementandbystoringthisinformationintocolumncomments.Wethensawhowtouseprivilegesandviewstoimprovesecurity,howtochoosethebeststorageenginepertable,andhowtobenefitfromforeignkeyconstraints.Performanceissueswereconsidered,andthenwewerepresentedwiththefinalmodelforthecardealer'scasestudy. SupplementalCaseStudyNow,it'stimetoapplyournewlylearnedprinciplestoacompletelydifferenttheme.Weupgradefromcarstoplanes,coveringasimpleairlinesystem.Thischapter'scasestudydoesnotpretendtoencompassthefullcollectionofdatafromrealairline–it'sonlyasample.Nonetheless,we'llseethattheprincipleswelearnedpreviouslycanbeappliedtobuildandrefineacorrectandcoherentdatastructure.Normally,eachairlinehasitsowninformationsystem.Weassumeherethatwehavegotthemandatetobuildaninformationsystemthatencompassesmanyairlines.ResultsfromtheDocumentGatheringPhaseAfterreviewingtheairlinesystem'scurrentwebsite,abookingagent'swebsite,someelectronictickets,andboardingpasses,wegatheralargeamountofinformation.We'llfirstexpressthisinformationwithsentenceswhichpresentthesystemanddataexchangeonaratherhighlevel.Eachsentenceisfollowedbyalistofthedataelementswhichwecandeducefromit.Anelementcanbepresentinmorethanonesentence.RefertotheTablesandSampleValuessectionformoredetailsabouteachdataelement.Therearealsosomenotesthatwillhelpusinthenamingandgroupingphase.Flight456ofAirQuebecleavesMontreal-Trudeauairportat22:45onOctober2nd,2007headingforParis'sCharlesdeGaulle.Thefollowingarethedataelementsthatcanbeobtainedfromtheabovesentence:•flight_number•airline_name SupplementalCaseStudy•airport_name•flight_departure_momentWeneedtoindicatewhethertheairportisfordepartureorarrival.TheairportcodeforMontreal-TrudeauisYULandtheoneforCharlesdeGaulleisCDG.Thedataelementobtainedfromtheabovesentenceis:•airport_codeShouldweusetheairport_codeasaprimarykey?Maybenot,duetospaceconsiderations.Thisflightisscheduledtolandat11:30thedayafter(localtime).Thedataelementobtainedis:•flight_arrival_momentDoweneedtosplitthedateandtimeintotwofields?Probablynot,tobenefitfromdateandtimecalculationsfunctions(howmanyhoursandminutestakesaflight,takingthedateintoaccount).AnaircraftmodelAPM-300fromFontaxservicesthisflight.Thedataelementsobtainedfromabovesentenceare:•plane_model•plane_brandDoweneedtoassociatetheplanemodeltoaflight,butalsotowhichspecificplane.(TherecanbemorethanoneAPM-300.)[76] Chapter6ThepilotonthisflightisDanMurrayandtheflightattendantisMelanieWaters.Othercrewmembersaretobeconfirmed.Thedataelementsobtainedfromtheabovesentenceare:•pilot_first_name•pilot_last_name•flight_attendant_first_name•flight_attendant_last_nameWeshouldgeneralizeusingthenotionofacrewcategory.PeterSmithbuysaticketforthisflightfromFantasticTour,Inc.,abookingagency.Theticketnumberis01488417654.Thisisaone-wayticket.Thedataelementsobtainedfromtheabovesentenceare:•passenger_first_name•passenger_last_name•booking_agency_name•ticket_number•ticket_typeWe'llalsoneedaprimarykeyforthepassengerandprobablyforthebookingagencyifwedon'tuseitscode.Shouldtheticketitselfberepresentedinatable,orwilltheticketnumberbepartofsomethingmoregenerallikeareservation?Forthisflight,Mr.Smithisseatedat19A,locatedintheeconomysectionoftheplane.Thedataelementsobtainedfromtheabovesentenceare:•passenger_last_name•seat_id•plane_section[77] SupplementalCaseStudyThesectionsavailableonaplanedependnotonlyontheplanemodelbutalsoontheairline.Thisticketisnon-refundable.Thedataelementobtainedfromtheabovesentenceis:•ticket_refundabilityFlight456canbeboardedatgatenumber74,35minutesbeforetakeoff.Thedataelementsobtainedfromtheabovesentenceare:•flight_number•gate_id•boarding_timeIneconomyclass,passengersareentitledtoonebaginsidethecabinandtworegisteredbags–totalweight50kgmax.MrSmithhasoneregisteredbag,bearingthetagAQ636-84763.Thedataelementsobtainedfromtheabovesentenceare:•plane_section•max_number_in_cabin_bags•max_number_registered_bags•max_weight_registered_bags_kg•tag_idWehavedetectedthat"class"isasynonymfor"section".Thereareinformationscreensintheairportthatindicatethestateofeachflight:ontime,boarding,delayed,orcancelled.Thedataelementobtainedfromtheabovesentenceis:•flight_status[78] Chapter6Willneedtobecoded(idanddescription).Twomealsareservedonthisflight.AirQuebechasarrangementswithMontrealChefServiceforthepreparationanddeliveryoffood.Thedataelementsobtainedfromtheabovesentenceare:•number_meals•airline_name•meal_supplierAirQuebecownsfourFontaxAPM-300aircraftbutaircraft#302(code-namedCharlie)isscheduledforrepairsinOctober2007.Thedataelementsobtainedfromtheabovesentenceare:•airline_name•plane_brand•plane_model•plane_id•description•plane_event•plane_event_start_moment•plane_event_end_momentEachplaneisaffectionatelynicknamed,theelementforthiswillbe"description".Abouttherepairs,wegeneralizethemwiththeconceptofevents,havingastartingandendingmoment.PassengerSmithcanusethequickreferencecodeA6BCUDandhislastnametoaccesshisflightinformationontheairlinewebsite.Thedataelementsobtainedfromtheabovesentenceare:•passenger_last_name•web_site_quick_reference[79] SupplementalCaseStudyPreliminaryListofDataElementsWelistherethedataelementsastheycanbededucedfromthedocumentgatheringphase.Inmanycases,theyarenotinaformatalreadysuitableforthefinalmodelbecausetheyareprefixedwithatablename.Forexample,adataelementidentifiedaspilot_last_namewillbecomethecolumnlast_nameinthepilottable.Samplevaluesandmoredetailedinformationabouteachdataelementappearinthenextsection.Dataelementsflight_departure_momentseat_idflight_arrival_momentplane_sectiondeparture_airport_codeticket_refundabilityarrival_airport_codegate_idairline_codeboarding_timeairline_namemax_number_in_cabin_bagsairport_namemax_number_registered_bagsplane_brandmax_weight_registered_bags_kgplane_modeltag_idpilot_last_nameticket_issued_onpilot_first_namenumber_mealsflight_attendant_last_nameweb_site_quick_referenceflight_attendant_first_namemeal_supplierpassenger_last_nameplane_idpassenger_first_nameplane_eventpassenger_idplane_event_start_momentbooking_agency_nameplane_event_end_momentticket_numberflight_statusTablesandSampleValuesTopreparethelistoftables,westartwiththephysicalobjectsorpersonswecanobserveinthesentencesbuiltfromthedocumentsgatheringphase.Thenwehavealookatalltheelementsandbuildnewtablestoaccommodatethem.Inthefollowingtabledescriptions,thetablelayoutisfollowedbydesigncommentswhenappropriate.[80] Chapter6CodeTablesUsuallythefollowingtablesaredesignedfirstbecausetheyareeasiertomodelandtheyareneededforestablishingtherelationsfrommorecomplextables.table:airportcolumnnamesamplevalue*id1international_codeYULdescriptionMontreal-TrudeauTheairporttablecouldcontainothercolumnsliketheaddress,phone,andwebsite.table:airlinecolumnnamesamplevalue*id1descriptionAir-Quebectable:plane_brandcolumnnamesamplevalue*id1descriptionFontaxWeavoidnamingthistableasbrandbecauseit'satoogenericname.table:meal_suppliercolumnnamesamplevalue*id9descriptionMontrealChefServicetable:booking_agencycolumnnamesamplevalue*id1descriptionFantasticTourAgain,thistablecouldhavemoredetailsabouttheagency,likephoneandaddress.Wecouldalsomergethistablewithmeal_suppliertablebyaddingacodeidentifyingthetypeofcompany,butit'snotdoneinthepresentmodel.table:ticket_typecolumnnamesamplevalue*id1descriptionone-way[81] SupplementalCaseStudytable:crew_categorycolumnnamesamplevalue*id1descriptionPilotToavoidcolumnslikepilot_last_name,copilot_first_name,weformacrew_categorytable.Seealsotherelatedflight_crewtablelaterinthischapter.table:ticket_refundabilitycolumnnamesamplevalue*id1descriptionnon-refundabletable:flight_statuscolumnnamesamplevalue*id1descriptionboardingtable:eventcolumnnamesamplevalue*id1descriptionrepairIfweneedtoincludeothertypesofeventsinthemodel,thiseventtablewillhavetoberenamedassomethingmorepreciselikeplane_event,andanewnamewillbeneededforourcurrentplane_eventtablethatisusedtoassociateaneventwithaplane.ThemedTablesThesetablesaremorecomprehensivethanthecodetables.Eachonereferstoaspecificthemethatneedsmorecolumnsthanasimplecodetable.table:planecolumnnamesamplevalue*id302airline_id1brand_id1model_id2descriptionCharlieThistableidentifieswhichaircraftbelongstowhichairline,withthedescriptionbeinganinternalmeansofdescribingthisparticularaircraftwithintheairline.Otherfieldslikeanaircraftserialnumbercanbeaddedhere.[82] Chapter6table:passengercolumnnamesamplevalue*id1302last_nameSmithfirst_namePeterpassport_infoCDN234234table:crewcolumnnamesamplevalue*id9category_id1last_nameMurrayfirst_nameDanPassengersandcrewmemberscannotbephysicallymergedintoonetableeveniftheybelongtothesameflightbecausethesetofcolumnsusedtodescribeapassengerdivergesfromtheoneassociatedtoacrewmember.We'llcoverintheSampleQueriessectionhowtoproduceacombinedlistofallpersonsonaplane.table:flightcolumnnamesamplevalue*id34airline_id1number456departure_moment2007-10-0222:45arrival_moment2007-10-0311:30departure_airport_id1arrival_airport_id2plane_id302meal_supplier_id9number_meals2departure_gate74arrival_gateB65boarding_moment2007-10-0222:10status_id1[83] SupplementalCaseStudyThenotionofflightiscentraltothissystem,thuswe'llhaveaflighttable.Thismeansthatwehavetodetermineaprimarykeyand,atfirstsight,theflightnumberwouldbeagoodcandidate–butit'snotandthereasonforthisisthattheflightnumberisnotpaintedonanaircraft;it'sonlyalogicalwayofexpressingthemovementofaplanebetweentwoairports,andalsothepersonsorcompaniesrelatedtothismovement.Wenotethattheflightnumberiskeptshort–threeorfourdigitsforimprovedreferenceonallprintedmatterandonairportinformationscreens;thus,thisflightnumberisonlymeaningfulwhenaccompaniedbysupplementalinformation,liketheairlinecode(AQ)orcompanyname,andadate.Keepinginmindthattherewillbeothertablesassociatedwiththisflighttable,wehavetwochoicesherefortheprimarykey:•createasurrogatekey(anartificialprimarykeywhosevalueisnotderivedfromothertable'sdata)•useacombinationofcolumns–airline_id,flight_number,departure_momentItisbettertocreateasurrogatekey,id.Thisidwillbepropagatedtotherelatedtablesinjustonecolumn,whichcouldhelptospeedupretrievaltimesbecausethereisjustonefieldtocomparebetweentables.Usingtheflight'sidwillalsosimplifythewritingofqueries.Ofcourseweincludetheflightnumber–theinformationknowntothepublicintheflighttablebutnotasaprimarykey.table:reservationcolumnnamesamplevalue*flight_id34*passenger_id1302web_site_quick_referenceKARTYUticket_number01488417654ticket_issued_moment2007-01-0112:00booking_agency_id1ticket_refundability_id1ticket_type_id1seat19Asection_id2Itwouldbeamistaketoincludecolumnssuchaspassenger1,passenger2orseat_1a,seat_1bintheflightstable.Thisiswhyweusethereservationtabletoholdpassengerinformationrelatedtoaspecificflight.Thistablecouldalsobenamedflight_passenger.[84] Chapter6Normallywewouldnotneedthesection_idinthereservationtable,sincewesincewecanrefertoitviatheseat_idbuttheseat_idmaybeunknownatthetimeofreservation,soseatassignmentcanbedelayeduntiltheboardingpassisissued.Composite-KeyTablesThesetableshavemorethanonekeybecausesomekeysegmentsrefertoacodeorthemedtable.table:plane_brand_columnnamesamplevaluemodel*brand_id1*id2descriptionAPM-300Here,thebrand_idandauniqueidformtheprimarykeysfortheplanemodel.Wewanttoknowtowhichbrandthismodelrefers,andstillkeepintegersforthekeysinsteadofusingAPM-300asakeyvalue.table:plane_sectioncolumnnamesamplevalue*airline_id1*id1descriptioneconomyEachairlinecanpotentiallydescribethesectionsoftheirplanesthewaytheywant–someareusinghospitalityinsteadofeconomy.table:airline_brand_model_columnnamesamplevaluerestriction*airline_id1*brand_id1*model_id2max_number_in_cabin_bags1max_number_registered_bags2max_weight_registered_bags_kg50[85] SupplementalCaseStudytable:plane_section_seatcolumnnamesamplevalue*airline_id1*brand_id1*model_id2*section_id1*seat19ATheplane_section_seattabledescribeswhichseatsarelocatedinaspecificsectionoftheaircraft.Thisisperairline,brand,model,andsection,becausedifferentairlinescouldpossessthesamekindofaircraftbutusedifferentseatnumbersorhavealargerbusinesssectionthanothersairlines.Also,insomecases,seats1Aand1Cmayexistbut1Bmaynot.Thus,weneedthistabletoholdthecompletelistofexistingseats.table:flight_crewcolumnnamesamplevalue*flight_id34*crew_id9Withthesesamplevalues,wecandeducethatDanMurrayisthepilotforflight456ofAir-Quebec.Anotherpossiblecolumninthistablewouldbethestatusofthiscrewmemberforthisflight:arrivedontime,cancelled,orreplaced.table:plane_eventcolumnnamesamplevalue*plane_id302*event_id1*start_moment2008-10-01end_moment2008-10-31table:reservation_registered_columnnamesamplevaluebags*flight_id34*passenger_id1302*tagAQ636-84763Othercolumnsfortagtrackingcouldbeaddedhere.[86] Chapter6AirlineSystemDataSchemaHereagainweusephpMyAdmin'sPDFschemafeaturetodisplaytherelationsbetweentablesandthekeysinvolved.SampleQueriesAsacomplementtothelistoftablesandthedatabaseschema,let'sseeourtablesinaction!Wewillentersamplevaluesintothetables,andthenbuildsomeSQLqueriesinordertopulltheneededdata.[87] SupplementalCaseStudyInsertingSampleValuesWeusethesamplevaluesdescribedintheabovelistoftables.Pleaserefertothisbook'ssupportsite((http://www.packtpub.com/support)forthecodedownloadwhichcontainsthetables'definitionandsamplevalues.BoardingPassApassengercanprinthisorherboardingpasswhileathomebyusingthewebsite'squickreferenceforhisorherreservation,whichisKARTYUinourexample.Hereisthegeneratedquerytoretrievetheboardingpassinformation:selectpassenger.last_name,passenger.first_name,flight.number,airline.description,flight.departure_moment,flight.departure_gate,flight.boarding_moment,reservation.seat,plane_section.descriptionfromreservationinnerjoinpassengeronreservation.passenger_id=passenger.idinnerjoinflightonreservation.flight_id=flight.idinnerjoinairlineonflight.airline_id=airline.idinnerjoinplane_sectionon(airline.id=plane_section.airline_idandreservation.section_id=plane_section.id)wherereservation.web_site_quick_reference='KARTYU'Executingthisqueryretrievestheseresults:PassengerListHere,theairlinewantsalistofpassengersforaspecificflight;weuseflight_id,whichistheprimarykeyoftheflighttable,andnottheflightnumberbecausenumberbecausebecauseflightnumbersarenotunique.selectreservation.seat,passenger.last_name,passenger.first_name,[88] Chapter6passenger.passport_info,airline.description,flight.numberfromreservationinnerjoinpassengeronreservation.passenger_id=passenger.idinnerjoinflightonreservation.flight_id=flight.idinnerjoinairlineonflight.airline_id=airline.idwherereservation.flight_id=34orderbyreservation.seatCurrently,thisflightisnotverypopular,anditlookslikePeterandAnniewillbeabletochattogether:AllPersonsonaFlightIntheunlikelyeventofaplanecrash,wemightneedtoextractquicklythelistofallpersonsonaflight.Thefollowingquerydoesjustthat:selectpassenger.last_nameas'lastname',passenger.first_nameas'firstname','passenger'as'type',airline.description,flight.numberfromreservationinnerjoinpassengeronreservation.passenger_id=passenger.idinnerjoinflightonreservation.flight_id=flight.idinnerjoinairlineonflight.airline_id=airline.idwherereservation.flight_id=34unionselectcrew.last_nameas'lastname',crew.first_nameas'firstname','crew'as'type',airline.description,flight.numberfromflight_crewinnerjoincrewonflight_crew.crew_id=crew.idinnerjoinflightonflight_crew.flight_id=flight.idinnerjoinairlineonflight.airline_id=airline.id[89] SupplementalCaseStudywhereflight_crew.flight_id=34orderby'lastname','firstname'Theresultsaresortedbylastnameandfirstname;notethe"type"columnthatindicateswhetherthispersonisapassengeroracrewmember.SummaryFromthestudyofafewdocumentsaboutanairlinesystem,welistedthepossibledataelementsthatbecomecolumnsgroupedintotables.Wecarefullychosetheprimarykeyorkeysforeachtableandbuiltrelationsbetweenthesetables,verifyingthatallpotentialdataelementswereincludedinatleastonetable.[90] IndexAcompositekeyabout42airlinesystem,casestudyusing43about75compoundkey42allpersonsonaflight,samplequery89,90boardingpass,samplequery88Dcodetables81composite-keytables85datadataelements80accesspolicies51documentgathering,results75-77,79applicationlevel,security53passengerlist,samplequery88,89ascolumnname30-32samplequeries87asresults29,30samplevalues80,84,85astablename30-32tables80,84,85datacleaning27,28themedtables82,84,85datadesign,withcasestudy11datastructure10Cdatastructure,changing32datastructure,normalization10casestudydependency41airlinesystem75encoding62cardealer11previleges53finalstructure63redundancy41casestudy,cardealershipsecurity53about11data,collectingcardealer11dataelements,example22dataelements22document,gathering19generalmanager,dataelements23interviews20,22salescontract,dataelements23systemboundaries,identifying17,18salesperson,dataelements23dataaccesspoliciesstoreassistant,dataelements24about51datamodel,building11applicationlevel,security53exampleofdatacleaning27dataresponsibility51goal17previleges53scope17,18security53systemgoals12views53Codd’srules10datacleaning27,28 datadependency41benefits55datadesignconstraint56technique10defining56dataelementsforms20examples22freefieldstechniquesubdividing28,29about32subdividing,example28,29drawbacks33withformattingcharacters29example33datamodelschallenges18ICodd’srules9flexibility19indexrelationalmodel,overview9about58datanamingcreating58abbreviations34,35creatingoncompositekey58about34EXPLAINkeyword59consistency36queryoptimizer60designer’screativity34interviewsMySQLissues36goal20pluralform35perceptions21problems30-32perceptions,example21tablenameintoacolumnname36relevantquestions21dataredundancychronologicalevents,relevantquestions22about41destinations,relevantquestions22drawbacks41existingsystem,relevantquestions21datastructuresources,relevantquestions22efficiency,improoving58urgency,relevantquestions22indexes58users,finding21document,gatheringusers,findingissues21existingsystem20forms20Mgeneralreading19masterserver60goal19modulardevelopmentadvantages18Epriorities18MySQLEAV.SeefreefieldstechniqueBLOBdatatype62Entity-Attribute-Value.SeefreefieldsCHARdatatype61techniquedatatypes61entityrelationshipdiagram10datatypesandstoragemethods61ERD.SeeentityrelationshipdiagramInnoDB55queryoptimizer60Fstorageengines54foreignkeyTEXTdatatype62about55VARCHARdatatype61[92] NTnon-relationaltable12tablelayoutnormalization10primarykey40rules40Ptablename,selecting40,41uniquekey40phpMyAdmintablenamecomments,addingtocolumnsintable51,selecting40,4152tablesDisplayPDFschema65about10Exportfeature65codetables,layingout43PDFPagefeature63comments,addingtocolumns51,52primarykeydataencoding62about10,40dynamictableformat61needfor40fixedtableformat61foreignkey55QInnoDBclause55InnoDBstorageengine57queryoptimizer60listoftables39lookuptables41Rname,selection40,41relationalmodelONUPDATECASCADEclause,used56,Codd’srules957overview9primarykey40referencetables41Sreferencingtables55rulesfortablelayout40secondleveldata.Seefreefieldstechniquesizereduction62slaveservers60structure,improoving44storageengines,MySQLtoowidetable12about54uniquekey40architecture54tablestructuregeneralguideline55brand_colortable66InnoDB54,55brand_modeltable66MEMORY54brandtable65MyISAM54comments,addingtocolumnsintable51,NDB5552pluggablestorageenginearchitecture54conditiontable69surrogatekey84credit_ratetable70systemboundariesemptycolumns45identifying17,18ENUM46modelflexibility19ENUM,advantages47modulardevelopment18ENUM,disadvantages47[93] ENUMandSET,avoiding46,47vehicle_eventtable67eventtable66vehicletable67gendertable68toowidetableimprooving44about12multilingualapplication48example13organization_categorytable72example,phpMyAdmin,used14organizationtable72example,problem,examining15,16parameterstable72needfor13person_categorytable69script,forcreatingexample14persontable68road_testtable73URowStatisticssection62sale_financingtable71uniquekeysaletable70about40salutationtable69scalabilityovertime44VSET46viewssurvey_answertable74about53survey_questiontable74example53surveytable73previleges53tax_ratetable71updating54validating48uses53vehicle_categorytable67[94]

当前文档最多预览五页,下载文档查看全文

此文档下载收益归作者所有

当前文档最多预览五页,下载文档查看全文
温馨提示:
1. 部分包含数学公式或PPT动画的文件,查看预览时可能会显示错乱或异常,文件下载后无此问题,请放心下载。
2. 本文档由用户上传,版权归属用户,天天文库负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容,确认文档内容符合您的需求后进行下载,若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误,付费完成后未能成功下载的用户请联系客服处理。
大家都在看
近期热门
关闭