欢迎来到天天文库
浏览记录
ID:43753250
大小:539.42 KB
页数:67页
时间:2019-10-13
《利用Hadoop构建云计算基础教程》由会员上传分享,免费在线阅读,更多相关内容在工程资料-天天文库。
1、BigDalaplacetjAl***■八JTV■ti&*»丁fl\k/Jpj73•J^^—<、.4=•Home•BigData•HadoopTutorials•Cassandra•HectorAPI•RequestTutorial•AboutLABELS:HADOOP-TUTORIAL,HDFS3OCTOBER2013HadoopTutorial:Part1-WhatisHadoop?(anOverview)Hadoopisanopensourcesoftwareframeworkthatsupportsdataintensivedistributedapplicationsw
2、hichislicensedunderApachev2license.At-leastthisiswhatyouaregoingtofindasthefirstlineofdefinitiononHadoopinWikipedia.Sowhatisdataintensivedistributedapplications?WelldataintensiveisnothingbutBigData(datathathasoutgrowninsize)anddistributedapplicationsaretheapplicationsthatworksonnetworkbycommunic
3、atingandcoordinatingw让heachotherbypassingmessages.(sayusingaRPCinterprocesscommunicationorthroughMessage-Queue)HenceHadoopworksonadistributedenvironmentandisbuildtostore,handleandprocesslargeamountofdataset(inpetabytes,exabyteandmore).Nowheresinceiamsayingthathadoopstorespetabytesofdata,thisdoes
4、n'tmeanthatHadoopisadatabase.Againremember让saframeworkthathandleslargeamountofdataforprocessing.YouwillgettoknowthedifferencebetweenHadoopandDatabases(orNoSQLDatabases,wellthat'swhatwecallBigDatafsdatabases)asyougodownthelineinthecomingtutorials.HadoopwasderivedfromtheresearchpaperpublishedbyGoo
5、gleonGoogleFileSystem(GFS)andGoogle'sMapReduce・SotherearetwointegralpartsofHadoop:HadoopDistributedFileSystem(HDFS)andHadoopMapReduce.HadoopDistributedFileSystem(HDFS)HDFSisafilesystemdesignedforstoringverylargefileswithstreamingdataaccesspatterns,runningonclustersofcommodityhardware.WellLetsget
6、intothedetailsofthestatementmentionedabove:VeryLargefiles:Nowwhenwesayverylargefileswemeanherethatthesizeofthefilewillbeinarangeofgigabyte,terabyte,petabyteormaybemore.Streamingdataaccess:HDFSisbuiltaroundtheideathatthemostefficientdataprocessingpatternisawilte-oncezread-many-timespattern.Adatas
7、etistypicallygeneratedorcopiedfromsource,andthenvariousanalysesareperformedonthatdatasetovertime.Eachanalysiswillinvolvealargeproportion,ifnotall,ofthedataset,sothetimetoreadthewholedatasetismoreimportantthanthelatencyinread
此文档下载收益归作者所有