Performance Tuning for CPU(Marat Dukhan).pdf

Performance Tuning for CPU(Marat Dukhan).pdf

ID:34592739

大小:1.12 MB

页数:55页

时间:2019-03-08

Performance Tuning for CPU(Marat Dukhan).pdf_第1页
Performance Tuning for CPU(Marat Dukhan).pdf_第2页
Performance Tuning for CPU(Marat Dukhan).pdf_第3页
Performance Tuning for CPU(Marat Dukhan).pdf_第4页
Performance Tuning for CPU(Marat Dukhan).pdf_第5页
资源描述:

《Performance Tuning for CPU(Marat Dukhan).pdf》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库

1、PerformanceTuningforCPUPart3:MemoryOptimizationsMaratDukhanExample0:ListTraversalsize_ttraverse_list(constlist_node*list){size_tlist_length=0;while(list!=0){list=list->next;list_length+=1;}returnlist_length;}Example0:ListTraversalsize_ttraverse_list(constlist_node*list){size_tlist_l

2、ength=0;while(list!=0){list=listO(N)?->next;list_length+=1;}returnlist_length;}Example0:ListTraversalsize_ttraverse_list(constlist_node*list){size_tlist_lengthO(N)?=0;while(list!=0){list=list->next;O(anything)!list_length+=1;}returnlist_length;}Example0:ListTraversalMemorySubsystem:

3、OverviewImagefromwww.anandtech.com/show/2960/2MemorySubsystem:OverviewImagefromwww.anandtech.com/show/2658MemorySubsystem:OverviewImagefromwww.anandtech.com/show/2960/2CacheHierarchy:OverviewThreelevelsofcache●Level1(thefastest,percore)●Level2(percore)●Level3(akaLast-LevelCacheinInt

4、eldocs,sharedbetweenallcores)Imagefromwww.anandtech.com/show/2594/9CacheHierarchy:Level1●Separatedataandinstructioncaches●L1datacache:○Almostasfastasregisters○Thelowestlatencyofallcaches○Thehighestbandwidth○Throughput:atleast1loadpercycle■Oftenmore●1load+1storeonNehalem○Latency:3-4c

5、ycles●L1instructioncache:○Intendedonlyformachinecode(notdata)■Read-only■DonotmixcodeanddatainthesamesectionCacheHierarchy:PerformanceRememberlatencyandthroughput●Latency=howlongtowaitfortheresult●Throughput=howmuchworkpersecondorCPUcyclecanbedoneBacktothememoryhierarchy●Latency=then

6、umberofcycles(ornanoseconds)tobringtherequesteddata●Throughput=howmanymegabytespersecondcanbereadorwrittenCacheHierarchy:LatencyLatencyalwaysincreasesfromlower-levelcachetoRAM.E.g.onNehalem:●4cyclesforL1cache●10cyclesforL2cache●17cyclesforL3cache●198(!)cyclesforRAMThisdataisfromuser

7、s.atw.hu/instlatx64/GenuineIntel00206C1_Gulftown_MemLatX64.txtCacheHierarchy:ThroughputThroughputdecreasesfromlower-levelcachetoRAM.However,thechangesarenotassharpaswithlatency.E.g.onNehalem:●32bytes/cycleforL1○116-byteread+116-bytewrite●32bytes/cycleforL2○Onaverage○Delivers64bytese

8、veryothercycle●Unkn

当前文档最多预览五页,下载文档查看全文

此文档下载收益归作者所有

当前文档最多预览五页,下载文档查看全文
温馨提示:
1. 部分包含数学公式或PPT动画的文件,查看预览时可能会显示错乱或异常,文件下载后无此问题,请放心下载。
2. 本文档由用户上传,版权归属用户,天天文库负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容,确认文档内容符合您的需求后进行下载,若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误,付费完成后未能成功下载的用户请联系客服处理。