C++ AMP

  • d8_816760
    了解作者
  • 72.3MB
    文件大小
  • rar
    文件格式
  • 0
    收藏次数
  • VIP专享
    资源类型
  • 0
    下载次数
  • 2022-04-05 09:57
    上传日期
C++ AMP
C++ AMP.rar
  • BinomialOptions.zip
    5.2KB
  • tips.txt
    2KB
  • ampfft-1.0.zip
    2.4MB
  • amp_algorithms-0.9.2.zip
    49.3KB
  • ampblas-1.0.zip
    75.4MB
  • amp_rng-1.0.zip
    5.2MB
  • CUDAInterOp.zip
    8.5KB
  • HIPS02.pdf
    389.8KB
  • MersenneTwister.zip
    58.6KB
  • HelloWorld_AMP.zip
    1.4KB
  • concrt_samples_v0.4_.zip
    2.3MB
内容介绍
<html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta charset="utf-8"> <meta name="generator" content="pdf2htmlEX"> <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"> <link rel="stylesheet" href="https://static.pudn.com/base/css/base.min.css"> <link rel="stylesheet" href="https://static.pudn.com/base/css/fancy.min.css"> <link rel="stylesheet" href="https://static.pudn.com/prod/directory_preview_static/624ba2368947fd5953059b5f/raw.css"> <script src="https://static.pudn.com/base/js/compatibility.min.js"></script> <script src="https://static.pudn.com/base/js/pdf2htmlEX.min.js"></script> <script> try{ pdf2htmlEX.defaultViewer = new pdf2htmlEX.Viewer({}); }catch(e){} </script> <title></title> </head> <body> <div id="sidebar" style="display: none"> <div id="outline"> </div> </div> <div id="pf1" class="pf w0 h0" data-page-no="1"><div class="pc pc1 w0 h0"><img class="bi x0 y0 w1 h1" alt="" src="https://static.pudn.com/prod/directory_preview_static/624ba2368947fd5953059b5f/bg1.jpg"><div class="c x1 y1 w2 h2"><div class="t m0 x2 h3 y2 ff1 fs0 fc0 sc0 ls0 ws0">Generating Parallel Programs from the Wavefront Design Pattern</div><div class="t m0 x3 h4 y3 ff2 fs1 fc0 sc0 ls0 ws0">John Anvik, Steve MacDonald, Duane Szafron, Jonathan Schaeffer, Steven Bromling and Kai Tan</div><div class="t m0 x4 h4 y4 ff2 fs1 fc0 sc0 ls0 ws0">Department of Computing Science, University of Alberta</div><div class="t m0 x5 h4 y5 ff2 fs1 fc0 sc0 ls0 ws0">{janvik, stevem, duane, jonathan, bromling, cavalier} @ cs.ualberta.ca</div><div class="t m0 x6 h5 y6 ff3 fs1 fc0 sc0 ls0 ws0">Abstract</div><div class="t m0 x7 h6 y7 ff2 fs2 fc0 sc0 ls1 ws0">Object-oriented <span class="_ _0"> </span>programming, <span class="_ _0"> </span>design <span class="_ _0"> </span>patterns, <span class="_ _0"> </span>and</div><div class="t m0 x6 h6 y8 ff2 fs2 fc0 sc0 ls2 ws0">frameworks <span class="_ _1"></span>are <span class="_ _1"></span>common <span class="_ _1"></span>techniques <span class="_ _1"></span>that <span class="_ _1"></span>have <span class="_ _1"></span>been <span class="_ _2"> </span>used</div><div class="t m0 x6 h6 y9 ff2 fs2 fc0 sc0 ls3 ws0">to <span class="_ _1"></span>reduce <span class="_ _3"></span>the <span class="_ _3"></span>complexity <span class="_ _1"></span>of <span class="_ _3"></span>sequential <span class="_ _1"></span>programming. <span class="_ _3"></span>We</div><div class="t m0 x6 h6 ya ff2 fs2 fc0 sc0 ls4 ws0">have <span class="_"> </span>applied <span class="_ _4"> </span>these <span class="_"> </span>techniques <span class="_ _4"> </span>to <span class="_"> </span>the <span class="_ _4"> </span>more <span class="_"> </span>difficult</div><div class="t m0 x6 h6 yb ff2 fs2 fc0 sc0 ls5 ws0">domain <span class="_ _0"> </span>of <span class="_ _5"> </span>parallel <span class="_ _0"> </span>programming. <span class="_ _5"> </span>This <span class="_ _0"> </span>paper <span class="_ _5"> </span>describes</div><div class="t m0 x6 h6 yc ff2 fs2 fc0 sc0 ls0 ws0">CO</div><div class="t m0 x8 h7 yd ff2 fs3 fc0 sc0 ls0 ws0">2</div><div class="t m0 x9 h6 yc ff2 fs2 fc0 sc0 ls0 ws0">P</div><div class="t m0 xa h7 yd ff2 fs3 fc0 sc0 ls0 ws0">3</div><div class="t m0 xb h6 yc ff2 fs2 fc0 sc0 ls6 ws0">S, <span class="_ _5"> </span>a <span class="_ _6"> </span>pattern-based <span class="_ _5"> </span>parallel <span class="_ _6"> </span>programming <span class="_ _5"> </span>system</div><div class="t m0 x6 h6 ye ff2 fs2 fc0 sc0 ls7 ws0">that <span class="_ _6"> </span>generates <span class="_ _6"> </span>parallel <span class="_ _6"> </span>programs <span class="_ _6"> </span>from <span class="_ _6"> </span>parallel <span class="_ _6"> </span>design</div><div class="t m0 x6 h6 yf ff2 fs2 fc0 sc0 ls8 ws0">patterns. <span class="_ _5"> </span>We <span class="_ _5"> </span>demonstrate <span class="_ _0"> </span>CO</div><div class="t m0 xc h7 y10 ff2 fs3 fc0 sc0 ls0 ws0">2</div><div class="t m0 xd h6 yf ff2 fs2 fc0 sc0 ls0 ws0">P</div><div class="t m0 xe h7 y10 ff2 fs3 fc0 sc0 ls0 ws0">3</div><div class="t m0 xf h6 yf ff2 fs2 fc0 sc0 ls9 ws0">S <span class="_"> </span>by <span class="_ _6"> </span>applying <span class="_ _6"> </span>a <span class="_"> </span>new</div><div class="t m0 x6 h6 y11 ff2 fs2 fc0 sc0 lsa ws0">design <span class="_"> </span>pattern <span class="_ _6"> </span>called <span class="_"> </span>the <span class="_"> </span>Wavefront <span class="_ _6"> </span>pattern <span class="_"> </span>to <span class="_"> </span>three</div><div class="t m0 x6 h6 y12 ff2 fs2 fc0 sc0 lsb ws0">problems. <span class="_ _6"> </span>We <span class="_ _6"> </span>show <span class="_ _6"> </span>that <span class="_ _6"> </span>it <span class="_ _5"> </span>is <span class="_"> </span>quick <span class="_ _6"> </span>and <span class="_ _5"> </span>easy <span class="_"> </span>to <span class="_ _6"> </span>use</div><div class="t m0 x6 h6 y13 ff2 fs2 fc0 sc0 ls0 ws0">CO</div><div class="t m0 x8 h7 y14 ff2 fs3 fc0 sc0 ls0 ws0">2</div><div class="t m0 x9 h6 y13 ff2 fs2 fc0 sc0 ls0 ws0">P</div><div class="t m0 xa h7 y14 ff2 fs3 fc0 sc0 ls0 ws0">3</div><div class="t m0 xb h6 y13 ff2 fs2 fc0 sc0 lsc ws0">S <span class="_ _3"></span>to <span class="_ _1"></span>generate <span class="_ _1"></span>structurally <span class="_ _3"></span>correct <span class="_ _1"></span>parallel <span class="_ _3"></span>programs</div><div class="t m0 x6 h6 y15 ff2 fs2 fc0 sc0 ls0 ws0">with good speed-ups on shared-memory computers.</div><div class="t m0 x6 h5 y16 ff3 fs1 fc0 sc0 ls0 ws0">1. Introduction</div><div class="t m0 x7 h6 y17 ff2 fs2 fc0 sc0 lsd ws0">Parallel <span class="_ _0"> </span>programming <span class="_ _0"> </span>potentially <span class="_ _0"> </span>offers <span class="_ _0"> </span>substantial</div><div class="t m0 x6 h6 y18 ff2 fs2 fc0 sc0 lse ws0">performance <span class="_ _0"> </span>improvements <span class="_ _5"> </span>to <span class="_ _0"> </span><span class="ls0">computationally-intensive</span></div><div class="t m0 x6 h6 y19 ff2 fs2 fc0 sc0 lsf ws0">problems. <span class="_ _0"> </span>To <span class="_ _5"> </span>realize <span class="_ _0"> </span>this <span class="_ _5"> </span>potential, <span class="_ _0"> </span>programmers <span class="_ _0"> </span>must</div><div class="t m0 x6 h6 y1a ff2 fs2 fc0 sc0 ls10 ws0">develop <span class="_ _3"></span>highly <span class="_ _3"></span>concurrent <span class="_ _3"></span>algorithms <span class="_ _3"></span>that <span class="_ _1"></span>can <span class="_ _3"></span>execute <span class="_ _3"></span>on</div><div class="t m0 x6 h6 y1b ff2 fs2 fc0 sc0 ls11 ws0">massively-parallel <span class="_ _3"></span>systems. <span class="_ _3"></span>The <span class="_ _1"></span>need <span class="_ _3"></span>for <span class="_ _3"></span>such <span class="_ _1"></span>algorithms</div><div class="t m0 x6 h6 y1c ff2 fs2 fc0 sc0 ls12 ws0">and <span class="_ _3"></span>systems <span class="_ _1"></span>arises <span class="_ _3"></span>from <span class="_ _3"></span>complex <span class="_ _1"></span>problems <span class="_ _3"></span>in <span class="_ _3"></span>fields <span class="_ _1"></span>such</div><div class="t m0 x6 h6 y1d ff2 fs2 fc0 sc0 ls13 ws0">as <span class="_ _3"></span>computational <span class="_ _1"></span>biology <span class="_ _1"></span>and <span class="_ _1"></span>chemistry. <span class="_ _3"></span>These <span class="_ _1"></span>problems</div><div class="t m0 x6 h6 y1e ff2 fs2 fc0 sc0 ls14 ws0">can <span class="_"> </span>take <span class="_"> </span>hours, <span class="_ _4"> </span>days, <span class="_"> </span>or <span class="_"> </span>weeks <span class="_"> </span>of <span class="_"> </span>processing <span class="_ _4"> </span>time.</div><div class="t m0 x6 h6 y1f ff2 fs2 fc0 sc0 ls15 ws0">Unfortunately, <span class="_"> </span>designing <span class="_"> </span>efficient, <span class="_"> </span>highly <span class="_"> </span>concurrent</div><div class="t m0 x6 h6 y20 ff2 fs2 fc0 sc0 ls16 ws0">algorithms <span class="_ _7"> </span>that <span class="_ _7"> </span>effectively <span class="_ _7"> </span>exploit <span class="_ _7"> </span>multiprocessor</div><div class="t m0 x6 h6 y21 ff2 fs2 fc0 sc0 ls17 ws0">computer <span class="_ _3"></span>systems is <span class="_ _3"></span>a daunting <span class="_ _3"></span>task <span class="_ _3"></span>that usually <span class="_ _3"></span>falls <span class="_ _8"></span>on <span class="_ _8"></span>a</div><div class="t m0 x6 h6 y22 ff2 fs2 fc0 sc0 ls18 ws0">small number <span class="_ _8"></span>of <span class="_ _8"></span>experts. <span class="_ _8"></span>While <span class="_ _8"></span>the <span class="_ _8"></span>range <span class="_ _8"></span>of <span class="_ _8"></span>problems <span class="_ _8"></span>that</div><div class="t m0 x6 h6 y23 ff2 fs2 fc0 sc0 ls19 ws0">can <span class="_ _0"> </span>benefit <span class="_ _2"> </span>from <span class="_ _0"> </span>parallelism <span class="_ _0"> </span>appears <span class="_ _0"> </span>almost <span class="_ _2"> </span>boundless,</div><div class="t m0 x6 h6 y24 ff2 fs2 fc0 sc0 ls1a ws0">the <span class="_ _3"></span>range <span class="_ _8"></span>of <span class="_ _3"></span>solutions <span class="_ _3"></span>to <span class="_ _3"></span>these <span class="_ _3"></span>problems <span class="_ _3"></span>exhibits <span class="_ _3"></span>a <span class="_ _3"></span>degree</div><div class="t m0 x6 h6 y25 ff2 fs2 fc0 sc0 ls1b ws0">of <span class="_ _0"> </span>commonality. <span class="_ _0"> </span>By <span class="_ _0"> </span>extracting <span class="_ _0"> </span>the <span class="_ _0"> </span>communication <span class="_ _0"> </span>and</div><div class="t m0 x6 h6 y26 ff2 fs2 fc0 sc0 ls1c ws0">synchronization <span class="_ _0"> </span>elements <span class="_ _0"> </span>from <span class="_ _0"> </span>these <span class="_ _0"> </span>parallel <span class="_ _0"> </span>solutions,</div><div class="t m0 x6 h6 y27 ff2 fs2 fc0 sc0 ls1d ws0">we <span class="_ _1"></span>can <span class="_ _1"></span>find <span class="_ _3"></span>common <span class="_ _1"></span>patterns <span class="_ _1"></span>in <span class="_ _1"></span>the <span class="_ _1"></span>design <span class="_ _1"></span>that <span class="_ _1"></span>captures</div><div class="t m0 x6 h6 y28 ff2 fs2 fc0 sc0 ls1e ws0">the <span class="_ _3"></span>experience <span class="_ _1"></span>in <span class="_ _3"></span>building <span class="_ _3"></span>parallel <span class="_ _1"></span>programs. <span class="_ _3"></span>This <span class="_ _3"></span>idea <span class="_ _1"></span>is</div><div class="t m0 x6 h6 y29 ff2 fs2 fc0 sc0 ls0 ws0">known in sequential programming as <span class="ff4">design patterns</span> [5].</div><div class="t m0 x7 h6 y2a ff2 fs2 fc0 sc0 ls1f ws0">In <span class="_ _2"> </span>this <span class="_ _0"> </span>paper <span class="_ _1"></span>we <span class="_ _0"> </span>describe <span class="_ _2"> </span>several <span class="_ _2"> </span>parallel <span class="_ _0"> </span>programs</div><div class="t m0 x6 h6 y2b ff2 fs2 fc0 sc0 ls20 ws0">that <span class="_ _3"></span>use <span class="_ _3"></span>wavefront <span class="_ _1"></span>computations. <span class="_ _3"></span>Each <span class="_ _3"></span>element <span class="_ _1"></span>computes</div><div class="t m0 x6 h6 y2c ff2 fs2 fc0 sc0 ls21 ws0">a <span class="_ _6"> </span>value <span class="_ _5"> </span>that <span class="_"> </span>depends <span class="_ _5"> </span>on <span class="_ _6"> </span>the <span class="_ _6"> </span>computation <span class="_ _5"> </span>of <span class="_"> </span>a <span class="_ _5"> </span>set <span class="_ _6"> </span>of</div><div class="t m0 x6 h6 y2d ff2 fs2 fc0 sc0 ls22 ws0">previous <span class="_ _3"></span>elements. <span class="_ _3"></span>The <span class="_ _1"></span>computation <span class="_ _3"></span>typically <span class="_ _1"></span>flows <span class="_ _3"></span>from</div><div class="t m0 x6 h6 y2e ff2 fs2 fc0 sc0 ls23 ws0">one <span class="_ _3"></span>region <span class="_ _3"></span>to <span class="_ _3"></span>another <span class="_ _3"></span>as <span class="_ _3"></span>shown <span class="_ _3"></span>in <span class="_ _3"></span>Figure <span class="_ _3"></span>1, <span class="_ _3"></span>and <span class="_ _3"></span>this <span class="_ _3"></span>flow</div><div class="t m0 x6 h6 y2f ff2 fs2 fc0 sc0 ls24 ws0">is <span class="_ _2"> </span>what <span class="_ _0"> </span>gives <span class="_ _1"></span>the <span class="_ _0"> </span><span class="ls25">wavefront <span class="_ _1"></span>its <span class="_ _2"> </span>name. <span class="_ _2"> </span>In <span class="_ _2"> </span>Figure <span class="_ _2"> </span>1, <span class="_ _2"> </span>each</span></div><div class="t m0 x6 h6 y30 ff2 fs2 fc0 sc0 ls26 ws0">element <span class="_ _3"></span>depends <span class="_ _1"></span>on <span class="_ _3"></span>the <span class="_ _1"></span>values <span class="_ _3"></span>to <span class="_ _1"></span>its <span class="_ _3"></span>north <span class="_ _1"></span>(N), <span class="_ _3"></span>west <span class="_ _1"></span>(W)</div><div class="t m0 x6 h6 y31 ff2 fs2 fc0 sc0 ls27 ws0">and <span class="_ _0"> </span>northwest <span class="_ _2"> </span>(NW). <span class="_ _0"> </span>The <span class="_ _2"> </span><span class="ls28">wavefront <span class="_ _2"> </span>frontier <span class="_ _2"> </span>is <span class="_ _2"> </span>denoted</span></div><div class="t m0 x6 h6 y32 ff2 fs2 fc0 sc0 ls29 ws0">by <span class="_ _5"> </span>the <span class="_ _5"> </span>thick <span class="_ _6"> </span>black <span class="_ _0"> </span><span class="ls2a">stair-case <span class="_ _5"> </span>line. <span class="_ _5"> </span>At <span class="_ _0"> </span>the <span class="_ _5"> </span>point <span class="_ _5"> </span>of <span class="_ _5"> </span>the</span></div><div class="t m0 x6 h6 y33 ff2 fs2 fc0 sc0 ls2b ws0">computation <span class="_ _8"></span>of <span class="_ _8"></span>Figure <span class="_ _3"></span>1, <span class="_ _8"></span>elements <span class="_ _3"></span>above <span class="_ _8"></span>the <span class="_ _3"></span>frontier <span class="_ _8"></span>have</div><div class="t m0 x6 h6 y34 ff2 fs2 fc0 sc0 ls2c ws0">been <span class="_ _9"> </span>computed <span class="_ _9"> </span>and <span class="_ _9"> </span>elements <span class="_ _9"> </span>below <span class="_ _9"> </span>it <span class="_ _9"> </span>have <span class="_ _9"> </span>not.</div><div class="t m0 x6 h6 y35 ff2 fs2 fc0 sc0 ls2d ws0">Concurrency <span class="_ _7"> </span>can <span class="_ _7"> </span>be <span class="_ _7"> </span>obtained <span class="_ _7"> </span>by <span class="_ _7"> </span>using <span class="_ _7"> </span>different</div><div class="t m0 x6 h6 y36 ff2 fs2 fc0 sc0 ls2e ws0">processors to <span class="_ _3"></span>compute <span class="_ _8"></span>multiple <span class="_ _8"></span>elements <span class="_ _8"></span>at <span class="_ _8"></span>the <span class="_ _8"></span>same <span class="_ _8"></span>time,</div><div class="t m0 x10 h6 y37 ff2 fs2 fc0 sc0 ls2f ws0">as <span class="_ _2"> </span>long <span class="_ _0"> </span>as <span class="_ _1"></span>each <span class="_ _0"> </span>element <span class="_ _1"> </span>is <span class="_ _0"> </span>computed <span class="_ _2"> </span>after <span class="_ _2"> </span>the <span class="_ _0"> </span>elements</div><div class="t m0 x10 h6 y38 ff2 fs2 fc0 sc0 ls30 ws0">that <span class="_ _0"> </span>it <span class="_ _5"> </span>depends <span class="_ _0"> </span>on. <span class="_ _5"> </span>For <span class="_ _5"> </span>example, <span class="_ _0"> </span>if <span class="_ _5"> </span>4 <span class="_ _0"> </span>processors <span class="_ _5"> </span>were</div><div class="t m0 x10 h6 y39 ff2 fs2 fc0 sc0 ls31 ws0">available, <span class="_ _6"> </span>4 <span class="_ _5"> </span>of <span class="_ _6"> </span>the <span class="_ _5"> </span>5 <span class="_ _6"> </span>shaded <span class="_ _6"> </span>elements <span class="_ _5"> </span>just <span class="_ _6"> </span>below <span class="_ _5"> </span>the</div><div class="t m0 x10 h6 y3a ff2 fs2 fc0 sc0 ls0 ws0">frontier could be computed concurrently.</div></div><div class="c x11 y3b w3 h8"><div class="t m0 x12 h9 y3c ff5 fs4 fc0 sc0 ls0 ws0">3</div><div class="t m0 x12 h9 y3d ff5 fs4 fc0 sc0 ls0 ws0">4</div><div class="t m0 x13 ha y3e ff6 fs5 fc0 sc0 ls32 ws0">NW</div><div class="t m0 x13 ha y3f ff6 fs5 fc0 sc0 ls0 ws0">W</div><div class="t m0 x14 ha y3e ff6 fs5 fc0 sc0 ls0 ws0">N</div><div class="t m0 x15 h9 y40 ff5 fs4 fc0 sc0 ls0 ws0">1</div><div class="t m0 x15 h9 y3d ff5 fs4 fc0 sc0 ls0 ws0">2</div><div class="t m0 x15 h9 y41 ff5 fs4 fc0 sc0 ls0 ws0">3</div><div class="t m0 x15 h9 y42 ff5 fs4 fc0 sc0 ls0 ws0">4</div><div class="t m0 x16 h9 y41 ff5 fs4 fc0 sc0 ls0 ws0">4</div><div class="t m0 x17 h9 y3c ff5 fs4 fc0 sc0 ls0 ws0">4</div><div class="t m0 x17 h9 y3d ff5 fs4 fc0 sc0 ls0 ws0">5</div><div class="t m0 x12 h9 y41 ff5 fs4 fc0 sc0 ls0 ws0">5</div><div class="t m0 x16 h9 y42 ff5 fs4 fc0 sc0 ls0 ws0">5<span class="_ _a"> </span>6</div><div class="t m0 x17 h9 y41 ff5 fs4 fc0 sc0 ls0 ws0">6</div><div class="t m0 x17 h9 y42 ff5 fs4 fc0 sc0 ls0 ws0">7</div></div><div class="c x1 y1 w2 h2"><div class="t m0 x18 hb y43 ff3 fs2 fc0 sc0 ls0 ws0">Figure 1: A wavefront computation.</div><div class="t m0 x19 h6 y44 ff2 fs2 fc0 sc0 ls33 ws0">To <span class="_ _0"> </span>increase <span class="_ _0"> </span>the <span class="_ _0"> </span>granularity <span class="_ _0"> </span>of <span class="_ _0"> </span>the <span class="_ _0"> </span>computation <span class="_ _2"> </span>for</div><div class="t m0 x10 h6 y45 ff2 fs2 fc0 sc0 ls34 ws0">each <span class="_ _2"> </span>processor, <span class="_ _0"> </span>the <span class="_ _2"> </span>individual <span class="_ _0"> </span>elements <span class="_ _1"></span>can <span class="_ _0"> </span>be <span class="_ _2"> </span>grouped</div><div class="t m0 x10 h6 y46 ff2 fs2 fc0 sc0 ls35 ws0">into <span class="_ _0"> </span>larger <span class="_ _2"> </span>blocks <span class="_ _0"> </span>and <span class="_ _0"> </span>each <span class="_ _2"> </span>block <span class="_ _0"> </span>can <span class="_ _0"> </span>be <span class="_ _2"> </span>assigned <span class="_ _0"> </span>to <span class="_ _2"> </span>a</div><div class="t m0 x10 h6 y47 ff2 fs2 fc0 sc0 ls36 ws0">processor. <span class="_ _1"></span>For <span class="_ _1"></span>example, <span class="_ _2"> </span>Figure <span class="_ _1"></span>1 <span class="_ _2"> </span>shows <span class="_ _1"></span>16 <span class="_ _2"> </span>blocks, <span class="_ _1"></span>each</div><div class="t m0 x10 h6 y48 ff2 fs2 fc0 sc0 ls37 ws0">containing <span class="_ _1"></span>6x6 <span class="_ _2"> </span>= <span class="_ _1"></span>36 <span class="_ _2"> </span>elements. <span class="_ _1"></span>However, <span class="_ _2"> </span>in <span class="_ _1"></span>evaluating <span class="_ _2"> </span>a</div><div class="t m0 x10 h6 y49 ff2 fs2 fc0 sc0 ls38 ws0">block, <span class="_ _5"> </span>elements <span class="_ _6"> </span>on <span class="_ _6"> </span>the <span class="_ _5"> </span>boundary <span class="_ _6"> </span>require <span class="_ _5"> </span>values <span class="_ _6"> </span>from</div><div class="t m0 x10 h6 y4a ff2 fs2 fc0 sc0 ls39 ws0">adjacent <span class="_ _6"> </span>blocks. <span class="_ _6"> </span>This <span class="_ _6"> </span>boundary <span class="_ _6"> </span>exchange <span class="_ _6"> </span>defines <span class="_ _6"> </span>the</div><div class="t m0 x10 h6 y4b ff2 fs2 fc0 sc0 ls3a ws0">communication <span class="_ _b"> </span>and <span class="_ _b"> </span>synchronization <span class="_ _b"> </span>structure. <span class="_ _b"> </span>The</div><div class="t m0 x10 h6 y4c ff2 fs2 fc0 sc0 ls3b ws0">numbers <span class="_ _8"></span>in <span class="_ _8"></span>Figure <span class="_ _8"></span>1 <span class="_ _8"></span>show <span class="_ _8"></span>the <span class="_ _8"></span>concurrency. <span class="_ _8"></span>In <span class="_ _8"></span>time <span class="_ _8"></span>step <span class="_ _8"></span>1,</div><div class="t m0 x10 h6 y4d ff2 fs2 fc0 sc0 ls3c ws0">only <span class="_ _3"></span>the <span class="_ _1"></span>single <span class="_ _3"></span>block <span class="_ _1"></span>labeled <span class="_ _3"></span>1 <span class="_ _1"></span>can <span class="_ _3"></span>be <span class="_ _1"></span>computed. <span class="_ _3"></span>In <span class="_ _1"></span>time</div><div class="t m0 x10 h6 y4e ff2 fs2 fc0 sc0 ls3d ws0">step <span class="_ _8"></span>2, <span class="_ _8"></span>the <span class="_ _8"></span>two <span class="_ _3"></span>blocks <span class="_ _8"></span>labeled <span class="_ _8"></span>2 <span class="_ _8"></span>can <span class="_ _3"></span>be <span class="_ _8"></span>computed, <span class="_ _8"></span>each <span class="_ _3"></span>by</div><div class="t m0 x10 h6 y4f ff2 fs2 fc0 sc0 ls3e ws0">a <span class="_ _3"></span>different <span class="_ _3"></span>processor. <span class="_ _3"></span>In <span class="_ _1"></span>time <span class="_ _3"></span>step <span class="_ _3"></span>3, <span class="_ _3"></span>three <span class="_ _1"></span>processors <span class="_ _3"></span>can</div><div class="t m0 x10 h6 y50 ff2 fs2 fc0 sc0 ls3f ws0">be <span class="_ _1"></span>used. <span class="_ _1"> </span>In <span class="_ _2"> </span>time <span class="_ _1"></span>step <span class="_ _2"> </span>4, <span class="_ _1"></span>all <span class="_ _2"> </span>4 <span class="_ _1"></span>processors <span class="_ _1"> </span>can <span class="_ _2"> </span>be <span class="_ _1"> </span>used. <span class="_ _2"> </span>In</div><div class="t m0 x10 h6 y51 ff2 fs2 fc0 sc0 ls40 ws0">time <span class="_ _3"></span>steps <span class="_ _8"></span>5, <span class="_ _3"></span>6 <span class="_ _3"></span>and <span class="_ _3"></span>7, <span class="_ _3"></span>the <span class="_ _3"></span>number <span class="_ _3"></span>of <span class="_ _3"></span>processors <span class="_ _3"></span>used <span class="_ _3"></span>is <span class="_ _3"></span>3,</div><div class="t m0 x10 h6 y52 ff2 fs2 fc0 sc0 ls41 ws0">2 <span class="_ _0"> </span>and <span class="_ _0"> </span>1 <span class="_ _2"> </span>respectively. <span class="_ _0"> </span>To <span class="_ _0"> </span>increase <span class="_ _0"> </span>processor <span class="_ _2"> </span>utilization,</div><div class="t m0 x10 h6 y53 ff2 fs2 fc0 sc0 ls42 ws0">more <span class="_ _1"></span>blocks <span class="_ _1"></span>could <span class="_ _1"></span>be <span class="_ _1"></span>created. <span class="_ _1"></span>However, <span class="_ _1"></span>this <span class="_ _1"></span>will <span class="_ _1"></span>reduce</div><div class="t m0 x10 h6 y54 ff2 fs2 fc0 sc0 ls0 ws0">the granularity of each block.</div><div class="t m0 x19 h6 y55 ff2 fs2 fc0 sc0 ls43 ws0">Wavefront <span class="_ _4"> </span>computations <span class="_ _c"> </span>form <span class="_ _4"> </span>a <span class="_ _c"> </span>pattern <span class="_ _4"> </span>that <span class="_ _c"> </span>is</div><div class="t m0 x10 h6 y56 ff2 fs2 fc0 sc0 ls44 ws0">recognized <span class="_ _1"></span>by <span class="_ _2"> </span>experienced <span class="_ _2"> </span>parallel <span class="_ _2"> </span>programmers, <span class="_ _2"> </span>with <span class="_ _1"> </span>a</div><div class="t m0 x10 h6 y57 ff2 fs2 fc0 sc0 ls45 ws0">few <span class="_ _2"> </span>details <span class="_ _2"> </span>that <span class="_ _d"> </span>vary <span class="_ _2"> </span>from <span class="_ _2"> </span>application <span class="_ _d"> </span>to <span class="_ _2"> </span>application. <span class="_ _2"> </span>In</div><div class="t m0 x10 h6 y58 ff2 fs2 fc0 sc0 ls46 ws0">object-oriented <span class="_ _1"></span>computing, <span class="_ _1"> </span>design <span class="_ _2"> </span>constructs <span class="_ _1"></span>that <span class="_ _2"> </span>can <span class="_ _1"></span>be</div><div class="t m0 x10 h6 y59 ff2 fs2 fc0 sc0 ls47 ws0">re-used <span class="_"> </span>between <span class="_"> </span>applications <span class="_"> </span>are <span class="_"> </span>often <span class="_"> </span>expressed <span class="_"> </span>as</div><div class="t m0 x10 h6 y5a ff2 fs2 fc0 sc0 ls48 ws0">design <span class="_ _8"></span>patterns <span class="_ _3"></span>[5], <span class="_ _8"></span>which <span class="_ _3"></span>capture <span class="_ _8"></span>design <span class="_ _3"></span>experience <span class="_ _8"></span>at <span class="_ _3"></span>an</div><div class="t m0 x10 h6 y5b ff2 fs2 fc0 sc0 ls49 ws0">abstract <span class="_ _4"> </span>level. <span class="_ _4"> </span>By <span class="_ _c"> </span>their <span class="_ _4"> </span>nature, <span class="_ _c"> </span>design <span class="_ _4"> </span>patterns <span class="_ _4"> </span>are</div><div class="t m0 x10 h6 y5c ff2 fs2 fc0 sc0 ls4a ws0">applicable <span class="_ _0"> </span>to <span class="_ _0"> </span>different <span class="_ _0"> </span>problem <span class="_ _5"> </span>domains, <span class="_ _0"> </span>each <span class="_ _0"> </span>with <span class="_ _0"> </span>its</div><div class="t m0 x10 h6 y5d ff2 fs2 fc0 sc0 ls4b ws0">own <span class="_ _5"> </span>individual <span class="_ _0"> </span>characteristics <span class="_ _5"> </span>and <span class="_ _5"> </span>concerns. <span class="_ _5"> </span>A <span class="_ _5"> </span>design</div><div class="t m0 x10 h6 y5e ff2 fs2 fc0 sc0 ls4c ws0">pattern <span class="_ _1"></span>is <span class="_ _2"> </span>a <span class="_ _1"></span>description <span class="_ _2"> </span>of <span class="_ _1"> </span>a <span class="_ _2"> </span>solution <span class="_ _2"> </span>to <span class="_ _1"></span>a <span class="_ _2"> </span>general <span class="_ _2"> </span>design</div><div class="t m0 x10 h6 y5f ff2 fs2 fc0 sc0 ls4d ws0">problem <span class="_ _1"></span>that <span class="_ _2"> </span>must <span class="_ _1"> </span>be <span class="_ _2"> </span>adapted <span class="_ _2"> </span>for <span class="_ _1"></span>each <span class="_ _2"> </span>use. <span class="_ _2"> </span>Once <span class="_ _1"></span>a <span class="_ _2"> </span>user</div><div class="t m0 x10 h6 y60 ff2 fs2 fc0 sc0 ls4e ws0">elects <span class="_ _3"></span>to <span class="_ _1"></span>use <span class="_ _1"></span>a <span class="_ _3"></span>design <span class="_ _1"></span>pattern, <span class="_ _1"></span>most <span class="_ _3"></span>of <span class="_ _1"></span>the <span class="_ _1"></span>basic <span class="_ _3"></span>structure</div><div class="t m0 x10 h6 y61 ff2 fs2 fc0 sc0 ls4f ws0">of <span class="_"> </span>the <span class="_ _6"> </span>application <span class="_ _6"> </span>can <span class="_ _6"> </span>be <span class="_"> </span>inferred. <span class="_ _6"> </span>We <span class="_ _6"> </span>want <span class="_"> </span>to <span class="_ _6"> </span>take</div><div class="t m0 x10 h6 y62 ff2 fs2 fc0 sc0 ls0 ws0">advantage of this knowledge in a more concrete manner.</div></div></div><div class="pi" data-data='{"ctm":[1.568627,0.000000,0.000000,1.568627,0.000000,0.000000]}'></div></div> </body> </html>
评论
    相关推荐
    • C++ Primer
      C++经典教程,其内容是C++大师Stanley B. Lippman丰富的实践经验和C++标准委员会原负责人Josée Lajoie对C++标准深入理解的完美结合,已经帮助全球无数程序员学会了C++。 对C++基本概念和技术全面而且权威的阐述,对...
    • c++课件
      c++课件c++课件c++课件c++课件c++课件c++课件c++课件c++课件c++课件c++课件c++课件c++课件c++课件
    • C++ PRrimer
      本书是久负盛名的C++经典教程,其内容是C++大师Stanley B. Lippman丰富的实践经验和C++标准委员会原负责人Josée Lajoie对C++标准深入理解的完美结合,已经帮助全球无数程序员学会了C++。本版对前一版进行了彻底的...
    • C++
      C++C++C++C++C++C++C++C++C++C++C++C++C++C++C++C++C++C++C++C++C++C++C++C++C++C++C++C++C++C++
    • C++ primer
      本文档具有C++ primer 以及 C++ primer 标准答案各一份,内容清晰充实!希望与热爱C++的学友们一起同舟共济,努力学习!
    • C++ primer
      本书是久负盛名的C++经典教程,其内容是C++大师Stanley B. Lippman丰富的实践经验和C++标准委员会原负责人Josée Lajoie对C++标准深入理解的完美结合,已经帮助全球无数程序员学会了C++。本版对前一版进行了彻底的...
    • c++yuyanbiancheng
      这是C和C++集成的编程环境!这是C和C++集成的编程环境!这是C和C++集成的编程环境!这是C和C++集成的编程环境!这是C和C++集成的编程环境!
    • effective C++
      有关C++编程方面的检验性介绍,对由C转向C++,和有C++编程基础的程序员有帮助,不过是英文版
    • C++ Primer
      这本处适合各个阶段的C++程序员,这本书可以帮助初学者快速入门,里面有最实用,最容易理解的代码;同时这也是有经验的C++程序员最好的一本参考手册
    • C++ Primer
      本书是久负盛名的C++经典教程引,其内容是C++大师Stanley B. Lippman丰富的实践经验和C++标准委员会原负责人Josée Lajoie对C++标准深入理解的完美结合,已经帮助全球无数程序员学会了C++