计算机视觉目标检测相关论文集合

  • k5_898038
    了解作者
  • 38.4MB
    文件大小
  • rar
    文件格式
  • 0
    收藏次数
  • VIP专享
    资源类型
  • 0
    下载次数
  • 2022-06-16 01:19
    上传日期
整理了最新最成熟的计算机视觉之目标检测论文,如R-CNN,Fast-R-CNN,Faster-R-CNN,SSD,YOLO,AttentionNet等等。
目标检测.rar
  • 目标检测
  • 物体检测论文
  • Scalable Object Detection using Deep Neural Networks.pdf
    4.2MB
  • adversarial_object_detection.pdf
    1.6MB
  • Girshick_Fast_R-CNN_ICCV_2015_paper.pdf
    575.4KB
  • AttentionNet_ Aggregating Weak Directions for Accurate Object Detection.pdf
    2.1MB
  • AttentionNet_ Aggregating Weak Directions for Accurate Object Detection20180108.pdf
    2MB
  • Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition.pdf
    4MB
  • Faster R-CNN_ Towards Real-Time Object Detection with Region Proposal Networks.pdf
    6.6MB
  • You Only Look Once_ Unified, Real-Time Object Detection.pdf
    5.1MB
  • R-CNN.pdf
    6.2MB
  • ssd.pdf
    2.2MB
  • Fast R-CNN.pdf
    714KB
  • Scalable High Quality Object Detection.pdf
    406.1KB
  • Speed accuracy trade-offs for modern convolutional object detectors.pdf
    7.9MB
  • CNN Features off-the-shelf_ an Astounding Baseline for Recognition.pdf
    327.2KB
内容介绍
<html xmlns="http://www.w3.org/1999/xhtml"><head><meta charset="utf-8"><meta name="generator" content="pdf2htmlEX"><meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"><link rel="stylesheet" href="https://csdnimg.cn/release/download_crawler_static/css/base.min.css"><link rel="stylesheet" href="https://csdnimg.cn/release/download_crawler_static/css/fancy.min.css"><link rel="stylesheet" href="https://csdnimg.cn/release/download_crawler_static/10200406/raw.css"><script src="https://csdnimg.cn/release/download_crawler_static/js/compatibility.min.js"></script><script src="https://csdnimg.cn/release/download_crawler_static/js/pdf2htmlEX.min.js"></script><script>try{pdf2htmlEX.defaultViewer = new pdf2htmlEX.Viewer({});}catch(e){}</script><title></title></head><body><div id="sidebar" style="display: none"><div id="outline"></div></div><div id="pf1" class="pf w0 h0" data-page-no="1"><div class="pc pc1 w0 h0"><img class="bi x0 y0 w1 h1" alt="" src="https://csdnimg.cn/release/download_crawler_static/10200406/bg1.jpg"><div class="t m0 x1 h2 y1 ff1 fs0 fc0 sc0 ls0 ws0">Speed/accuracy<span class="_"> </span>trade-offs<span class="_"> </span>f<span class="_ _0"></span>or<span class="_"> </span>modern<span class="_"> </span>con<span class="_ _1"></span>volutional<span class="_"> </span>object<span class="_"> </span>detectors</div><div class="t m0 x2 h3 y2 ff2 fs1 fc0 sc0 ls0 ws0">Jonathan<span class="_"> </span>Huang<span class="_ _2"> </span>V<span class="_ _1"></span>iv<span class="_ _0"></span>ek<span class="_"> </span>Rathod<span class="_ _2"> </span>Chen<span class="_"> </span>Sun<span class="_ _2"> </span>Menglong<span class="_"> </span>Zhu<span class="_ _2"> </span>Anoop<span class="_"> </span>K<span class="_ _0"></span>orattikara</div><div class="t m0 x3 h3 y3 ff2 fs1 fc0 sc0 ls0 ws0">Alireza<span class="_"> </span>Fathi<span class="_ _3"> </span>Ian<span class="_"> </span>Fischer<span class="_ _3"> </span>Zbigniew<span class="_"> </span>W<span class="_ _4"></span>ojna<span class="_ _3"> </span>Y<span class="_ _4"></span>ang<span class="_"> </span>Song<span class="_ _5"> </span>Sergio<span class="_"> </span>Guadarrama</div><div class="t m0 x4 h3 y4 ff2 fs1 fc0 sc0 ls0 ws0">K<span class="_ _0"></span>evin<span class="_"> </span>Murph<span class="_ _0"></span>y</div><div class="t m0 x5 h3 y5 ff2 fs1 fc0 sc0 ls0 ws0">Google<span class="_"> </span>Research</div><div class="t m0 x6 h4 y6 ff1 fs1 fc0 sc0 ls0 ws0">Abstract</div><div class="t m0 x7 h5 y7 ff3 fs2 fc0 sc0 ls0 ws0">The<span class="_ _6"> </span>goal<span class="_ _6"> </span>of<span class="_ _6"> </span>this<span class="_ _6"> </span>paper<span class="_ _6"> </span>is<span class="_ _6"> </span>to<span class="_ _6"> </span>serve<span class="_ _6"> </span>as<span class="_ _6"> </span>a<span class="_ _6"> </span>guide<span class="_ _6"> </span>for<span class="_ _6"> </span>se-</div><div class="t m0 x8 h5 y8 ff3 fs2 fc0 sc0 ls0 ws0">lecting<span class="_ _7"> </span>a<span class="_ _7"> </span>detection<span class="_ _7"> </span>ar<span class="_ _0"></span>chitectur<span class="_ _0"></span>e<span class="_ _7"> </span>that<span class="_ _7"> </span>ac<span class="_ _0"></span>hieves<span class="_ _7"> </span>the<span class="_ _7"> </span>right</div><div class="t m0 x8 h5 y9 ff3 fs2 fc0 sc0 ls0 ws0">speed/memory/accuracy<span class="_ _8"> </span>balance<span class="_ _8"> </span>for<span class="_ _9"> </span>a<span class="_ _8"> </span>given<span class="_ _9"> </span>application</div><div class="t m0 x8 h5 ya ff3 fs2 fc0 sc0 ls0 ws0">and<span class="_ _a"> </span>platform.<span class="_ _7"> </span>T<span class="_ _4"></span>o<span class="_ _a"> </span>this<span class="_ _a"> </span>end,<span class="_ _b"> </span>we<span class="_ _a"> </span>in<span class="_ _0"></span>vestigate<span class="_ _a"> </span>various<span class="_ _a"> </span>ways<span class="_ _a"> </span>to</div><div class="t m0 x8 h5 yb ff3 fs2 fc0 sc0 ls0 ws0">trade<span class="_ _c"> </span>accuracy<span class="_ _c"> </span>for<span class="_ _c"> </span>speed<span class="_ _d"> </span>and<span class="_ _c"> </span>memory<span class="_ _d"> </span>usage<span class="_ _c"> </span>in<span class="_ _c"> </span>modern<span class="_ _d"> </span>con-</div><div class="t m0 x8 h5 yc ff3 fs2 fc0 sc0 ls0 ws0">volutional<span class="_"> </span>object<span class="_"> </span>detection<span class="_ _c"> </span>systems.<span class="_ _a"> </span>A<span class="_"> </span>number<span class="_"> </span>of<span class="_ _c"> </span>successful</div><div class="t m0 x8 h5 yd ff3 fs2 fc0 sc0 ls0 ws0">systems<span class="_ _e"> </span>have<span class="_ _e"> </span>been<span class="_ _e"> </span>pr<span class="_ _0"></span>oposed<span class="_"> </span>in<span class="_ _e"> </span>recent<span class="_"> </span>years,<span class="_ _e"> </span>but<span class="_"> </span>apples-to-</div><div class="t m0 x8 h5 ye ff3 fs2 fc0 sc0 ls0 ws0">apples<span class="_ _a"> </span>comparisons<span class="_ _a"> </span>are<span class="_ _a"> </span>dif<span class="_ _0"></span>&#64257;cult<span class="_ _a"> </span>due<span class="_ _a"> </span>to<span class="_ _a"> </span>differ<span class="_ _0"></span>ent<span class="_ _a"> </span>base<span class="_ _a"> </span>fea-</div><div class="t m0 x8 h5 yf ff3 fs2 fc0 sc0 ls0 ws0">tur<span class="_ _0"></span>e<span class="_ _6"> </span>e<span class="_ _0"></span>xtractors<span class="_ _6"> </span>(<span class="_ _0"></span>e.g.,<span class="_ _6"> </span>VGG,<span class="_ _6"> </span>Residual<span class="_ _6"> </span>Networks),<span class="_ _8"> </span>differ<span class="_ _1"></span>ent</div><div class="t m0 x8 h5 y10 ff3 fs2 fc0 sc0 ls0 ws0">default<span class="_ _c"> </span>image<span class="_ _d"> </span>r<span class="_ _0"></span>esolutions,<span class="_"> </span>as<span class="_ _c"> </span>well<span class="_ _d"> </span>as<span class="_ _c"> </span>differ<span class="_ _0"></span>ent<span class="_ _d"> </span>har<span class="_ _0"></span>dwar<span class="_ _0"></span>e<span class="_ _c"> </span>and</div><div class="t m0 x8 h5 y11 ff3 fs2 fc0 sc0 ls0 ws0">softwar<span class="_ _0"></span>e<span class="_"> </span>platforms.<span class="_ _a"> </span>W<span class="_ _1"></span>e<span class="_"> </span>present<span class="_"> </span>a<span class="_"> </span>uni&#64257;ed<span class="_"> </span>implementation<span class="_ _e"> </span>of</div><div class="t m0 x8 h5 y12 ff3 fs2 fc0 sc0 ls0 ws0">the<span class="_ _e"> </span>F<span class="_ _1"></span>aster<span class="_ _e"> </span>R-CNN<span class="_ _e"> </span>[<span class="fc1">31</span>],<span class="_ _e"> </span>R-FCN<span class="_ _e"> </span>[<span class="fc1">6</span>]<span class="_ _e"> </span>and<span class="_ _e"> </span>SSD<span class="_ _e"> </span>[<span class="fc1">26</span>]<span class="_ _e"> </span>systems,</div><div class="t m0 x8 h5 y13 ff3 fs2 fc0 sc0 ls0 ws0">which<span class="_ _a"> </span>we<span class="_ _b"> </span>view<span class="_ _b"> </span>as<span class="_ _a"> </span>&#8220;meta-arc<span class="_ _0"></span>hitectur<span class="_ _0"></span>es&#8221;<span class="_ _b"> </span>and<span class="_ _b"> </span>trace<span class="_ _a"> </span>out<span class="_ _b"> </span>the</div><div class="t m0 x8 h5 y14 ff3 fs2 fc0 sc0 ls0 ws0">speed/accuracy<span class="_ _f"> </span>trade-of<span class="_ _0"></span>f<span class="_ _f"> </span>curve<span class="_ _6"> </span>cr<span class="_ _0"></span>eated<span class="_ _f"> </span>by<span class="_ _6"> </span>using<span class="_ _f"> </span>alterna-</div><div class="t m0 x8 h5 y15 ff3 fs2 fc0 sc0 ls0 ws0">tive<span class="_ _c"> </span>featur<span class="_ _0"></span>e<span class="_ _c"> </span>extr<span class="_ _0"></span>actors<span class="_ _c"> </span>and<span class="_ _c"> </span>varying<span class="_ _c"> </span>other<span class="_ _c"> </span>critical<span class="_ _c"> </span>parameters</div><div class="t m0 x8 h5 y16 ff3 fs2 fc0 sc0 ls0 ws0">such<span class="_"> </span>as<span class="_"> </span>image<span class="_"> </span>size<span class="_ _e"> </span>within<span class="_ _e"> </span>each<span class="_"> </span>of<span class="_ _e"> </span>these<span class="_"> </span>meta-arc<span class="_ _0"></span>hitectur<span class="_ _0"></span>es.</div><div class="t m0 x8 h5 y17 ff3 fs2 fc0 sc0 ls0 ws0">On<span class="_ _d"> </span>one<span class="_ _c"> </span>extr<span class="_ _0"></span>eme<span class="_ _d"> </span>end<span class="_ _d"> </span>of<span class="_ _c"> </span>this<span class="_"> </span>spectrum<span class="_ _c"> </span>where<span class="_ _c"> </span>speed<span class="_"> </span>and<span class="_ _c"> </span>mem-</div><div class="t m0 x8 h5 y18 ff3 fs2 fc0 sc0 ls0 ws0">ory<span class="_ _f"> </span>ar<span class="_ _0"></span>e<span class="_ _b"> </span>critical,<span class="_ _6"> </span>we<span class="_ _f"> </span>pr<span class="_ _0"></span>esent<span class="_ _f"> </span>a<span class="_ _f"> </span>detector<span class="_ _b"> </span>that<span class="_ _f"> </span>achieves<span class="_ _b"> </span>r<span class="_ _0"></span>eal</div><div class="t m0 x8 h5 y19 ff3 fs2 fc0 sc0 ls0 ws0">time<span class="_ _a"> </span>speeds<span class="_ _a"> </span>and<span class="_ _a"> </span>can<span class="_ _a"> </span>be<span class="_ _b"> </span>deployed<span class="_ _a"> </span>on<span class="_ _a"> </span>a<span class="_ _a"> </span>mobile<span class="_ _a"> </span>device.<span class="_ _7"> </span>On</div><div class="t m0 x8 h5 y1a ff3 fs2 fc0 sc0 ls0 ws0">the<span class="_ _a"> </span>opposite<span class="_ _a"> </span>end<span class="_ _a"> </span>in<span class="_ _a"> </span>which<span class="_ _a"> </span>accuracy<span class="_ _a"> </span>is<span class="_ _a"> </span>critical,<span class="_ _a"> </span>we<span class="_ _a"> </span>present</div><div class="t m0 x8 h5 y1b ff3 fs2 fc0 sc0 ls0 ws0">a<span class="_ _e"> </span>detector<span class="_ _a"> </span>that<span class="_ _e"> </span>achieves<span class="_ _e"> </span>state-of-the-art<span class="_ _e"> </span>performance<span class="_ _a"> </span>mea-</div><div class="t m0 x8 h5 y1c ff3 fs2 fc0 sc0 ls0 ws0">sur<span class="_ _0"></span>ed<span class="_"> </span>on<span class="_"> </span>the<span class="_"> </span>COCO<span class="_"> </span>detection<span class="_"> </span>task.</div><div class="t m0 x8 h4 y1d ff1 fs1 fc0 sc0 ls0 ws0">1.<span class="_"> </span>Introduction</div><div class="t m0 x7 h6 y1e ff2 fs2 fc0 sc0 ls0 ws0">A<span class="_ _c"> </span>lot<span class="_"> </span>of<span class="_ _c"> </span>progress<span class="_ _c"> </span>has<span class="_"> </span>been<span class="_ _c"> </span>made<span class="_ _d"> </span>in<span class="_ _c"> </span>recent<span class="_"> </span>years<span class="_ _c"> </span>on<span class="_ _c"> </span>object</div><div class="t m0 x8 h6 y1f ff2 fs2 fc0 sc0 ls0 ws0">detection<span class="_ _a"> </span>due<span class="_ _b"> </span>to<span class="_ _b"> </span>the<span class="_ _a"> </span>use<span class="_ _b"> </span>of<span class="_ _b"> </span>con<span class="_ _0"></span>v<span class="_ _0"></span>olutional<span class="_ _b"> </span>neural<span class="_ _a"> </span>networks</div><div class="t m0 x8 h6 y20 ff2 fs2 fc0 sc0 ls0 ws0">(CNNs).<span class="_ _f"> </span>Modern<span class="_ _e"> </span>object<span class="_ _e"> </span>detectors<span class="_ _e"> </span>based<span class="_ _e"> </span>on<span class="_ _e"> </span>these<span class="_ _e"> </span>networks</div><div class="t m0 x8 h6 y21 ff2 fs2 fc0 sc0 ls0 ws0">&#8212;<span class="_ _e"> </span>such<span class="_ _e"> </span>as<span class="_ _e"> </span>Faster<span class="_ _e"> </span>R-CNN<span class="_ _e"> </span>[<span class="fc1">31</span>],<span class="_ _a"> </span>R-FCN<span class="_"> </span>[<span class="fc1">6</span>],<span class="_ _a"> </span>Multibox<span class="_"> </span>[<span class="fc1">40</span>],</div><div class="t m0 x8 h6 y22 ff2 fs2 fc0 sc0 ls0 ws0">SSD<span class="_ _a"> </span>[<span class="fc1">26</span>]<span class="_ _a"> </span>and<span class="_ _b"> </span>Y<span class="_ _0"></span>OLO<span class="_ _a"> </span>[<span class="fc1">29</span>]<span class="_ _a"> </span>&#8212;<span class="_ _b"> </span>are<span class="_ _a"> </span>now<span class="_ _a"> </span>good<span class="_ _a"> </span>enough<span class="_ _a"> </span>to<span class="_ _b"> </span>be</div><div class="t m0 x8 h6 y23 ff2 fs2 fc0 sc0 ls0 ws0">deployed<span class="_ _a"> </span>in<span class="_ _a"> </span>consumer<span class="_ _a"> </span>products<span class="_ _a"> </span>(e.g.,<span class="_ _a"> </span>Google<span class="_ _a"> </span>Photos,<span class="_ _a"> </span>Pin-</div><div class="t m0 x8 h6 y24 ff2 fs2 fc0 sc0 ls0 ws0">terest<span class="_ _e"> </span>V<span class="_ _0"></span>isual<span class="_ _e"> </span>Search)<span class="_ _e"> </span>and<span class="_ _e"> </span>some<span class="_ _a"> </span>ha<span class="_ _0"></span>ve<span class="_"> </span>been<span class="_ _a"> </span>sho<span class="_ _0"></span>wn<span class="_ _e"> </span>to<span class="_ _e"> </span>be<span class="_ _e"> </span>fast</div><div class="t m0 x8 h6 y25 ff2 fs2 fc0 sc0 ls0 ws0">enough<span class="_"> </span>to<span class="_"> </span>be<span class="_"> </span>run<span class="_"> </span>on<span class="_"> </span>mobile<span class="_"> </span>devices.</div><div class="t m0 x7 h6 y26 ff2 fs2 fc0 sc0 ls0 ws0">Howe<span class="_ _0"></span>v<span class="_ _0"></span>er<span class="_ _0"></span>,<span class="_ _f"> </span>it<span class="_ _f"> </span>can<span class="_ _b"> </span>be<span class="_ _f"> </span>dif<span class="_ _0"></span>&#64257;cult<span class="_ _b"> </span>for<span class="_ _f"> </span>practitioners<span class="_ _b"> </span>to<span class="_ _f"> </span>decide</div><div class="t m0 x8 h6 y27 ff2 fs2 fc0 sc0 ls0 ws0">what<span class="_ _a"> </span>architecture<span class="_ _a"> </span>is<span class="_ _b"> </span>best<span class="_ _a"> </span>suited<span class="_ _a"> </span>to<span class="_ _a"> </span>their<span class="_ _a"> </span>application.<span class="_ _10"> </span>Stan-</div><div class="t m0 x8 h6 y28 ff2 fs2 fc0 sc0 ls0 ws0">dard<span class="_ _9"> </span>accuracy<span class="_ _8"> </span>metrics,<span class="_ _10"> </span>such<span class="_ _8"> </span>as<span class="_ _7"> </span>mean<span class="_ _8"> </span>average<span class="_ _8"> </span>precision</div><div class="t m0 x8 h6 y29 ff2 fs2 fc0 sc0 ls0 ws0">(mAP),<span class="_ _b"> </span>do<span class="_ _b"> </span>not<span class="_ _b"> </span>tell<span class="_ _b"> </span>the<span class="_ _b"> </span>entire<span class="_ _b"> </span>story<span class="_ _1"></span>,<span class="_ _f"> </span>since<span class="_ _b"> </span>for<span class="_ _b"> </span>real<span class="_ _b"> </span>deploy-</div><div class="t m0 x8 h6 y2a ff2 fs2 fc0 sc0 ls0 ws0">ments<span class="_"> </span>of<span class="_ _e"> </span>computer<span class="_ _e"> </span>vision<span class="_ _e"> </span>systems,<span class="_ _e"> </span>running<span class="_"> </span>time<span class="_ _e"> </span>and<span class="_ _e"> </span>mem-</div><div class="t m0 x8 h6 y2b ff2 fs2 fc0 sc0 ls0 ws0">ory<span class="_ _b"> </span>usage<span class="_ _f"> </span>are<span class="_ _f"> </span>also<span class="_ _b"> </span>critical.<span class="_ _11"> </span>For<span class="_ _b"> </span>example,<span class="_ _f"> </span>mobile<span class="_ _f"> </span>de<span class="_ _0"></span>vices</div><div class="t m0 x8 h6 y2c ff2 fs2 fc0 sc0 ls0 ws0">often<span class="_ _f"> </span>require<span class="_ _f"> </span>a<span class="_ _6"> </span>small<span class="_ _f"> </span>memory<span class="_ _6"> </span>footprint,<span class="_ _8"> </span>and<span class="_ _f"> </span>self<span class="_ _f"> </span>driving</div><div class="t m0 x9 h6 y6 ff2 fs2 fc0 sc0 ls0 ws0">cars<span class="_ _e"> </span>require<span class="_ _e"> </span>real<span class="_ _e"> </span>time<span class="_ _e"> </span>performance.<span class="_ _6"> </span>Server<span class="_ _0"></span>-side<span class="_ _e"> </span>production</div><div class="t m0 x9 h6 y2d ff2 fs2 fc0 sc0 ls0 ws0">systems,<span class="_ _e"> </span>like<span class="_ _e"> </span>those<span class="_ _e"> </span>used<span class="_ _e"> </span>in<span class="_ _e"> </span>Google,<span class="_ _e"> </span>Facebook<span class="_ _e"> </span>or<span class="_ _e"> </span>Snapchat,</div><div class="t m0 x9 h6 y2e ff2 fs2 fc0 sc0 ls0 ws0">hav<span class="_ _0"></span>e<span class="_ _c"> </span>more<span class="_"> </span>lee<span class="_ _0"></span>way<span class="_ _c"> </span>to<span class="_ _c"> </span>optimize<span class="_"> </span>for<span class="_ _c"> </span>accuracy<span class="_ _1"></span>,<span class="_"> </span>b<span class="_ _0"></span>ut<span class="_ _c"> </span>are<span class="_ _d"> </span>still<span class="_ _c"> </span>sub-</div><div class="t m0 x9 h6 y2f ff2 fs2 fc0 sc0 ls0 ws0">ject<span class="_ _e"> </span>to<span class="_ _e"> </span>throughput<span class="_ _a"> </span>constraints.<span class="_ _f"> </span>While<span class="_ _e"> </span>the<span class="_ _12"> </span>methods<span class="_ _e"> </span>that<span class="_ _e"> </span>win</div><div class="t m0 x9 h6 y30 ff2 fs2 fc0 sc0 ls0 ws0">competitions,<span class="_ _b"> </span>such<span class="_ _a"> </span>as<span class="_ _a"> </span>the<span class="_ _a"> </span>COCO<span class="_ _b"> </span>challenge<span class="_ _a"> </span>[<span class="fc1">25</span>],<span class="_ _b"> </span>are<span class="_ _a"> </span>opti-</div><div class="t m0 x9 h6 y31 ff2 fs2 fc0 sc0 ls0 ws0">mized<span class="_ _b"> </span>for<span class="_ _b"> </span>accuracy<span class="_ _4"></span>,<span class="_ _f"> </span>they<span class="_ _b"> </span>often<span class="_ _a"> </span>rely<span class="_ _b"> </span>on<span class="_ _b"> </span>model<span class="_ _b"> </span>ensembling</div><div class="t m0 x9 h6 y32 ff2 fs2 fc0 sc0 ls0 ws0">and<span class="_"> </span>multicrop<span class="_"> </span>methods<span class="_ _e"> </span>which<span class="_"> </span>are<span class="_ _e"> </span>too<span class="_"> </span>slow<span class="_"> </span>for<span class="_"> </span>practical<span class="_"> </span>us-</div><div class="t m0 x9 h6 y33 ff2 fs2 fc0 sc0 ls0 ws0">age.</div><div class="t m0 xa h6 y34 ff2 fs2 fc0 sc0 ls0 ws0">Unfortunately<span class="_ _1"></span>,<span class="_ _8"> </span>only<span class="_ _f"> </span>a<span class="_ _f"> </span>small<span class="_ _f"> </span>subset<span class="_ _f"> </span>of<span class="_ _f"> </span>papers<span class="_ _f"> </span>(e.g.,<span class="_ _8"> </span>R-</div><div class="t m0 x9 h6 y35 ff2 fs2 fc0 sc0 ls0 ws0">FCN<span class="_ _b"> </span>[<span class="fc1">6</span>],<span class="_ _6"> </span>SSD<span class="_ _f"> </span>[<span class="fc1">26</span>]<span class="_ _b"> </span>YOLO<span class="_ _b"> </span>[<span class="fc1">29</span>])<span class="_ _f"> </span>discuss<span class="_ _b"> </span>running<span class="_ _f"> </span>time<span class="_ _f"> </span>in</div><div class="t m0 x9 h6 y36 ff2 fs2 fc0 sc0 ls0 ws0">any<span class="_ _a"> </span>detail.<span class="_ _13"> </span>Furthermore,<span class="_ _b"> </span>these<span class="_ _b"> </span>papers<span class="_ _a"> </span>typically<span class="_ _b"> </span>only<span class="_ _b"> </span>state</div><div class="t m0 x9 h6 y37 ff2 fs2 fc0 sc0 ls0 ws0">that<span class="_ _b"> </span>they<span class="_ _b"> </span>achieve<span class="_ _b"> </span>some<span class="_ _b"> </span>frame-rate,<span class="_ _6"> </span>but<span class="_ _b"> </span>do<span class="_ _f"> </span>not<span class="_ _b"> </span>giv<span class="_ _0"></span>e<span class="_ _f"> </span>a<span class="_ _b"> </span>full</div><div class="t m0 x9 h6 y38 ff2 fs2 fc0 sc0 ls0 ws0">picture<span class="_ _a"> </span>of<span class="_ _a"> </span>the<span class="_ _a"> </span>speed/accuracy<span class="_ _12"> </span>trade-off,<span class="_ _a"> </span>which<span class="_ _a"> </span>depends<span class="_ _a"> </span>on</div><div class="t m0 x9 h6 y39 ff2 fs2 fc0 sc0 ls0 ws0">many<span class="_"> </span>other<span class="_"> </span>factors,<span class="_"> </span>such<span class="_"> </span>as<span class="_"> </span>which<span class="_"> </span>feature<span class="_ _e"> </span>extractor<span class="_"> </span>is<span class="_"> </span>used,</div><div class="t m0 x9 h6 y3a ff2 fs2 fc0 sc0 ls0 ws0">input<span class="_"> </span>image<span class="_"> </span>sizes,<span class="_"> </span>etc.</div><div class="t m0 xa h6 y3b ff2 fs2 fc0 sc0 ls0 ws0">In<span class="_ _8"> </span>this<span class="_ _6"> </span>paper,<span class="_ _9"> </span>we<span class="_ _8"> </span>seek<span class="_ _6"> </span>to<span class="_ _8"> </span>explore<span class="_ _8"> </span>the<span class="_ _6"> </span>speed/accuracy</div><div class="t m0 x9 h6 y3c ff2 fs2 fc0 sc0 ls0 ws0">trade-off<span class="_"> </span>of<span class="_"> </span>modern<span class="_"> </span>detection<span class="_ _e"> </span>systems<span class="_ _e"> </span>in<span class="_"> </span>an<span class="_ _e"> </span>exhausti<span class="_ _0"></span>ve<span class="_"> </span>and</div><div class="t m0 x9 h6 y3d ff2 fs2 fc0 sc0 ls0 ws0">fair<span class="_ _12"> </span>way<span class="_ _0"></span>.<span class="_ _7"> </span>While<span class="_ _a"> </span>this<span class="_ _a"> </span>has<span class="_ _a"> </span>been<span class="_ _a"> </span>studied<span class="_ _b"> </span>for<span class="_ _12"> </span>full<span class="_ _a"> </span>image<span class="_ _b"> </span>clas-</div><div class="t m0 x9 h6 y3e ff2 fs2 fc0 sc0 ls0 ws0">si&#64257;cation(<span class="_ _f"> </span>(e.g.,<span class="_ _f"> </span>[<span class="fc1">3</span>]),<span class="_ _8"> </span>detection<span class="_ _f"> </span>models<span class="_ _f"> </span>tend<span class="_ _f"> </span>to<span class="_ _f"> </span>be<span class="_ _f"> </span>signif-</div><div class="t m0 x9 h6 y3f ff2 fs2 fc0 sc0 ls0 ws0">icantly<span class="_ _8"> </span>more<span class="_ _6"> </span>complex.<span class="_ _14"> </span>W<span class="_ _1"></span>e<span class="_ _8"> </span>primarily<span class="_ _6"> </span>in<span class="_ _0"></span>vestigate<span class="_ _6"> </span>single-</div><div class="t m0 x9 h6 y40 ff2 fs2 fc0 sc0 ls0 ws0">model/single-pass<span class="_ _8"> </span>detectors,<span class="_ _7"> </span>by<span class="_ _8"> </span>which<span class="_ _8"> </span>we<span class="_ _8"> </span>mean<span class="_ _9"> </span>models</div><div class="t m0 x9 h6 y41 ff2 fs2 fc0 sc0 ls0 ws0">that<span class="_ _b"> </span>do<span class="_ _b"> </span>not<span class="_ _b"> </span>use<span class="_ _b"> </span>ensembling,<span class="_ _b"> </span>multi-crop<span class="_ _b"> </span>methods,<span class="_ _f"> </span>or<span class="_ _b"> </span>other</div><div class="t m0 x9 h6 y42 ff2 fs2 fc0 sc0 ls0 ws0">&#8220;tricks&#8221;<span class="_ _c"> </span>such<span class="_"> </span>as<span class="_ _c"> </span>horizontal<span class="_ _c"> </span>&#64258;ipping.<span class="_ _12"> </span>In<span class="_ _d"> </span>other<span class="_ _c"> </span>words,<span class="_"> </span>we<span class="_ _c"> </span>only</div><div class="t m0 x9 h6 y43 ff2 fs2 fc0 sc0 ls0 ws0">pass<span class="_ _c"> </span>a<span class="_ _c"> </span>single<span class="_ _c"> </span>image<span class="_ _c"> </span>through<span class="_ _c"> </span>a<span class="_ _c"> </span>single<span class="_ _d"> </span>network.<span class="_ _e"> </span>For<span class="_ _c"> </span>simplicity</div><div class="t m0 x9 h6 y44 ff2 fs2 fc0 sc0 ls0 ws0">(and<span class="_ _e"> </span>because<span class="_ _e"> </span>it<span class="_ _e"> </span>is<span class="_ _e"> </span>more<span class="_ _e"> </span>important<span class="_ _e"> </span>for<span class="_ _e"> </span>users<span class="_ _e"> </span>of<span class="_ _12"> </span>this<span class="_ _e"> </span>technol-</div><div class="t m0 x9 h6 y45 ff2 fs2 fc0 sc0 ls0 ws0">ogy),<span class="_ _b"> </span>we<span class="_ _b"> </span>focus<span class="_ _b"> </span>only<span class="_ _a"> </span>on<span class="_ _b"> </span>test-time<span class="_ _b"> </span>performance<span class="_ _a"> </span>and<span class="_ _b"> </span>not<span class="_ _b"> </span>on</div><div class="t m0 x9 h6 y46 ff2 fs2 fc0 sc0 ls0 ws0">how<span class="_"> </span>long<span class="_"> </span>these<span class="_"> </span>models<span class="_"> </span>tak<span class="_ _0"></span>e<span class="_"> </span>to<span class="_"> </span>train.</div><div class="t m0 xa h6 y47 ff2 fs2 fc0 sc0 ls0 ws0">Though<span class="_ _e"> </span>it<span class="_ _12"> </span>is<span class="_ _12"> </span>impractical<span class="_ _e"> </span>to<span class="_ _12"> </span>compare<span class="_ _12"> </span>ev<span class="_ _0"></span>ery<span class="_ _e"> </span>recently<span class="_ _12"> </span>pro-</div><div class="t m0 x9 h6 y48 ff2 fs2 fc0 sc0 ls0 ws0">posed<span class="_ _12"> </span>detection<span class="_ _a"> </span>system,<span class="_ _a"> </span>we<span class="_ _a"> </span>are<span class="_ _a"> </span>fortunate<span class="_ _12"> </span>that<span class="_ _a"> </span>many<span class="_ _12"> </span>of<span class="_ _a"> </span>the</div><div class="t m0 x9 h6 y49 ff2 fs2 fc0 sc0 ls0 ws0">leading<span class="_ _6"> </span>state<span class="_ _6"> </span>of<span class="_ _6"> </span>the<span class="_ _6"> </span>art<span class="_ _8"> </span>approaches<span class="_ _f"> </span>have<span class="_ _f"> </span>con<span class="_ _0"></span>verged<span class="_ _f"> </span>on<span class="_ _6"> </span>a</div><div class="t m0 x9 h6 y4a ff2 fs2 fc0 sc0 ls0 ws0">common<span class="_ _b"> </span>methodology<span class="_ _a"> </span>(at<span class="_ _b"> </span>least<span class="_ _b"> </span>at<span class="_ _a"> </span>a<span class="_ _b"> </span>high<span class="_ _b"> </span>lev<span class="_ _0"></span>el).<span class="_ _13"> </span>This<span class="_ _a"> </span>has</div><div class="t m0 x9 h6 y4b ff2 fs2 fc0 sc0 ls0 ws0">allowed<span class="_ _c"> </span>us<span class="_ _c"> </span>to<span class="_"> </span>implement<span class="_ _c"> </span>and<span class="_ _c"> </span>compare<span class="_"> </span>a<span class="_ _c"> </span>large<span class="_ _c"> </span>number<span class="_"> </span>of<span class="_ _c"> </span>de-</div><div class="t m0 x9 h6 y4c ff2 fs2 fc0 sc0 ls0 ws0">tection<span class="_ _e"> </span>systems<span class="_ _e"> </span>in<span class="_ _e"> </span>a<span class="_ _12"> </span>uni&#64257;ed<span class="_ _e"> </span>manner<span class="_ _0"></span>.<span class="_ _f"> </span>In<span class="_ _e"> </span>particular,<span class="_ _e"> </span>we<span class="_ _e"> </span>hav<span class="_ _0"></span>e</div><div class="t m0 x9 h6 y4d ff2 fs2 fc0 sc0 ls0 ws0">created<span class="_"> </span>implementations<span class="_ _e"> </span>of<span class="_ _e"> </span>the<span class="_ _e"> </span>Faster<span class="_"> </span>R-CNN,<span class="_ _e"> </span>R-FCN<span class="_ _e"> </span>and</div><div class="t m0 x9 h6 y4e ff2 fs2 fc0 sc0 ls0 ws0">SSD<span class="_ _12"> </span>meta-architectures,<span class="_ _a"> </span>which<span class="_ _a"> </span>at<span class="_ _12"> </span>a<span class="_ _a"> </span>high<span class="_ _12"> </span>level<span class="_ _e"> </span>consist<span class="_ _a"> </span>of<span class="_ _12"> </span>a</div><div class="t m0 x9 h6 y4f ff2 fs2 fc0 sc0 ls0 ws0">single<span class="_ _e"> </span>conv<span class="_ _0"></span>olutional<span class="_ _e"> </span>network,<span class="_ _12"> </span>trained<span class="_ _12"> </span>with<span class="_ _12"> </span>a<span class="_ _12"> </span>mixed<span class="_ _e"> </span>regres-</div><div class="t m0 x9 h6 y50 ff2 fs2 fc0 sc0 ls0 ws0">sion<span class="_ _f"> </span>and<span class="_ _b"> </span>classi&#64257;cation<span class="_ _f"> </span>objectiv<span class="_ _0"></span>e,<span class="_ _6"> </span>and<span class="_ _f"> </span>use<span class="_ _b"> </span>sliding<span class="_ _f"> </span>window</div><div class="t m0 x9 h6 y51 ff2 fs2 fc0 sc0 ls0 ws0">style<span class="_"> </span>predictions.</div><div class="t m0 xa h6 y52 ff2 fs2 fc0 sc0 ls0 ws0">T<span class="_ _1"></span>o<span class="_"> </span>summarize,<span class="_"> </span>our<span class="_"> </span>main<span class="_"> </span>contributions<span class="_"> </span>are<span class="_"> </span>as<span class="_"> </span>follo<span class="_ _0"></span>ws:</div><div class="t m0 xb h7 y2c ff4 fs2 fc0 sc0 ls0 ws0">&#8226;<span class="_ _8"> </span><span class="ff2">W<span class="_ _1"></span>e<span class="_ _e"> </span>provide<span class="_ _e"> </span>a<span class="_ _e"> </span>concise<span class="_ _e"> </span>survey<span class="_"> </span>of<span class="_ _e"> </span>modern<span class="_ _e"> </span>con<span class="_ _0"></span>volutional</span></div><div class="t m0 xc h6 y53 ff2 fs2 fc0 sc0 ls0 ws0">1</div><div class="t m1 xd h8 y54 ff5 fs3 fc2 sc0 ls0 ws0">arXiv:1611.10012v3 [cs.CV] 25 Apr 2017</div><a class="l" rel='nofollow' onclick='return false;'><div class="d m2"></div></a><a class="l" rel='nofollow' onclick='return false;'><div class="d m2"></div></a><a class="l" rel='nofollow' onclick='return false;'><div class="d m2"></div></a><a class="l" rel='nofollow' onclick='return false;'><div class="d m2"></div></a><a class="l" rel='nofollow' onclick='return false;'><div class="d m2"></div></a><a class="l" rel='nofollow' onclick='return false;'><div class="d m2"></div></a><a class="l" rel='nofollow' onclick='return false;'><div class="d m2"></div></a><a class="l" rel='nofollow' onclick='return false;'><div class="d m2"></div></a><a class="l" rel='nofollow' onclick='return false;'><div class="d m2"></div></a><a class="l" rel='nofollow' onclick='return false;'><div class="d m2"></div></a><a class="l" rel='nofollow' onclick='return false;'><div class="d m2"></div></a><a class="l" rel='nofollow' onclick='return false;'><div class="d m2"></div></a><a class="l" rel='nofollow' onclick='return false;'><div class="d m2"></div></a></div><div class="pi" data-data='{"ctm":[1.568627,0.000000,0.000000,1.568627,0.000000,0.000000]}'></div></div></body></html>
评论
    相关推荐