视觉经典论文.zip

  • F3_260687
    了解作者
  • 107.5MB
    文件大小
  • zip
    文件格式
  • 0
    收藏次数
  • VIP专享
    资源类型
  • 0
    下载次数
  • 2022-05-01 11:06
    上传日期
计算机视觉经典网络,还有目标检测,轻量级网络的论文,能在网上搜得到相关的翻译和学习资料
视觉经典论文.zip
  • 07_ResNeXt.pdf
    1.3MB
  • 27_CTPN.pdf
    8MB
  • 34_AttractioNet.pdf
    4.8MB
  • 09_R-CNN.pdf
    6.2MB
  • 39_MnasNet.pdf
    1.3MB
  • 35_Densenet.pdf
    1.1MB
  • 29_MTCNN.pdf
    831.5KB
  • 01_AlexNet.pdf
    1.4MB
  • 11_Faster R-CNN.pdf
    6.6MB
  • 06_Resnet.pdf
    800.2KB
  • 05_Inception-v3.pdf
    505.5KB
  • 33_M2Det.pdf
    2.3MB
  • 29_2_cascnn.pdf
    5.1MB
  • 42.pdf
    327.4KB
  • 02_VGG.pdf
    195.3KB
  • 36_Deformable-ConvNets.pdf
    6.6MB
  • 22_MobileNets.pdf
    919.2KB
  • 21_SqueezeNet.pdf
    903.3KB
  • 18_RFCN.pdf
    8.6MB
  • 20_Cascade R-CNN.pdf
    1.8MB
  • 32_RFBnet.pdf
    1.2MB
  • 37_FSRNet:.pdf
    7.2MB
  • 30_DeepLung.pdf
    1.6MB
  • 31_RetinaNet.pdf
    1.2MB
  • 28_FaceNet.pdf
    4.5MB
  • 38_Xception.pdf
    785.6KB
  • 04_BNGooLeNet.pdf
    169.5KB
  • 17_FPN.pdf
    770.6KB
  • 17_2_PSPnet.pdf
    4.3MB
  • 12_YOLO.pdf
    5.1MB
  • 13_YOLOv2.pdf
    5MB
  • 10_Fast-RCNN.pdf
    714KB
  • 15_SSD.pdf
    2.4MB
  • 24_ShuffleNet.pdf
    362.6KB
  • 23_2_MobileNetV3.pdf
    524.3KB
  • 26_CRNN.pdf
    1MB
  • 03_GooLeNet.pdf
    1.2MB
  • 16_DSSD.pdf
    5.4MB
  • 19_MaskRCNN.pdf
    7.4MB
  • 25_ShuffleNet V2.pdf
    1.6MB
  • 08_SENet.pdf
    2.1MB
  • 23_MobileNetV2.pdf
    1.5MB
  • 14_YOLOv3.pdf
    2.1MB
内容介绍
<html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta charset="utf-8"> <meta name="generator" content="pdf2htmlEX"> <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"> <link rel="stylesheet" href="https://static.pudn.com/base/css/base.min.css"> <link rel="stylesheet" href="https://static.pudn.com/base/css/fancy.min.css"> <link rel="stylesheet" href="https://static.pudn.com/prod/directory_preview_static/626e013640256a40ce754a3c/raw.css"> <script src="https://static.pudn.com/base/js/compatibility.min.js"></script> <script src="https://static.pudn.com/base/js/pdf2htmlEX.min.js"></script> <script> try{ pdf2htmlEX.defaultViewer = new pdf2htmlEX.Viewer({}); }catch(e){} </script> <title></title> </head> <body> <div id="sidebar" style="display: none"> <div id="outline"> </div> </div> <div id="pf1" class="pf w0 h0" data-page-no="1"><div class="pc pc1 w0 h0"><img class="bi x0 y0 w1 h1" alt="" src="https://static.pudn.com/prod/directory_preview_static/626e013640256a40ce754a3c/bg1.jpg"><div class="t m0 x1 h2 y1 ff1 fs0 fc0 sc0 ls0 ws0">R-FCN:<span class="_ _0"> </span>Object<span class="_ _0"> </span>Detection<span class="_ _0"> </span>via</div><div class="t m0 x2 h2 y2 ff1 fs0 fc0 sc0 ls0 ws0">Region-based<span class="_ _0"> </span>Fully<span class="_ _0"> </span>Con<span class="_ _1"></span>volutional<span class="_ _0"> </span>Netw<span class="_ _2"></span>orks</div><div class="t m0 x3 h3 y3 ff1 fs1 fc0 sc0 ls0 ws0">Jifeng<span class="_ _3"> </span>Dai</div><div class="t m0 x4 h4 y4 ff2 fs1 fc0 sc0 ls0 ws0">Microsoft<span class="_"> </span>Research</div><div class="t m0 x5 h3 y3 ff1 fs1 fc0 sc0 ls0 ws0">Y<span class="_ _2"></span>i<span class="_ _3"> </span>Li</div><div class="t m0 x6 h5 y5 ff3 fs2 fc0 sc0 ls0 ws0">&#8727;</div><div class="t m0 x7 h4 y4 ff2 fs1 fc0 sc0 ls0 ws0">Tsinghua<span class="_"> </span>Univ<span class="_ _2"></span>ersity</div><div class="t m0 x8 h3 y3 ff1 fs1 fc0 sc0 ls0 ws0">Kaiming<span class="_ _3"> </span>He</div><div class="t m0 x9 h4 y4 ff2 fs1 fc0 sc0 ls0 ws0">Microsoft<span class="_"> </span>Research</div><div class="t m0 xa h3 y3 ff1 fs1 fc0 sc0 ls0 ws0">Jian<span class="_ _3"> </span>Sun</div><div class="t m0 xb h4 y4 ff2 fs1 fc0 sc0 ls0 ws0">Microsoft<span class="_"> </span>Research</div><div class="t m0 xc h6 y6 ff1 fs3 fc0 sc0 ls0 ws0">Abstract</div><div class="t m1 x2 h4 y7 ff2 fs1 fc0 sc0 ls0 ws0">W<span class="_ _1"></span>e<span class="_"> </span>present<span class="_"> </span>re<span class="_ _2"></span>gion-based,<span class="_"> </span>fully<span class="_"> </span>con<span class="_ _1"></span>volutional<span class="_"> </span>networks<span class="_ _4"> </span>for<span class="_"> </span>accurate<span class="_"> </span>and<span class="_"> </span>ef<span class="_ _2"></span>&#64257;cient</div><div class="t m2 x2 h4 y8 ff2 fs1 fc0 sc0 ls0 ws0">object<span class="_"> </span>detection.<span class="_ _5"> </span>In<span class="_"> </span>contrast<span class="_"> </span>to<span class="_"> </span>previous<span class="_"> </span>re<span class="_ _2"></span>gion-based<span class="_"> </span>detectors<span class="_"> </span>such<span class="_"> </span>as<span class="_"> </span>Fast/Faster</div><div class="t m3 x2 h4 y9 ff2 fs1 fc0 sc0 ls0 ws0">R-CNN<span class="_"> </span>[</div><div class="t m0 xd h4 y9 ff2 fs1 fc0 sc0 ls0 ws0">6</div><div class="t m3 xe h4 y9 ff2 fs1 fc0 sc0 ls0 ws0">,</div><div class="t m0 xf h4 y9 ff2 fs1 fc0 sc0 ls0 ws0">18</div><div class="t m3 x10 h4 y9 ff2 fs1 fc0 sc0 ls0 ws0">]<span class="_"> </span>that<span class="_"> </span>apply<span class="_"> </span>a<span class="_"> </span>costly<span class="_"> </span>per<span class="_ _2"></span>-region<span class="_"> </span>subnetw<span class="_ _2"></span>ork<span class="_"> </span>hundreds<span class="_"> </span>of<span class="_"> </span>times,<span class="_"> </span>our</div><div class="t m4 x2 h4 ya ff2 fs1 fc0 sc0 ls0 ws0">region-based<span class="_"> </span>detector<span class="_"> </span>is<span class="_"> </span>fully<span class="_"> </span>con<span class="_ _1"></span>volutional<span class="_"> </span>with<span class="_"> </span>almost<span class="_"> </span>all<span class="_"> </span>computation<span class="_"> </span>shared<span class="_"> </span>on</div><div class="t m1 x2 h4 yb ff2 fs1 fc0 sc0 ls0 ws0">the<span class="_"> </span>entire<span class="_"> </span>image.<span class="_ _5"> </span>T<span class="_ _6"></span>o<span class="_"> </span>achie<span class="_ _2"></span>ve<span class="_"> </span>this<span class="_"> </span>goal,<span class="_ _4"> </span>we<span class="_"> </span>propose<span class="_"> </span>position-sensiti<span class="_ _2"></span>ve<span class="_"> </span>score<span class="_ _4"> </span>maps</div><div class="t m5 x2 h4 yc ff2 fs1 fc0 sc0 ls0 ws0">to<span class="_"> </span>address<span class="_"> </span>a<span class="_"> </span>dilemma<span class="_ _7"> </span>between<span class="_"> </span>translation-in<span class="_ _2"></span>variance<span class="_"> </span>in<span class="_"> </span>image<span class="_"> </span>classi&#64257;cation<span class="_"> </span>and</div><div class="t m6 x2 h4 yd ff2 fs1 fc0 sc0 ls0 ws0">translation-variance<span class="_"> </span>in<span class="_"> </span>object<span class="_"> </span>detection.<span class="_ _7"> </span>Our<span class="_"> </span>method<span class="_"> </span>can<span class="_"> </span>thus<span class="_"> </span>naturally<span class="_ _7"> </span>adopt<span class="_"> </span>fully</div><div class="t m5 x2 h4 ye ff2 fs1 fc0 sc0 ls0 ws0">con<span class="_ _2"></span>volutional<span class="_"> </span>image<span class="_ _5"> </span>classi&#64257;er<span class="_ _7"> </span>backbones,<span class="_ _5"> </span>such<span class="_ _7"> </span>as<span class="_ _7"> </span>the<span class="_ _7"> </span>latest<span class="_ _5"> </span>Residual<span class="_ _7"> </span>Networks</div><div class="t m7 x2 h4 yf ff2 fs1 fc0 sc0 ls0 ws0">(ResNets)<span class="_"> </span>[</div><div class="t m0 x11 h4 yf ff2 fs1 fc0 sc0 ls0 ws0">9</div><div class="t m7 x12 h4 yf ff2 fs1 fc0 sc0 ls0 ws0">],<span class="_"> </span>for<span class="_"> </span>object<span class="_"> </span>detection.<span class="_ _7"> </span>W<span class="_ _1"></span>e<span class="_"> </span>show<span class="_"> </span>competiti<span class="_ _2"></span>ve<span class="_ _4"> </span>results<span class="_"> </span>on<span class="_"> </span>the<span class="_"> </span>P<span class="_ _6"></span>ASCAL</div><div class="t m5 x2 h4 y10 ff2 fs1 fc0 sc0 ls0 ws0">V<span class="_ _2"></span>OC<span class="_ _8"> </span>datasets<span class="_ _8"> </span>(<span class="ff4">e.g.</span>,<span class="_ _8"> </span>83.6%<span class="_ _8"> </span>mAP<span class="_ _8"> </span>on<span class="_ _9"> </span>the<span class="_ _8"> </span>2007<span class="_ _8"> </span>set)<span class="_ _8"> </span>with<span class="_ _8"> </span>the<span class="_ _8"> </span>101-layer<span class="_ _9"> </span>ResNet.</div><div class="t m8 x2 h4 y11 ff2 fs1 fc0 sc0 ls0 ws0">Meanwhile,<span class="_"> </span>our<span class="_ _4"> </span>result<span class="_ _4"> </span>is<span class="_ _4"> </span>achie<span class="_ _2"></span>ved<span class="_ _4"> </span>at<span class="_ _4"> </span>a<span class="_ _4"> </span>test-time<span class="_ _4"> </span>speed<span class="_ _4"> </span>of<span class="_ _4"> </span>170ms<span class="_ _4"> </span>per<span class="_ _4"> </span>image,<span class="_ _4"> </span>2.5-20</div><div class="t m0 x13 h7 y11 ff5 fs1 fc0 sc0 ls0 ws0">&#215;</div><div class="t m5 x2 h4 y12 ff2 fs1 fc0 sc0 ls0 ws0">faster<span class="_ _5"> </span>than<span class="_ _5"> </span>the<span class="_ _5"> </span>Faster<span class="_ _5"> </span>R-CNN<span class="_ _5"> </span>counterpart.<span class="_ _a"> </span>Code<span class="_ _5"> </span>is<span class="_ _5"> </span>made<span class="_ _5"> </span>publicly<span class="_ _5"> </span>available<span class="_ _5"> </span>at:</div><div class="t m0 x2 h8 y13 ff6 fs1 fc0 sc0 ls0 ws0">https://github.com/daijifeng001/r-<span class="_ _b"></span>fcn<span class="ff2">.</span></div><div class="t m0 x14 h6 y14 ff1 fs3 fc0 sc0 ls0 ws0">1<span class="_ _c"> </span>Introduction</div><div class="t m8 x14 h4 y15 ff2 fs1 fc0 sc0 ls0 ws0">A<span class="_ _d"> </span>pre<span class="_ _2"></span>valent<span class="_ _d"> </span>f<span class="_ _2"></span>amily<span class="_ _d"> </span>[</div><div class="t m0 x15 h4 y15 ff2 fs1 fc0 sc0 ls0 ws0">8</div><div class="t m8 x11 h4 y15 ff2 fs1 fc0 sc0 ls0 ws0">,</div><div class="t m0 x12 h4 y15 ff2 fs1 fc0 sc0 ls0 ws0">6</div><div class="t m8 x1 h4 y15 ff2 fs1 fc0 sc0 ls0 ws0">,</div><div class="t m0 x16 h4 y15 ff2 fs1 fc0 sc0 ls0 ws0">18</div><div class="t m8 x17 h4 y15 ff2 fs1 fc0 sc0 ls0 ws0">]<span class="_ _d"> </span>of<span class="_ _d"> </span>deep<span class="_ _d"> </span>networks<span class="_ _d"> </span>for<span class="_ _d"> </span>object<span class="_ _d"> </span>detection<span class="_ _d"> </span>can<span class="_ _d"> </span>be<span class="_ _d"> </span>di<span class="_ _2"></span>vided<span class="_ _d"> </span>into<span class="_ _d"> </span>two<span class="_ _d"> </span>subnetworks</div><div class="t m5 x14 h4 y16 ff2 fs1 fc0 sc0 ls0 ws0">by<span class="_ _7"> </span>the<span class="_ _5"> </span>Region-of-Interest<span class="_ _7"> </span>(RoI)<span class="_ _5"> </span>pooling<span class="_ _7"> </span>layer<span class="_ _5"> </span>[</div><div class="t m0 x18 h4 y16 ff2 fs1 fc0 sc0 ls0 ws0">6</div><div class="t m5 x19 h4 y16 ff2 fs1 fc0 sc0 ls0 ws0">]:<span class="_ _9"> </span>(i)<span class="_ _5"> </span>a<span class="_ _7"> </span>shared,<span class="_ _5"> </span>&#8220;<span class="ff4">fully<span class="_ _5"> </span>con<span class="_ _1"></span>volutional<span class="_ _b"></span><span class="ff2">&#8221;<span class="_ _7"> </span>subnetwork</span></span></div><div class="t m5 x14 h4 y17 ff2 fs1 fc0 sc0 ls0 ws0">independent<span class="_ _8"> </span>of<span class="_ _9"> </span>RoIs,<span class="_ _0"> </span>and<span class="_ _8"> </span>(ii)<span class="_ _9"> </span>an<span class="_ _8"> </span>RoI-wise<span class="_ _9"> </span>subnetwork<span class="_ _8"> </span>that<span class="_ _9"> </span>does<span class="_ _8"> </span>not<span class="_ _9"> </span>share<span class="_ _8"> </span>computation.<span class="_ _e"> </span>This</div><div class="t m5 x14 h4 y18 ff2 fs1 fc0 sc0 ls0 ws0">decomposition<span class="_ _7"> </span>[</div><div class="t m0 x1a h4 y18 ff2 fs1 fc0 sc0 ls0 ws0">8</div><div class="t m5 x1b h4 y18 ff2 fs1 fc0 sc0 ls0 ws0">]<span class="_ _7"> </span>was<span class="_ _7"> </span>historically<span class="_ _7"> </span>resulted<span class="_ _7"> </span>from<span class="_ _7"> </span>the<span class="_ _5"> </span>pioneering<span class="_ _7"> </span>classi&#64257;cation<span class="_ _7"> </span>architectures,<span class="_ _7"> </span>such</div><div class="t m9 x14 h4 y19 ff2 fs1 fc0 sc0 ls0 ws0">as<span class="_"> </span>AlexNet<span class="_"> </span>[</div><div class="t m0 x1c h4 y19 ff2 fs1 fc0 sc0 ls0 ws0">10</div><div class="t m9 x1d h4 y19 ff2 fs1 fc0 sc0 ls0 ws0">]<span class="_"> </span>and<span class="_"> </span>VGG<span class="_ _4"> </span>Nets<span class="_"> </span>[</div><div class="t m0 x1e h4 y19 ff2 fs1 fc0 sc0 ls0 ws0">23</div><div class="t m9 x1f h4 y19 ff2 fs1 fc0 sc0 ls0 ws0">],<span class="_"> </span>that<span class="_"> </span>consist<span class="_"> </span>of<span class="_"> </span>two<span class="_ _d"> </span>subnetworks<span class="_"> </span>by<span class="_"> </span>design<span class="_"> </span>&#8212;<span class="_"> </span>a<span class="_"> </span>con<span class="_ _2"></span>v<span class="_ _2"></span>olutional</div><div class="t ma x14 h4 y1a ff2 fs1 fc0 sc0 ls0 ws0">subnetwork<span class="_"> </span>ending<span class="_"> </span>with<span class="_"> </span>a<span class="_"> </span>spatial<span class="_"> </span>pooling<span class="_"> </span>layer,<span class="_"> </span>follo<span class="_ _2"></span>wed<span class="_"> </span>by<span class="_"> </span>sev<span class="_ _2"></span>eral<span class="_"> </span>fully-connected<span class="_"> </span>(<span class="ff4">fc</span>)<span class="_"> </span>layers.<span class="_ _5"> </span>Thus</div><div class="t m8 x14 h4 y1b ff2 fs1 fc0 sc0 ls0 ws0">the<span class="_"> </span>(last)<span class="_ _d"> </span>spatial<span class="_"> </span>pooling<span class="_ _d"> </span>layer<span class="_"> </span>in<span class="_ _d"> </span>image<span class="_"> </span>classi&#64257;cation<span class="_"> </span>netw<span class="_ _2"></span>orks<span class="_"> </span>is<span class="_ _d"> </span>naturally<span class="_"> </span>turned<span class="_ _d"> </span>into<span class="_"> </span>the<span class="_ _d"> </span>RoI<span class="_"> </span>pooling</div><div class="t m0 x14 h4 y1c ff2 fs1 fc0 sc0 ls0 ws0">layer<span class="_"> </span>in<span class="_"> </span>object<span class="_"> </span>detection<span class="_"> </span>networks<span class="_"> </span>[8,<span class="_"> </span>6,<span class="_"> </span>18].</div><div class="t m5 x14 h4 y1d ff2 fs1 fc0 sc0 ls0 ws0">But<span class="_"> </span>recent<span class="_ _7"> </span>state-of-the-art<span class="_ _7"> </span>image<span class="_"> </span>classi&#64257;cation<span class="_ _7"> </span>networks<span class="_"> </span>such<span class="_ _7"> </span>as<span class="_"> </span>Residual<span class="_ _7"> </span>Nets<span class="_ _7"> </span>(ResNets)<span class="_"> </span>[</div><div class="t m0 x20 h4 y1d ff2 fs1 fc0 sc0 ls0 ws0">9</div><div class="t m5 x21 h4 y1d ff2 fs1 fc0 sc0 ls0 ws0">]<span class="_"> </span>and</div><div class="t m5 x14 h4 y1e ff2 fs1 fc0 sc0 ls0 ws0">GoogLeNets<span class="_ _8"> </span>[</div><div class="t m0 x22 h4 y1e ff2 fs1 fc0 sc0 ls0 ws0">24</div><div class="t m5 x23 h4 y1e ff2 fs1 fc0 sc0 ls0 ws0">,</div><div class="t m0 x24 h4 y1e ff2 fs1 fc0 sc0 ls0 ws0">26</div><div class="t m5 x25 h4 y1e ff2 fs1 fc0 sc0 ls0 ws0">]<span class="_ _8"> </span>are<span class="_ _5"> </span>by<span class="_ _8"> </span>design<span class="_ _8"> </span><span class="ff4">fully<span class="_ _5"> </span>convolutional</span></div><div class="t m0 x26 h9 y1f ff2 fs2 fc0 sc0 ls0 ws0">2</div><div class="t m5 x27 h4 y1e ff2 fs1 fc0 sc0 ls0 ws0">.<span class="_ _f"> </span>By<span class="_ _8"> </span>analogy<span class="_ _6"></span>,<span class="_ _9"> </span>it<span class="_ _8"> </span>appears<span class="_ _5"> </span>natural<span class="_ _8"> </span>to<span class="_ _8"> </span>use</div><div class="t m5 x14 h4 y20 ff2 fs1 fc0 sc0 ls0 ws0">all<span class="_ _7"> </span>con<span class="_ _2"></span>volutional<span class="_"> </span>layers<span class="_ _7"> </span>to<span class="_ _7"> </span>construct<span class="_ _5"> </span>the<span class="_"> </span>shared,<span class="_ _5"> </span>con<span class="_ _2"></span>volutional<span class="_"> </span>subnetwork<span class="_ _7"> </span>in<span class="_ _7"> </span>the<span class="_ _7"> </span>object<span class="_ _7"> </span>detection</div><div class="t m6 x14 h4 y21 ff2 fs1 fc0 sc0 ls0 ws0">architecture,<span class="_"> </span>leaving<span class="_"> </span>the<span class="_"> </span>RoI-wise<span class="_"> </span>subnetwork<span class="_"> </span>no<span class="_"> </span>hidden<span class="_"> </span>layer<span class="_ _1"></span>.<span class="_ _5"> </span>Ho<span class="_ _2"></span>we<span class="_ _2"></span>ver<span class="_ _1"></span>,<span class="_"> </span>as<span class="_ _7"> </span>empirically<span class="_"> </span>in<span class="_ _2"></span>vestig<span class="_ _2"></span>ated</div><div class="t mb x14 h4 y22 ff2 fs1 fc0 sc0 ls0 ws0">in<span class="_"> </span>this<span class="_"> </span>work,<span class="_"> </span>this<span class="_"> </span>na&#239;ve<span class="_"> </span>solution<span class="_"> </span>turns<span class="_"> </span>out<span class="_"> </span>to<span class="_"> </span>ha<span class="_ _2"></span>ve<span class="_"> </span>considerably<span class="_"> </span><span class="ff4">inferior<span class="_"> </span>detection<span class="_"> </span>accur<span class="_ _2"></span>acy<span class="_"> </span><span class="ff2">that<span class="_"> </span>does</span></span></div><div class="t mc x14 h4 y23 ff2 fs1 fc0 sc0 ls0 ws0">not<span class="_"> </span>match<span class="_"> </span>the<span class="_"> </span>network&#8217;<span class="_ _1"></span>s<span class="_"> </span><span class="ff4">superior<span class="_"> </span>classi&#64257;cation<span class="_"> </span>accuracy</span>.<span class="_ _5"> </span>T<span class="_ _6"></span>o<span class="_"> </span>remedy<span class="_"> </span>this<span class="_ _7"> </span>issue,<span class="_"> </span>in<span class="_"> </span>the<span class="_"> </span>ResNet<span class="_"> </span>paper</div><div class="t md x14 h4 y24 ff2 fs1 fc0 sc0 ls0 ws0">[</div><div class="t m0 x28 h4 y24 ff2 fs1 fc0 sc0 ls0 ws0">9</div><div class="t md x29 h4 y24 ff2 fs1 fc0 sc0 ls0 ws0">]<span class="_"> </span>the<span class="_"> </span>RoI<span class="_"> </span>pooling<span class="_"> </span>layer<span class="_"> </span>of<span class="_"> </span>the<span class="_ _7"> </span>Faster<span class="_"> </span>R-CNN<span class="_"> </span>detector<span class="_"> </span>[</div><div class="t m0 x8 h4 y24 ff2 fs1 fc0 sc0 ls0 ws0">18</div><div class="t md x27 h4 y24 ff2 fs1 fc0 sc0 ls0 ws0">]<span class="_"> </span>is<span class="_"> </span><span class="ff4">unnaturally<span class="_"> </span></span>inserted<span class="_"> </span>between<span class="_"> </span>two<span class="_"> </span>sets</div><div class="t me x14 h4 y25 ff2 fs1 fc0 sc0 ls0 ws0">of<span class="_"> </span>con<span class="_ _2"></span>v<span class="_ _2"></span>olutional<span class="_"> </span>layers<span class="_"> </span>&#8212;<span class="_"> </span>this<span class="_"> </span>creates<span class="_"> </span>a<span class="_"> </span>deeper<span class="_"> </span>RoI-wise<span class="_"> </span>subnetwork<span class="_"> </span>that<span class="_"> </span>improv<span class="_ _2"></span>es<span class="_"> </span>accuracy<span class="_ _6"></span>,<span class="_"> </span>at<span class="_"> </span>the</div><div class="t m0 x14 h4 y26 ff2 fs1 fc0 sc0 ls0 ws0">cost<span class="_"> </span>of<span class="_"> </span>lower<span class="_"> </span>speed<span class="_"> </span>due<span class="_"> </span>to<span class="_"> </span>the<span class="_"> </span>unshared<span class="_"> </span>per<span class="_ _1"></span>-RoI<span class="_"> </span>computation.</div><div class="t mf x14 h4 y27 ff2 fs1 fc0 sc0 ls0 ws0">W<span class="_ _6"></span>e<span class="_"> </span>argue<span class="_"> </span>that<span class="_"> </span>the<span class="_"> </span>aforementioned<span class="_"> </span>unnatural<span class="_"> </span>design<span class="_"> </span>is<span class="_"> </span>caused<span class="_"> </span>by<span class="_"> </span>a<span class="_"> </span>dilemma<span class="_"> </span>of<span class="_"> </span>increasing<span class="_"> </span>translation</div><div class="t m10 x14 h4 y28 ff4 fs1 fc0 sc0 ls0 ws0">in<span class="_ _2"></span>variance<span class="_"> </span><span class="ff2">for<span class="_"> </span>image<span class="_"> </span>classi&#64257;cation<span class="_"> </span></span>vs<span class="ff2">.<span class="_ _7"> </span>respecting<span class="_"> </span>translation<span class="_"> </span></span>variance<span class="_"> </span><span class="ff2">for<span class="_"> </span>object<span class="_"> </span>detection.<span class="_ _7"> </span>On<span class="_"> </span>one</span></div><div class="t m11 x14 h4 y29 ff2 fs1 fc0 sc0 ls0 ws0">hand,<span class="_"> </span>the<span class="_"> </span>image-le<span class="_ _2"></span>vel<span class="_"> </span>classi&#64257;cation<span class="_"> </span>task<span class="_"> </span>fa<span class="_ _2"></span>v<span class="_ _2"></span>ors<span class="_"> </span>translation<span class="_"> </span>in<span class="_ _2"></span>v<span class="_ _2"></span>ariance<span class="_"> </span>&#8212;<span class="_"> </span>shift<span class="_"> </span>of<span class="_"> </span>an<span class="_"> </span>object<span class="_"> </span>inside<span class="_"> </span>an</div><div class="t m8 x14 h4 y2a ff2 fs1 fc0 sc0 ls0 ws0">image<span class="_ _d"> </span>should<span class="_ _d"> </span>be<span class="_"> </span>indiscriminati<span class="_ _2"></span>ve.<span class="_ _7"> </span>Thus,<span class="_ _d"> </span>deep<span class="_"> </span>(fully)<span class="_ _d"> </span>con<span class="_ _1"></span>volutional<span class="_ _d"> </span>architectures<span class="_"> </span>that<span class="_ _d"> </span>are<span class="_ _d"> </span>as<span class="_ _d"> </span>translation-</div><div class="t m5 x14 h4 y2b ff2 fs1 fc0 sc0 ls0 ws0">in<span class="_ _2"></span>v<span class="_ _2"></span>ariant<span class="_"> </span>as<span class="_"> </span>possible<span class="_"> </span>are<span class="_"> </span>preferable<span class="_"> </span>as<span class="_"> </span>evidenced<span class="_"> </span>by<span class="_"> </span>the<span class="_"> </span>leading<span class="_"> </span>results<span class="_"> </span>on<span class="_"> </span>ImageNet<span class="_"> </span>classi&#64257;cation</div><div class="t m0 x4 ha y2c ff7 fs4 fc0 sc0 ls0 ws0">&#8727;</div><div class="t m0 x2a hb y2d ff2 fs5 fc0 sc0 ls0 ws0">This<span class="_"> </span>work<span class="_"> </span>was<span class="_"> </span>done<span class="_"> </span>when<span class="_"> </span>Y<span class="_ _1"></span>i<span class="_"> </span>Li<span class="_"> </span>was<span class="_"> </span>an<span class="_"> </span>intern<span class="_"> </span>at<span class="_"> </span>Microsoft<span class="_"> </span>Research.</div><div class="t m0 x4 hc y2e ff2 fs4 fc0 sc0 ls0 ws0">2</div><div class="t m12 x2a hb y2f ff2 fs5 fc0 sc0 ls0 ws0">Only<span class="_"> </span>the<span class="_"> </span>last<span class="_"> </span>layer<span class="_"> </span>is<span class="_"> </span>fully-connected,<span class="_"> </span>which<span class="_"> </span>is<span class="_ _3"> </span>remov<span class="_ _2"></span>ed<span class="_"> </span>and<span class="_"> </span>replaced<span class="_"> </span>when<span class="_"> </span>&#64257;ne-tuning<span class="_"> </span>for<span class="_ _3"> </span>object<span class="_"> </span>detection.</div><div class="t m13 x2b hd y30 ff8 fs6 fc1 sc0 ls0 ws0">arXiv:1605.06409v2 [cs.CV] 21 Jun 2016</div><a class="l" rel='nofollow' onclick='return false;'><div class="d m14"></div></a><a class="l" rel='nofollow' onclick='return false;'><div class="d m14"></div></a><a class="l" rel='nofollow' onclick='return false;'><div class="d m14"></div></a><a class="l" rel='nofollow' onclick='return false;'><div class="d m14"></div></a><a class="l" rel='nofollow' onclick='return false;'><div class="d m14"></div></a><a class="l" rel='nofollow' onclick='return false;'><div class="d m14"></div></a><a class="l" rel='nofollow' onclick='return false;'><div class="d m14"></div></a><a class="l" rel='nofollow' onclick='return false;'><div class="d m14"></div></a><a class="l" rel='nofollow' onclick='return false;'><div class="d m14"></div></a><a class="l" rel='nofollow' onclick='return false;'><div class="d m14"></div></a><a class="l" rel='nofollow' onclick='return false;'><div class="d m14"></div></a><a class="l" rel='nofollow' onclick='return false;'><div class="d m14"></div></a><a class="l" rel='nofollow' onclick='return false;'><div class="d m14"></div></a><a class="l" rel='nofollow' onclick='return false;'><div class="d m14"></div></a><a class="l" rel='nofollow' onclick='return false;'><div class="d m14"></div></a><a class="l" rel='nofollow' onclick='return false;'><div class="d m14"></div></a><a class="l" rel='nofollow' onclick='return false;'><div class="d m14"></div></a><a class="l" rel='nofollow' onclick='return false;'><div class="d m14"></div></a><a class="l" rel='nofollow' onclick='return false;'><div class="d m14"></div></a><a class="l" rel='nofollow' onclick='return false;'><div class="d m14"></div></a></div><div class="pi" data-data='{"ctm":[1.568627,0.000000,0.000000,1.568627,0.000000,0.000000]}'></div></div> </body> </html>
评论
    相关推荐