<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8">
<meta name="generator" content="pdf2htmlEX">
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<link rel="stylesheet" href="https://static.pudn.com/base/css/base.min.css">
<link rel="stylesheet" href="https://static.pudn.com/base/css/fancy.min.css">
<link rel="stylesheet" href="https://static.pudn.com/prod/directory_preview_static/625d2f7bbe9ad24cfa7bc4e7/raw.css">
<script src="https://static.pudn.com/base/js/compatibility.min.js"></script>
<script src="https://static.pudn.com/base/js/pdf2htmlEX.min.js"></script>
<script>
try{
pdf2htmlEX.defaultViewer = new pdf2htmlEX.Viewer({});
}catch(e){}
</script>
<title></title>
</head>
<body>
<div id="sidebar" style="display: none">
<div id="outline">
</div>
</div>
<div id="pf1" class="pf w0 h0" data-page-no="1"><div class="pc pc1 w0 h0"><img class="bi x0 y0 w1 h1" alt="" src="https://static.pudn.com/prod/directory_preview_static/625d2f7bbe9ad24cfa7bc4e7/bg1.jpg"><div class="c x0 y1 w2 h2"><div class="t m0 x1 h3 y2 ff1 fs0 fc0 sc0 ls0 ws0">数据挖掘实验报告</div><div class="t m0 x2 h4 y3 ff2 fs1 fc0 sc1 ls0 ws0"> <span class="ff3 fs2">——<span class="ff2">svm<span class="_ _0"> </span><span class="ff1 sc0">分类算法</span></span></span></div><div class="t m0 x3 h5 y4 ff1 fs3 fc0 sc0 ls0 ws0">学号:<span class="ff2 sc1">111060896 </span>姓名:王建</div><div class="t m0 x4 h6 y5 ff1 fs1 fc0 sc0 ls0 ws0">一、<span class="_ _1"></span>数据源说明</div><div class="t m0 x5 h6 y6 ff2 fs1 fc0 sc1 ls0 ws0">1<span class="ff1 sc0">、<span class="_ _2"> </span>数据源说远和理解:</span></div><div class="t m0 x6 h7 y7 ff1 fs4 fc0 sc1 ls0 ws0">采<span class="_ _3"></span>用<span class="_ _3"></span>的<span class="_ _3"></span>实<span class="_ _3"></span>验<span class="_ _3"></span>数<span class="_ _4"></span>据<span class="_ _3"></span>源<span class="_ _3"></span>为<span class="_ _1"></span><span class="fc1">第<span class="_ _5"> </span><span class="ff4"><span class="_ _5"> </span></span>组<span class="_ _3"></span>:<span class="_ _4"></span><span class="ff4"><span class="_ _3"></span><span class="_ _3"></span><span class="_ _6"></span><span class="_ _6"></span><span class="_ _6"></span><span class="_ _6"></span><span class="_ _3"></span></span></span></div><div class="t m0 x6 h8 y8 ff4 fs4 fc1 sc1 ls0 ws0"></div><div class="t m0 x7 h7 y9 ff4 fs4 fc2 sc1 ls0 ws0"><span class="_ _7"></span><span class="_ _7"></span><span class="fc0"><span class="_ _6"></span><span class="ff1">这个<span class="_ _6"></span>数据集用来训<span class="_ _6"></span>练和检验预测<span class="_ _6"></span>模型,并<span class="_ _6"></span>且建立了一个<span class="_ _8"> </span></span> !<span class="_ _9"> </span><span class="ff1">个</span></span></div><div class="t m0 x7 h7 ya ff1 fs4 fc0 sc1 ls0 ws0">客户的记录的描述。每个记录由<span class="_ _5"> </span><span class="ff4">!<span class="_ _9"> </span></span>个属性组成,包含社会人口数据(属性<span class="_ _5"> </span><span class="ff4">"#$%</span>)和</div><div class="t m0 x7 h7 yb ff1 fs4 fc0 sc1 ls0 ws0">产品的<span class="_ _6"></span>所有关<span class="_ _6"></span>系(<span class="_ _6"></span>属性<span class="_ _9"> </span><span class="ff4">$$#!<span class="_ _6"></span></span>)<span class="_ _a"></span>。社会<span class="_ _6"></span>人口数<span class="_ _6"></span>据是<span class="_ _6"></span>由派生<span class="_ _6"></span>邮政编<span class="_ _6"></span>码派生<span class="_ _6"></span>而来<span class="_ _6"></span>的,生<span class="_ _6"></span>活</div><div class="t m0 x7 h7 yc ff1 fs4 fc0 sc1 ls0 ws0">在具<span class="_ _6"></span>有<span class="_ _6"></span>相同<span class="_ _6"></span>邮<span class="_ _6"></span>政编<span class="_ _6"></span>码<span class="_ _6"></span>地<span class="_ _6"></span>区的<span class="_ _6"></span>所<span class="_ _6"></span>有客<span class="_ _6"></span>户<span class="_ _6"></span>都<span class="_ _6"></span>具有<span class="_ _6"></span>相<span class="_ _6"></span>同的<span class="_ _6"></span>社<span class="_ _6"></span>会人<span class="_ _6"></span>口<span class="_ _6"></span>属<span class="_ _6"></span>性。<span class="_ _6"></span>第<span class="_"> </span><span class="ff4">!<span class="_ _5"> </span></span>个属性<span class="_ _6"></span>:<span class="_ _b"></span>“大</div><div class="t m0 x7 h7 yd ff1 fs4 fc0 sc1 ls0 ws0">篷车<span class="_ _6"></span>:<span class="_ _6"></span>家<span class="_ _6"></span>庭移<span class="_ _6"></span>动<span class="_ _6"></span>政策<span class="_ _6"></span>”<span class="_ _6"></span> ,<span class="_ _6"></span>是<span class="_ _6"></span>我<span class="_ _6"></span>们<span class="_ _6"></span>的目<span class="_ _6"></span>标<span class="_ _6"></span>变量<span class="_ _6"></span>。<span class="_ _6"></span><span class="fc2">共<span class="_ _6"></span>有<span class="_ _9"> </span><span class="ff4"> !<span class="_ _5"> </span></span>条记<span class="_ _6"></span>录,<span class="_ _6"></span>根<span class="_ _6"></span>据要<span class="_ _6"></span>求<span class="_ _6"></span>,<span class="_ _6"></span>全部<span class="_ _6"></span>用</span></div><div class="t m0 x7 h7 ye ff1 fs4 fc2 sc1 ls0 ws0">来训练<span class="fc0">。</span></div><div class="t m0 x7 h7 yf ff4 fs4 fc2 sc1 ls0 ws0">&'<span class="_ _c"></span><span class="fc0"><span class="_ _3"></span><span class="_ _6"></span><span class="ff1">这<span class="_ _3"></span>个<span class="_ _6"></span>数<span class="_ _6"></span>据<span class="_ _3"></span>集<span class="_ _6"></span>是<span class="_ _3"></span>需<span class="_ _6"></span>要<span class="_ _3"></span>预<span class="_ _3"></span>测<span class="_ _6"></span>(<span class="_ _6"></span> <span class="_ _3"></span></span>$<span class="_ _5"> </span><span class="ff1">个<span class="_ _6"></span>客<span class="_ _3"></span>户<span class="_ _6"></span>记<span class="_ _6"></span>录<span class="_ _3"></span>)<span class="_ _6"></span>的<span class="_ _3"></span>数<span class="_ _6"></span>据<span class="_ _3"></span>集<span class="_ _6"></span>。<span class="_ _3"></span>它<span class="_ _3"></span>和</span></span></div><div class="t m0 x7 h7 y10 ff4 fs4 fc0 sc1 ls0 ws0"><span class="_ _7"></span><span class="_ _7"></span><span class="_ _5"> </span><span class="ff1">它具有<span class="_ _6"></span>相<span class="_ _6"></span>同<span class="_ _6"></span>的格<span class="_ _6"></span>式<span class="_ _6"></span>,<span class="_ _3"></span>只是<span class="_ _6"></span>没<span class="_ _6"></span>有最<span class="_ _6"></span>后<span class="_ _6"></span>一<span class="_ _6"></span>列的<span class="_ _6"></span>目<span class="_ _6"></span>标记<span class="_ _6"></span>录<span class="_ _4"></span>。我<span class="_ _6"></span>们<span class="_ _6"></span>只希<span class="_ _6"></span>望</span></div><div class="t m0 x7 h7 y11 ff1 fs4 fc0 sc1 ls0 ws0">返回<span class="_ _6"></span>预测<span class="_ _6"></span>目标<span class="_ _6"></span>的列<span class="_ _6"></span>表集<span class="_ _6"></span>,<span class="_ _6"></span>所<span class="_ _6"></span>有数<span class="_ _6"></span>据集<span class="_ _6"></span>都<span class="_ _6"></span>用制<span class="_ _6"></span>表符<span class="_ _6"></span>进行<span class="_ _6"></span>分隔<span class="_ _6"></span>。<span class="fc2">共<span class="_ _6"></span>有<span class="_ _9"> </span><span class="ff4">$%<span class="_ _6"></span></span>(<span class="_ _6"></span>自己<span class="_ _6"></span>加了<span class="_ _6"></span>三</span></div><div class="t m0 x7 h7 y12 ff1 fs4 fc2 sc1 ls0 ws0">条数据),根据要求,用来做预测。</div><div class="t m0 x7 h7 y13 ff4 fs4 fc2 sc1 ls0 ws0">(<span class="_ _d"></span>)<span class="_ _6"></span><span class="ff1 fc0">:<span class="_ _6"></span>最<span class="_ _6"></span>终<span class="_ _3"></span>的目<span class="_ _6"></span>标<span class="_ _6"></span>评<span class="_ _6"></span>估<span class="_ _6"></span>数<span class="_ _3"></span>据。<span class="_ _6"></span>这<span class="_ _6"></span>是<span class="_ _6"></span>一<span class="_ _6"></span>个<span class="_ _3"></span>实际<span class="_ _6"></span>情<span class="_ _6"></span>况<span class="_ _6"></span>下<span class="_ _6"></span>的<span class="_ _3"></span>目标<span class="_ _6"></span>数<span class="_ _6"></span>据<span class="_ _6"></span>,<span class="_ _6"></span>将<span class="_ _6"></span>与</span></div><div class="t m0 x7 h7 y14 ff1 fs4 fc0 sc1 ls0 ws0">我们预测的结果进行校验。我们的预测结果将放在<span class="_ _9"> </span><span class="ff4"><span class="_ _d"></span>*<span class="_ _9"> </span><span class="ff1">文件中。</span></span></div><div class="t m0 x7 h7 y15 ff1 fs4 fc2 sc1 ls0 ws0">数据<span class="_ _6"></span>集<span class="_ _6"></span>理解<span class="_ _6"></span>:<span class="_ _6"></span>本实<span class="_ _6"></span>验<span class="_ _6"></span>任<span class="_ _6"></span>务可<span class="_ _6"></span>以<span class="_ _6"></span>理解<span class="_ _6"></span>为<span class="_ _6"></span>分<span class="_ _6"></span>类问<span class="_ _6"></span>题<span class="_ _6"></span>,即<span class="_ _6"></span>分<span class="_ _6"></span>为<span class="_"> </span><span class="ff4"><span class="_ _5"> </span></span>类,也<span class="_ _6"></span>就<span class="_ _6"></span>是数<span class="_ _6"></span>据<span class="_ _6"></span>源的<span class="_ _6"></span>第<span class="_ _5"> </span><span class="ff4">!</span></div><div class="t m0 x7 h7 y16 ff1 fs4 fc2 sc1 ls0 ws0">列<span class="_ _4"></span>,<span class="_ _4"></span>可<span class="_ _1"></span>以<span class="_ _3"></span>分<span class="_ _1"></span>为<span class="_"> </span><span class="ff4"><span class="_ _4"></span></span>、<span class="_ _1"></span><span class="ff4">"<span class="_ _e"> </span></span>两<span class="_ _4"></span>类<span class="_ _4"></span>。<span class="_ _1"></span>我<span class="_ _3"></span>们<span class="_ _1"></span>首<span class="_ _3"></span>先<span class="_ _1"></span>需<span class="_ _3"></span>要<span class="_ _1"></span>对<span class="_"> </span><span class="ff4"><span class="_ _7"></span><span class="_ _c"></span><span class="_ _8"> </span><span class="ff1">进<span class="_ _4"></span>行<span class="_ _4"></span>训<span class="_ _4"></span>练<span class="_ _1"></span>,<span class="_ _3"></span>生<span class="_ _1"></span>成</span></span></div><div class="t m0 x7 h7 y17 ff4 fs4 fc2 sc1 ls0 ws0">+*<span class="ff1">,再根据<span class="_ _9"> </span></span>+*<span class="_ _9"> </span><span class="ff1">进行预测。</span></div><div class="t m0 x5 h6 y18 ff2 fs1 fc0 sc1 ls0 ws0">2<span class="ff1 sc0">、<span class="_ _2"> </span>数据清理</span></div><div class="t m0 x6 h7 y19 ff1 fs4 fc0 sc1 ls0 ws0">代码中需要对数据集进行缩放的目的在于:</div><div class="t m0 x6 h7 y1a ff4 fs4 fc0 sc1 ls0 ws0"><span class="ff1">、避免一些特征值范围过大而另一些特征值范围过小;</span></div><div class="t m0 x6 h7 y1b ff4 fs4 fc0 sc1 ls0 ws0"><span class="ff1">、<span class="_ _6"></span>避<span class="_ _6"></span>免<span class="_ _6"></span>在<span class="_ _6"></span>训<span class="_ _6"></span>练<span class="_ _6"></span>时<span class="_ _6"></span>为<span class="_ _6"></span>了<span class="_ _6"></span>计算<span class="_ _6"></span>核<span class="_ _6"></span>函<span class="_ _6"></span>数<span class="_ _6"></span>而<span class="_ _6"></span>计<span class="_ _6"></span>算<span class="_ _6"></span>内积<span class="_ _3"></span>的时<span class="_ _6"></span>候引<span class="_ _3"></span>起数<span class="_ _6"></span>值<span class="_ _6"></span>计<span class="_ _6"></span>算的<span class="_ _6"></span>困<span class="_ _6"></span>难<span class="_ _6"></span>。<span class="_ _6"></span>因<span class="_ _6"></span>此<span class="_ _3"></span>,</span></div><div class="t m0 x6 h7 y1c ff1 fs4 fc0 sc1 ls0 ws0">通常将数据缩放到 <span class="ff4">,#"-"./</span>或者是 <span class="ff4">,-"./</span>之间。</div><div class="t m0 x4 h6 y1d ff1 fs1 fc0 sc0 ls0 ws0">二、<span class="_ _1"></span>数据挖掘的算法说明</div><div class="t m0 x6 h6 y1e ff2 fs1 fc0 sc1 ls0 ws0">1<span class="ff1 sc0">、<span class="_ _3"></span></span>svm<span class="_ _9"> </span><span class="ff1 sc0">算法说明</span></div><div class="t m0 x8 h6 y1f ff4 fs1 fc0 sc1 ls0 ws0">)'0<span class="_ _5"> </span><span class="ff1">软<span class="_ _3"></span>件<span class="_ _6"></span>包<span class="_ _6"></span>是<span class="_ _3"></span>台湾<span class="_ _3"></span>大<span class="_ _6"></span>学<span class="_ _6"></span>林<span class="_ _3"></span>智仁<span class="_ _4"></span></span>1#2<span class="_ _6"></span><span class="_ _6"></span>1<span class="_ _6"></span><span class="ff1">博<span class="_ _3"></span>士<span class="_ _6"></span>等<span class="_ _6"></span>用<span class="_ _e"> </span></span>33<span class="_ _3"></span><span class="ff1">实现</span></div><div class="t m0 x8 h6 y20 ff1 fs1 fc0 sc1 ls0 ws0">的<span class="_ _5"> </span><span class="ff4">)'0<span class="_ _e"> </span></span>库<span class="_ _3"></span>,<span class="_ _6"></span>并<span class="_ _6"></span>且<span class="_ _3"></span>拥<span class="_ _6"></span>有<span class="_ _e"> </span><span class="ff4">*4-*<span class="_ _e"> </span></span>等<span class="_ _6"></span>工<span class="_ _3"></span>具<span class="_ _6"></span>箱<span class="_ _6"></span>或<span class="_ _3"></span>者<span class="_ _6"></span>代<span class="_ _3"></span>码<span class="_ _6"></span><span class="ff4">-<span class="_ _3"></span></span>移<span class="_ _6"></span>植<span class="_ _3"></span>和使<span class="_ _3"></span>用<span class="_ _6"></span>都</div><div class="t m0 x8 h6 y21 ff1 fs1 fc0 sc1 ls0 ws0">比较方便<span class="_ _6"></span><span class="ff4"></span>它可以解决<span class="_ _6"></span>分类问题<span class="_ _6"></span><span class="ff4"></span>包括<span class="_ _5"> </span><span class="ff4">#)'</span>、<span class="ff4">#)'</span>、回归<span class="_ _6"></span>问题<span class="ff4"><span class="_ _6"></span></span>包括</div><div class="t m0 x8 h6 y22 ff4 fs1 fc0 sc1 ls0 ws0">#)'5<span class="ff1">、<span class="_ _6"></span></span>#)'5<span class="ff1">以及<span class="_ _6"></span>分布估<span class="_ _6"></span>计<span class="_ _6"></span></span>#*#)'0<span class="_ _6"></span><span class="ff1">等问题<span class="_ _6"></span>,提供<span class="_ _6"></span>了线性<span class="_ _3"></span>、</span></div><div class="t m0 x8 h6 y23 ff1 fs1 fc0 sc1 ls0 ws0">多<span class="_ _6"></span>项<span class="_ _6"></span>式<span class="_ _6"></span>、<span class="_ _3"></span>径向<span class="_ _6"></span>基<span class="_ _3"></span>和<span class="_ _e"> </span><span class="ff4">)<span class="_ _5"> </span></span>形<span class="_ _6"></span>函<span class="_ _6"></span>数<span class="_ _6"></span>四<span class="_ _6"></span>种<span class="_ _3"></span>常用<span class="_ _3"></span>的核<span class="_ _3"></span>函数<span class="_ _3"></span>供选<span class="_ _3"></span>择,<span class="_ _3"></span>可以<span class="_ _6"></span>有<span class="_ _3"></span>效地<span class="_ _6"></span>解</div><div class="t m0 x8 h6 y24 ff1 fs1 fc0 sc1 ls0 ws0">决<span class="_ _6"></span>多<span class="_ _6"></span>类<span class="_ _6"></span>问<span class="_ _3"></span>题、<span class="_ _6"></span>交<span class="_ _3"></span>叉验<span class="_ _3"></span>证选<span class="_ _3"></span>择参<span class="_ _3"></span>数、<span class="_ _3"></span>对不<span class="_ _3"></span>平衡<span class="_ _6"></span>样<span class="_ _3"></span>本加<span class="_ _3"></span><span class="ff5">权<span class="_ _6"></span></span>、<span class="_ _3"></span>多类<span class="_ _6"></span>问<span class="_ _3"></span>题的<span class="_ _3"></span><span class="ff5">概</span></div></div><a class="l" rel='nofollow' onclick='return false;'><div class="d m1"></div></a><a class="l" rel='nofollow' onclick='return false;'><div class="d m1"></div></a></div><div class="pi" data-data='{"ctm":[1.611850,0.000000,0.000000,1.611850,0.000000,0.000000]}'></div></div>
</body>
</html>