• 秦冰
    了解作者
  • WINDOWS
    开发工具
  • 205KB
    文件大小
  • zip
    文件格式
  • 0
    收藏次数
  • 1 积分
    下载积分
  • 6
    下载次数
  • 2017-12-18 11:39
    上传日期
kaggle比赛中亚马逊广告预估rank2,其中包括相关的数据和代码
kaggle-avazu-rank2.zip
  • kaggle-avazu-rank2
  • _3c_vw.py
    6.2KB
  • _3d_fm.py
    3.4KB
  • _0_run_me.sh
    1.4KB
  • _3b_gbdt.py
    1.9KB
  • _4_post_processing.py
    2.6KB
  • _2b_generate_dataset_for_vw_fm.py
    2.7KB
  • _2c_generate_fm_features.py
    3.4KB
  • _1_encode_cat_features.py
    7.9KB
  • _3a_rf.py
    2.8KB
  • LICENSE.txt
    551B
  • READ.me
    719B
  • AvazuModelDocumentation.pdf
    192.1KB
  • utils.py
    16.4KB
内容介绍
<html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta charset="utf-8"> <meta name="generator" content="pdf2htmlEX"> <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"> <link rel="stylesheet" href="https://static.pudn.com/base/css/base.min.css"> <link rel="stylesheet" href="https://static.pudn.com/base/css/fancy.min.css"> <link rel="stylesheet" href="https://static.pudn.com/prod/directory_preview_static/626511d04f8811599e7380fa/raw.css"> <script src="https://static.pudn.com/base/js/compatibility.min.js"></script> <script src="https://static.pudn.com/base/js/pdf2htmlEX.min.js"></script> <script> try{ pdf2htmlEX.defaultViewer = new pdf2htmlEX.Viewer({}); }catch(e){} </script> <title></title> </head> <body> <div id="sidebar" style="display: none"> <div id="outline"> </div> </div> <div id="pf1" class="pf w0 h0" data-page-no="1"><div class="pc pc1 w0 h0"><img class="bi x0 y0 w1 h1" alt="" src="https://static.pudn.com/prod/directory_preview_static/626511d04f8811599e7380fa/bg1.jpg"><div class="t m0 x1 h2 y1 ff1 fs0 fc0 sc0 ls0 ws0">Winning&#58883;model&#58883;documentation&#58883;</div><div class="t m0 x1 h3 y2 ff1 fs1 fc0 sc0 ls0 ws0">&#58883;</div><div class="t m0 x1 h3 y3 ff1 fs1 fc0 sc0 ls0 ws0">Name:&#58883;<span class="_ _0"> </span>Owen&#58883;Zhang&#58883;</div><div class="t m0 x1 h3 y4 ff1 fs1 fc0 sc0 ls0 ws0">Location:&#58883;<span class="_ _1"> </span>NJ,&#58883;USA<span class="_ _2"></span>&#58883;</div><div class="t m0 x1 h3 y5 ff1 fs1 fc0 sc0 ls0 ws0">Email:&#58883;<span class="_ _3"> </span><span class="fc1">zhonghua.zhang2006@gmail.com</span>&#58883;</div><div class="t m0 x1 h3 y6 ff1 fs1 fc0 sc0 ls0 ws0">Competition:&#58883;<span class="_ _4"> </span>Click&#58896;through&#58883;rate&#58883;prediction<span class="_ _2"></span>&#58883;</div><div class="t m0 x1 h3 y7 ff1 fs1 fc0 sc0 ls0 ws0">&#58883;</div><div class="t m0 x2 h3 y8 ff1 fs1 fc0 sc0 ls0 ws0">1.<span class="_ _5"> </span>Summary&#58883;</div><div class="t m0 x1 h3 y9 ff1 fs1 fc0 sc0 ls0 ws0">The&#58883;final&#58883;solution&#58883;is&#58883;a&#58883;manually&#58883;tuned&#58883;blend&#58883;(based&#58883;on&#58883;PB&#58883;feedback)&#58883;of&#58883;4&#58883;different&#58883;models&#58883;</div><div class="t m0 x1 h3 ya ff1 fs1 fc0 sc0 ls0 ws0">(RandomForest/sklearn,&#58883;GBDT/xgboost,&#58883;OnlineSGD/Vowpal&#58883;Wabbit,&#58883;Factorization&#58883;machine/3<span class="_ _6"></span>&#58883;</div><div class="t m0 x1 h3 yb ff1 fs1 fc0 sc0 ls0 ws0">idiots).&#58883;This&#58883;solution&#58883;is&#58883;largely&#58883;based&#58883;on&#58883;3&#58883;idiot&#8217;s&#58883;winning&#58883;solution&#58883;to&#58883;the&#58883;Criteo&#58883;competition<span class="_ _6"></span>&#58883;</div><div class="t m0 x1 h4 yc ff1 fs1 fc0 sc0 ls0 ws0">(<span class="ff2">&#59701;<span class="_ _7"></span><span class="ff1 fc1">https://github.com/guestwalk/kaggle&#58896;2014&#58896;criteo<span class="ff2">&#59701;<span class="_ _7"></span><span class="ff1 fc0">)&#58883;with&#58883;moderate&#58883;amount&#58883;of&#58883;feature&#58883;</span></span></span></span></div><div class="t m0 x1 h3 yd ff1 fs1 fc0 sc0 ls0 ws0">engineering&#58883;and&#58883;manual&#58883;tuning.&#58883;</div><div class="t m0 x1 h3 ye ff1 fs1 fc0 sc0 ls0 ws0">&#58883;</div><div class="t m0 x2 h3 yf ff1 fs1 fc0 sc0 ls0 ws0">2.<span class="_ _5"> </span>Feature&#58883;Selection&#58883;/&#58883;Extraction&#58883;</div><div class="t m0 x3 h3 y10 ff1 fs1 fc0 sc0 ls0 ws0">A&#58883;few&#58883;different&#58883;approaches&#58883;were&#58883;utilized&#58883;for&#58883;feature&#58883;engineering:&#58883;</div><div class="t m0 x3 h3 y11 ff1 fs1 fc0 sc0 ls0 ws0">2a.&#58883;Combining&#58883;site/app&#58883;based&#58883;features.&#58883;These&#58883;features&#58883;are&#58883;complementary&#58883;(in&#58883;the<span class="_ _2"></span>&#58883;</div><div class="t m0 x3 h3 y12 ff1 fs1 fc0 sc0 ls0 ws0">sense&#58883;of&#58883;if&#58883;one&#58883;is&#58883;missing&#58883;the&#58883;other&#58883;one&#58883;is&#58883;not),&#58883;so&#58883;combining&#58883;them&#58883;will&#58883;at&#58883;least&#58883;save&#58883;</div><div class="t m0 x3 h3 y13 ff1 fs1 fc0 sc0 ls0 ws0">space.&#58883;<span class="_ _6"></span>&#58883;</div><div class="t m0 x3 h3 y14 ff1 fs1 fc0 sc0 ls0 ws0">2b.&#58883;Prior&#58883;day&#58883;mean(y)&#58883;encoding&#58883;for&#58883;categorical&#58883;features.&#58883;These&#58883;are&#58883;done&#58883;in&#58883;both<span class="_ _2"></span>&#58883;</div><div class="t m0 x3 h3 y15 ff1 fs1 fc0 sc0 ls0 ws0">univerate&#58883;and&#58883;multivariate&#58883;approaches&#58883;</div><div class="t m0 x3 h3 y16 ff1 fs1 fc0 sc0 ls0 ws0">2c.&#58883;Counts&#58883;and&#58883;sequence&#58883;of&#58883;device_ip.&#58883;Device_ip&#58883;seems&#58883;to&#58883;be&#58883;a&#58883;reasonable&#58883;proxy&#58883;of&#58883;</div><div class="t m0 x3 h3 y17 ff1 fs1 fc0 sc0 ls0 ws0">user&#58883;identity.&#58883;</div><div class="t m0 x3 h3 y18 ff1 fs1 fc0 sc0 ls0 ws0">2d.&#58883;Factorization&#58883;Machine&#58883;based&#58883;predictions&#58883;using&#58883;raw&#58883;features&#58883;and<span class="_ _6"></span>&#58883;</div><div class="t m0 x3 h3 y19 ff1 fs1 fc0 sc0 ls0 ws0">counts/sequences.&#58883;</div><div class="t m0 x3 h3 y1a ff1 fs1 fc0 sc0 ls0 ws0">2e.&#58883;GBDT&#58883;predicted&#58883;leaf&#58883;node&#58883;using&#58883;raw&#58883;features,&#58883;counts/sequences,&#58883;and&#58883;prior&#58883;day&#58883;</div><div class="t m0 x3 h3 y1b ff1 fs1 fc0 sc0 ls0 ws0">mean(y)&#58883;encoded&#58883;categorical&#58883;features.&#58883;</div><div class="t m0 x3 h3 y1c ff1 fs1 fc0 sc0 ls0 ws0">2f.&#58883;Some&#58883;manual&#58883;interactions,&#58883;especially&#58883;app_site_id&#58883;*&#58883;C14&#58896;21&#58883;</div><div class="t m0 x1 h3 y1d ff1 fs1 fc0 sc0 ls0 ws0">&#58883;&#58883;</div><div class="t m0 x2 h3 y1e ff1 fs1 fc0 sc0 ls0 ws0">3.<span class="_ _5"> </span>Modeling&#58883;techniques&#58883;and&#58883;Training&#58883;</div><div class="t m0 x4 h3 y1f ff1 fs1 fc0 sc0 ls0 ws0">a.<span class="_ _5"> </span>I&#58883;used&#58883;day&#58883;30&#58883;as&#58883;validation&#58883;and&#58883;had&#58883;very&#58883;stable&#58883;(and&#58883;comparable&#58883;in&#58883;score&#58883;</div><div class="t m0 x5 h3 y20 ff1 fs1 fc0 sc0 ls0 ws0">movement)&#58883;results&#58883;throughout&#58883;the&#58883;competition.&#58883;</div><div class="t m0 x4 h3 y21 ff1 fs1 fc0 sc0 ls0 ws0">b.<span class="_ _5"> </span>3&#58883;idiots&#8217;&#58883;factorization&#58883;machine&#58883;turns&#58883;to&#58883;be&#58883;extremely&#58883;effective&#58883;in&#58883;this&#58883;problem.&#58883;</div><div class="t m0 x5 h3 y22 ff1 fs1 fc0 sc0 ls0 ws0">FM&#58883;based&#58883;models&#58883;are&#58883;the&#58883;best&#58883;individual&#58883;models&#58883;in&#58883;this&#58883;solution.&#58883;It&#58883;is&#58883;worth<span class="_ _2"></span>&#58883;</div><div class="t m0 x5 h3 y23 ff1 fs1 fc0 sc0 ls0 ws0">noting&#58883;that&#58883;they&#58883;outperform&#58883;VW&#58883;with&#58883;manually&#58883;built&#58883;2&#58883;way&#58883;interactions.&#58883;</div><div class="t m0 x6 h3 y24 ff1 fs1 fc0 sc0 ls0 ws0">i.<span class="_ _8"> </span>Best&#58883;FM&#58883;with&#58883;GBDT&#58883;features&#58883;get&#58883;~.3830&#58883;on&#58883;public&#58883;LB&#58883;</div><div class="t m0 x4 h3 y25 ff1 fs1 fc0 sc0 ls0 ws0">c.<span class="_ _9"> </span>I&#58883;spent&#58883;fair&#58883;amount&#58883;of&#58883;time&#58883;tuning&#58883;VW&#58883;(vowpal&#58883;wabbit)&#58883;models,&#58883;espcially&#58883;around<span class="_ _6"></span>&#58883;</div><div class="t m0 x5 h3 y26 ff1 fs1 fc0 sc0 ls0 ws0">interactions.&#58883;I&#58883;defined&#58883;name&#58883;space&#58883;by&#58883;feature&#58883;type&#58883;(C14&#58896;21,&#58883;device,&#58883;app/site,&#58883;</div><div class="t m0 x5 h3 y27 ff1 fs1 fc0 sc0 ls0 ws0">device&#58883;id/ip,&#58883;GBDT&#58883;prediction)&#58883;and&#58883;tested&#58883;two&#58883;way&#58883;interactions&#58883;through&#58883;ad&#58896;hoc&#58883;</div><div class="t m0 x5 h3 y28 ff1 fs1 fc0 sc0 ls0 ws0">(almost&#58883;a&#58883;step&#58896;wise)&#58883;process.&#58883;</div><div class="t m0 x6 h3 y29 ff1 fs1 fc0 sc0 ls0 ws0">i.<span class="_ _8"> </span>I&#58883;tried&#58883;VW&#58883;built&#58896;in&#58883;FTRL&#58883;optimization&#58883;but&#58883;cannot&#58883;get&#58883;it&#58883;to&#58883;perform&#58883;better&#58883;</div><div class="t m0 x7 h3 y2a ff1 fs1 fc0 sc0 ls0 ws0">than&#58883;the&#58883;default&#58883;adaptive&#58883;procedure.&#58883;</div><a class="l" rel='nofollow' onclick='return false;'><div class="d m1"></div></a><a class="l" rel='nofollow' onclick='return false;'><div class="d m1"></div></a></div><div class="pi" data-data='{"ctm":[1.568627,0.000000,0.000000,1.568627,0.000000,0.000000]}'></div></div> </body> </html>
评论
    相关推荐
    • kaggle-avazu-rank1.zip
      Kaggle比赛中对于亚马逊的广告点击率预估rank1,其中包含相关数据和代码
    • Amazon 食品评论数据数据集
      截止2012年10月份在Amazon网站上568454条食品评论数据,包括用户、评论内容、评论食品、食品评分等数据,数据来自Kaggle.com。
    • Amazon:Kaggle亚马逊竞赛
      该代码产生了Kaggle亚马逊访问竞赛第一名代码的一部分。 我的搭档Paul Duan负责其他部分。 当时,我们还具有用于混合各种模型输出的代码。 请参阅: : 和: : 还包括一个使用相同数据的ipython笔记本。 我用...
    • Keras-LSTM-Text-Normalization:该存储库源于Google研究人员举办的持续Kaggle竞赛。 正在尝
      该存储库源于Google研究人员赞助的正在进行的Kaggle竞赛-文本规范化。 对于TTS(文本到语音)系统,需要在将原始文本输入到系统之前对其进行规范化,以便可以从中生成语音。 例如,如果将文本“ 123”输入算法,则...
    • pytorch-kaggle-starter:用于Kaggle比赛的Pytorch入门套件
      Pytorch Kaggle入门程序是用于管理Kaggle比赛中的实验的框架。 通过提供一组用于模型训练,数据加载,调整学习率,进行预测,汇总模型和格式化提交内容的辅助功能,它减少了第一次提交的时间。 内部是示例Jupyter...
    • kaggle-ndsb-visualization:可视化 Kaggle NDSB 训练图像的脚本
      kaggle-ndsb-可视化 这是一组脚本,用于可视化来自比赛的浮游生物图像。 每个训练类中的图像被编译成一个单一的马赛克图像,并创建一个气泡图,根据提供的分类法对马赛克进行分组。 由于此图像的尺寸很大,因此它被...
    • planet-amazon-deforestation:Kaggle亚马逊森林破坏竞赛的开源存储库https
      Kaggle亚马逊森林砍伐挑战 该存储库包含的源代码 如何使用和查看Jupyter笔记本 将jupyter笔记本剥离为.py文件,以便在notebook/文件夹中进行版本控制。 要重新创建原始.ipynb文件并将其与jupyter结合使用,请执行...
    • kaggle-crowdflower:Kaggle“搜索结果相关性”第二名解决方案
      Kaggle 第二名解决方案 米哈伊尔·特罗菲莫夫(Mikhail Trofimov),斯坦尼斯拉夫·塞梅诺夫(Stanislav Semenov),德米特里·阿尔图霍夫 在私人排行榜上获得0.71881分 如何复制提交 不要忘记检查./cfg.py路径! ...
    • Kaggle-StackOverflow-Vis:提交Stack Exchange Kaggle可视化比赛
      Kaggle堆栈溢出可视化比赛参赛作品 ... 工作示例位于: http : //ec2-50-17-90-141.compute-1.amazonaws.com/Kaggle/topTagRelations.html 如我们所见,类似技术的标签具有很高的关联性,例如php-mysql,HTML-j
    • SIM800C_MQTT.rar
      使用SIM800C模块,使用MQTT协议,连接中国移动onenet平台,能实现数据的订阅、发布、存储等