hnstats

所属分类:特征抽取
开发工具:Java
文件大小:3148KB
下载次数:0
上传日期:2017-09-05 22:19:14
上 传 者sh-1993
说明:  使用word2vec的黑客新闻分析
(HackerNews analytics using word2vec)

文件列表:
_config.yml (26, 2017-09-06)
d3.layout.cloud.js (14613, 2017-09-06)
d3.min.js (149699, 2017-09-06)
data.json (18848, 2017-09-06)
h2-1.4.196.jar (1821816, 2017-09-06)
hnstats-ewan-robertson-208059.jpg (277421, 2017-09-06)
hnstats-ewan-robertson-208059.png (1100447, 2017-09-06)
logging.properties (415, 2017-09-06)
pom.xml (7954, 2017-09-06)
src (0, 2017-09-06)
src\main (0, 2017-09-06)
src\main\webapp (0, 2017-09-06)
src\main\webapp\META-INF (0, 2017-09-06)
src\main\webapp\META-INF\context.xml (38, 2017-09-06)
src\main\webapp\WEB-INF (0, 2017-09-06)
src\main\webapp\WEB-INF\jboss-deployment-structure.xml (143, 2017-09-06)
src\main\webapp\WEB-INF\jboss-web.xml (325, 2017-09-06)
src\main\webapp\WEB-INF\web.xml (423, 2017-09-06)
src\main\webapp\index.html (0, 2017-09-06)
src\test (0, 2017-09-06)
src\test\java (0, 2017-09-06)
src\test\java\test (0, 2017-09-06)
src\test\java\test\BaseUtil.java (12332, 2017-09-06)
src\test\java\test\BasicLineIterator.java (4087, 2017-09-06)
src\test\java\test\DaemonMyStemProvider.java (2187, 2017-09-06)
src\test\java\test\ListSequenceIterator.java (2163, 2017-09-06)
src\test\java\test\NERDemo.java (6800, 2017-09-06)
src\test\java\test\ReadJSON.java (3307, 2017-09-06)
src\test\java\test\StanfordLemmatizer.java (4927, 2017-09-06)
src\test\java\test\TestDumpBigQuery2H2.java (5135, 2017-09-06)
src\test\java\test\TestLemmatizer.java (671, 2017-09-06)
src\test\java\test\TestMinio.java (2122, 2017-09-06)
src\test\java\test\TestNER.java (2338, 2017-09-06)
src\test\java\test\TestShowH2.java (1207, 2017-09-06)
src\test\java\test\TestStemming.java (4558, 2017-09-06)
src\test\java\test\TestWord2Vec.java (4696, 2017-09-06)
src\test\java\test\TestWord2VecDB.java (4018, 2017-09-06)
... ...

![HackerNews analytics](https://github.com/wizecore/hnstats/blob/master/hnstats-ewan-robertson-208059.png) Using available [HackerNews](https://github.com/wizecore/hnstats/blob/master/https://news.ycombinator.com) dataset produce some insight into the most meaningful topics. ## Ultimate reason * Most discussed topics (and yearly shift) * Top technology and startups ## Technology behind it * Java * Deeplearning4J * Word2vec * Stanford CoreNLP (lemmatizing) ## Project [Online version](https://github.com/wizecore/hnstats/blob/master/http://wizecore.com/hnstats/terms.html) ## Roadmap - Gather data (DONE) - Produce JSON (DONE) - For selected terms - related words trending through years 2007 - 2017 (DONE) - All terms - display counts every year (TODO) - Term cleanup (DONE) - Auto build SPA, i.e. push -> CI -> deploy (TODO) - Fine tune word2vec params (see below) ## Source repo Project is hosted on [GitHub](https://github.com/wizecore/hnstats/blob/master/https://github.com/wizecore/hnstats) ## Word2vec tuning **Help is welcome** in fine-tuning word2vec parameters. Here is current setup: ```java Word2Vec vec = new Word2Vec.Builder() .minWordFrequency(5) .iterations(1) .layerSize(100) .seed(System.currentTimeMillis()) .windowSize(5) .iterate(iter) .tokenizerFactory(t) .build(); ```

近期下载者

相关文件


收藏者