Spark20NewsGroup

所属分类:聚类算法
开发工具:Scala
文件大小:21675KB
下载次数:0
上传日期:2015-01-03 19:36:10
上 传 者sh-1993
说明:  20个新闻组数据集Spark中的朴素贝叶斯+TFIDF
(Naive Bayes + TFIDF in Spark on 20 News Group Dataset)

文件列表:
.history (5, 2015-01-04)
build.sbt (377, 2015-01-04)
data (0, 2015-01-04)
data\test (0, 2015-01-04)
data\test\alt.atheism (0, 2015-01-04)
data\test\alt.atheism\53068 (950, 2015-01-04)
data\test\alt.atheism\53257 (3995, 2015-01-04)
data\test\alt.atheism\53260 (2215, 2015-01-04)
data\test\alt.atheism\53261 (1321, 2015-01-04)
data\test\alt.atheism\53262 (7264, 2015-01-04)
data\test\alt.atheism\53265 (1994, 2015-01-04)
data\test\alt.atheism\53272 (2419, 2015-01-04)
data\test\alt.atheism\53276 (696, 2015-01-04)
data\test\alt.atheism\53277 (639, 2015-01-04)
data\test\alt.atheism\53278 (765, 2015-01-04)
data\test\alt.atheism\53279 (1778, 2015-01-04)
data\test\alt.atheism\53280 (2554, 2015-01-04)
data\test\alt.atheism\53293 (4259, 2015-01-04)
data\test\alt.atheism\53294 (1365, 2015-01-04)
data\test\alt.atheism\53297 (956, 2015-01-04)
data\test\alt.atheism\53302 (1543, 2015-01-04)
data\test\alt.atheism\53313 (2403, 2015-01-04)
data\test\alt.atheism\53315 (1172, 2015-01-04)
data\test\alt.atheism\53316 (1348, 2015-01-04)
data\test\alt.atheism\53317 (861, 2015-01-04)
data\test\alt.atheism\53319 (1120, 2015-01-04)
data\test\alt.atheism\53320 (3209, 2015-01-04)
data\test\alt.atheism\53321 (620, 2015-01-04)
data\test\alt.atheism\53322 (2443, 2015-01-04)
data\test\alt.atheism\53324 (5289, 2015-01-04)
data\test\alt.atheism\53325 (3956, 2015-01-04)
data\test\alt.atheism\53326 (1816, 2015-01-04)
data\test\alt.atheism\53327 (2124, 2015-01-04)
data\test\alt.atheism\53328 (4637, 2015-01-04)
data\test\alt.atheism\53329 (1061, 2015-01-04)
data\test\alt.atheism\53331 (1134, 2015-01-04)
data\test\alt.atheism\53332 (3908, 2015-01-04)
data\test\alt.atheism\53333 (1782, 2015-01-04)
... ...

An implementation of TF-IDF + a Naive Bayes Classifier using Apache Spark and Stanford NLP utils. - Clone the repo and cd into it - Run `sbt assembly` to build uber jar - Submit by running `spark-submit --class com.brokendata.NaiveBayesSpark target/scala-2.10/spark20newsgroup-assembly-1.0.jar` from the repo's root. Make sure you have apache spark installed and in your $PATH, you will most likely need create a `$SPARK_HOME/conf/spark-defaults.conf` file and the following: `spark.executor.memory 3g` `spark.driver.memory 4g`

近期下载者

相关文件


收藏者