news-combinator

所属分类:matlab编程
开发工具:C++
文件大小:0KB
下载次数:0
上传日期:2020-09-30 21:29:06
上 传 者sh-1993
说明:  使用爬虫获取新闻,组合相似的内容,并显示来自不同网站的评论
(Use crawlers to get news, combine the similar ones and display their comments from different websites)

文件列表:
chnsegmt/ (0, 2016-11-26)
chnsegmt/basicfuncs.py (2523, 2016-11-26)
chnsegmt/categorize.py (3907, 2016-11-26)
chnsegmt/extracttags.py (1401, 2016-11-26)
chnsegmt/findsimilarpassage.py (2280, 2016-11-26)
chnsegmt/getabstract.py (3970, 2016-11-26)
chnsegmt/jieba_example/ (0, 2016-11-26)
chnsegmt/jieba_example/1.log (89762, 2016-11-26)
chnsegmt/jieba_example/2.log (77951, 2016-11-26)
chnsegmt/jieba_example/dict/ (0, 2016-11-26)
chnsegmt/jieba_example/dict/userdict.txt (110, 2016-11-26)
chnsegmt/jieba_example/docs/ (0, 2016-11-26)
chnsegmt/jieba_example/docs/000913.json (5064, 2016-11-26)
chnsegmt/jieba_example/docs/000913.tags (72, 2016-11-26)
chnsegmt/jieba_example/docs/example/ (0, 2016-11-26)
chnsegmt/jieba_example/docs/example/1-1-29416628.json (3819, 2016-11-26)
chnsegmt/jieba_example/docs/example/1-1-29416628.tags (87, 2016-11-26)
chnsegmt/jieba_example/docs/example/1-杈藉畞闃滄柊瀹樺憳娑夊珜娣涔变簨浠朵妇鎶ヨ呰鍒戞嫎.txt (966, 2016-11-26)
chnsegmt/jieba_example/docs/example/18届三中UTF8.txt (66140, 2016-11-26)
chnsegmt/jieba_example/docs/example/18灞婁笁涓鍏ㄤ細.txt (44267, 2016-11-26)
chnsegmt/jieba_example/docs/example/4-English.txt (5593, 2016-11-26)
chnsegmt/jieba_example/docs/example/涓鑻辨枃娣锋潅绀轰緥.txt (157, 2016-11-26)
chnsegmt/jieba_example/docs/example/用户词典.txt (966, 2016-11-26)
chnsegmt/jieba_example/jb_f1_cut.py (532, 2016-11-26)
chnsegmt/jieba_example/jb_f2_userdict.py (708, 2016-11-26)
chnsegmt/jieba_example/jb_f3_extracttags.py (523, 2016-11-26)
chnsegmt/jieba_example/jb_f4_posseg.py (289, 2016-11-26)
chnsegmt/jieba_example/jb_f5_parallel.py (355, 2016-11-26)
chnsegmt/jieba_example/jb_f6_tokenize.py (661, 2016-11-26)
chnsegmt/jieba_example/jb_f7_chnanalyserforwhoosh.py (1725, 2016-11-26)
chnsegmt/jieba_example/test_extracttags.py (139, 2016-11-26)
chnsegmt/jieba_example/test_optionparser.py (417, 2016-11-26)
chnsegmt/jieba_example/tmp/ (0, 2016-11-26)
chnsegmt/jieba_example/tmp/MAIN_WRITELOCK (0, 2016-11-26)
chnsegmt/jieba_example/tmp/MAIN_umsn2xib579lmm0w.seg (12036, 2016-11-26)
chnsegmt/jieba_example/tmp/_MAIN_1.toc (1764, 2016-11-26)
chnsegmt/main.py (1209, 2016-11-26)
... ...

News Crawler ============ A repo for news crawling. Then combine similar news. Temporarily used python and Scrapy framework. Used Jieba and Scrapy Version 1.0 is the current directory Version 2.0 is in the `reconstruction` directory 1.0版本的代码参见当前目录 2.0版本的代码参见目录`reconstruction` TODO ==== 1. Find a proper Chinese segmentation tool 2. Split a JSON file into small files, every file contains only one piece of news 3. Read some articles about SVM (Did not use SVM, but tfidf and cosin similarity) 4. Try to categorize different news 5. Build another IDF dictionary from web news 6. Categorize those similar passages in sina and netease but not in tencent 7. Make a website display the results and show the comments 8. Improve categorization performance 9. Make the website more beautiful! 10. Reconstruct this project with php Website ======= http://news.reetsee.com

近期下载者

相关文件


收藏者