twitter_persian_news_tagcloud

所属分类:数据挖掘/数据仓库
开发工具:Java
文件大小:0KB
下载次数:0
上传日期:2019-10-27 11:26:21
上 传 者sh-1993
说明:  标记云生成器,从波斯新闻机构的Twitter页面中提取热门关键字
(Tag cloud generator that extracts hot keywords from Twitter page of a Persian news agency)

文件列表:
ProjectNews.jar (3306625, 2019-10-27)
corpus/ (0, 2019-10-27)
corpus/Tasnimnews_Fa_2017-12-01_2017-12-31.csv (258117, 2019-10-27)
corpus/Tasnimnews_Fa_2018-01-01_2018-01-31.csv (293422, 2019-10-27)
corpus/Tasnimnews_Fa_2018-02-01_2018-02-28.csv (239808, 2019-10-27)
corpus/Tasnimnews_Fa_2018-03-01_2018-03-31.csv (215746, 2019-10-27)
corpus/Tasnimnews_Fa_2018-04-01_2018-04-30.csv (281245, 2019-10-27)
corpus/Tasnimnews_Fa_2018-05-01_2018-05-31.csv (268779, 2019-10-27)
corpus/Tasnimnews_Fa_2018-06-01_2018-06-30.csv (318236, 2019-10-27)
docs/ (0, 2019-10-27)
docs/ProjectNews.pdf (80124, 2019-10-27)
docs/ProjectNews.tex (15347, 2019-10-27)
docs/results_economics.csv (545, 2019-10-27)
docs/results_social.csv (433, 2019-10-27)
economy.txt (2506, 2019-10-27)
economy_seperated.txt (1500, 2019-10-27)
pom.xml (2223, 2019-10-27)
results/ (0, 2019-10-27)
results/Economy_Tasnimnews_Fa_2018-04-01_2018-04-30 (4521, 2019-10-27)
results/Economy_Tasnimnews_Fa_2018-05-01_2018-05-31 (2567, 2019-10-27)
results/Economy_Tasnimnews_Fa_2018-06-01_2018-06-30 (5729, 2019-10-27)
results/Social_Tasnimnews_Fa_2018-04-01_2018-04-30 (1135, 2019-10-27)
results/Social_Tasnimnews_Fa_2018-05-01_2018-05-31 (2065, 2019-10-27)
results/Social_Tasnimnews_Fa_2018-06-01_2018-06-30 (2068, 2019-10-27)
social.txt (1004, 2019-10-27)
social_seperated.txt (1004, 2019-10-27)
src/ (0, 2019-10-27)
src/main/ (0, 2019-10-27)
src/main/java/ (0, 2019-10-27)
src/main/java/META-INF/ (0, 2019-10-27)
src/main/java/META-INF/MANIFEST.MF (75, 2019-10-27)
src/main/java/ir/ (0, 2019-10-27)
src/main/java/ir/ac/ (0, 2019-10-27)
src/main/java/ir/ac/um/ (0, 2019-10-27)
src/main/java/ir/ac/um/ce/ (0, 2019-10-27)
src/main/java/ir/ac/um/ce/projectnews/ (0, 2019-10-27)
src/main/java/ir/ac/um/ce/projectnews/crawler/ (0, 2019-10-27)
... ...

## Twitter Persian news tagcloud extraction Final project of Information retrieval course. TPNT is a Tag cloud generator that extracts hot keywords from Twitter page of a major Persian news agency in the fields of **Economics** and **Socials** for each month in a year. --- ### Dependencies * `GetOldTweets-java v1.2.0` * `Lucene 7.2.1` ### News agency * Tasnim News([@TasnimNews_Fa](https://twitter.com/tasnimnews_fa)) ### How to Run This project has to main steps. First, twitts are stored in a `csv` file with the help of `Crawler` class. this class needs some **options** to work properly: | Flag | Desc | Requisition | | ------------ | ------------- | ------------- | | `-i` | The Id of twitter page | `required` | | `-s` | Start date of extraction, format: `YYY-MM-DD` | `required` | | `-e` | End date of extraction, format: `YYY-MM-DD` | no | | `-m` | Limitation in the number of retrieved twitts | no | | `-p` | Path of csv file | no | | `-n` | Name of csv file | no | An example for retrieving twitts from ([@TasnimNews_Fa](https://twitter.com/tasnimnews_fa)) starting from 2018-06-01 to 2018-07-01 in `$PWD/result/` path: ```bash java -cp ProjectNews.jar ir.ac.um.ce.projectnews.crawler.Crawler -i Tasnimnews_Fa -s 2018-06-01 -e 2018-07-01 -p result/ ``` The next step is indexing docs. After removing stop-words from docs we use `Searcher` and `Classifier` classes plus a Bag of word to create some queries to estimate the correlation of each doc with context. Finally, we use the most corrolated words to generate a tag clud. ### Contributors * [Mehdi Akbarian-Rastaghi](https://github.com/makbn) * [Mohammad-Reza Daliri](https://github.com/mrdaliri)

近期下载者

相关文件


收藏者