twitter_persian_news_tagcloud
所属分类:数据挖掘/数据仓库
开发工具:Java
文件大小:0KB
下载次数:0
上传日期:2019-10-27 11:26:21
上 传 者:
sh-1993
说明: 标记云生成器,从波斯新闻机构的Twitter页面中提取热门关键字
(Tag cloud generator that extracts hot keywords from Twitter page of a Persian news agency)
文件列表:
ProjectNews.jar (3306625, 2019-10-27)
corpus/ (0, 2019-10-27)
corpus/Tasnimnews_Fa_2017-12-01_2017-12-31.csv (258117, 2019-10-27)
corpus/Tasnimnews_Fa_2018-01-01_2018-01-31.csv (293422, 2019-10-27)
corpus/Tasnimnews_Fa_2018-02-01_2018-02-28.csv (239808, 2019-10-27)
corpus/Tasnimnews_Fa_2018-03-01_2018-03-31.csv (215746, 2019-10-27)
corpus/Tasnimnews_Fa_2018-04-01_2018-04-30.csv (281245, 2019-10-27)
corpus/Tasnimnews_Fa_2018-05-01_2018-05-31.csv (268779, 2019-10-27)
corpus/Tasnimnews_Fa_2018-06-01_2018-06-30.csv (318236, 2019-10-27)
docs/ (0, 2019-10-27)
docs/ProjectNews.pdf (80124, 2019-10-27)
docs/ProjectNews.tex (15347, 2019-10-27)
docs/results_economics.csv (545, 2019-10-27)
docs/results_social.csv (433, 2019-10-27)
economy.txt (2506, 2019-10-27)
economy_seperated.txt (1500, 2019-10-27)
pom.xml (2223, 2019-10-27)
results/ (0, 2019-10-27)
results/Economy_Tasnimnews_Fa_2018-04-01_2018-04-30 (4521, 2019-10-27)
results/Economy_Tasnimnews_Fa_2018-05-01_2018-05-31 (2567, 2019-10-27)
results/Economy_Tasnimnews_Fa_2018-06-01_2018-06-30 (5729, 2019-10-27)
results/Social_Tasnimnews_Fa_2018-04-01_2018-04-30 (1135, 2019-10-27)
results/Social_Tasnimnews_Fa_2018-05-01_2018-05-31 (2065, 2019-10-27)
results/Social_Tasnimnews_Fa_2018-06-01_2018-06-30 (2068, 2019-10-27)
social.txt (1004, 2019-10-27)
social_seperated.txt (1004, 2019-10-27)
src/ (0, 2019-10-27)
src/main/ (0, 2019-10-27)
src/main/java/ (0, 2019-10-27)
src/main/java/META-INF/ (0, 2019-10-27)
src/main/java/META-INF/MANIFEST.MF (75, 2019-10-27)
src/main/java/ir/ (0, 2019-10-27)
src/main/java/ir/ac/ (0, 2019-10-27)
src/main/java/ir/ac/um/ (0, 2019-10-27)
src/main/java/ir/ac/um/ce/ (0, 2019-10-27)
src/main/java/ir/ac/um/ce/projectnews/ (0, 2019-10-27)
src/main/java/ir/ac/um/ce/projectnews/crawler/ (0, 2019-10-27)
... ...
## Twitter Persian news tagcloud extraction
Final project of Information retrieval course.
TPNT is a Tag cloud generator that extracts hot keywords from Twitter page of a major Persian news agency in the fields of **Economics** and **Socials** for each month in a year.
---
### Dependencies
* `GetOldTweets-java v1.2.0`
* `Lucene 7.2.1`
### News agency
* Tasnim News([@TasnimNews_Fa](https://twitter.com/tasnimnews_fa))
### How to Run
This project has to main steps. First, twitts are stored in a `csv` file with the help of `Crawler` class. this class needs some **options** to work properly:
| Flag | Desc | Requisition |
| ------------ | ------------- | ------------- |
| `-i` | The Id of twitter page | `required` |
| `-s` | Start date of extraction, format: `YYY-MM-DD` | `required` |
| `-e` | End date of extraction, format: `YYY-MM-DD` | no |
| `-m` | Limitation in the number of retrieved twitts | no |
| `-p` | Path of csv file | no |
| `-n` | Name of csv file | no |
An example for retrieving twitts from ([@TasnimNews_Fa](https://twitter.com/tasnimnews_fa)) starting from 2018-06-01 to 2018-07-01 in `$PWD/result/` path:
```bash
java -cp ProjectNews.jar ir.ac.um.ce.projectnews.crawler.Crawler -i Tasnimnews_Fa -s 2018-06-01 -e 2018-07-01 -p result/
```
The next step is indexing docs. After removing stop-words from docs we use `Searcher` and `Classifier` classes plus a Bag of word to create some queries to estimate the correlation of each doc with context. Finally, we use the most corrolated words to generate a tag clud.
### Contributors
* [Mehdi Akbarian-Rastaghi](https://github.com/makbn)
* [Mohammad-Reza Daliri](https://github.com/mrdaliri)
近期下载者:
相关文件:
收藏者: