news-event-detection

所属分类:内容生成
开发工具:Python
文件大小:0KB
下载次数:0
上传日期:2019-09-23 13:42:09
上 传 者sh-1993
说明:  本项目旨在开发一个新闻事件检测框架,该框架将从tweet中检测有新闻价值的事件,并从...、...,
(This project aims to develop a news event detection framework that will detect newsworthy events from tweets and generate headlines from them. We will first pass the tweets through threshold based clustering techniques to cluster tweets corresponding to a certain event based on their similarity, retweet map and Links. Finally, we will use those ...)

文件列表:
.idea/ (0, 2019-09-23)
.idea/inspectionProfiles/ (0, 2019-09-23)
.idea/inspectionProfiles/Project_Default.xml (432, 2019-09-23)
.idea/misc.xml (196, 2019-09-23)
.idea/modules.xml (292, 2019-09-23)
.idea/twitternewsdetection.iml (398, 2019-09-23)
.idea/vcs.xml (180, 2019-09-23)
DatasetA/ (0, 2019-09-23)
DatasetA/Data/ (0, 2019-09-23)
DatasetA/Data/mytweet02.csv (987491, 2019-09-23)
DatasetA/Data/mytweet15 May.csv (1068665, 2019-09-23)
DatasetA/Data/mytweet1stMarch.csv (1052196, 2019-09-23)
DatasetA/Data/mytweet2 (1531680, 2019-09-23)
DatasetA/Data/mytweet2ndMarch.csv (1051792, 2019-09-23)
DatasetA/EventDetection/ (0, 2019-09-23)
DatasetA/EventDetection/ClusterClass.py (2680, 2019-09-23)
DatasetA/EventDetection/DynammicClustering.py (9879, 2019-09-23)
DatasetA/EventDetection/MainFile.py (3135, 2019-09-23)
DatasetA/EventDetection/clusterb.py (10650, 2019-09-23)
DatasetA/EventDetection/index.py (6402, 2019-09-23)
DatasetA/EventDetection/tagging.py (4684, 2019-09-23)
DatasetA/MyOutputs/ (0, 2019-09-23)
DatasetA/MyOutputs/clusters.csv (190450, 2019-09-23)
DatasetA/MyOutputs/clustersIds.csv (128823, 2019-09-23)
DatasetA/MyOutputs/slang2.csv (82378, 2019-09-23)
DatasetA/MyOutputs/summary.csv (24749, 2019-09-23)
DatasetA/MyOutputs/topicsA.csv (13344, 2019-09-23)
DatasetA/Scrapper/ (0, 2019-09-23)
DatasetA/Scrapper/scrapper2.py (3549, 2019-09-23)
DatasetA/Training/ (0, 2019-09-23)
DatasetA/Training/Scores (324, 2019-09-23)
DatasetA/Training/clusterb.py (10386, 2019-09-23)
DatasetA/Training/index.py (6412, 2019-09-23)
DatasetA/Training/tagging.py (3810, 2019-09-23)
DatasetA/Training/trainingOutputs/ (0, 2019-09-23)
DatasetA/Training/trainingOutputs/clusters.csv (923607, 2019-09-23)
DatasetA/Training/trainingOutputs/clustersIds.csv (270395, 2019-09-23)
DatasetA/Training/trainingOutputs/clustersIds03_02_03.csv (257335, 2019-09-23)
DatasetA/Training/trainingOutputs/clustersIds03_04_04.csv (253194, 2019-09-23)
DatasetA/Training/trainingOutputs/clustersIds05_06_03.csv (276531, 2019-09-23)
... ...

# Twitter Based Newstand This project aims to develop a news event detection framework that will detect newsworthy events from tweets and generate headlines from them. ## Instalation Newsstand requires the [NER package](https://shafaqA15@bitbucket.org/shafaqA15/named-entity-recognizer.git) and run the `java gateway` _All code is in the DatasetA/EventDetection folder DatasetA/Myoutputs has all output files generated cluster.csv has tweets and their assigned cluster ids summary.csv has headlines along with cluster id topics.csv: contains topics_ - [x] Clone the repository by running `git clone https://github.com/Shafaq19/Twitter-News-Event-Detection.git` - [ ] you need to register for a api key for - [intelligent tagging at OpenCalasis](https://www.refinitiv.com/en/products/intelligent-tagging-text-analytics) - [twitter api](https://developer.twitter.com/en/docs/basics/authentication/guides/access-tokens) - [ ] if you are using python envirment install the dependencies by `pip install -r requirements.txt` else conda users can do `conda install` - [ ] run DatasetA/EventDetection/Mainfile.py and check the outputs at Dataset/Myoutputs folder Congrats You are all set!:+1: ## Features - Using NER as a base rather then the traditional TF-IDF vectorization that doesnt allow flexibilty in reconzing unknown words and take too much space. - Dynammic threshold based clustering algorithm You can [also][df1]: - Search for news related to relevent topic of interest like technology - News events at your beck in just 4 minutes! - Responsive and self refreshed > The overriding design goal for tweetter based newsstand > is to make it as easier and faster for journalist to cover stories > as possible. The idea is that a > since people are already on site and actively reporting it on tweets > makes it feasibke for the journalist as they dont have to go onsite or worry something might be missed ## Plugins |No Plugins Needed | | ------ | ## Credits Dataset Cittation: Andrew J. McMinn, Yashar Moshfeghi, Joemon M. Jose. Building a large-scale corpus for Evaluating Event Detection on Twitter - Proceedings of the 22nd ACM international conference on Conference on information & knowledge management. similarity function taken from paper: Liu, Xiaomo, Quanzhi Li, Armineh Nourbakhsh, Rui Fang, Merine Thomas, Kajsa Anderson, RussKociuba, et al. 2016. “Reuters Tracer: A Large Scale System of Detecting and Verifying Real-Time News Events from Twitter.” in the Proceedings of the 25th ACM International Conference on Information and Knowledge Management, 207–216, Indianapolis, Indiana, October 24–28 FASTNU [Shafaq Arshad]()

近期下载者

相关文件


收藏者