NewsCluster

所属分类:数据挖掘/数据仓库
开发工具:Python
文件大小:20KB
下载次数:0
上传日期:2020-01-10 07:25:57
上 传 者sh-1993
说明:  NewsCluster,新闻事件挖掘。通过聚合公开的新闻数据,聚合描述相同事件的新闻并生成相关事件信息。
(NewsCluster, news event mining. By aggregating public news data, news describing the same event is aggregated and relevant event information is generated.)

文件列表:
Cluster.py (2820, 2020-01-09)
DataLoader.py (4115, 2020-01-09)
EventExtractor.py (2871, 2020-01-09)
Extractor (0, 2020-01-09)
Extractor\ToyExtractor.py (1188, 2020-01-09)
Extractor\__init__.py (72, 2020-01-09)
Extractor\__pycache__ (0, 2020-01-09)
Extractor\__pycache__\ToyExtractor.cpython-36.pyc (982, 2020-01-09)
Extractor\__pycache__\__init__.cpython-36.pyc (133, 2020-01-09)
Extractor\config.py (111, 2020-01-09)
Text2Vector.py (3055, 2020-01-09)
__init__.py (72, 2020-01-09)
__pycache__ (0, 2020-01-09)
__pycache__\Cluster.cpython-36.pyc (2574, 2020-01-09)
__pycache__\DataLoader.cpython-36.pyc (3284, 2020-01-09)
__pycache__\EventExtractor.cpython-36.pyc (2570, 2020-01-09)
__pycache__\Text2Vector.cpython-36.pyc (3075, 2020-01-09)
__pycache__\config.cpython-36.pyc (998, 2020-01-09)
config.py (944, 2020-01-09)
data (0, 2020-01-09)
model (0, 2020-01-09)
run.py (802, 2020-01-09)

# 简介 新闻事件挖掘。通过聚合公开的新闻数据,聚合描述相同事件的新闻并生成相关事件信息。 # 协作说明 请保持 DataLoader.py、Text2Vector.py、Cluster.py、EventExtractor.py 这四个文件尽量简洁。不要在这些文件里实现具体算法。在其他地方实现,在这些文件中 import 后调用。比如,EventExtractor.py 是对聚类结果提取事件信息,目前实现了一个 ToyExtractor, 其具体实现在 Extractor 文件夹下,EventExtractor.py 只是调用该文件。 # 数据库 目前有两个表结构:原始新闻表(news) 存储原始新闻信息,事件信息表(event)存储聚类分析后的事件信息。 ### 原始新闻表(news) 表结构: | Field | Type | Null | Key | Default | Extra | | --- | --- | --- | --- | --- | --- | | news_id | int(10) unsigned | NO | PRI | | auto_increment | | source | varchar(1000) | YES | | | | | author | varchar(1000) | YES | | | | | title | varchar(1000) | YES | | | | | queryKeyWord | varchar(100) | YES | | | | | description | varchar(2000) | YES | | | | | url | varchar(1000) | YES | | | | | urlToImage | varchar(1000) | YES | | | | | publishedAt | datetime | YES | | | | | content | text | YES | | | | 字段说明: | 字段 | 说明 | 示例 | | ------------ | ------------------------------------------------------------ | -------------------- | | news_id | | 106511 | | source | The identifier display name for the source this
article came from | "The New York Times" | | author | The author of the article | "Michael Levenson" | | title | The headline or title of the article | | | queryKeyWord | Keywords or phrases to search for in the article
title and body | "Donald Trump" | | description | A description or snippet from the article | | | url | The direct URL to the article | | | urlToImage | The URL to a relevant image for the article | | | publishedAt | The date and time the article was published | 2019-12-17 11:26:36 | | content | The unformatted content of the article.
This is truncated to 260 chars for Developer plan users | | ### 事件信息表(event) 表结构: | Field | Type | Null | Key | Default | Extra | | --- | --- | --- | --- | --- | --- | | label | varchar(20) | NO | PRI | | | | newsid | varchar(2000) | NO | | | | | title | varchar(1000) | YES | | | | | keyWord | varchar(100) | YES | | | | | time | datetime | YES | | | | | abstract | varchar(2000) | YES | | | | | content | text | YES | | | | 字段说明: | 字段 | 说明 | 示例 | | -------- | -------------------------------- | ----------------------- | | label | 簇标记/事件id | | | newsid | 该事件包含的news_id,用空格分隔 | "106511 106522" | | title | 事件标题 | | | keyWord | 事件关键字,多个关键字用 \| 分隔 | “ keyword1 \| keyword2" | | time | 事件发生时间 | 2019-12-17 11:26:36 | | abstract | 事件摘要 | | | content | 事件详细描述 | |

近期下载者

相关文件


收藏者