Real-Time-News-Article-Analysis-with-NER

所属分类:Docker
开发工具:Python
文件大小:0KB
下载次数:0
上传日期:2023-09-08 23:11:25
上 传 者sh-1993
说明:  实时数据处理管道,接收新闻文章,提取命名实体,并对提取的信息执行分析。该项目涉及数据工程、实时流、NER和分布式计算领域。
(A real-time data processing pipeline that ingests news articles, extracts named entities, and performs analytics on the extracted information. The project is in the area of data engineering, real-time streaming, NER, and distributed computing.)

文件列表:
automation/ (0, 2023-09-08)
automation/commands.md (537, 2023-09-08)
elasticsearch/ (0, 2023-09-08)
elasticsearch/ingest-pipeline.json (1374, 2023-09-08)
elasticsearch/template.json (978, 2023-09-08)
images/ (0, 2023-09-08)
images/architecture.png (290095, 2023-09-08)
images/kibana-1.png (214590, 2023-09-08)
images/kibana-2.png (178046, 2023-09-08)
kibana/ (0, 2023-09-08)
kibana/export.ndjson (6533, 2023-09-08)
redpanda/ (0, 2023-09-08)
redpanda/consumer.py (2010, 2023-09-08)
redpanda/producer.py (1520, 2023-09-08)
redpanda/response.json (23893, 2023-09-08)

# Real-Time-News-Article-Analysis-with-NER In this project, I created a real-time data processing pipeline that ingests news articles, extracts named entities (e.g., organizations, people, locations), and performs analytics on the extracted information. The project showcases some skills in data engineering, real-time streaming, NER, and distributed computing. ![Global Architecture](https://github.com/RiadBensalem/Real-Time-News-Article-Analysis-with-NER/blob/master/./images/architecture.png) Key Components: ## Data Ingestion: News articles are collected in real-time using [Currents API](https://github.com/RiadBensalem/Real-Time-News-Article-Analysis-with-NER/blob/master/https://currentsapi.services/en). ## Named Entity Recognition (NER): A [NER model from hugginface](https://github.com/RiadBensalem/Real-Time-News-Article-Analysis-with-NER/blob/master/https://huggingface.co/elastic/distilbert-base-uncased-finetuned-conll03-english) is loaded using [Eland](https://github.com/RiadBensalem/Real-Time-News-Article-Analysis-with-NER/blob/master/https://www.elastic.co/guide/en/elasticsearch/client/eland/current/index.html) to Elastic Cloud where it will be deployed and used when ingesting news to detect entities showing in the news articles. ## Real-Time Data Streaming and Processing: Integrate Redpanda with the data ingestion layer to handle the incoming news articles stream. ## Analytics and Insights: The news articles are consumed by a python script from Redpanda and sent to elastic cluster deployed using elastic Cloud. In the Elasticsearch an ingest pipeline is used to transform the documents and call the NER model on the title field (or any other field we want). Then the transformed documents or articles are stored in Elasticsearch. ## Data Visualization: Kibana is used to create a simple dashboard showing the detected entities.(keeping it simple) ![dashboard1](https://github.com/RiadBensalem/Real-Time-News-Article-Analysis-with-NER/blob/master/./images/kibana-1.png) ![dashboard2](https://github.com/RiadBensalem/Real-Time-News-Article-Analysis-with-NER/blob/master/./images/kibana-2.png)

近期下载者

相关文件


收藏者