News-Articles-Sorting-Using-MLOps-DVC
所属分类:DevOps
开发工具:Python
文件大小:0KB
下载次数:0
上传日期:2023-08-03 16:43:19
上 传 者:
sh-1993
说明: 新闻文章分类项目是一个基于机器学习的系统,旨在自动将新闻文章分类和排序为预定义...,
(News-Articles-Sorting project is a machine learning-based system that aims to automatically categorize and sort news articles into predefined categories.)
文件列表:
LICENSE (11357, 2023-08-03)
News_Articles_sorting.egg-info/ (0, 2023-08-03)
News_Articles_sorting.egg-info/PKG-INFO (141, 2023-08-03)
News_Articles_sorting.egg-info/SOURCES.txt (611, 2023-08-03)
News_Articles_sorting.egg-info/dependency_links.txt (1, 2023-08-03)
News_Articles_sorting.egg-info/requires.txt (34, 2023-08-03)
News_Articles_sorting.egg-info/top_level.txt (63, 2023-08-03)
app.py (0, 2023-08-03)
app_exception/ (0, 2023-08-03)
app_exception/__init__.py (0, 2023-08-03)
app_exception/__pycache__/ (0, 2023-08-03)
app_exception/__pycache__/__init__.cpython-310.pyc (161, 2023-08-03)
app_exception/__pycache__/app_exception.cpython-310.pyc (1148, 2023-08-03)
app_exception/app_exception.py (657, 2023-08-03)
application_logging/ (0, 2023-08-03)
application_logging/__init__.py (46, 2023-08-03)
application_logging/__pycache__/ (0, 2023-08-03)
application_logging/__pycache__/__init__.cpython-310.pyc (225, 2023-08-03)
application_logging/__pycache__/logger.cpython-310.pyc (623, 2023-08-03)
application_logging/logger.py (723, 2023-08-03)
data/ (0, 2023-08-03)
data/processed/ (0, 2023-08-03)
data/raw/ (0, 2023-08-03)
data_given/ (0, 2023-08-03)
data_given/BBC News Test.csv (1711696, 2023-08-03)
data_given/BBC News Train.csv (3349715, 2023-08-03)
dvc.yaml (0, 2023-08-03)
notebooks/ (0, 2023-08-03)
params.yaml (146, 2023-08-03)
prediction_service/ (0, 2023-08-03)
... ...
# News-Articles-Sorting-using-MLOps - Ongoing
## Overview
The News-Articles-Sorting project is a machine learning-based system that aims to automatically categorize and sort news articles into predefined categories. The project leverages natural language processing (NLP) techniques and supervised learning algorithms to achieve accurate classification of news articles.
This repository contains all the necessary code and resources to train and deploy the News-Articles-Sorting model. The project utilizes a labeled dataset of news articles, where each article is assigned to a specific category. By training a machine learning model on this dataset, the system can then classify unseen news articles into the appropriate categories.
## Problem Statement:
In today’s world, data is power. With News companies having terabytes of data stored in
servers, everyone is in the quest to discover insights that add value to the organization.
With various examples to quote in which analytics is being used to drive actions, one that
stands out is news article classification.
Nowadays on the Internet there are a lot of sources that generate immense amounts of
daily news. In addition, the demand for information by users has been growing
continuously, so it is crucial that the news is classified to allow users to access the
information of interest quickly and effectively. This way, the machine learning model for
automated news classification could be used to identify topics of untracked news and/or
make individual suggestions based on the user’s prior interests.
## Approach
Techniques like clustering and associating rule-based algorithms can be
applied to group together similar text. The ML algorithms learn the mapping function
between the text and the tags based on already categorized data. Algorithms such as
SVM, Neural Networks, Random Forest are commonly used for text classification.
## Website
## File structure
.
├── app_exception # Custom exception
├── application_logging # logging
├── data_given # Given Data
├── data # raw / processed/ transformed data
├── saved_models # classification model
├── report # model parameter and pipeline reports.
├── src # Source files for project implementation
├── webapp # ml web application
├── dvc.yaml # data version control pipeline.
├── app.py # Flask backend
├── param.yaml # parameters
├── requirements.txt
└── README.md
## Dataset
[BBC News](https://www.kaggle.com/c/learn-ai-bbc/data)
### Dataset Description
- **BBC News Train.csv** - the training set of 1490 records
- **BBC News Test.csv** - the test set of 736 records
### Data fields
- **ArticleId** - Article id unique # given to the record
- **Article** - text of the header and article
- **Category** - cateogry of the article (tech, business, sport, entertainment, politics/li>
## Features
- Automatic classification of news articles into predefined categories.
- Training and evaluation of machine learning models for article categorization.
- Web-based interface for interacting with the system and classifying news articles.
## Experiments
## Contributing
Contributions to the News-Articles-Sorting project are welcome! If you would like to contribute, please follow these steps:
1. Fork the repository.
2. Create a new branch for your feature or bug fix.
3. Make your changes and commit them with descriptive commit messages.
4. Push your changes to your forked repository.
5. Submit a pull request explaining your changes.
## License
This project is licensed under the MIT License.
## Contact
If you have any questions or suggestions regarding the project, please feel free to contact the project maintainer at [Gmail](https://mail.google.com/mail/?view=cm&tf=0&to=prajwalgbdr03@gmail.com)
近期下载者:
相关文件:
收藏者: