News-Articles-Sorting-Using-MLOps-DVC

所属分类:DevOps
开发工具:Python
文件大小:0KB
下载次数:0
上传日期:2023-08-03 16:43:19
上 传 者sh-1993
说明:  新闻文章分类项目是一个基于机器学习的系统,旨在自动将新闻文章分类和排序为预定义...,
(News-Articles-Sorting project is a machine learning-based system that aims to automatically categorize and sort news articles into predefined categories.)

文件列表:
LICENSE (11357, 2023-08-03)
News_Articles_sorting.egg-info/ (0, 2023-08-03)
News_Articles_sorting.egg-info/PKG-INFO (141, 2023-08-03)
News_Articles_sorting.egg-info/SOURCES.txt (611, 2023-08-03)
News_Articles_sorting.egg-info/dependency_links.txt (1, 2023-08-03)
News_Articles_sorting.egg-info/requires.txt (34, 2023-08-03)
News_Articles_sorting.egg-info/top_level.txt (63, 2023-08-03)
app.py (0, 2023-08-03)
app_exception/ (0, 2023-08-03)
app_exception/__init__.py (0, 2023-08-03)
app_exception/__pycache__/ (0, 2023-08-03)
app_exception/__pycache__/__init__.cpython-310.pyc (161, 2023-08-03)
app_exception/__pycache__/app_exception.cpython-310.pyc (1148, 2023-08-03)
app_exception/app_exception.py (657, 2023-08-03)
application_logging/ (0, 2023-08-03)
application_logging/__init__.py (46, 2023-08-03)
application_logging/__pycache__/ (0, 2023-08-03)
application_logging/__pycache__/__init__.cpython-310.pyc (225, 2023-08-03)
application_logging/__pycache__/logger.cpython-310.pyc (623, 2023-08-03)
application_logging/logger.py (723, 2023-08-03)
data/ (0, 2023-08-03)
data/processed/ (0, 2023-08-03)
data/raw/ (0, 2023-08-03)
data_given/ (0, 2023-08-03)
data_given/BBC News Test.csv (1711696, 2023-08-03)
data_given/BBC News Train.csv (3349715, 2023-08-03)
dvc.yaml (0, 2023-08-03)
notebooks/ (0, 2023-08-03)
params.yaml (146, 2023-08-03)
prediction_service/ (0, 2023-08-03)
... ...

# News-Articles-Sorting-using-MLOps - Ongoing ## Overview The News-Articles-Sorting project is a machine learning-based system that aims to automatically categorize and sort news articles into predefined categories. The project leverages natural language processing (NLP) techniques and supervised learning algorithms to achieve accurate classification of news articles. This repository contains all the necessary code and resources to train and deploy the News-Articles-Sorting model. The project utilizes a labeled dataset of news articles, where each article is assigned to a specific category. By training a machine learning model on this dataset, the system can then classify unseen news articles into the appropriate categories. ## Problem Statement: In today’s world, data is power. With News companies having terabytes of data stored in servers, everyone is in the quest to discover insights that add value to the organization. With various examples to quote in which analytics is being used to drive actions, one that stands out is news article classification. Nowadays on the Internet there are a lot of sources that generate immense amounts of daily news. In addition, the demand for information by users has been growing continuously, so it is crucial that the news is classified to allow users to access the information of interest quickly and effectively. This way, the machine learning model for automated news classification could be used to identify topics of untracked news and/or make individual suggestions based on the user’s prior interests. ## Approach Techniques like clustering and associating rule-based algorithms can be applied to group together similar text. The ML algorithms learn the mapping function between the text and the tags based on already categorized data. Algorithms such as SVM, Neural Networks, Random Forest are commonly used for text classification. ## Website ## File structure . ├── app_exception # Custom exception ├── application_logging # logging ├── data_given # Given Data ├── data # raw / processed/ transformed data ├── saved_models # classification model ├── report # model parameter and pipeline reports. ├── src # Source files for project implementation ├── webapp # ml web application ├── dvc.yaml # data version control pipeline. ├── app.py # Flask backend ├── param.yaml # parameters ├── requirements.txt └── README.md ## Dataset [BBC News](https://www.kaggle.com/c/learn-ai-bbc/data) ### Dataset Description - **BBC News Train.csv** - the training set of 1490 records - **BBC News Test.csv** - the test set of 736 records ### Data fields - **ArticleId** - Article id unique # given to the record - **Article** - text of the header and article - **Category** - cateogry of the article (tech, business, sport, entertainment, politics/li> ## Features - Automatic classification of news articles into predefined categories. - Training and evaluation of machine learning models for article categorization. - Web-based interface for interacting with the system and classifying news articles. ## Experiments ## Contributing Contributions to the News-Articles-Sorting project are welcome! If you would like to contribute, please follow these steps: 1. Fork the repository. 2. Create a new branch for your feature or bug fix. 3. Make your changes and commit them with descriptive commit messages. 4. Push your changes to your forked repository. 5. Submit a pull request explaining your changes. ## License This project is licensed under the MIT License. ## Contact If you have any questions or suggestions regarding the project, please feel free to contact the project maintainer at [Gmail](https://mail.google.com/mail/?view=cm&tf=0&to=prajwalgbdr03@gmail.com)

近期下载者

相关文件


收藏者