Automatic-Myanmar-News-Classification
所属分类:数学计算
开发工具:Jupyter Notebook
文件大小:4569KB
下载次数:0
上传日期:2023-05-21 05:12:23
上 传 者:
sh-1993
说明: no intro
(Automatic Myanmar News Classification System using Multinomial Naive Bayes)
文件列表:
Burmese_News_Classification.ipynb (72548, 2023-05-21)
LICENSE (1067, 2023-05-21)
app.py (1587, 2023-05-21)
media (0, 2023-05-21)
media\news.png (7689, 2023-05-21)
media\result.gif (3357812, 2023-05-21)
mm-news-classification-dataset.csv (3834791, 2023-05-21)
requirements.txt (58, 2023-05-21)
stopword.txt (5941, 2023-05-21)
svm_model.sav (714900, 2023-05-21)
system_design.png (46532, 2023-05-21)
vectorizer.pickle (1292646, 2023-05-21)
# Automatic Myanmar News Classification
## Project Overview
Automatic Myanmar News Classification System using Linear SVM. I have examined also with other machine learning algorithms - Logistic Regression, Multinomial Naive Bayes, Random Forest and Decision Tree. The weighted f-score is highest when using Linear SVM.
- A.H.Khine, K.T.Nwet, K.M.Soe, Automatic Myanmar News Classification proposed a system which is based on Naive Bayes. I used their dataset for training the model.[1]
- Tokenzation is done by using pyidaungsu library which is based on fasttext.
- The vecotorizer I used is tf-idf.
- N-gram for TF-IDF is Unigram + Bigram
## System Design
- I use the system design proposed in Nwet, Khin & Darren, Seth, Machine Learning Algorithms for Myanmar News Classification [2]
![System](https://github.com/ThuraAung1601/Automatic-Myanmar-News-Classification/blob/master/system_design.png)
## Dataset
Dataset is taken from Aye Hnin Khine's [repository](https://github.com/ThuraAung1601/Automatic-Myanmar-News-Classification/blob/master/https://github.com/ayehninnkhine/MyanmarNewsClassificationSystem)
![Dataset](https://github.com/ThuraAung1601/Automatic-Myanmar-News-Classification/blob/master/media/news.png)
## Experiments
- For feature extraction, vectorize text data using TF-IDF vectorizer available in scikit-learn
- Then train on different machine learning models for classification
| Model | F1-score |
|:---------------------------:|:------------:|
| Decision Tree | 67% |
| Random Forest | 82% |
| Multinomial Naive Bayes | 84% |
| Logistic Regression | 86% |
| **Linear SVM** | **88%** |
## Demonstration
Demonstration available [HERE](https://github.com/ThuraAung1601/Automatic-Myanmar-News-Classification/blob/master/https://share.streamlit.io/thuraaung1601/automatic-myanmar-news-classification/main/app.py)
![Demo](https://github.com/ThuraAung1601/Automatic-Myanmar-News-Classification/blob/master/media/result.gif)
## How to run demo
- Download this repository
- Install requirements
```{r, engine='bash', count_lines}
tra@thura-pc:~$ pip install -r requirements.txt
```
- Run the main notebook - News_Classificaiton.ipynb for training
- For Demo
```{r, engine='bash', count_lines}
tra@thura-pc:~$ streamlit run app.py
```
## Future Works
- More Data is needed to
- Test with Hybrid methods and Deep Learning Approaches
## References
[1] A.H.Khine, K.T.Nwet, K.M.Soe, Automatic Myanmar News Classification, 15th Proceedings of International Conference on Computer Applications, February 2017, pp. 401-408
[2] Nwet, Khin & Darren, Seth. (2019). MACHINE LEARNING ALGORITHMS FOR MYANMAR NEWS CLASSIFICATION. Journal of Natural Language Processing. 8. 17-24.
近期下载者:
相关文件:
收藏者: