Automatic-Myanmar-News-Classification

所属分类:数学计算
开发工具:Jupyter Notebook
文件大小:4569KB
下载次数:0
上传日期:2023-05-21 05:12:23
上 传 者sh-1993
说明:  no intro
(Automatic Myanmar News Classification System using Multinomial Naive Bayes)

文件列表:
Burmese_News_Classification.ipynb (72548, 2023-05-21)
LICENSE (1067, 2023-05-21)
app.py (1587, 2023-05-21)
media (0, 2023-05-21)
media\news.png (7689, 2023-05-21)
media\result.gif (3357812, 2023-05-21)
mm-news-classification-dataset.csv (3834791, 2023-05-21)
requirements.txt (58, 2023-05-21)
stopword.txt (5941, 2023-05-21)
svm_model.sav (714900, 2023-05-21)
system_design.png (46532, 2023-05-21)
vectorizer.pickle (1292646, 2023-05-21)

# Automatic Myanmar News Classification ## Project Overview Automatic Myanmar News Classification System using Linear SVM. I have examined also with other machine learning algorithms - Logistic Regression, Multinomial Naive Bayes, Random Forest and Decision Tree. The weighted f-score is highest when using Linear SVM. - A.H.Khine, K.T.Nwet, K.M.Soe, Automatic Myanmar News Classification proposed a system which is based on Naive Bayes. I used their dataset for training the model.[1] - Tokenzation is done by using pyidaungsu library which is based on fasttext. - The vecotorizer I used is tf-idf. - N-gram for TF-IDF is Unigram + Bigram ## System Design - I use the system design proposed in Nwet, Khin & Darren, Seth, Machine Learning Algorithms for Myanmar News Classification [2] ![System](https://github.com/ThuraAung1601/Automatic-Myanmar-News-Classification/blob/master/system_design.png) ## Dataset Dataset is taken from Aye Hnin Khine's [repository](https://github.com/ThuraAung1601/Automatic-Myanmar-News-Classification/blob/master/https://github.com/ayehninnkhine/MyanmarNewsClassificationSystem) ![Dataset](https://github.com/ThuraAung1601/Automatic-Myanmar-News-Classification/blob/master/media/news.png) ## Experiments - For feature extraction, vectorize text data using TF-IDF vectorizer available in scikit-learn - Then train on different machine learning models for classification | Model | F1-score | |:---------------------------:|:------------:| | Decision Tree | 67% | | Random Forest | 82% | | Multinomial Naive Bayes | 84% | | Logistic Regression | 86% | | **Linear SVM** | **88%** | ## Demonstration Demonstration available [HERE](https://github.com/ThuraAung1601/Automatic-Myanmar-News-Classification/blob/master/https://share.streamlit.io/thuraaung1601/automatic-myanmar-news-classification/main/app.py) ![Demo](https://github.com/ThuraAung1601/Automatic-Myanmar-News-Classification/blob/master/media/result.gif) ## How to run demo - Download this repository - Install requirements ```{r, engine='bash', count_lines} tra@thura-pc:~$ pip install -r requirements.txt ``` - Run the main notebook - News_Classificaiton.ipynb for training - For Demo ```{r, engine='bash', count_lines} tra@thura-pc:~$ streamlit run app.py ``` ## Future Works - More Data is needed to - Test with Hybrid methods and Deep Learning Approaches ## References [1] A.H.Khine, K.T.Nwet, K.M.Soe, Automatic Myanmar News Classification, 15th Proceedings of International Conference on Computer Applications, February 2017, pp. 401-408
[2] Nwet, Khin & Darren, Seth. (2019). MACHINE LEARNING ALGORITHMS FOR MYANMAR NEWS CLASSIFICATION. Journal of Natural Language Processing. 8. 17-24.

近期下载者

相关文件


收藏者