DeepNewsNet

所属分类:自然语言处理
开发工具:Python
文件大小:0KB
下载次数:0
上传日期:2023-09-18 22:36:41
上 传 者sh-1993
说明:  DeepNewsNet-新闻的高级文本分类,
(DeepNewsNet - Advanced Text Classification for News,)

文件列表:
Bert_ClassificationProject/ (0, 2023-09-18)
Bert_ClassificationProject/.idea/ (0, 2023-09-18)
Bert_ClassificationProject/.idea/$CACHE_FILE$ (429, 2023-09-18)
iml (327, 2023-09-18)
Bert_ClassificationProject/.idea/dictionaries (161, 2023-09-18)
Bert_ClassificationProject/.idea/inspectionProfiles/ (0, 2023-09-18)
Bert_ClassificationProject/.idea/inspectionProfiles/Project_Default.xml (506, 2023-09-18)
Bert_ClassificationProject/.idea/inspectionProfiles/profiles_settings.xml (174, 2023-09-18)
Bert_ClassificationProject/.idea/misc.xml (298, 2023-09-18)
Bert_ClassificationProject/.idea/modules.xml (346, 2023-09-18)
Bert_ClassificationProject/LICENSE (1066, 2023-09-18)
Bert_ClassificationProject/THUCNews/ (0, 2023-09-18)
Bert_ClassificationProject/THUCNews/data/ (0, 2023-09-18)
Bert_ClassificationProject/THUCNews/data/class.txt (82, 2023-09-18)
Bert_ClassificationProject/THUCNews/data/dev.txt (551313, 2023-09-18)
Bert_ClassificationProject/THUCNews/data/test.txt (551596, 2023-09-18)
Bert_ClassificationProject/THUCNews/data/train.txt (9946122, 2023-09-18)
Bert_ClassificationProject/THUCNews/saved_dict/ (0, 2023-09-18)
Bert_ClassificationProject/THUCNews/saved_dict/DPCNN.ckpt (7377743, 2023-09-18)
Bert_ClassificationProject/THUCNews/saved_dict/TextCNN.ckpt (8514563, 2023-09-18)
Bert_ClassificationProject/THUCNews/saved_dict/TextRNN.ckpt (9069686, 2023-09-18)
Bert_ClassificationProject/THUCNews/saved_dict/model.ckpt (0, 2023-09-18)
Bert_ClassificationProject/__pycache__/ (0, 2023-09-18)
Bert_ClassificationProject/__pycache__/train_eval.cpython-38.pyc (4216, 2023-09-18)
Bert_ClassificationProject/__pycache__/utils.cpython-38.pyc (3466, 2023-09-18)
Bert_ClassificationProject/bert_pretrain/ (0, 2023-09-18)
Bert_ClassificationProject/bert_pretrain/bert_config.json (520, 2023-09-18)
Bert_ClassificationProject/bert_pretrain/vocab.txt (109540, 2023-09-18)
Bert_ClassificationProject/models/ (0, 2023-09-18)
Bert_ClassificationProject/models/ERNIE.py (2346, 2023-09-18)
Bert_ClassificationProject/models/__pycache__/ (0, 2023-09-18)
Bert_ClassificationProject/models/__pycache__/bert.cpython-38.pyc (2149, 2023-09-18)
Bert_ClassificationProject/models/bert.py (2315, 2023-09-18)
Bert_ClassificationProject/models/bert_CNN.py (3036, 2023-09-18)
Bert_ClassificationProject/models/bert_DPCNN.py (3776, 2023-09-18)
Bert_ClassificationProject/models/bert_RCNN.py (3014, 2023-09-18)
Bert_ClassificationProject/models/bert_RNN.py (2928, 2023-09-18)
Bert_ClassificationProject/pytorch_pretrained/ (0, 2023-09-18)
... ...

# DeepNewsNet - Advanced Text Classification for News ## Introduction DeepNewsNet is a state-of-the-art text classification system that leverages both traditional machine learning and modern deep learning techniques to categorize news articles. Starting with the foundational methodologies like KNN, Naive Bayes, and SVM, the project eventually transitioned into advanced architectures using PyTorch, including but not limited to, TextCNN, TextRNN, TextRNN_att, FastText, TextRCNN, DPCNN, Transformer, and BERT. With a real-world application developed through Flask for web-based interaction, DeepNewsNet represents a comprehensive approach to news categorization. DeepNewsNet provides a platform for users, researchers, and developers to understand, build upon, and interact with one of the most advanced news classification systems. By blending traditional machine learning with the latest deep learning architectures, we aim to push the boundaries of what's possible in news classification. ## Dataset Details - THUCNews Sub-dataset: A subset of 200,000 news headlines spanning 10 categories. Initially, traditional machine learning techniques were applied to this dataset. - Long-form News Dataset: A set of extensive news articles used as an alternative to THUCNews after achieving suboptimal results with traditional methods. - BBC News Classification: Available on Kaggle, this dataset served as the platform for real-world competition. Direct Link. ## Methodologies and Implementation - Traditional ML: - Models: KNN, Naive Bayes, SVM - Results: Initial results with THUCNews were below expectations. Switching to a long-text news dataset yielded improved but still subpar performance. - Deep Learning: - Framework: PyTorch - Models: TextCNN, TextRNN, TextRNN_att, FastText, TextRCNN, DPCNN, Transformer, Bert - Hybrid Models: Bert+DPCNN, Bert+TextRCNN - Results: Achieved significantly higher accuracy compared to traditional models. ## Web-based Demonstration Using Flask, a web-based interface was created. This allows users to input a news article and get its predicted category, along with the associated probability. This demonstrates the practical application of the DeepNewsNet models in a user-friendly environment. ## Kaggle Competition Results Building upon the deep learning methodologies, we participated in the BBC NEWS classification challenge on Kaggle. We employed models like Bert-base-uncased, Bert+CNN, and Bert+RNN, achieving a commendable score of 0.97959.

近期下载者

相关文件


收藏者