ake-News-Classify-with-BidirectionalLSTM-Word2Vec

所属分类:特征抽取
开发工具:Jupyter Notebook
文件大小:0KB
下载次数:0
上传日期:2024-04-08 21:44:44
上 传 者sh-1993
说明:  使用双向LSTM Word2Vec对假新闻进行分类
(Fake News Classify with BidirectionalLSTM Word2Vec)

文件列表:
Fake News Classify- with Bidirectional LSTM, Word2Vec .ipynb

# Developed a deep learning technique to identify when an article might be fake news. ##### Dataset Description https://www.kaggle.com/competitions/fake-news/data `train.csv`: A full training dataset with the following attributes: `id: unique id for a news article` `title: the title of a news article` `author: author of the news article` `text: the text of the article; could be incomplete` `label: a label that marks the article as potentially unreliable` `1: unreliable` `0: reliable` `test.csv`: A testing training dataset with all the same attributes at train.csv without the label. `submit.csv`: A sample submission. ### Introduction This project explored the use of Long Short-Term Memory (LSTM) networks with Word2Vec embeddings to identify when an article might be fake news(text classification). The goal was to develop a model that could effectively categorize text data into two predefined classes. ### Methodology #### Data Preprocessing and Feature Engineering: Text data was cleaned to remove noise and irrelevant information. Stop words were eliminated. Words were lemmatized to capture their base form. Text was tokenized and padded to ensure consistent sequence length. #### Word Embeddings: Word2Vec was used to generate numerical representations (embeddings) for each word in the vocabulary. These embeddings capture semantic relationships between words. #### Model Architecture: image A Bidirectional LSTM network was employed for text classification. Bidirectional LSTMs process text in both directions, capturing long-range dependencies. L1 and L2 regularization were applied to prevent overfitting. Dropout was used to further prevent overfitting by randomly dropping neurons during training. #### Training and Evaluation: The data was divided into training and testing/validation sets. The model was trained on the training set and evaluated on the unseen testing set. Early stopping used to prevent overfitting by halting training if validation loss ceased to improve. Model performance was measured using accuracy, precision, recall, and F1-score. ### Results image image The final model achieved an accuracy of `97%` on the testing set. We can see, classification report showed high precision, recall, and F1-score for both classes, indicating a well-balanced and effective model. ### Discussion The project successfully demonstrated the effectiveness of LSTM networks with Word2Vec embeddings for text classification. The high accuracy and balanced performance across classes suggest the model can accurately distinguish between the two categories. 5. Conclusion This project showcased the potential of LSTMs and Word2Vec for text classification tasks. The developed model achieved a high level of accuracy and demonstrated a good balance between precision and recall. 6. Future Work Experiment with different hyperparameters to potentially improve model performance. Explore alternative text classification models (e.g., GRU,convolutional neural networks). Apply the model to a new text classification problem.

近期下载者

相关文件


收藏者