HyperPartisan_Classification_Using_BERT

所属分类:推荐系统
开发工具:Jupyter Notebook
文件大小:0KB
下载次数:0
上传日期:2023-07-01 19:54:37
上 传 者sh-1993
说明:  使用基于BERT的技术的超党派新闻文章分类系统。目标是利用最先进的transformer模块...,
(A hyperpartisan news article classification system using BERT-based techniques. The goal was to leverage state-of-the-art transformer models like BERT, ROBERTa, and Longformer to accurately classify news articles as hyperpartisan or non-hyperpartisan.)

文件列表:
Best512/
Stride/
Summarization/
0.6M.png
10K.png
Report.pdf
Statistics.ipynb
conda.sh
requirements.txt

# HyperPartisan Classification Using BERT-Based Methods and Longformers This repository contains the code and resources for developing a hyperpartisan news article classification system leveraging state-of-the-art transformer models like BERT, ROBERTa, and Longformer. ## Objective The primary goal of this project is to accurately classify news articles as hyperpartisan or non-hyperpartisan. The project explores and compares the performance of various transformer models and assesses the impact of incorporating Part-of-Speech (POS) tagging on classification accuracy. ## Tools and Models The project was implemented using the following tools and models: - **Programming Language:** Python - **Library:** Huggingface datasets - **Models:** - BERT (Bidirectional Encoder Representations from Transformers) - ROBERTa (A Robustly Optimized BERT Pretraining Approach) - Longformer (The Long-Document Transformer) ## Methods ### Method 1: Segment-Based Evaluation We divide the text into 512-token segments and evaluate the performance of our models on these segments. This method helps us identify which parts of the document influence categorization accuracy. ### Method 2: Summarization-Based Approach We condense the documents into summaries of 512 tokens each using the Hugging Face summarization pipeline. This summarized version is used for classification tasks. ### Method 3: Stride-Based Segmentation We refer to each 512-token segment of a document and introduce a 'stride' that represents the shared tokens between consecutive segments. This method allows us to handle documents of varying lengths. ## Detailed Report A comprehensive report detailing the methods, experiments, and results is available in this repository. Please refer to the [report](https://github.com/AbineshSivakumar/HyperPartisan_Classification_Using_BERT/blob/master/Report.pdf) for a complete understanding of the project. ## Outcome - **Hyperpartisan Classification System:** Developed a robust system capable of classifying news articles into hyperpartisan or non-hyperpartisan categories. - **Comparative Analysis:** Conducted a detailed comparison between BERT and ROBERTa models to understand their strengths and weaknesses. - **Longformer Exploration:** Investigated the effectiveness of Longformer in handling lengthy news articles, a common challenge in text classification tasks. ## Getting Started ### Prerequisites - Python 3.x - Huggingface datasets library - PyTorch ### Installation 1. Clone the repository: ```bash git clone https://github.com/AbineshSivakumar/HyperPartisan_Classification_Using_BERT ``` 2. Install all the necessary packages ```bash pip install -r requirements.txt ``` ### Usage Run into each method and execute the respective jupyter notebooks

近期下载者

相关文件


收藏者