HyperPartisan_Classification_Using_BERT 联合开发网

Pudn.com > 下载中心 > 推荐系统 > HyperPartisan_Classification_Using_BERT

HyperPartisan_Classification_Using_BERT

text-classification BERT hyperpartisan-news-detection longformer large-language-models

所属分类：推荐系统
开发工具：Jupyter Notebook
文件大小：0KB
下载次数：0
上传日期：2023-07-01 19:54:37
上传者：sh-1993

说明：使用基于BERT的技术的超党派新闻文章分类系统。目标是利用最先进的transformer模块...，
(A hyperpartisan news article classification system using BERT-based techniques. The goal was to leverage state-of-the-art transformer models like BERT, ROBERTa, and Longformer to accurately classify news articles as hyperpartisan or non-hyperpartisan.)

文件列表:

Best512/
Stride/
Summarization/
0.6M.png
10K.png
Report.pdf
Statistics.ipynb
conda.sh
requirements.txt

# HyperPartisan Classification Using BERT-Based Methods and Longformers This repository contains the code and resources for developing a hyperpartisan news article classification system leveraging state-of-the-art transformer models like BERT, ROBERTa, and Longformer. ## Objective The primary goal of this project is to accurately classify news articles as hyperpartisan or non-hyperpartisan. The project explores and compares the performance of various transformer models and assesses the impact of incorporating Part-of-Speech (POS) tagging on classification accuracy. ## Tools and Models The project was implemented using the following tools and models: - **Programming Language:** Python - **Library:** Huggingface datasets - **Models:** - BERT (Bidirectional Encoder Representations from Transformers) - ROBERTa (A Robustly Optimized BERT Pretraining Approach) - Longformer (The Long-Document Transformer) ## Methods ### Method 1: Segment-Based Evaluation We divide the text into 512-token segments and evaluate the performance of our models on these segments. This method helps us identify which parts of the document influence categorization accuracy. ### Method 2: Summarization-Based Approach We condense the documents into summaries of 512 tokens each using the Hugging Face summarization pipeline. This summarized version is used for classification tasks. ### Method 3: Stride-Based Segmentation We refer to each 512-token segment of a document and introduce a 'stride' that represents the shared tokens between consecutive segments. This method allows us to handle documents of varying lengths. ## Detailed Report A comprehensive report detailing the methods, experiments, and results is available in this repository. Please refer to the [report](https://github.com/AbineshSivakumar/HyperPartisan_Classification_Using_BERT/blob/master/Report.pdf) for a complete understanding of the project. ## Outcome - **Hyperpartisan Classification System:** Developed a robust system capable of classifying news articles into hyperpartisan or non-hyperpartisan categories. - **Comparative Analysis:** Conducted a detailed comparison between BERT and ROBERTa models to understand their strengths and weaknesses. - **Longformer Exploration:** Investigated the effectiveness of Longformer in handling lengthy news articles, a common challenge in text classification tasks. ## Getting Started ### Prerequisites - Python 3.x - Huggingface datasets library - PyTorch ### Installation 1. Clone the repository: ```bash git clone https://github.com/AbineshSivakumar/HyperPartisan_Classification_Using_BERT ``` 2. Install all the necessary packages ```bash pip install -r requirements.txt ``` ### Usage Run into each method and execute the respective jupyter notebooks

近期下载者：

相关文件：

评论：[我要评论] [举报此文件]

收藏者：