WikiSummarization
所属分类:特征抽取
开发工具:Python
文件大小:0KB
下载次数:0
上传日期:2023-12-15 03:32:36
上 传 者:
sh-1993
说明: WikiSummarization(维基摘要)
(WikiSummarization)
文件列表:
LICENSE.txt
WikiSummarization.py
WikiSummarization.pyproj
WikiSummarization.sln
requirements.txt
summary.txt
text.txt
## Table of Contents
- [Wikipedia Page Summarizer](https://github.com/seanpatrickduggan/WikiSummarization/blob/master/#wikipedia-page-summarizer)
- [Coming Soon](https://github.com/seanpatrickduggan/WikiSummarization/blob/master/#coming-soon)
- [Prerequisites](https://github.com/seanpatrickduggan/WikiSummarization/blob/master/#prerequisites)
- [Usage](https://github.com/seanpatrickduggan/WikiSummarization/blob/master/#usage)
1. [Clone or Download the Repository](https://github.com/seanpatrickduggan/WikiSummarization/blob/master/#1-clone-or-download-the-repository)
2. [Create a Virtual Environment (Optional but Recommended)](https://github.com/seanpatrickduggan/WikiSummarization/blob/master/#2-create-a-virtual-environment-optional-but-recommended)
3. [Install Dependencies](https://github.com/seanpatrickduggan/WikiSummarization/blob/master/#3-install-dependencies)
4. [Run the Script](https://github.com/seanpatrickduggan/WikiSummarization/blob/master/#4-run-the-script)
5. [View the Results](https://github.com/seanpatrickduggan/WikiSummarization/blob/master/#5-view-the-results)
- [License](https://github.com/seanpatrickduggan/WikiSummarization/blob/master/#license)
# Wikipedia Page Summarizer
This Python script allows you to fetch and summarize content from Wikipedia articles using the Wikipedia API and the BERT Extractive Summarizer. It also saves the fetched text and the generated summary to separate text files.
Please be nice to wikipedia and me, as the code uses their free API, and an agent with my email specified. If you want to run the code, please modify it with your own information in the custom user agent variable.
Currently the topic is hard-coded to be "Dog".
# Coming Soon
- More organized output, possibly a database for storing pages and summaries
- Options to choose between summarization models
- Exploring the Wikipedia API for
- A "parseable" output, e.g. sections, subsections, references, etc.
- Search functionality, checking if a similar page exists, etc.
- Creating a UI for the script allowing for:
- Tracking/saving summaries
- Summarizing using a different model
### Prerequisites
Ensure that you have Python and pip installed on your system before creating the virtual environment and installing dependencies.
## Usage
1. **Clone or Download the Repository:**
- Clone or download this repository to your local machine:
```bash
git clone https://github.com/seanpatrickduggan/WikiSummarization.git
```
2. **Create a Virtual Environment (Optional but Recommended):**
- It's a good practice to create a virtual environment to isolate project dependencies. Open your terminal and navigate to the project directory:
```bash
cd WikiSummarization
```
- Create a virtual environment:
```bash
python -m venv venv
```
- Activate the virtual environment:
On Windows:
```bash
venv\Scripts\activate
```
On macOS and Linux:
```bash
source venv/bin/activate
```
3. **Install Dependencies:**
- With the virtual environment activated, install the required Python libraries from the `requirements.txt` file:
```bash
pip install -r requirements.txt
```
This will install the necessary libraries, including Wikipedia-API, BERT Extractive Summarizer, and Torch.
4. **Run the Script:**
- Modify the `topic` variable in the `main` function of `WikiSummarization.py` to specify the Wikipedia topic you want to summarize.
- Run the script:
```bash
python WikiSummarization.py
```
5. **View the Results:**
- The script will fetch the Wikipedia page for the specified topic, extract the text content, and save it to a file named `text.txt`. It will then use the BERT Extractive Summarizer to generate a summary and save it to a file named `summary.txt`.
## License
This project is licensed under the [GNU General Public License v3.0](https://github.com/seanpatrickduggan/WikiSummarization/blob/master/LICENSE). For more details, see the [LICENSE](https://github.com/seanpatrickduggan/WikiSummarization/blob/master/LICENSE) file.
近期下载者:
相关文件:
收藏者: