auto_summ
所属分类:论文
开发工具:Python
文件大小:0KB
下载次数:0
上传日期:2023-01-10 21:51:40
上 传 者:
sh-1993
说明: 论文“使用预先训练的句子嵌入和图形中心度的无监督文档摘要”的在线演示,发表在《中国期刊》上...,
(Online demo of the paper "Unsupervised Document Summarization using pre-Trained Sentence Embeddings and Graph Centrality" published in the Second Scholarly Document Processing Workshop at NAACL-HLT 2021.)
文件列表:
.flaskenv (37, 2023-01-10)
LICENSE (1082, 2023-01-10)
Procfile (25, 2023-01-10)
app.db (253952, 2023-01-10)
auto_summ.py (183, 2023-01-10)
config.py (444, 2023-01-10)
engine/ (0, 2023-01-10)
engine/__init__.py (192, 2023-01-10)
engine/core/ (0, 2023-01-10)
engine/core/engine_summarization.py (6773, 2023-01-10)
engine/forms.py (754, 2023-01-10)
engine/models.py (449, 2023-01-10)
engine/routes.py (1963, 2023-01-10)
engine/templates/ (0, 2023-01-10)
engine/templates/about_me.html (511, 2023-01-10)
engine/templates/base.html (977, 2023-01-10)
engine/templates/index.html (1751, 2023-01-10)
engine/templates/response.html (759, 2023-01-10)
engine/templates/summarization.html (593, 2023-01-10)
requirements.txt (491, 2023-01-10)
runtime.txt (13, 2023-01-10)
# Unsupervised Document Summarization using pre-Trained Sentence Embeddings and Graph Centrality
This repository implements an [online demo](http://selene.research.cs.dal.ca:37637/) of the paper [*Unsupervised Document Summarization using pre-Trained Sentence Embeddings and Graph Centrality*](https://aclanthology.org/2021.sdp-1.14/) published in the Second Scholarly Document Processing Workshop (SDProc 2021) at NAACL-HLT 2021.
## Usage
### Starting the server
This project is based on the [Flask](https://flask.palletsprojects.com/en/2.0.x/) framework. Detailed explanations about how to use Flask and the configuration files of this project can be found in the excellent [Mega Flask tutorial](https://blog.miguelgrinberg.com/post/the-flask-mega-tutorial-part-i-hello-world). To start the server, you just need to run the following command in the root folder of this project:
flask run
### Using as a library
Assuming you have your whole document in a single string, using
from auto_summ.engine.core.engine_summarization import algorithm
centralities = algorithm(text)
will parse it into sentences, compute the centrality of each one of them according to the algorithm described in the paper and give you back a Pandas dataframe with the following columns:
* **sentence**, which contains the sentences found in your document.
* **centrality**, which contains the relevance score (essentially the degree centrality) of each one of the sentences.
## Features
* A detailed sentence tokenization process based on regular expressions than can accurately handle most cases found in scientific literature.
* As opposed to the implementation of the paper, this online implementation runs on TF-IDF embeddings for the sake of speed and efficiency. You can easily change this to any of the pre-trained language models found in https://www.sbert.net/.
## Installation
git clone https://github.com/jarobyte91/auto_summ.git
cd auto_summ
pip install -r requirements.txt
## Support
Feel free to send an email to jarobyte91@gmail.com or contact me through any of my social media.
## Contribute
Feel free to use the Issue Tracker or Pull Requests of this repository.
## License
This project is licensed under the MIT License.
近期下载者:
相关文件:
收藏者: