auto_summ

所属分类:论文
开发工具:Python
文件大小:0KB
下载次数:0
上传日期:2023-01-10 21:51:40
上 传 者sh-1993
说明:  论文“使用预先训练的句子嵌入和图形中心度的无监督文档摘要”的在线演示,发表在《中国期刊》上...,
(Online demo of the paper "Unsupervised Document Summarization using pre-Trained Sentence Embeddings and Graph Centrality" published in the Second Scholarly Document Processing Workshop at NAACL-HLT 2021.)

文件列表:
.flaskenv (37, 2023-01-10)
LICENSE (1082, 2023-01-10)
Procfile (25, 2023-01-10)
app.db (253952, 2023-01-10)
auto_summ.py (183, 2023-01-10)
config.py (444, 2023-01-10)
engine/ (0, 2023-01-10)
engine/__init__.py (192, 2023-01-10)
engine/core/ (0, 2023-01-10)
engine/core/engine_summarization.py (6773, 2023-01-10)
engine/forms.py (754, 2023-01-10)
engine/models.py (449, 2023-01-10)
engine/routes.py (1963, 2023-01-10)
engine/templates/ (0, 2023-01-10)
engine/templates/about_me.html (511, 2023-01-10)
engine/templates/base.html (977, 2023-01-10)
engine/templates/index.html (1751, 2023-01-10)
engine/templates/response.html (759, 2023-01-10)
engine/templates/summarization.html (593, 2023-01-10)
requirements.txt (491, 2023-01-10)
runtime.txt (13, 2023-01-10)

# Unsupervised Document Summarization using pre-Trained Sentence Embeddings and Graph Centrality This repository implements an [online demo](http://selene.research.cs.dal.ca:37637/) of the paper [*Unsupervised Document Summarization using pre-Trained Sentence Embeddings and Graph Centrality*](https://aclanthology.org/2021.sdp-1.14/) published in the Second Scholarly Document Processing Workshop (SDProc 2021) at NAACL-HLT 2021. ## Usage ### Starting the server This project is based on the [Flask](https://flask.palletsprojects.com/en/2.0.x/) framework. Detailed explanations about how to use Flask and the configuration files of this project can be found in the excellent [Mega Flask tutorial](https://blog.miguelgrinberg.com/post/the-flask-mega-tutorial-part-i-hello-world). To start the server, you just need to run the following command in the root folder of this project: flask run ### Using as a library Assuming you have your whole document in a single string, using from auto_summ.engine.core.engine_summarization import algorithm centralities = algorithm(text) will parse it into sentences, compute the centrality of each one of them according to the algorithm described in the paper and give you back a Pandas dataframe with the following columns: * **sentence**, which contains the sentences found in your document. * **centrality**, which contains the relevance score (essentially the degree centrality) of each one of the sentences. ## Features * A detailed sentence tokenization process based on regular expressions than can accurately handle most cases found in scientific literature. * As opposed to the implementation of the paper, this online implementation runs on TF-IDF embeddings for the sake of speed and efficiency. You can easily change this to any of the pre-trained language models found in https://www.sbert.net/. ## Installation git clone https://github.com/jarobyte91/auto_summ.git cd auto_summ pip install -r requirements.txt ## Support Feel free to send an email to jarobyte91@gmail.com or contact me through any of my social media. ## Contribute Feel free to use the Issue Tracker or Pull Requests of this repository. ## License This project is licensed under the MIT License.

近期下载者

相关文件


收藏者