TezHaber

所属分类:特征抽取
开发工具:Others
文件大小:14KB
下载次数:0
上传日期:2020-01-29 00:07:00
上 传 者sh-1993
说明:  Türk e Haber Toplay c ve zetleyici|土耳其新闻聚合与摘要
(Türk e Haber Toplay c ve zetleyici | Turkish News Aggregator & Summarizer)

文件列表:
LICENSE (35149, 2020-01-29)
TezHaber-Core (0, 2020-01-29)
TezHaber-Scraper (0, 2020-01-29)
turkish-text-summarizer (0, 2020-01-29)

# TezHaber Turkce Haber Toplayc ve Ozetleyici | Turkish News Aggregator & Summarizer ## Hedefimiz Projedeki amacmz cesitli kaynaklarda bulunan ayn bilgileri bir araya toplayp bunlar ozetleyerek zaman kayb ve yanls bilgi edinimini engellemeyi amacladk. Ornegin haber kaynaklar, twitterdaki konular, egitim kaynaklar gibi. Bu calsmada amacmza yonelik olarak haber kaynaklarndan ayn haberi bahseden siteleri bir araya toparlayp ozetleme islemi gerceklestirdik. ## Calsma Mimarimiz - Cesitli haber kaynaklarnn RSS lerinden haberler cekilerek, json dosyas olusturuldu - Olusturulan json dosyasndaki haberler duzenlenmek uzere Zemberek islemine sokuldu - Zemberek ile duzenlenen metinler snflandrlarak ayn haberler tespit edildi - Bulunan benzer haber metinlerindeki haberler ozetlendi - Web ve mobil arayuzunde sunuldu ## Kullanm virtualenv -m python3 env source env/bin/activate pip install requirements.txt cd TezHaber-Core python main.py ### Kutuphaneler ### Referanslar * Centroid-based Text Summarization through Compositionality of Word Embeddings ``` @inproceedings{rossiello-etal-2017-centroid, title = "Centroid-based Text Summarization through Compositionality of Word Embeddings", author = "Rossiello, Gaetano and Basile, Pierpaolo and Semeraro, Giovanni", booktitle = "Proceedings of the {M}ulti{L}ing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres", month = apr, year = "2017", address = "Valencia, Spain", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/W17-1003", doi = "10.18653/v1/W17-1003", pages = "12--21", abstract = "The textual similarity is a crucial aspect for many extractive text summarization methods. A bag-of-words representation does not allow to grasp the semantic relationships between concepts when comparing strongly related sentences with no words in common. To overcome this issue, in this paper we propose a centroid-based method for text summarization that exploits the compositional capabilities of word embeddings. The evaluations on multi-document and multilingual datasets prove the effectiveness of the continuous vector representation of words compared to the bag-of-words model. Despite its simplicity, our method achieves good performance even in comparison to more complex deep learning models. Our method is unsupervised and it can be adopted in other summarization tasks.", } ``` * [Zemberek-Python-Examples](https://github.com/ozturkberkay/Zemberek-Python-Examples) * [text-summarizer](https://github.com/gaetangate/text-summarizer) * NewsScraper](https://github.com/holwech/NewsScraper) ## Lisanslar GPL-3.0 ### kullandgmz ack kaynak kodlar * [zemberek-nlp](https://github.com/ahmetaa/zemberek-nlp): Apache-2.0 * [text-summarizer](https://github.com/gaetangate/text-summarizer): GPL-3.0 * [gensim](https://github.com/RaRe-Technologies/gensim): LGPL-2.1 * [Zemberek-Python-Examples](https://github.com/ozturkberkay/Zemberek-Python-Examples): Apache-2.0

近期下载者

相关文件


收藏者