purano
所属分类:聚类算法
开发工具:Jupyter Notebook
文件大小:1217KB
下载次数:0
上传日期:2022-06-08 19:20:15
上 传 者:
sh-1993
说明: 新闻注释和聚类
(News annotation and clustering)
文件列表:
.dvc (0, 2021-07-28)
.dvc\config (187, 2021-07-28)
.dvc\plots (0, 2021-07-28)
.dvc\plots\confusion.json (740, 2021-07-28)
.dvc\plots\default.json (677, 2021-07-28)
.dvc\plots\scatter.json (654, 2021-07-28)
.dvc\plots\smooth.json (889, 2021-07-28)
.dvcignore (139, 2021-07-28)
.flake8 (292, 2021-07-28)
.travis.yml (472, 2021-07-28)
LICENSE (11356, 2021-07-28)
NOTICE (550, 2021-07-28)
baselines.ipynb (730406, 2021-07-28)
build_competition.sh (376, 2021-07-28)
competition (0, 2021-07-28)
competition\baselines.html (158, 2021-07-28)
competition\competition.yaml (2224, 2021-07-28)
competition\data.html (782, 2021-07-28)
competition\evaluation.html (105, 2021-07-28)
competition\logo.jpg (859470, 2021-07-28)
competition\organizers.html (288, 2021-07-28)
competition\overview.html (868, 2021-07-28)
competition\scoring_program (0, 2021-07-28)
competition\scoring_program\evaluate.py (2493, 2021-07-28)
competition\scoring_program\metadata (105, 2021-07-28)
competition\terms_and_conditions.html (634, 2021-07-28)
competition\timeline.html (301, 2021-07-28)
compile_proto.sh (70, 2021-07-28)
configs (0, 2021-07-28)
configs\annotator.jsonnet (3960, 2021-07-28)
configs\annotator_light.jsonnet (2647, 2021-07-28)
configs\cleaner.jsonnet (219, 2021-07-28)
configs\clusterer.jsonnet (673, 2021-07-28)
configs\training (0, 2021-07-28)
configs\training\distil_bert.jsonnet (222, 2021-07-28)
configs\training\gen_title.jsonnet (274, 2021-07-28)
... ...
# PuraNo - news annotation and clustering
[![Build Status](https://travis-ci.org/IlyaGusev/purano.svg?branch=master)](https://travis-ci.org/IlyaGusev/purano)
[![Code Climate](https://codeclimate.com/github/IlyaGusev/purano/badges/gpa.svg)](https://codeclimate.com/github/IlyaGusev/purano)
## Installation
Install Git, DVC and pip:
```
$ sudo wget https://dvc.org/deb/dvc.list -O /etc/apt/sources.list.d/dvc.list
$ sudo apt-get update
$ sudo apt-get install git dvc python3-pip
```
Clone repo and install Python requirements (Python 3.6+ recommended):
```
$ git clone https://github.com/IlyaGusev/purano
$ python3 -m pip install -r purano/requirements.txt
```
## Run pipeline
```
$ dvc pull
$ dvc repro
$ cat output/metrics.json
```
WARNING: The clustering requires more than 8GB of RAM, as it stores all N^2 pairwise distances
近期下载者:
相关文件:
收藏者: