ifaxbotcovid

所属分类:聊天室
开发工具:Python
文件大小:0KB
下载次数:0
上传日期:2023-11-06 09:00:33
上 传 者sh-1993
说明:  新闻记者的一个任务电报机器人。将关于俄罗斯新冠肺炎-19病例的每日新闻稿改写为连贯的新闻机构材料...
(One-task telegram bot for news reporters. Rewrites daily press-release about new COVID-19 cases in Russia into a coherent news agency material.)

文件列表:
LICENSE (1072, 2023-11-06)
Procfile (22, 2023-11-06)
ifaxbotcovid/ (0, 2023-11-06)
ifaxbotcovid/__init__.py (0, 2023-11-06)
ifaxbotcovid/bot/ (0, 2023-11-06)
ifaxbotcovid/bot/__init__.py (0, 2023-11-06)
ifaxbotcovid/bot/bot_testmode.py (1157, 2023-11-06)
ifaxbotcovid/bot/factory.py (590, 2023-11-06)
ifaxbotcovid/bot/handlers.py (8591, 2023-11-06)
ifaxbotcovid/bot/helpers.py (12104, 2023-11-06)
ifaxbotcovid/bot/logic.py (5438, 2023-11-06)
ifaxbotcovid/config/ (0, 2023-11-06)
ifaxbotcovid/config/__init__.py (0, 2023-11-06)
ifaxbotcovid/config/logging.ini (692, 2023-11-06)
ifaxbotcovid/config/messagehelp.txt (3129, 2023-11-06)
ifaxbotcovid/config/messagestart.txt (993, 2023-11-06)
ifaxbotcovid/config/templates/ (0, 2023-11-06)
ifaxbotcovid/config/templates/flashtemplate.txt (594, 2023-11-06)
ifaxbotcovid/config/templates/rpntemplate.txt (451, 2023-11-06)
ifaxbotcovid/config/templates/texttemplate.txt (1673, 2023-11-06)
ifaxbotcovid/config/utils/ (0, 2023-11-06)
ifaxbotcovid/config/utils/__init__.py (0, 2023-11-06)
ifaxbotcovid/config/utils/helpmessage.py (119, 2023-11-06)
ifaxbotcovid/config/utils/settings.py (886, 2023-11-06)
ifaxbotcovid/config/utils/startmessage.py (202, 2023-11-06)
ifaxbotcovid/config/utils/tmploader.py (902, 2023-11-06)
ifaxbotcovid/parser/ (0, 2023-11-06)
ifaxbotcovid/parser/__init__.py (0, 2023-11-06)
ifaxbotcovid/parser/dateline.py (2737, 2023-11-06)
ifaxbotcovid/parser/lib/ (0, 2023-11-06)
ifaxbotcovid/parser/lib/__init__.py (0, 2023-11-06)
ifaxbotcovid/parser/lib/addition_lib.py (2611, 2023-11-06)
ifaxbotcovid/parser/lib/regions.py (2138, 2023-11-06)
ifaxbotcovid/parser/regexp/ (0, 2023-11-06)
... ...

# ifaxBotCovid - reporter's helper Currently available at https://t.me/ifaxcovidbot. Russian: [here](https://github.com/dexpiper/ifaxbotcovid/blob/config/README_rus.md) ### Table of contents 1. [Introduction](#introduction) 2. [Technologies and libraries](#technologies-and-libraries) 3. [Scope of functionalities](#scope-of-functionalities): * [Parser](#core-parser) * [Intermediate logic](#intermediate-logic-covidchef) 4. [Examples of use](#examples-of-use) 5. [Project structure](#structure) 6. [Sources](#sources) 7. [Further development](#further-development) *** ### Introduction Simple Telegram bot. Helps journalists to fetch data from Russian daily COVID-19 official press-releases and write news materials on their basis since October, 2020. Initialized by journalist for fellow colleagues from Interfax News Group (Moscow) to save time when dealing with the daily routine. The bot neither stores data, nor pulls any from the web - it just parses the sent text and fills the pre-written form: > user sends text *==>* > bot finds variables and organizes them *==>* > bot fills templates with found variables *==>* > bot sends back ready news material. You can test the bot out with any of sample press-releases from *ifaxbotcovid/tests/test_data* folder of this repo. The ones with "corrupted" in names would yeild an error. * Bot is settled to find *11 variables* in the COVID-19 press-release, issued by Russian authorities: the numbers of registered COVID patients, COVID-related deaths and recovered patients in Russia and in Moscow separately for the last 24 hours, total sum of this variables since the start of the pandemic etc. * Also bot re-writes long tables of new cases and deaths in the release, grouping regions by numbers of cases/deaths. Algorithm finds and detects the type of correspondent part of text, takes all the information into a dictionary and build ready text: region by region, string by string (implemented in *ifaxbotcovid.regioncounter*). * All variables, found and generated, and ready re-written region tables go to the news material template. With all the gaps filled, ready news material is stamped out to the user. ### Technologies and libraries Project is created with: * Python 3.9.7 * [pyTelegramBotAPI 4.2.0](https://github.com/eternnoir/pyTelegramBotAPI) * Flask/gunicorn for handling and forwarding requests * pytest (tests), venv (virtual environment), Git (version control) * yaml for yml files parsing * Key functionality (finding values in text) heavily depends on the *re* module from the Standard Library that provides regular expression operations. * The bot runs at *Heroku*, pipelined directly from the main brunch of this repository. ### Scope of functionalities #### Core: Parser - The core functionality is provided by *ifaxbotcovid.parser.texparser* module. - Textparser finds values essential for the future news material with pre-written regular expressions (the *re* module from the Standard Python Library). Regexes are defined in [*/ifaxbotcovid/parser/regexp/ dir*](https://github.com/dexpiper/ifaxbotcovid/blob/main/ifaxbotcovid/parser/regexp/regex.py). Bot is designed to deal with some deviations in press-release text, so most of the variables have 2-3 regexes to try. **Variables to find**: - COVID cases ----------- \ - COVID deaths ------------ *in Moscow and in Russia registered officially for the last 24h* - recovered patients ---- / - total number of cases ------ \ - total number of deaths ----- *in Russia, since the start of the pandemic* - total number of recovered -- / - growth rate ("pace") - quotation from the start of the release **Variables to generate**: - Active cases = total cases - (total deaths + total recovered) - Dateline and name of day of the week (from *time.time* with respect to morphological form and preposition in Russian language) - Re-written tables about other regions statistics - All the found variables and ready region blocks fit into patterns listed in the */ifaxcovidbot/config/templates/* dir. In the outcome, bot gets a ready-to-use news material and sends it to the user. - If the message with the raw text had a 'йй' symbols in the end, bot also provides the user with a log of the operations performed. Log is meant to be sent in a separate message. - Bot programmed to warn user if some of the variables are suspiciously small (defined in */ifaxcovidbot/config/settings.yml*). Also user will get a notice if some of the variables haven't been found (a gap would be filled with a *NO_VALUE* dummy). - Bot accepts special aftertext command '$$*int*' to request ready text in a file with an optional integer parameter needed to construct the list of Russian regions with new COVID-19 cases. **Addition feature** Bot also can parse a short piece of text provided daily by the press-office of the RPN (Russian Federal agency in charge of virus protection, Rospotrebnadzor), containing information about COVID tests performed in Russia. - If bot detects this kind of text in incoming message (it is a simple and rather small string), it finds 3 variables in it: + total number of tests, + tests done for the last 24 h, + peoples under the medical monitoring - Then puts variables into template (*/ifaxbotcovid/config/templates/*) and gives the user ready-to-use block of text in the answer message. This feature, realized in *ifaxbotcovid.parser.rpn*, is pretty streightforward. All the regular expressions for it defined in the named module, not in *ifaxbotcovid.parser.regexp.regex*. #### Intermediate logic: CovidChef Bot acts like a tiny web app written in Flask, getting new Telegram messages via webhook. When message came, Flask redirects it to Telebot (pyTelegramBotAPI), who should call correspondent internal module and send ready template with all the mined data inside. Telegram <==> Flask <=> Telebot <=> Parser The ideas behind the bot was straightforwardness and simplicity: user sends text and recieves ready material in answer - no commands, no files, no settings. This 'just-send-and-recieve' approach raised two problems: - Telegram tends to cut in pieces long messages sent to a bot, but *ifaxbotcovid.parser.textparser* should get them all in one piece. - *ifaxbotcovid* should destinguish raw COVID-19 press-release from short COVID-test report provided by Russian agency RPN. Also bot should ignore any messages that are not press-releases and RPN reports. Firstly, COVID-19 official press releases in Russia were rather short - Telegram used to cut them in just in two pieces. But in the fall of 2021 the everyday pandemic statistics has become more detailed, thus I have to re-write previous code and add new structure element - an intermediate logic between Telebot and internal scripts: Telegram <======> Flask <=> Telebot <=> CovidChef <=> Parser CovidChef is organized as a custom Python class, initialized along with the Flask and Telebot. In general: 1) Telebot gives any recieved message to the Chef 2) Chef does its magic 3) Telebot sends Chef's answers back to user Under the bonnet Chef stores messages in temporary MessageStorage (using deque), glues sequential messages from single user with certain context (defined in */ifaxbotcovid/config/settings.yml*) together and calls *ifaxcovidvbot.parser.textparser* when glued text seems completed and ended. The Chef's answer is organized in separate class with special flag. This flag signals when the last message were properly cooked to send ready answer to user. Also Chef rapidly cooks short RPN report when recieved one, calling *ifaxbotcovid.parser.rpn*. **Environment variables** Two env vars should be defind: - Telegram bot *TOKEN* (from BotFather) - *URL* where Telegram would send messages Webhook should be registered manualy by "*URL*/setwebhook" request. Token for testing purposes could be defined in *ifaxbotcovid.config.token.TOKEN* **Templates** Templates are stored in */ifaxbotcovid/config/templates/*. Any other templates could be used instead of the default ones, placeholders for variables are defined in curly brackets: ``` Some text here: {name_of_variable1}, {name_of_variable2}. Another piece of text etc. ``` Names of variables and their descriptions are listed in the readme file. *** ### Examples of use Raw press release: ```Оперативная сводка на 30.10.2020 За последние сутки в России подтвержденных случаев новой коронавирусной инфекции COVID-19 – 18283 в 84 регионах, в том числе выявлено активно 4072 (22,3%) без клинических проявлений. Распределение по субъектам 1 Москва 5268 2 Санкт-Петербург 801 3 Московская область 524 <...> В Российской Федерации нарастающим итогом зарегистрировано 1 599 976 случая (+1,2%) коронавирусной инфекции в 85 регионах. За последние сутки подтверждено 355 летальных случаев: Воронежская область 1 Ивановская область 3 Калужская область 3 Костромская область 1 Курская область 4 Липецкая область 1 Московская область 13 <...> Амурская область 2 Хабаровский край 2 За весь период по России умерло 27 656 человек. За прошедшие сутки выписано по выздоровлению 14 519 человек: Белгородская область 128 Брянская область 56 Владимирская область 78 <...> Хабаровский край 68 Сахалинская область 110 Чукотский автономный округ 1 За весь период выписано по выздоровлению по России – 1 200 560 ``` Outcome with default template: ```ЭМБАРГО МОЛНИЯ 18283 НОВЫХ СЛУЧАЕВ COVID-19 В РФ (+1,2%), 355 УМЕРШИХ, 14,52 ТЫС. ВЫПИСАННЫХ – ОПЕРШТАБ МОЛНИЯ В МОСКВЕ 5268 НОВЫХ СЛУЧАЕВ КОРОНАВИРУСА, 69 СМЕРТЕЙ, 3985 ВЫПИСАННЫХ – ОПЕРШТАБ ЭМБАРГО ЭМБАРГО ЭКСПРЕСС-РОССИЯ-COVID-СТАТИСТИКА 18,28 тыс. новых случаев COVID-19 в РФ, 355 умерших - оперштаб Москва. 30 октября. ИНТЕРФАКС - Суточный прирост новых заболевших коронавирусной инфекцией составил 18,28 тыс. случаев, умерли за сутки 355 пациентов, следует из данных оперативного штаба, обнародованных в пятницу. "За последние сутки в России подтвержденных случаев новой коронавирусной инфекции COVID-19 – 18283 в 84 регионах, в том числе выявлено активно 4072 (22,3%) без клинических проявлений", - говорится в сообщении штаба. Нарастающим итогом в России зарегистрировано 1599976 случаев коронавирусной инфекции, 27656 умерших и 1200560 выписанных (14519 выписаны за последние сутки). Таким образом, общее количество активных случаев в стране (общее число случаев за вычетом всех выздоровевших и всех умерших) на текущий момент составляет 371760. ПОКАЗАТЕЛИ ПРИРОСТА И СМЕРТНОСТИ В СТОЛИЦЕ И РЕГИОНАХ В Москве в пятницу, сообщает оперативный штаб, 5268 новых случаев COVID-19 за сутки, 69 смертей и 3985 выздоровевших. По информации оперативного штаба, еще 801 новых случаев COVID-19 зафиксировано в Санкт-Петербурге, 524 - в Московской области, 401 - в Нижегородской области, 325 - в Архангельской области, 310 - в Ростовской области, 296 - в Воронежской области, 292 - в Красноярском крае, 284 - в Свердловской области, по 232 - в Иркутской области, Забайкальском крае. 231 - в Коми, 224 - в Хабаровском крае, 219 - в Алтайском крае, 216 - в Крыму, 215 - в Бурятии, <...> 151 - в Челябинской области, В других регионах России суточный прирост не превышает 150. Согласно данным оперштаба о смертности, 69 пациентов скончалось за сутки в Москве, 36 - в Санкт-Петербурге, 15 - в Ростовской области, по 13 летальных случаев в Московской области, Нижегородской области. По 8 умерших в Республике Алтай, Бурятии, Якутии, Иркутской области, Красноярском крае. <...> По 2 - в Крыму, Кабардино-Балкарии, Амурской области, Орловской области, Тульской области, Пензенской области, Челябинской области, Хабаровском крае, Севастополе. По 1 - в Башкирии, Марий Эл, Мордовии, Татарстане, Калининградской области, Кировской области, Воронежской области, Костромской области, Липецкой области, Рязанской области. 1** ЭМБАРГО ``` ### Structure ``` / wsgi.py - *Flask routes, Chef instance creation* manual_parse.py - *manual testing for textparser.py* manual_rpn_parse.py - *manual testing for rpn.py* /ifaxbotcovid /bot logic.py - *CovidChef to call parsers and to bring answers* helpers.py - *CovidChef helpers* factory.py - *Flask, Telebot and Chef starts here* handlers.py - *Telebot handlers* ... /parser textparser.py - *chief module for parsing the big release* rpn.py - *module for parsing short RPN report* ... /lib /regexp regex.py - *regular expressions used by the 'textparser.py'* ... /config /templates flashtemplate.txt rpntemplate.txt texttemplate.txt /utils tmploader.py - *template loader* settings.py - *settings parser* ... logging.ini - *logging settings* settings.yml - *admins, some base vars and key words defined here* messagestart.txt - *message to answer /start command* messagehelp.txt - *message to answer /help command* /tests ... /unit_tests /bot /parser /test_data sample_xxx.txt - *input for test, .txt files > 300 considered as a textparser.py input, less then 300 - as a rpn.py input. Any "sample_xxx.txt" content can be used as a valid input for telegram bot* corrupted_xxx.txt - *files with "corrupted" in their name deliberately placed to raise an error during the tests* ... ``` ### Sources The project is inspired by: * [the guide](https://tproger.ru/translations/telegram-bot-create-and-deploy/) on tproger website * [Automate the boring staff with Python](https://automatetheboringstuff.com/) by Al Sweigart * [Fluent Python](https://www.oreilly.com/library/view/fluent-python/9781491946237/) by Luciano Ramalho * My grateful collegues Firstly written for personal use as a bunch of Python scripts called via command line, later the staff transformed into a simple, but fast and pretty straightforward solution for fellow Interfax reporters. ### Further development Having a working parsing algorithm and a bot to launch it, it is rather simple to change ready patterns for putting values into. That means the bot could be tuned to necessities of a vast circle of editors and journalists, both in Russia and outside the country.

近期下载者

相关文件


收藏者