NewsHub

所属分类:特征抽取
开发工具:Python
文件大小:0KB
下载次数:0
上传日期:2023-08-01 21:01:38
上 传 者sh-1993
说明:  Newshub是一个全面的新闻聚合和摘要套件。它使用人工智能来抓取、分类和总结来自各种so......的新闻...,
(Newshub is a comprehensive news aggregation and summarization suite. It uses AI to scrape, categorize, and summarize news from various sources, providing a clear and concise overview of current news topics.)

文件列表:
dailynewsapi/ (0, 2023-11-12)
dailynewsapi/application.py (4305, 2023-11-12)
dailynewsapi/requirements.txt (27, 2023-11-12)
dailynewsapi/schema.json (7027, 2023-11-12)
dailynewsapi/wsgi.py (86, 2023-11-12)
main.py (9161, 2023-11-12)
misc_scripts/ (0, 2023-11-12)
misc_scripts/NewsQuery/ (0, 2023-11-12)
misc_scripts/NewsQuery/analyzer.py (3209, 2023-11-12)
misc_scripts/NewsQuery/config.py (574, 2023-11-12)
misc_scripts/NewsQuery/data_handler.py (2807, 2023-11-12)
misc_scripts/NewsQuery/document_retriever.py (1277, 2023-11-12)
misc_scripts/NewsQuery/gpt_response.py (4177, 2023-11-12)
misc_scripts/NewsQuery/gui.py (14187, 2023-11-12)
misc_scripts/NewsQuery/main.py (179, 2023-11-12)
misc_scripts/NewsQuery/summarizer.py (424, 2023-11-12)
misc_scripts/NewsQuery/text_processor.py (1181, 2023-11-12)
misc_scripts/convert_cache_json.py (1877, 2023-11-12)
misc_scripts/summarize_isw.py (15537, 2023-11-12)
misc_scripts/telegram_gpt.py (14628, 2023-11-12)
modules/ (0, 2023-11-12)
modules/__init__.py (0, 2023-11-12)
modules/cache_files.py (1310, 2023-11-12)
modules/classifier.py (2096, 2023-11-12)
modules/config_parse.py (178, 2023-11-12)
modules/errors.py (1119, 2023-11-12)
modules/ftp_uploader.py (659, 2023-11-12)
modules/locations.py (8208, 2023-11-12)
modules/refine_sup_sum.py (7877, 2023-11-12)
modules/scraper.py (2147, 2023-11-12)
modules/similarity.py (3280, 2023-11-12)
modules/sum_summaries.py (9101, 2023-11-12)
modules/summarizer.py (6638, 2023-11-12)
modules/super_summary.py (5266, 2023-11-12)
news_twitter_similarity_proposed.py (5611, 2023-11-12)
newsletter_matrix_proposed.py (9870, 2023-11-12)
requirements.txt (80, 2023-11-12)
... ...

# NewsPlanetAI: AI-Powered News Summarization and Broadcasting ## Overview NewsPlanetAI is an innovative project that leverages advanced AI technologies for scraping, summarizing, and broadcasting news. The system is designed to automatically fetch news articles from various sources, categorize and summarize them, and then compile these summaries into a concise Super Summary. This Super Summary is converted into an audio format using Eleven Labs Text-to-Speech (TTS) technology, giving it a unique voice similar to a blend of David Attenborough and Morgan Freeman, and is subsequently uploaded to SoundCloud for public access. ## Project Structure The project comprises several Python scripts, each serving a specific role in the news aggregation and summarization process: - **scraper.py:** Scrapes headlines from various news sources. - **classifier.py:** Categorizes headlines using OpenAI's GPT-3 model. - **cache_files.py:** Provides functions for saving and loading data from cache files. - **errors.py:** Provides a function for making robust API calls to OpenAI. - **ftp_uploader.py:** Uploads files to an FTP server. - **summarizer.py:** Generates summaries for articles and organizes them by category. - **super_summary.py:** Generates a "super summary" using a list of news summaries. - **locations.py:** Extracts and appends geographical locations to the news summaries. - **main.py:** Orchestrates the overall process, from scraping to summarization. ## Process Flow 1. **Scraping Headlines**: Uses Goose3 and Feedparser to extract headlines from various news sources. 2. **Categorizing Headlines**: Employs GPT-3.5 Turbo 1106 to categorize headlines. 3. **Summarizing Articles**: Summarizes articles using GPT-3.5 Turbo 1106. 4. **Caching Summaries**: Stores summaries using the Python `pickle` module. 5. **Grouping Summaries by Category**: Organizes summaries in categories for structured presentation. 6. **Generating Top Articles by Category**: Selects top articles using a Sentence Transformer model. 7. **Summarizing Daily Cache**: Condenses daily news using the BART Large CNN model. 8. **Generating Super Summary**: Compiles a comprehensive summary using GPT-4. 9. **Converting to Audio**: Transforms the Super Summary text into an audio file using Eleven Labs TTS. 10. **Uploading to SoundCloud**: Manually uploads the audio file to SoundCloud. 11. **Social Media Promotion**: Generates promotional content for social media platforms. ## Technologies - **AI Models**: GPT-3.5 Turbo, GPT-4, BART Large CNN, Sentence Transformers. - **Text-to-Speech**: Eleven Labs TTS. - **Data Handling**: Python, PyTorch, Goose3, Feedparser, `pickle`. - **Geocoding and Mapping**: GeoPy. - **Social Media Integration**: Custom scripts utilizing GPT models. ## Setup and Configuration To run NewsPlanetAI, users need to set up their own website with FTP details for uploading content. Additionally, a `suite_config.ini` file must be created in the `modules` folder with the necessary configurations, such as API keys, model details, and category settings. ### Configuration File Structure ```ini [General] UseTqdm = True TruncateMaxTokens = 2000 [FTP] Host = ftp.fakenews.com User = cortex Password = AFancyDerickDotCom Directory = /public_html/ [Cache] Directory = cache MaxAgeHours = 1 WeeklyCacheFile = cache/your_overall_cache.pkl DailyCacheFile = cache/daily_summaries.p LocCacheFile = cache/locations.pkl [Summaries] MaxSummaryLength = 800 MaxCacheAgeHours = 12 [Headlines] Categories = World News, US News, ... [Models] CategorizeHeadlines = gpt-3.5-turbo-1106 SummarizeArticles = gpt-3.5-turbo-1106 SummarizeSuperSummary = gpt-3.5-turbo-1106 GetSuperSummary = gpt-4-1106-preview SimilarityModel = sentence-transformers/all-MiniLM-L6-v2 [Retry] SummarizeArticlesRetries = 3 SummarizeArticlesWaitTimeSeconds = 2 SummarizeSuperSummaryRetries = 3 SummarizeSuperSummaryWaitTimeSeconds = 2 [OPENAI] OPENAI_API_KEY = YourOpenAIKey [THRESHOLDS] SIMILARITY_THRESHOLD = 0.7 TOP_N_ARTICLES = 1 [logging] level = WARNING [telegram] api_id = getyourown api_id = getyourown api_hash = getyourown phone_number = +getyourown [TWITTER] api_key = getyourown api_secret_key = getyourown access_token = getyourown-getyourown access_token_secret = getyourown ``` ### Running the System The system is designed to be run periodically (e.g., every hour) using a task scheduler that executes the `main.py` script. ## Contributions NewsPlanetAI is an open-source project. Contributions, suggestions, and feedback are welcome to enhance its functionality and accuracy.

近期下载者

相关文件


收藏者