y-and-ETF-Global-News-Sentiment-Analysis-R-Python

所属分类:金融证券系统
开发工具:Jupyter Notebook
文件大小:0KB
下载次数:0
上传日期:2023-12-22 14:47:38
上 传 者sh-1993
说明:  该项目背后的想法是同时从多个来源获得市场新闻的概述。
(The idea behind this project is to get overview of the market news from multiple sources at the same time.)

文件列表:
ETF News Sentiment Generator.R
Scrapping ETF News Website Link From .ipynb
data_file.csv

# Equity-and-ETF-Global-News-Sentiment-Analysis-R-Python ![Screenshot 2023-12-22 041613](https://github.com/kantrishav/Equity-and-ETF-Global-News-Sentiment-Analysis-R-Python/assets/28995985/64d5732c-9160-4034-b530-9798eba00175) More than eight-in-ten U.S. adults (86%) say they get news from a smartphone, computer or tablet. Stock news and editorials are one of the biggest source of information on companies financials , goals , outreach , environment commitment , downsides and many more attributes. An attentive investor wants to read as many editorial and news blogs on their favorite equities , ETFs or any other assest and debt class. Mostly , a average person can read maximun 3-5 news blogs daily. As a human it is difficult to read all news available on internet before taking any market decision. Hence , the idea of StockIQ , a simple sentiment analysis model that capture all the news website according to a reader choice and provide them a visual alternative i.e news in form of vizualisation. The project consists of 4 Parts: 1 : Searching all news websites link according to user choice of companies.(Python is Used For This Process) 2 : Extracting content from each website and saving it in a data structure.(R is Used For This Process) 3: Analyzing the extracted courpus and performing , Unigram , Bi-gram and N-gram , tf-idf analysis (R is Used) 4: Visualizing the analysis using ggplot2 , shiny and plotly. (R is Used For This Process) image One crucial aspect of this model is the incorporation of the Affin lexicon, a powerful tool in natural language processing. The Affin lexicon, short for Affective Norms for English Words, provides a comprehensive sentiment analysis framework by assigning sentiment scores to words based on their emotional connotations. Integrating this lexicon into the model enhances its ability to capture nuances in language, enabling a more accurate understanding of sentiment within the context of financial news. image The R block code is also built to tackle non-working website that can't be scrapped. Websites which passes via this block is used for generating insights. #Step 1: Data Collection Package Installation: The required R packages (textdata, tidyRSS, rvest, urltools, tibble, tidytext, dplyr, ggplot2, cowplot, wordcloud2) are installed. Library Loading: The necessary libraries are loaded into the R environment. Data Loading: The code reads a CSV file named "data_file.csv" containing news links related to the QQQ ETF. Web Scraping Loop: The code iterates through each news link, attempts to fetch the HTML content, and checks for successful retrieval. If successful, it prints "success" and records the URL, status ("Pass"), and the number of paragraphs in the news article. If unsuccessful, it prints an error message and records the URL, status ("Fail"), and 0 for the number of paragraphs. #Step 2: Data Processing and Sentiment Analysis Filtering: URLs with a status of "Pass" and a positive number of paragraphs are retained. Loop for Sentiment Analysis: For each retained URL, the code extracts the domain name, reads the HTML content, and processes the paragraphs. Text Processing: The paragraphs are tokenized, filtered for stop words, and sentiment analysis is performed using the Bing and Afinn lexicons. Data Aggregation: The results of sentiment analysis are aggregated and visualized using ggplot2. #Step 3: Visualizations Domain Frequency: The frequency of each domain is calculated and visualized. Overall Sentiment Density Curve: Histogram and density plots visualize the overall sentiment scores. Global Media Sentiment: Bar plots depict the sentiment distribution across different media sources. Word Cloud: A word cloud is generated based on the frequency of domain names. Final Visualization: The plot_grid function combines all the visualizations into a single display for easy interpretation.

近期下载者

相关文件


收藏者