Sentiment-analysis-of-financial-news-data 联合开发网

Pudn.com > 下载中心 > 自然语言处理 > Sentiment-analysis-of-financial-news-data

Sentiment-analysis-of-financial-news-data

scraper sentiment-analysis NLTK news-data

所属分类：自然语言处理
开发工具：Python
文件大小：28KB
下载次数：0
上传日期：2022-07-29 22:29:54
上传者：sh-1993

说明：金融新闻数据的情绪分析，股价新闻的情绪分析
(Sentiment-analysis-of-financial-news-data,Sentiment Analysis of news on stock prices)

文件列表:

LICENSE (1072, 2019-12-10)
__pycache__ (0, 2019-12-10)
__pycache__\pandas.cpython-35.pyc (565, 2019-12-10)
__pycache__\search_scrape.cpython-35.pyc (3577, 2019-12-10)
code (0, 2019-12-10)
code\archive_scraper.py (11867, 2019-12-10)
code\filter.py (3150, 2019-12-10)
code\merger.py (5356, 2019-12-10)
code\parser.py (2123, 2019-12-10)
code\quick_scraper (0, 2019-12-10)
code\quick_scraper\quick_scraper (0, 2019-12-10)
code\quick_scraper\quick_scraper\__init__.py (0, 2019-12-10)
code\quick_scraper\quick_scraper\items.py (292, 2019-12-10)
code\quick_scraper\quick_scraper\middlewares.py (3609, 2019-12-10)
code\quick_scraper\quick_scraper\pipelines.py (293, 2019-12-10)
code\quick_scraper\quick_scraper\settings.py (3163, 2019-12-10)
code\quick_scraper\quick_scraper\spiders (0, 2019-12-10)
code\quick_scraper\quick_scraper\spiders\__init__.py (161, 2019-12-10)
code\quick_scraper\quick_scraper\spiders\quick_scraper.py (5804, 2019-12-10)
code\quick_scraper\quick_scraper\spiders\scrape_with_bs4.py (3814, 2019-12-10)
code\quick_scraper\scrapy.cfg (269, 2019-12-10)
code\search_scrape.py (3410, 2019-12-10)
code\sentiment.py (1854, 2019-12-10)
data (0, 2019-12-10)
data\empty.txt (0, 2019-12-10)
regexList (2145, 2019-12-10)
requirements.txt (123, 2019-12-10)
tracker.data (0, 2019-12-10)

# Sentiment-analysis-of-financial-news-data This was developed as part of a study oriented project for 6th sem 2016-2017. It has been evolving since then. Currently it fetches all the urls and scrapes data from the google search results and news archives of * economictimes.indiatimes.com * reuters.com * ndtv.com * thehindubusinessline.com * moneycontrol.com * thehindu.com. ## Setup Download the chrome driver from here [link](http://chromedriver.chromium.org/downloads). Unzip it and then place the chromedriver in the root directory. To setup the dependencies, do the following: ``` pip install -r requirements.txt. ``` ## Usage Currently the pipeline is available till merging step. **Scrapy and sentiment integration are to be done.** The complete help to use the following repo is as follows: ``` usage: parser.py [-h] -s START -e END -w [WEB [WEB ...]] [-r REGEXP] Runs the entire pipeline from scraping to extracting sentiments. optional arguments: -h, --help show this help message and exit -s START, --start START Start date,format-(dd/mm/yy) -e END, --end END End date,format-(dd/mm/yy) -w [WEB [WEB ...]], --web [WEB [WEB ...]] Specify the website number to scrape from separated by space 0=economictimes.indiatimes.com 1=reuters.com 2=ndtv.com 3=thehindubusinessline.com 4=moneycontrol.com 5=thehindu.com -r REGEXP, --regexp REGEXP Complete path to the regex list file for companies. For template refer regesList file at root directory of this repo. By default, it runs the regexList file present at root directory of this repo. ``` A sample run from **01/01/2014** to **01/10/2014** for * economictimes * reuters * thehindubusinessline would be given as follows: ``` python parser.py -s 01/01/2014 -e 01/10/2014 -w 0 1 3 ``` Note: This assumes that the companies for which the data have to be fetched are specified in the default file,regexList. If user wants to specify some other file, it can be provided by using the **-r** parameter. Make sure to follow the template as given in regexList file. ## File Structure ``` . ├── code │ ├── archive_scraper.py │ ├── filter.py │ ├── merger.py | ├── parser.py │ ├── quick_scraper │ │ ├── quick_scraper │ │ │ ├── __init__.py | | | .. │ │ │ ├── settings.py │ │ │ └── spiders │ │ │ ├── __init__.py │ │ │ ├── quick_scraper.py │ │ │ └── scrape_with_bs4.py │ │ └── scrapy.cfg │ ├── search_scrape.py │ └── sentiment.py ├── data │ └── empty.txt ├── LICENSE ├── README.md ├── regexList ├── requirements.txt └── tracker.data ``` ## File Description USEFUL_FILES -> regexList- Contains the regex for the company name to enhance search results and get more relevant results. -> search_scrape.py- Scrapes urls from the google search results. -> filter.py- Filters the relevant urls collected from google search results. -> archive_scraper.py - Scrapes urls from archives of various websites. -> scrape_with_bs4.py- Scrapes the content from the scraped urls. -> quick_scraper.py- Scrapes content parallely and faster by sending multiple requests per second. -> merger.py- Merges the data collected from google search results and news archive. -> sentiment.py - Calculates the sentiment of the collected data. ## Note * RegexList is used to fetch more relevant urls from the archives and not from the google search results. So there should be consistency in the names of company used across google search ( search_scrape.py ) and those mentioned in the regexList. ## Contribute If you would like to contribute to this repo or have any ideas to make this better, feel free to submit a pull request or contact me at gyaneshmalhotra [at] gmail [dot] com. For starters, there are some tasks available in the project tab of this repo on which you can start working on.

近期下载者：

相关文件：

评论：[我要评论] [举报此文件]

收藏者：