Pardigm-RiskEx

所属分类:加密货币
开发工具:Jupyter Notebook
文件大小:7903KB
下载次数:0
上传日期:2018-05-04 00:49:37
上 传 者sh-1993
说明:  比特币相关新闻事件的数据挖掘、时间序列分析和NLP
(Data Mining, Time Series Analysis, and NLP on Bitcoin Related News Events)

文件列表:
.Rhistory (0, 2018-05-04)
Archive (0, 2018-05-04)
Archive\Data (discontinued) (0, 2018-05-04)
Archive\Data (discontinued) \2mo_webapi_riskex.json (13200, 2018-05-04)
Archive\Data (discontinued) \90articles (260939, 2018-05-04)
Archive\Data (discontinued) \Dec-08-17 to Jan-08-28 (482233, 2018-05-04)
Archive\Data (discontinued) \March 26 2018 Articles.csv (499003, 2018-05-04)
Archive\Data (discontinued) \articles1000 (399787, 2018-05-04)
Archive\Data (discontinued) \marked_dates.csv (5088, 2018-05-04)
Archive\Data (discontinued) \marked_news_short.csv (7240715, 2018-05-04)
Archive\Data (discontinued) \marker (1, 2018-05-04)
Archive\Data (discontinued) \marker.csv (434, 2018-05-04)
Archive\Data (discontinued) \short (7240914, 2018-05-04)
Archive\Notebooks (discontinued) (0, 2018-05-04)
Archive\Notebooks (discontinued)\Bitcoin DateTime split and Rolling Max Demo.ipynb (76174, 2018-05-04)
Archive\Notebooks (discontinued)\Filter news by marker.ipynb (36606, 2018-05-04)
Archive\Notebooks (discontinued)\Final Analysis - KMeans.ipynb (25179, 2018-05-04)
Archive\Notebooks (discontinued)\Final Analysis - Sentiment.ipynb (25179, 2018-05-04)
Archive\Notebooks (discontinued)\Final Cleaning and Merging.ipynb (39065, 2018-05-04)
Archive\Notebooks (discontinued)\NLP - Manana .ipynb (109521, 2018-05-04)
Archive\Notebooks (discontinued)\NLP Clustering Articles and Analysis.ipynb (82197, 2018-05-04)
Archive\Notebooks (discontinued)\NLP Removing and identifying irrelevant articles.ipynb (57983, 2018-05-04)
Archive\Notebooks (discontinued)\NewsAPI for Riskex.ipynb (20739, 2018-05-04)
Archive\Notebooks (discontinued)\Webscraping for Riskex.ipynb (35174, 2018-05-04)
Archive\Notebooks (discontinued)\bitcoin price data.ipynb (68938, 2018-05-04)
Archive\Notebooks (discontinued)\data_construction.ipynb (181008, 2018-05-04)
Archive\Notebooks (discontinued)\gTrends for RiskEx.ipynb (26611, 2018-05-04)
Background and Resources (0, 2018-05-04)
Background and Resources\Bitcoin price data sources list .docx (8286, 2018-05-04)
Background and Resources\Research for NewsBot.docx (14227, 2018-05-04)
Background and Resources\The impact of press releases on stock prices.pdf (849394, 2018-05-04)
Background and Resources\Understanding Business Reporting and Financial Journalism.pdf (170887, 2018-05-04)
Background and Resources\dataX-Spr18 (0, 2018-05-04)
Background and Resources\dataX-Spr18\RiskEx + UC Berkeley DataX - Spring 2018 Project Summary .pdf (81785, 2018-05-04)
Background and Resources\dataX-Spr18\RiskEx Low Tech Demo - Data-X Spring 2018 UC Berkeley.pptx (5119619, 2018-05-04)
Background and Resources\dataX-Spr18\Spr18 Data-X Team Introductions.pdf (67093, 2018-05-04)
Data (0, 2018-05-04)
Data\Cleaning final articles from Jan to April.ipynb (145114, 2018-05-04)
Data\Price change marker.ipynb (62578, 2018-05-04)
... ...

# RiskEx-News and Their Effect on Bitcoin Price ## Toward a Better RSS Recomendation System ### Introduction and Motivation News events continue to be of outmost importance during the price discovery process of any asset. Some news events -- such as press releases -- have been found to represent a potential explanation for up to 24% of an asset's price movements. We seek to understand and classify what causes a news event to have a statistically significant effect on price. In particular, we aim to implement machine learning (ML) processes, and Natural Language Processing (NLP) techniques to extract what traders intuitively know when they choose which news articles/publications/news-beats to listen to, and consequently base their trading decisions on. The following represent a collaborative project for identifying, processing, predicting, and recommending relevant information from news events to sophisticated traders. ### Methodology For this project we decided to split our overall scope into three steps. #### I. Data Collection 1) Web Scraping and API utilization for pulling news articles 2) IMetadata Extraction from the articles 3) Preprocessing and cleaning the articles 4) Collecting and cleaning Bitcoin price data #### II. Data Processing 1) Filtering of relevant news content with relevant keywords 2) Preprocessing of content for modeling purposes (vectorization, text cleaning etc.) #### III. Analysize and Classify 1) Perform Clustering Analysis on articles 2) Classify events based on sentiment analysis 3) Identify significant price movements through event study methodology 4) Establish a causal relationship between a news event and its effect on the price of Bitcoin ### Results and Findings \ ### Next Steps Further steps will involve the development of a learning recommender system built around both supervised and unsupervised learning methods. In particular, based on the foundation established by our team, we believe the following projects could be implemented to increase the strength and capacity of our model. 1. Acheive better granularity of data - create series of crawlers that extract the desired granular data. 1.a Examples include: pricing data, Google Trends data, etc. 2. Create two tiers of data acquisition processes. 2.a Broad spectrum crawlers should be scheduled to stay up to data with current events, while minimizing load. 2.b Event driven crawlers should be developed around predictive modeling and/or event detection when a significant price movement occurs. 3. Implement a predictive pricing model for asset classes. 3.a Use pricing predictions to trigger searches for relevant articles, and evaluate pricing and recommender models. 3.b Predictive pricing models could also be used for arbitrage purposes, and optimization of business processes. 4. Implement a distributive compuatition architecture 4.a In order to implement just-in-time analysis, future team will nead to export processes to cloud. 5. Increase the quality of prediction by instituting a magnitude/directional component to our time series modeling. ## Start Here: Python was used for all data gathering, cleaning, and modeling purposes. Multiple python notebooks (Jupyter) were written in order to carry out project. To execute our procedure follow the following in order: 1) Scrapping (in order): news_to_features(1o3) url_to_contents(2o3) many_df_to_one(3o3) 2) Time Series Modeling and Event Detection (in order): price_change_marker filter_news_by_marker 3) Natural Language Processing and Formulations (in order): clean_articles_final(1o2) clean_articles_final(2o2) 4) Classificatin and Modeling: Word2Vec_and_KMeans_clustering Final_Classification ### A brief consideration: ##### The majority of labor associated with this project was consumed by data gathering and cleaning. Should you choose to expand on our efforts, please reach out to any member directly and we can share the preprocessed and cleaned data sets we used to train our models. Or, if you find a better, faster way to get and clean data, go with that (and let us know!)

近期下载者

相关文件


收藏者