NewsSentimentAnalysis

所属分类:NumPy
开发工具:Jupyter Notebook
文件大小:203KB
下载次数:0
上传日期:2018-05-24 00:21:49
上 传 者sh-1993
说明:  在这个练习中,我使用了Python库——pandas、numpy、matplotlib.pyplot、tweepy、seaborn、datetime、VADER-JSON遍历和Twitter的API,根据来自五个不同新闻组织——BBC、CBS、CNN、Fox news和New York times——的推文,对新闻情绪进行情绪分析。
(In this exercise I utilized Python libraries - pandas, numpy, matplotlib.pyplot, tweepy, seaborn, datetime, VADER - JSON traversals, and Twitter s API to perform a sentiment analysis on the news mood based on tweets from five different news organizations - BBC, CBS, CNN, Fox News, and New York times. ,)

文件列表:
HW7-NewsMood_AMD.ipynb (131719, 2018-05-24)
Overall_Sentiment_based_on_Twitter.png (19180, 2018-05-24)
Sentiment_Analysis_of_Media_Tweets.png (46412, 2018-05-24)
sentiments_array_pd.csv (32249, 2018-05-24)

## News Sentiment Analysis In this exercise I utilized Python libraries - pandas, numpy, matplotlib.pyplot, tweepy, seaborn, datetime, VADER - JSON traversals, and Twitter's API to perform a sentiment analysis on the news mood based on tweets from five different news organizations - __BBC, CBS, CNN, Fox News, and New York times__. ## Three observable trends based on the data below- 1. The scatterplot reflecting the sentiment for the most recent one hundred tweets on Twitter for five major news organizations was highly variable ranging anywhere from ~-0.95 to +0.95, with -1 being the most negative sentiment, and +1 being the most positive sentiment, based on the VADER (Valence Aware Dictionary and sEntiment Reasoner) Sentiment Analysis. Visually it was difficult to determine which news organizations had the most positive or negative sentiments based on the scatterplot alone. 2. Numerous points on the scatterplot were located at the 0 (zero) y-intercept. My first assumption was that these points simply represented an overwhelming number of tweets with neutral sentiment, but a closer look at the tweet text indicated that several of these “neutral” points also represented tweets in languages other than English, which could not be evaluated by VADER, and were, therefore, given a compound score of 0. I added a filter to my code so that only English tweets were counted and evaluated, but a few tweets in other languages still managed to come through in my analysis. 3. A bar plot representing the mean tweet sentiment made it easier to interpret the overall sentiment at a specific time for each news organization as being more positive or more negative. Having said that, the sentiment means for the same news organization varied tremendously from hour to hour, and day to day (data not shown). When I ran my code two days ago, which coincided with the release of the book “Fire and Fury: Inside the Trump White House” by Michael Wolff for example, all news organizations presented a negative mean sentiment. The bar plot below represents an analysis performed Sunday night (01/07/2018) with positive mean sentiment values for BBC, CBS and the NY Times (ranging from +0.06 to +0.09, a slightly negative mean for Fox News (- 0.03) and a negative sentiment mean for CNN (-0.1). I noticed that several of the tweets were about the Golden Globe Awards, which may partially explain the overall boost in tweet sentiment this evening, compared to earlier today. Overall, it would be best to sample tweets throughout a couple of months or a year to get a better idea of the overall sentiment for each news organization on Twitter. ```python # Import dependencies import tweepy import pandas as pd import matplotlib.pyplot as plt %matplotlib inline import seaborn as sns import json import numpy as np from IPython.display import display from datetime import datetime # Import and Initialize Sentiment Analyzer from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer analyzer = SentimentIntensityAnalyzer() ``` ```python #Set up and call config document import yaml TWITTER_CONFIG_FILE = 'auth.yaml' with open (TWITTER_CONFIG_FILE, 'r') as config_file: config = yaml.load (config_file) #print(type(config)) ``` ```python # Twitter API Keys access_token = config ['twitter']['access_token'] access_token_secret = config ['twitter']['access_token_secret'] consumer_key= config['twitter']['consumer_key'] consumer_secret = config ['twitter']['consumer_secret'] #print(access_token, access_token_secret, consumer_key, consumer_secret) ``` ```python # Setup Tweepy API Authentication auth = tweepy.OAuthHandler(consumer_key, consumer_secret) auth.set_access_token(access_token, access_token_secret) api = tweepy.API(auth, parser=tweepy.parsers.JSONParser()) # Target Search Term news_orgs = ("BBC", "CBS", "CNN","FoxNews","nytimes") # Create arrays to hold sentiments for all news organizations all_sentiments=[] sentiment_means=[] # Loop through all target news organizations for org in news_orgs: # Reset counter for each news_org loop counter=1 # Variables for holding sentiments compound_list = [] positive_list = [] negative_list = [] neutral_list = [] # Run search for each tweet public_tweets = api.search(org, count=100, result_type="recent",lang='en') #print(json.dumps(public_tweets["statuses"], indent=4, sort_keys=True, separators=(',',': '))) # Loop through all tweets for tweet in public_tweets["statuses"]: # Run Vader Analysis on each tweet compound = analyzer.polarity_scores(tweet["text"])["compound"] pos = analyzer.polarity_scores(tweet["text"])["pos"] neu = analyzer.polarity_scores(tweet["text"])["neu"] neg = analyzer.polarity_scores(tweet["text"])["neg"] # Add each value to the appropriate arrays above compound_list.append(compound) positive_list.append(pos) negative_list.append(neg) neutral_list.append(neu) #print(org) #print (compound_list, tweets_ago) #print(" ") # Append all sentiments to an array all_sentiments.append({" Media" : org, "Date": tweet["created_at"], "Compound": compound, "Positive": pos, "Neutral": neu, "Negative": neg, "Tweets_Ago": counter }) # Add 1 to counter counter+=1 # Store the Average Sentiments into the array created above sentiment_means.append({" Media": org, "Compound_Mean": np.mean(compound_list), "Positive": np.mean(positive_list), "Neutral": np.mean(negative_list), "Negative": np.mean(neutral_list), "Count": len(compound_list) }) # Convert all_sentiments to DataFrame all_sentiments_pd = pd.DataFrame.from_dict(all_sentiments) all_sentiments_pd.to_csv("sentiments_array_pd.csv") display(all_sentiments_pd) #print(all_sentiments_pd.dtypes) # Convert sentiment_means to DataFrame sentiment_means_pd = pd.DataFrame.from_dict(sentiment_means) display(sentiment_means_pd) ```
... ...
近期下载者

相关文件


收藏者

Media Compound Date Negative Neutral Positive Tweets_Ago
0 BBC 0.0000 Mon Jan 08 07:04:28 +0000 2018 0.000 1.000 0.000 1
1 BBC 0.5719 Mon Jan 08 07:04:27 +0000 2018 0.000 0.850 0.150 2
2 BBC -0.6597 Mon Jan 08 07:04:27 +0000 2018 0.306 0.694 0.000 3
3 BBC 0.7906 Mon Jan 08 07:04:26 +0000 2018 0.000 0.750 0.250 4
4 BBC 0.0000 Mon Jan 08 07:04:25 +0000 2018 0.000 1.000 0.000 5
5 BBC 0.5994 Mon Jan 08 07:04:25 +0000 2018 0.075 0.717 0.208 6
6 BBC -0.5423 Mon Jan 08 07:04:25 +0000 2018 0.218 0.691 0.091 7
7 BBC 0.5719 Mon Jan 08 07:04:24 +0000 2018 0.000 0.575 0.425 8
8 BBC 0.5719 Mon Jan 08 07:04:23 +0000 2018 0.000 0.850 0.150 9
9 BBC 0.6369 Mon Jan 08 07:04:23 +0000 2018 0.000 0.755 0.245 10
10 BBC 0.6369 Mon Jan 08 07:04:23 +0000 2018 0.000 0.802 0.1*** 11
11 BBC -0.4404 Mon Jan 08 07:04:22 +0000 2018 0.253 0.***2 0.106 12
12 BBC 0.5106 Mon Jan 08 07:04:22 +0000 2018 0.092 0.683 0.225 13
13 BBC -0.4767 Mon Jan 08 07:04:22 +0000 2018 0.256 0.744 0.000 14
14 BBC 0.0000 Mon Jan 08 07:04:20 +0000 2018 0.000 1.000 0.000 15
15 BBC 0.5719 Mon Jan 08 07:04:20 +0000 2018 0.000 0.850 0.150 16
16 BBC -0.2263 Mon Jan 08 07:04:19 +0000 2018 0.087 0.913 0.000 17
17 BBC 0.0000 Mon Jan 08 07:04:19 +0000 2018 0.000 1.000 0.000 18
18 BBC 0.3612 Mon Jan 08 07:04:19 +0000 2018 0.000 0.8*** 0.102 19
19 BBC 0.0000 Mon Jan 08 07:04:18 +0000 2018 0.000 1.000 0.000 20
20 BBC 0.5719 Mon Jan 08 07:04:18 +0000 2018 0.000 0.850 0.150 21
21 BBC 0.5719 Mon Jan 08 07:04:18 +0000 2018 0.000 0.850 0.150 22
22 BBC -0.2617 Mon Jan 08 07:04:18 +0000 2018 0.127 0.785 0.088 23
23 BBC -0.***86 Mon Jan 08 07:04:18 +0000 2018 0.374 0.485 0.141 24
24 BBC -0.3595 Mon Jan 08 07:04:18 +0000 2018 0.161 0.839 0.000 25
25 BBC -0.2732 Mon Jan 08 07:04:17 +0000 2018 0.123 0.877 0.000 26
26 BBC 0.0000 Mon Jan 08 07:04:17 +0000 2018 0.000 1.000 0.000 27
27 BBC 0.0000 Mon Jan 08 07:04:16 +0000 2018 0.000 1.000 0.000 28
28 BBC 0.5267 Mon Jan 08 07:04:15 +0000 2018 0.094 0.***4 0.262 29
29 BBC -0.1027 Mon Jan 08 07:04:15 +0000 2018 0.123 0.877 0.000 30
... ... ... ... ... ... ... ...
470 nytimes 0.5719 Mon Jan 08 07:03:39 +0000 2018 0.000 0.861 0.139 71
471 nytimes -0.8519 Mon Jan 08 07:03:39 +0000 2018 0.283 0.717 0.000 72
472 nytimes 0.2732 Mon Jan 08 07:03:39 +0000 2018 0.107 0.741 0.152 73
473 nytimes 0.2732 Mon Jan 08 07:03:39 +0000 2018 0.000 0.806 0.194 74
474 nytimes 0.0000 Mon Jan 08 07:03:39 +0000 2018 0.000 1.000 0.000 75
475 nytimes -0.4767 Mon Jan 08 07:03:38 +0000 2018 0.147 0.853 0.000 76
476 nytimes 0.0000 Mon Jan 08 07:03:38 +0000 2018 0.000 1.000 0.000 77
477 nytimes 0.0000 Mon Jan 08 07:03:37 +0000 2018 0.000 1.000 0.000 78
478 nytimes -0.4215 Mon Jan 08 07:03:37 +0000 2018 0.109 0.891 0.000 79
479 nytimes 0.0000 Mon Jan 08 07:03:36 +0000 2018 0.000 1.000 0.000 80
480 nytimes 0.0000 Mon Jan 08 07:03:35 +0000 2018 0.000 1.000 0.000 81
481 nytimes -0.1695 Mon Jan 08 07:03:35 +0000 2018 0.180 0.702 0.118 82
482 nytimes -0.3182 Mon Jan 08 07:03:35 +0000 2018 0.091 0.909 0.000 83
483 nytimes 0.4939 Mon Jan 08 07:03:35 +0000 2018 0.000 0.802 0.1*** 84
484 nytimes 0.0000 Mon Jan 08 07:03:35 +0000 2018 0.000 1.000 0.000 85
485 nytimes 0.0000 Mon Jan 08 07:03:33 +0000 2018 0.000 1.000 0.000 86
486 nytimes 0.0000 Mon Jan 08 07:03:31 +0000 2018 0.000 1.000 0.000 87
487 nytimes 0.3818 Mon Jan 08 07:03:31 +0000 2018 0.000 0.885 0.115 88
488 nytimes -0.5106 Mon Jan 08 07:03:31 +0000 2018 0.320 0.680 0.000 89
489 nytimes -0.5122 Mon Jan 08 07:03:31 +0000 2018 0.212 0.788 0.000 90
490 nytimes -0.4215 Mon Jan 08 07:03:30 +0000 2018 0.109 0.891 0.000 91
491 nytimes 0.0000 Mon Jan 08 07:03:30 +0000 2018 0.000 1.000 0.000 92
492 nytimes 0.5719 Mon Jan 08 07:03:30 +0000 2018 0.000 0.861 0.139 93
493 nytimes -0.5574 Mon Jan 08 07:03:29 +0000 2018 0.375 0.625 0.000 94
494 nytimes 0.0000 Mon Jan 08 07:03:29 +0000 2018 0.000 1.000 0.000 95
495 nytimes 0.0000 Mon Jan 08 07:03:28 +0000 2018 0.000 1.000 0.000 96
496 nytimes -0.0516 Mon Jan 08 07:03:28 +0000 2018 0.239 0.606 0.155 97
497 nytimes -0.4215 Mon Jan 08 07:03:27 +0000 2018 0.109 0.891 0.000 ***
4*** nytimes 0.0000 Mon Jan 08 07:03:27 +0000 2018 0.000 1.000