Sentiment analysis for stocks in S&P 500

Vu Hiep
4 min readSep 12, 2020

--

Motivation

In my last post, I have detailed my valuation for Tesla and concluded that mood and momentum are in the driver’s seat in Tesla’ incredulous ride to a price of USD 400 per share (USD 2000 per share before stock’s split), underpinning the importance of taking such factors in consideration even in value investing, for investors should want these to also work in their favors.

As a result, in this post I use data (financial news) scraped online to quantify market’s sentiment for each stock in S&P 500 and to visualize it in ways that could explain certain movement in the market and assist in investment decisions.

Getting started — Data

I scraped news titles and their release time from Finviz.com. Since the website incorporate news from a variety of sources (Bloomberg, Motley Fool, Reuters, etc.), it reduces selection bias from the data collected. I include time release of the news because I will time-weight the news in the process of calculating stocks’ sentiment analysis (the more recent a news, the larger effects it has on the stock’s overall score). Overall there are 102 stocks out of 500 that have sufficient data for further sentiment analysis.

Sentiment Analysis — Methodology and algorithm

For each stock, I will use a combination of nature language processing (NLP) packages such as spacy (for lemmatization), nltk (for stop-word removal), and vader (for sentiments analysis).
Lemmatization is a common data-cleaning process in NLP. It returns the word to its “original form” so that it gets easier for sentiment analysis algorithms to process. For example, in the sentence “he goes to the park and exercises.”, the word “goes” will be transformed into its original form — go, while he will be reclassified generally as “Pronoun”.
Stop-words in English are common word such as “the”, “to”, “a”, etc. These words are there for grammatical purposes and appear quite often but do not convey much meaning. As a result, these words should be removed before the whole sentence is analyzed.
After also removing punctuation marks, the sentence is ready for sentiment analysis. The Vader algorithm scores each word. Then the scores are added up and weighted to get an overall score of the whole sentence (the news’ title).
All news in a given day are weighted equally, with news from today are weighted 1, while news from the past are weighted less than 1 (news from n days ago are weighted 1* 0.95^n).

Results

The following graph shows the distribution of sentiment scores produced by the Vader algorithm as well as the top 5 and bottom 5 companies that have the most positive (>0) and negative (<0) sentiment scores:

Surprisingly, Duke Energy Group (DUK) has the best score. Its new commitment to transition into sustainable energy has earned the company a positive popularity among investors. In the same industry, ExxonMobil (XOM) is not well perceived by investors, however. Still reeling from oil price’s recent collapse, the company is recently kicked out of Dow Jones Industrial Average, adding to the severity of the company’s unpopularity.

On another note, Disney (DIS) is not doing that badly since the crisis. Despite its Disneylands being closed worldwide, its entry to the lucrative online streaming business saw the stock climbing back up from the mid-March low. However, the recent release of Mulan, which attracts criticism among human-right activists for being filmed in a politically controversial region in China, is not appreciated by investors, resulting in a low sentiment score for the stock.

Looking across sectors, energy stocks are doing the worst sentiment-wise, having not recovered from the recent tumultuous period in the market. Despite faltering demands, Saudi Arabia continues pumping oil out of the ground, driving oil price down further, thus putting pressing on margins of energy companies.

Looking across industries, even though Communication Equipment, Gaming and Media have enjoyed a good bullish run as people adjust to working from home, their negative popularity scores reflect corrections of the market for all tech stocks.

The ways forward

There are a variety of applications of sentiment analysis. It is especially a useful tool in pricing multiples and explaining multiples. As opposed to intrinsic valuation, the essence of multiples is to understand how investors perceive other similar assets, which is why sentiment definitely plays a huge part in such calculations and should be incorporated in analyzing multiples.

Even for value investors, understanding sentiment and taking advantages of mood and momentum to time their investments better is crucial to improve returns, for timing is also one of the most important aspects of value investing.

Bonus

Here is the link to my data and to my code for this project:

--

--

Vu Hiep
Vu Hiep

Responses (1)