Sentiment Analysis with Tiingo and Google
A few days ago I was aimlessly surfing the net when I realized that both Microsoft and Google have text sentiment analysis APIs which are both simple to use and free to try out.
This got me wondering if I could apply it to financial news and potential build some kind of market/ticker sentiment indicator. The answer is kinda.
Below is the output of the script that I ended up with after a few hours of hacking.
Story Time
Enter Microsoft...
The first API I had seen was actually the Microsoft one, as I was wanting to play with Microsoft's Azure for totally unrelated purposes. As soon as I saw it, I knew what I wanted to do - Tiingo has a fantastic news API which lets you search for news items based by financial tickers. It's super simple to use, you just do a GET on a URL with the tickers you want to look up on, like the following (requires paid Tiingo account): https://api.tiingo.com/tiingo/news?tickers=msft,googl,aapl,fb and it'll bring back a whole bunch of news articles about the two tickers (sample output below)So the two obvious things that jumped out to me here were the 'title' and the 'description' fields my first thought was to grab all of these, smash them together into one large string and get me some sentiment! Part of the plan involved taking the numerical average of the number spat out for each title/description combo by Microsoft's engine. Sadly... this didn't yield great results, and mostly seemed to return 0.5 all the time.
So, back to the drawing board... I reasoned that perhaps just the titles and the descriptions of the articles wasn't quite enough to get meaningful values back from Microsoft - so the next step I thought, would be to grab the contents of each article and uploading that.
Unfortunately, that requires screen scraping the news sites and that didn't seem like something I wanted to be writing over a lunch break. Not being a quitter, I did what every reasonable programmer would do - googled to see if anyone had done the work already for me. It turns out, someone has written a decent one size fits all scraper - it sometimes returns some slightly weird data, but it's good enough for this proof of concept.
So, with a copy of newspaperjs now in hand, I set about grabbing all of the news articles and uploading them to Microsoft. After an hour or so of re-remembering bits of Node.js I'd forgotten about since giving up coding in it as my day job, I had something that worked... aaaand basically got 0.5 as a result every time again. Nuts.
Thankfully not much time was spent on this as Microsoft's API is super easy to work with I essentially just followed the guide here copying the code from it and making only a few minor adjustments to grab titles from Tiingo first before sending the 'documents' to Microsoft. Genuinely, I'm super impressed with how easy this was - just a few years ago this wouldn't have been remotely possible.
In retrospect, I actually think Microsoft's API can still do what I want it to do, but I abandoned it because the initial results didn't seem fruitful.
Enter Google...
Surely, I thought, Microsoft can't be the only one with a free/cheap API capable of doing sentiment analysis on text, and 30 seconds later, I proved myself right. Google also have one. Google's demo instantly impressed me with the amount of depth it appeared to be doing on the text. So I gave it a shot. Google's documentation isn't quite as easy to follow as Microsoft's but I got there in the end.And guess what? While there was more variance in the results, it was much the same - everything came back nearly neutral (or weird)
So I changed my plan, I stopped trying to calculate an average and instead, taking inspirations from Google's demo page... categorized each article into 'Positive', 'Neutral' and 'Negative' - this seemed to yield more interesting results and I think I finally have something I can actually work with.
Usually Tiingo will return 100 news articles at time, but sometimes the scraper will fail to get info about them, hence why the "Total/Actual" field, it shows the attempted article hits, vs what ones we actually managed to get data for.
Unfortunately, time marches on and for now this one will sit on the shelves. Some interesting future experiments would be to switch back to Microsoft and/or only supply the titles/descriptions - and probably more usefully, plot the Positive, Neutral and Negative counts for a given ticker over time to see what/if there is any change in sentiment towards it over time.
Below I've attached the full code - please note it's an absolute shambles, bad programming practice and I've left half of the Microsoft related code in there too, so don't use this for anything serious. You'll need to read the documentation from Google here on how to set up your API account and use your local token in an environment variable, and you'll also need a paid Tiingo account and API key if you want to play with this at home.
'Till next time - peace out home-slices! ✌️


Comments
Post a Comment