Building an (somewhat) automated analysis suite on a Mac Part 1


As a programmer I never thought I'd say it, but... I'm warming to Excel.

However, I'm also a bit of a die hard Mac User and using Excel on a Mac is not without it's challenges, but sometimes limitations force creativity - hopefully that's what's occurred here.

Very quickly I got bored with downloading CSV data and opening it in Excel, then adding new sheets and basically copying the same process over and over for each data set.

My first thought was to try using web queries to grab CSV on demand. Unfortunately on a Mac, this process isn't very well documented and didn't seem to work very well at all, so I quickly gave up.

The next thing I thought about was writing some tools to stick stuff in a database and then pull it out (automatically-ish) into Excel. This has the added benefit that I can store data locally and I'm not reliant on the back-data that a given API has, it also meant if the API only returned JSON and not CSV, I wasn't stuck as my import tool would take care of the translation.

Sadly, out the box, Excel only supports connections to Microsoft SQL Server which is prohibitively expensive to run as a service, and I don't feel like buying a Windows machine to run it one on either.

Fortunately, Excel provides the ability to connect to databases over ODBC.

Unfortunately Microsoft only officially supports two ODBC drivers both of which cost more than I really want to pay for a feature that should come out of the box.

However with a bit of work[tm] we can coax the MySQL ODBC connector into working with Excel. I say a bit of work, because, it's not very well documented and there are several Mac specific gotchas. I actually initially tried to use SQLite3 as a database (I'm not churning that much data yet, and it's pretty portable) but I couldn't for the life of me get it working - so MySQL it is.

To hopefully save a someone else (or perhaps just my future self...) some time, I've tried to document as best I possibly can the steps I took to get it working.

How-to

The first things are are going to need are unixodbc and mysql we're going to use Homebrew to install these (if you haven't already got Homebrew installed on your system, go do so, it'll make your life orders of magnitude easier as a Mac User/Developer).

Once Homebrew is installed the commands below will get you up and running pretty quickly. It's really worth setting a password on MySQL and locking it down a bit - I'm doing this on a laptop that I travel around a lot with, and I don't really want to get hacked via MySQL when I connect to some Airport wifi.

The next thing we'll want to do is create some example data, we can enter some data to play with from the command line as follows. (example also includes the output from the commands)


Ok, great progress, now the specific tricks to get excel to talk to it (note, the order of software installation is important for some of these - I found that out the hard way!)

The first thing we need to install is the iODBC Tools this is pretty straight forward, just click the link at the top for the stable version for OS X and download the dmg file (yes it really is called mxkozzzz.dmg) - just install this like any other OS X package.

The next bit of magic we need is the MySQL ODBC connector. Again, just download and install, nothing too exotic.

You'd be forgiven at this point for thinking "Well, I've got all the software, I just need to configure it now, right?" Well, I thought that too, and this is where I wasted hours.

It turns out that Excel is a sandboxed app which is great from a security perspective as Excel (and it's Macros) can't access system components and cause true havoc - however, it complicates our life, as the MySQL connector installs into a system location which Excel can't access, and as you can probably tell from all the third party software we are installing - the error message Excel does throw when this fails miserably is less than helpful.

Thankfully the solution is really simple - we just copy the library to a location that Excel can access


Tada!

Update: 14th June 2018 - at some point I installed some updates to OS X which appears to have broken some dynamic library loading the following commands below are now also required. This is probably a horrible hack, but I already learned more about dynamic library linking in OS X than I ever wanted to in figuring this one out. This doesn't quite seem to fully solve the problem, Excel freaks out and provides zero useful info as to why. Sigh.



Now we can finally configure the thing and get Excel talking to it!

The first thing we need to do is open the 'iODBC Administrator64' app (just use spotlight to do this) - do make sure it's the 64bit version though, if you use the normal 'iODBC Administrator' app (note, no 64) bad things will happen. Probably.

You then want to click the "ODBC Drivers" screen. You'll likely find that you already have two MySQL ODBC Drivers listed - but guess what? These don't work properly (partly because we need the driver in another location) so we need to add our own.


First up, click the "Add a driver" button and you'll get a screen like the one below, the only real trick here is the make sure the Driver file name is the same as the one you moved earlier - i.e. /Library/ODBC/libmyodbc8w.so give it a description and click "ok" - with any luck, this will have added the driver.


The next thing we want to do is click on the "System DSN" tab and then "Add" to add a new source.
Then we select the MySQL Driver we set up earlier

Then we fill in the details of our database - note, it's best to use 127.0.0.1 for the server name, I tried with localhost, and it seems there are sometimes some issues with Excel/ODBC/whatever resolving localhost properly - it's solvable, but just didn't seem worth the effort when I can type the IP.

If everything has gone well, we can now see this newly added source in our list and we are done with iODBC Administrator.

Next open up Excel and a new blank work book. From here, we want to click on the "Data" tab and then "New Database Query" > "From Database"


Click on "System DSN" then select the Excel Demo DSN we set up earlier and hit OK

Depending on how you set up MySQL it might ask you for a password, so just supply it with the credentials you used earlier.

We should now be connected to our a database and able to pull data from it! Expand the tree view on the left and select the 'prices' table, this will also auto generate some SQL for you. 

Then hit the "Return Data" button and follow the instructions. And voila! We have our database data in excel!


Now, to really get an idea of how this is cool, we can add some more data into MySQL by doing an insert statement similar to what we did earlier and then clicking "Refresh All" in the "Data" tab in Excel. You'll find the data updates to reflect what is in the database.

In the next few posts, I'll detail how to automatically fetch data of interest into our database so stay tuned! 







Comments

Popular posts from this blog

Sentiment Analysis with Tiingo and Google

Building an (somewhat) automated analysis suite on a Mac Part 2