Spent a day or two this weekend working through a creditcard fraud example to give a go at running Keras via r-studio. Along the way I dug into virtualenv which I hadn’t used before.
The example I was working with came from here:
https://tensorflow.rstudio.com/blog/keras-autoencoder.html
The same site had some other blogs that look possibly interesting. In the example, you use a credit-card fraud data set (I think transactional). From a distance, the steps are pretty easy and I’ve documented them via r-markdown. You first normalize the data so they all run from 0 – 1. That got me acquainted with purrr and it’s map functions. The biggest challenge ended up being linking keras and tensorflow to R-studio. These are really python packages and R offers a way to link to python packages. However, you need to be using anaconda or virtualenv. Although I use anaconda at work quite a bit, I’m not eager to go to it at home just yet. I prefer to be more of a micromanager at this point just because it helps me learn things at a deeper level. But I hadn’t heard of virtualenv. It appears to be similar to the env functionality of anaconda. You can create a ‘local’ (a.k.a. ‘virtual’) environment with a standalone python binary and whatever packages you want. It comes with it’s own pip installer as well. So I installed it,
apt-get install python-virtualenv
You create a virtual environment as follows:
virtualenv testvirtualenv
or
virtualenv –system-site-pkgs testvirtualenv
The 2nd for lets the virtualenv access the system python packages as well.
Now in Rstudio, you do
install_keras(virtualenv)
Problem of the day now occurs: Rstudio proceeds to create its own virtual environment and then install tensorflow and keras into it. During installation, it loads pip via an import statement and talks to pip’s Main function. However, as of pip 10.0, there is no longer a Main function and it fails. Solution: do the install via pip in my own virtual env then tell R which virtual env to use:
virtualenv testvirtualenv
source testvirtualenv/bin/activate.sh
easy_install -U pip
pip install –upgrade tensorflow
use_virtualenv(‘/home/steve/myvirtualenv’)
I can’t remember if I also installed keras there.
Afterwards, the rest of the keras autoencoder example went reasonably well – after accounting for some rookie mistakes on my part, some of which appeared bad until after you figure out what was wrong. For example, in normalizing the data I had produced a dataset with no data. I hadn’t realized it until much later when creating the model. Need to check that things go as expected more aggressively. purrr offers a million map function which is what I’m using for normalization and you have to be a little careful about which one to use, not just one that doesn’t cause an error.
A section at the end of the tutorial also talks about using CloudML and Google’s cloud infrastructure to support tuning of ‘hyperparameters’ (which drive the training of the net – e.g. # of epochs). I hadn’t approached google’s cloud infrastructure yet but (not surprisingly), it looks pretty neat too.
Still couldn’t get a sensible AUC at end though.