Status

So on Muskegon, my playspace for tech stuff, I now have:

Hadoop 3.0.0 alpha4 pseudo-distributed mode using Yarn.
Spark 2.2.0 for Hadoop 2.7.0
TensorFlow
Rstudio
Jupyter

Missing from the stack are:

Hive (think I should wait for Hadoop 3.0.0 stable to be released)
Neo4J

I’d also like to repeat the Hadoop install and then extend it to a cluster-of-2 install.

I’d also like to play with Docker.

I think for now, I’m going to aim at exploring Spark even without the whole large data infrastructure. This way I can get familiar with the API more generally. 2nd priority is using both Rstudio and spark to tackle some of the Kaggle problems and data I’ve espied.

While doing this, I’ll keep an eye on when the 3.0.0 release of Hadoop emerges and look for an opportunity to get Neo4J or a another similar system put up.

Status

Published by stevevsthoughts

Leave a comment Cancel reply

Share this:

Related

Published by stevevsthoughts

Leave a comment Cancel reply