Hadoop

Going through the steps to install hadoop in pseudo-standalone mode. The idea is to run on a single server but behave like a cluster. I’ve done it before, maybe 3 years ago. Want to do it again and aim at playing with Apache Spark. Lots of other items need to be installed:

  • docker: was unsure at first as there’s not a typical deb package out there, but it looks like the docker folks have their own packages with good instructions. I had also found similar instructions at the linuxbabe site. Given that they largely agree and no other higher returning search item, I started with linuxbabe’s instructions and then finished off with dockers. (Hers had what seemed to me to be a better set of apt-get prep and checks of OS, etc.). 1st step docker-ce looks good. Neat. I could even use docker to download an ubuntu image and start up a shell within it. Kind of like a VM. It was not clear to me that I had to install docker-composer. In the end, I decided to go ahead. It can be pulled from git-hub and the install directions seem pretty clear.
  • maven
  • node.js pain to install as cruft from instructions at digitalocean/setup_6.x  which creates a script that gets the packages and builds it.
  • libprotobuf9
  • bats
  • cmake
  • zlib
  • java 1.8 (used openjdk-8-jre and jdk which needed ca-certificates-java, accessing jessie-backports then apt-get install -t jessie-backport openjdk-8-jdk, ca-certificates, etc. forcing the use of the backports version). Then run update-alternatives –config java to pick the java one wants.

Now run the start-build-env.sh script within the hadoop package. This relies on docker (and apparently installs docker if needed). Looking at the build.txt file with hadoop, it’s a little murky about which steps are needed and how intensively I need to know and work with maven, or can I mostly cut-and-paste…

 

 

Leave a comment