Running Hadoop or Spark locally can be a nightmare. There are many parameters to tweak, and Java is slow, memory inefficient and the JVM imposes heap caps.
In this talk, I present an efficient MR framework, GoMR, that beats Spark on the canonical wordcount app by an order of magnitude.
Frustrated trying to run/debug MapReduce applications on my laptop, I decided to spin my own MR framework to solve the problems of efficiency, configuration, and error messages. Go is a compiled language and has inherent support for distribution in the form of channels. I took advantage of these two strengths to create GoMR.
In this talk, I give an overview of the framework, the programming style I used to create it, and an evaluation of the framework against a current state of the art system, Apache Spark.
I hope this work will encourage a move away from the cumbersome JVM, and spark a next-generation of efficient distributed compute frameworks.