GoMR: A MapReduce Framework for Go

Date: May 26, 2020

More information here and here.
Slides
Code

Abstract

Running Hadoop or Spark locally can be a nightmare. There are many parameters to tweak, and Java is slow, memory inefficient and the JVM imposes heap caps.

In this talk, I present an efficient MR framework, GoMR, that beats Spark on the canonical wordcount app by an order of magnitude.

Description

Frustrated trying to run/debug MapReduce applications on my laptop, I decided to spin my own MR framework to solve the problems of efficiency, configuration, and error messages. Go is a compiled language and has inherent support for distribution in the form of channels. I took advantage of these two strengths to create GoMR.

In this talk, I give an overview of the framework, the programming style I used to create it, and an evaluation of the framework against a current state of the art system, Apache Spark.

A more complete description can be found here, and its evaluation here.

I hope this work will encourage a move away from the cumbersome JVM, and spark a next-generation of efficient distributed compute frameworks.

Connor Zanin

Abstract

Description