reduceByKey Spark

Basically reduceByKey function works only for RDDs which contains key and value pairs kind of elements(i.e RDDs having tuple or Map as a data element). It is a transformation operation which means it is lazily evaluated. We need to pass one associative function as a parameter, which will be applied to the source RDD and will create a new RDD as with resulting values(i.e. key value pair). This operation is a wide operation as data shuffling may happen across the partitions.

Following videos will explain briefly along with example. Please follow the youtube channel for further updates.

Example from following video.

val x = sc.parallelize(Array((“a”, 1), (“b”, 1), (“a”, 1),(“a”, 1), (“b”, 1), (“b”, 1),(“b”, 1), (“b”, 1)), 3)

val y = x.reduceByKey((accum, n) => (accum + n))

y.collect

Leave a Reply

Your email address will not be published. Required fields are marked *