I'm trying to implement a dot product using pyspark in order to learn pyspark's syntax.
I've currently implemented the dot product like so:
import operator as op
from functools import reduce
def inner(rdd, rdd2):
return (rdd.zip(rdd2)
.map(lambda x: reduce(op.mul, x))
.reduce(lambda x,y: x + y)
)
My solution feels inelegant (particularly my lambda function). I'd like to know if there would be a more 'pysparkian' way of writing this.
Furthermore, are there performance considerations that I should be thinking about with regards to this problem (i.e. does my dot product solution not scale well)?