I am new to Spark and Scala and I have solved the following problem. I have a table in database with following structure:
id name eid color
1 John S1 green
2 Shaun S2 red
3 Shaun S2 green
4 Shaun S2 green
5 John S1 yellow
And now I want to know how many times a person is red, green or yellow. So the result should be like this
name red yellow green
John 0 1 1
Shaun 1 0 2
I have written this code and it solves the problem, But I am not sure is this the best way to do it. It think my code is large for this small problem and it can be done with smaller code and with best practice. I need some guidance
val rdd = df.rdd.map {
case Row(id: Int, name: String, eid: String, color: String) => ((eid),List((id, name, eid, color)))
}.reduceByKey(_ ++ _)
val result = rdd.map({
case (key, list) => {
val red = list.count(p => p._4.equals("red"))
val yellow = list.count(p => p._4.equals("yellow"))
val green = list.count(p => p._4.equals("green"))
val newList = list.map(x => (x._2, red, yellow, green))
(key, newList.take(1))
}
}).flatMap {
case ((eid), list) =>
list.map {
case (name, red, yellow, green) =>
(eid, name, red, yellow, green)
}
}
import SparkConfig.sc.sqlContext.implicits._
val rDf = result.toDF("eid", "name", "red", "yellow", "green");
rDf.show()