У меня есть rdd с типом RDD [String], например, вот его часть как таковая:
1990,1990-07-08
1994,1994-06-18
1994,1994-06-18
1994,1994-06-22
1994,1994-06-22
1994,1994-06-26
1994,1994-06-26
1954,1954-06-20
2002,2002-06-26
1954,1954-06-23
2002,2002-06-29
1954,1954-06-16
2002,2002-06-30
...
результат: (1982,52) (2006,64) (1962,32) (1966,32) (1986,52) (2002,64) (1994,52) (1974,38) (1990,52) (2010,64) (1978,38) (1954,26) (2014,64) (1958,35) (1998,64) (1970,32)
I group it nicely, but my problem is this v.size part, I do not know to to calculate that length.
Just to put it in perspective, here are expected results:
It is not a mistake that there is two times for 2002. But ignore that.