Нахождение суммы в искре Scala фрейм данных с разделенными значениями - PullRequest
0 голосов
/ 05 августа 2020

У меня с собой приведенный ниже фрейм данных.

val df1=Seq(
("1_2_3","5_10"),
("4_5_6","15_20")
)toDF("c1","c2")

+-----+-----+
|   c1|   c2|
+-----+-----+
|1_2_3| 5_10|
|4_5_6|15_20|
+-----+-----+

Как получить сумму в отдельном столбце на основе условия -

-Пропустить третье значение после разделителя - '_' в первом столбце. -добавление первого значения каждого столбца ie, опускание '_3' и '_6' в 1_2_3 и 4_5_6, а затем добавление 1,5 и 2,10. Также добавляем 15 + 4 и 20 + 5.

Ожидаемый результат -

+-----+-----+-----+
|   c1|   c2|  res|
+-----+-----+-----+
|1_2_3| 5_10| 6_12|
|4_5_6|15_20|19_25|
+-----+-----+-----+

1 Ответ

1 голос
/ 05 августа 2020

Попробуйте -

zip_with + split

  val df1=Seq(
      ("1_2_3","5_10"),
      ("4_5_6","15_20")
    )toDF("c1","c2")
    df1.show(false)

    df1.withColumn("res",
      expr("concat_ws('_', zip_with(split(c1, '_'), split(c2, '_'), (x, y) -> cast(x+y as int)))"))
      .show(false)

    /**
      * +-----+-----+-----+
      * |c1   |c2   |res  |
      * +-----+-----+-----+
      * |1_2_3|5_10 |6_12 |
      * |4_5_6|15_20|19_25|
      * +-----+-----+-----+
      */

update dynamically for 50 columns

 val end = 51 // 50 cols
    val df = spark.sql("select '1_2_3' as c1")
    val new_df = Range(2, end).foldLeft(df){(df, i) => df.withColumn(s"c$i", $"c1")}
    new_df.show(false)
    /**
      * +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
      * |c1   |c2   |c3   |c4   |c5   |c6   |c7   |c8   |c9   |c10  |c11  |c12  |c13  |c14  |c15  |c16  |c17  |c18  |c19  |c20  |c21  |c22  |c23  |c24  |c25  |c26  |c27  |c28  |c29  |c30  |c31  |c32  |c33  |c34  |c35  |c36  |c37  |c38  |c39  |c40  |c41  |c42  |c43  |c44  |c45  |c46  |c47  |c48  |c49  |c50  |
      * +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
      * |1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|
      * +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
      */
    val res = new_df.withColumn("res", $"c1")
    Range(2, end).foldLeft(res){(df4, i) =>
      df4.withColumn("res",
        expr(s"concat_ws('_', zip_with(split(res, '_'), split(${s"c$i"}, '_'), (x, y) -> cast(x+y as int)))"))
    }
      .show(false)
    /**
      * +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+----------+
      * |c1   |c2   |c3   |c4   |c5   |c6   |c7   |c8   |c9   |c10  |c11  |c12  |c13  |c14  |c15  |c16  |c17  |c18  |c19  |c20  |c21  |c22  |c23  |c24  |c25  |c26  |c27  |c28  |c29  |c30  |c31  |c32  |c33  |c34  |c35  |c36  |c37  |c38  |c39  |c40  |c41  |c42  |c43  |c44  |c45  |c46  |c47  |c48  |c49  |c50  |res       |
      * +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+----------+
      * |1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|50_100_150|
      * +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+----------+
      */
...