Вот одно из решений:
Input Data:
+---+---+---+---+---+---+---+---+---+
|a1 |b1 |c1 |d1 |e1 |a2 |b2 |c2 |d2 |
+---+---+---+---+---+---+---+---+---+
|24 |74 |74 |21 |66 |65 |100|27 |19 |
+---+---+---+---+---+---+---+---+---+
Сжатие столбцов для удаления несоответствующих столбцов:
val oneCols = data.schema.filter(_.name.contains("1")).map(x => x.name).sorted
val twoCols = data.schema.filter(_.name.contains("2")).map(x => x.name).sorted
val cols = oneCols.zip(twoCols)
//cols: Seq[(String, String)] = List((a1,a2), (b1,b2), (c1,c2), (d1,d2))
Использование функции foldLeft для динамического добавления столбцов:
import org.apache.spark.sql.functions._
val result = cols.foldLeft(data)((data,c) => data.withColumn(s"Diff_${c._1}",
(col(s"${lit(c._2)}") - col(s"${lit(c._1)}"))/col(s"${lit(c._2)}")))
Вот результат:
result.show(false)
+---+---+---+---+---+---+---+---+---+------------------+-------+-------------------+--------------------+
|a1 |b1 |c1 |d1 |e1 |a2 |b2 |c2 |d2 |Diff_a1 |Diff_b1|Diff_c1 |Diff_d1 |
+---+---+---+---+---+---+---+---+---+------------------+-------+-------------------+--------------------+
|24 |74 |74 |21 |66 |65 |100|27 |19 |0.6307692307692307|0.26 |-1.7407407407407407|-0.10526315789473684|
+---+---+---+---+---+---+---+---+---+------------------+-------+-------------------+--------------------+