Проверьте, помогает ли это -
1. Загрузите данные тестирования
val data =
"""
|2018-04-07 07:07:17
|2018-04-07 07:32:27
|2018-04-07 08:36:44
|2018-04-07 08:38:00
|2018-04-07 08:39:29
|2018-04-08 01:43:08
|2018-04-08 01:43:55
|2018-04-09 07:52:31
|2018-04-09 07:52:42
|2019-01-24 11:52:31
|2019-01-24 12:52:42
|2019-01-25 12:52:42
""".stripMargin
val df = spark.read
.schema(StructType(Array(StructField("startDate", DataTypes.TimestampType))))
.csv(data.split(System.lineSeparator()).toSeq.toDS())
df.show(false)
df.printSchema()
Output-
+-------------------+
|startDate |
+-------------------+
|2018-04-07 07:07:17|
|2018-04-07 07:32:27|
|2018-04-07 08:36:44|
|2018-04-07 08:38:00|
|2018-04-07 08:39:29|
|2018-04-08 01:43:08|
|2018-04-08 01:43:55|
|2018-04-09 07:52:31|
|2018-04-09 07:52:42|
|2019-01-24 11:52:31|
|2019-01-24 12:52:42|
|2019-01-25 12:52:42|
+-------------------+
root
|-- startDate: timestamp (nullable = true)
2. Создать столбец фильтра на основе current date
val filterCOl = (currentDate: String) => when(datediff(date_format(lit(currentDate), "yyyy-MM-dd")
,date_format(lit(currentDate), "yyyy-MM-01"))===lit(0),
date_format(col("startDate"), "yyyy-MM") ===
date_format(concat_ws("-",year(lit(currentDate)), month(lit(currentDate)) -1), "yyyy-MM")
).otherwise(to_date(col("startDate"))
.between(date_format(lit(currentDate), "yyyy-MM-01"), lit(currentDate)))
3. Проверить, когда текущие данные находятся в промежутке между месяцами
var currentDateStr = "2018-04-08"
df.filter(filterCOl(currentDateStr)).show(false)
Вывод -
+-------------------+
|startDate |
+-------------------+
|2018-04-07 07:07:17|
|2018-04-07 07:32:27|
|2018-04-07 08:36:44|
|2018-04-07 08:38:00|
|2018-04-07 08:39:29|
|2018-04-08 01:43:08|
|2018-04-08 01:43:55|
+-------------------+
4. Проверить, когда текущие данные - это первый день месяца
currentDateStr = "2018-05-01"
df.filter(filterCOl(currentDateStr)).show(false)
Выход-
+-------------------+
|startDate |
+-------------------+
|2018-04-07 07:07:17|
|2018-04-07 07:32:27|
|2018-04-07 08:36:44|
|2018-04-07 08:38:00|
|2018-04-07 08:39:29|
|2018-04-08 01:43:08|
|2018-04-08 01:43:55|
|2018-04-09 07:52:31|
|2018-04-09 07:52:42|
+-------------------+