Если вы можете жить с такой структурой каталогов, как
tablename/date=2019-08-12
tablename/date=2019-08-13
вместо этого, тогда DataFrameWriter.partitionBy
добивается цели. Например
val df =
Seq((Timestamp.valueOf("2019-06-01 12:00:00"), 1),
(Timestamp.valueOf("2019-06-01 12:00:01"), 2),
(Timestamp.valueOf("2019-06-02 12:00:00"), 3)).toDF("time", "foo")
df.withColumn("date", to_date($"time"))
.write
.partitionBy("date")
.format("avro")
.save("/tmp/foo")
дает следующую структуру
find /tmp/foo
/tmp/foo
/tmp/foo/._SUCCESS.crc
/tmp/foo/date=2019-06-01
/tmp/foo/date=2019-06-01/.part-00000-2a7a63f2-7038-4aec-8f76-87077f91a415.c000.avro.crc
/tmp/foo/date=2019-06-01/part-00000-2a7a63f2-7038-4aec-8f76-87077f91a415.c000.avro
/tmp/foo/date=2019-06-01/.part-00001-2a7a63f2-7038-4aec-8f76-87077f91a415.c000.avro.crc
/tmp/foo/date=2019-06-01/part-00001-2a7a63f2-7038-4aec-8f76-87077f91a415.c000.avro
/tmp/foo/_SUCCESS
/tmp/foo/date=2019-06-02
/tmp/foo/date=2019-06-02/part-00002-2a7a63f2-7038-4aec-8f76-87077f91a415.c000.avro
/tmp/foo/date=2019-06-02/.part-00002-2a7a63f2-7038-4aec-8f76-87077f91a415.c000.avro.crc