Вы можете использовать input_file_name
вместе с reduce
и union
:
from pyspark.sql import functions as F
from functools import reduce
paths = ['first', 'second', 'third'] # your paths here
dataframes = [spark.read.parquet(path).withColumn(path, F.input_file_name()) for path in paths]
result = reduce(lambda x, y: x.union(y), dataframes)