Вот один подход в Spark> = 1.5 с использованием встроенной функции size
:
from pyspark.sql import Row
from pyspark.sql.functions import size
df = spark.createDataFrame([Row(a=[9, 3, 4], b=[8,9,10]),Row(a=[7, 2, 6, 4], b=[2,1,5]), Row(a=[7, 2, 4], b=[8,2,1,5]), Row(a=[2, 4], b=[8,2,10,12,20])])
df.where(size(df['a']) <= 3).show()
Выход:
+---------+------------------+
| a| b|
+---------+------------------+
|[9, 3, 4]| [8, 9, 10]|
|[7, 2, 4]| [8, 2, 1, 5]|
| [2, 4]|[8, 2, 10, 12, 20]|
+---------+------------------+