Использовать boolean indexing
с фильтрацией по isin
:
out = df.loc[df['col1'].isin(anyOfThese), 'gr'].unique()
Или проверьте членство по numpy.in1d
:
out = df.loc[np.in1d(df['col1'], anyOfThese), 'gr'].unique()
Задержка
np.random.seed(218)
gr = []
for i in range(12000):
gr.extend([i] * 2)
np.random.seed(0)
df = pd.DataFrame({'gr': gr,
'col1': np.random.choice(200, 24000)})
anyOfThese = np.array([50, 60]) #randomly chosen
a = df[df.groupby('gr')['col1'].transform(lambda x: np.any(np.in1d(np.array(x), anyOfThese))).astype(bool)].gr.unique()
out = df.loc[df['col1'].isin(anyOfThese), 'gr'].unique()
print ((a == out).all())
True
In [314]: %timeit df[df.groupby('gr')['col1'].transform(lambda x: np.any(np.in1d(np.array(x), anyOfThese))).astype(bool)].gr.unique()
2.9 s ± 79.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [315]: %timeit df.loc[df['col1'].isin(anyOfThese), 'gr'].unique()
746 µs ± 32.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [316]: %timeit df.loc[np.in1d(df['col1'], anyOfThese), 'gr'].unique()
325 µs ± 14.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)