Question

Предположим, у меня есть фрейм данных, один столбец которого представляет собой список (с неизвестными значениями и длиной), например:

df = pd.DataFrame(
 {'messageLabels': [['Good', 'Other', 'Bad'],['Bad','Terrible']]}
)

Я столкнулся с этим решением, но это не то, что я ищу. Как лучше всего извлечь столбец Pandas, содержащий списки или кортежи, в несколько столбцов

теоретически полученный df будет выглядеть так:

messageLabels             | Good| Other| Bad| Terrible
--------------------------------------------------------
['Good', 'Other', 'Bad']  | True| True |True| False
--------------------------------------------------------
['Bad','Terrible']        |False|False |True| True

См. Выше

piRSquared · Answer 1 · 24 мая 2019

Succint

df.join(df.messageLabels.str.join('|').str.get_dummies().astype(bool))

        messageLabels   Bad   Good  Other  Terrible
0  [Good, Other, Bad]  True   True   True     False
1     [Bad, Terrible]  True  False  False      True

`sklearn`

from sklearn.preprocessing import MultiLabelBinarizer

mlb = MultiLabelBinarizer()
dum = mlb.fit_transform(df.messageLabels)

df.join(pd.DataFrame(dum.astype(bool), df.index, mlb.classes_))

        messageLabels   Bad   Good  Other  Terrible
0  [Good, Other, Bad]  True   True   True     False
1     [Bad, Terrible]  True  False  False      True

пережаренное

n = len(df)
i = np.arange(n)
l = [*map(len, df.messageLabels)]
j, u = pd.factorize(np.concatenate(df.messageLabels))

o = np.zeros((n, len(u)), bool)
o[i.repeat(l), j] = True

df.join(pd.DataFrame(o, df.index, u))

        messageLabels   Good  Other   Bad  Terrible
0  [Good, Other, Bad]   True   True  True     False
1     [Bad, Terrible]  False  False  True      True

Возиться

И вдохновлять Энди

df.join(pd.DataFrame([dict.fromkeys(x, True) for x in df.messageLabels]).fillna(False))

        messageLabels   Bad   Good  Other  Terrible
0  [Good, Other, Bad]  True   True   True     False
1     [Bad, Terrible]  True  False  False      True

Andy Hayden · Answer 2 · 24 мая 2019

Другой способ - использовать apply и конструктор Series:

In [11]: pd.get_dummies(df.messageLabels.apply(lambda x: pd.Series(1, x)) == 1)
Out[11]:
    Good  Other   Bad  Terrible
0   True   True  True     False
1  False  False  True      True

, где

In [12]: df.messageLabels.apply(lambda x: pd.Series(1, x))
Out[12]:
   Good  Other  Bad  Terrible
0   1.0    1.0  1.0       NaN
1   NaN    NaN  1.0       1.0

Чтобы получить желаемый результат:

In [21]: res = pd.get_dummies(df.messageLabels.apply(lambda x: pd.Series(1, x)) == 1)

In [22]: df[res.columns] = res

In [23]: df
Out[23]:
        messageLabels   Good  Other   Bad  Terrible
0  [Good, Other, Bad]   True   True  True     False
1     [Bad, Terrible]  False  False  True      True

cs95 · Answer 3 · 24 мая 2019

Я бы сделал это, используя get_dummies и sum (или max, любой из них работает):

tmp = pd.DataFrame(df['messageLabels'].tolist())
pd.get_dummies(tmp, prefix='', prefix_sep='').max(level=0, axis=1).astype(bool)

    Bad   Good  Other  Terrible
0  True   True   True     False
1  True  False  False      True

Вы можете объединить это с df, используя join:

df.join(pd.get_dummies(tmp, prefix='', prefix_sep='')
          .max(level=0, axis=1)
          .astype(bool))

        messageLabels   Bad   Good  Other  Terrible
0  [Good, Other, Bad]  True   True   True     False
1     [Bad, Terrible]  True  False  False      True

Вы также можете stack и pivot_table:

(pd.DataFrame(df['messageLabels'].tolist())
   .stack()
   .reset_index()
   .pivot_table(index='level_0', columns=0, aggfunc='size', fill_value=0)
   .astype(bool))

0         Bad   Good  Other  Terrible
level_0                              
0        True   True   True     False
1        True  False  False      True

Как выполнить One Hot Encoding для списков в столбце панд?

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

Ответы [ 3 ]

Succint

`sklearn`

пережаренное

Возиться

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Как выполнить One Hot Encoding для списков в столбце панд?

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

Ответы [ 3 ]

Succint

sklearn

пережаренное

Возиться

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Нет похожих вопросов

`sklearn`