Может кто-нибудь объяснить мне, почему это работает: (автономно)
numpy_data = np.array([[1, [{'id': 1495, 'name': 'fishing'}, {'id': 12392, 'name': 'best friend'}]],
[3, [{‘id’: 818, ‘name’: ‘based on novel’}, {‘id’: 10131, ‘name’: ‘interracial relationship’}]]])
df = pd.DataFrame(data=numpy_data, index=[“row1”, “row2"], columns=[“id”, “keywords_text”])
df[‘keywords_list’] = df[‘keywords_text’].apply(lambda column_value : ” “.join([sub[‘name’] for sub in column_value]))
df.head(20)
Вот результат команды head:
df is a <class 'pandas.core.frame.DataFrame'> datatype
id keywords_text keywords_list
==== ===== =================================================== ========================
row1 1 [{'id': 1495, 'name': 'fishing'}, {'id': 12392... fishing best friend
row2 3 [{'id': 818, 'name': 'based on novel'}, {'id':... based on novel interracial relationship
А этого нет: (это взят из набора данных Kaggle Movies, файла ключевых слов)
df_movie_keywords[‘keywords_list’] = df_movie_keywords[‘keywords’].apply(lambda column_value : ” “.join([sub[‘name’] for sub in column_value]))
Я получаю эту ошибку:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-1473-18a756783d63> in <module>
15
16 # df_movie_keywords['keywords_list'] = df_movie_keywords.apply(lambda row: string_all_keywords(row), axis=1)
---> 17 df_movie_keywords['keywords_list'] = df_movie_keywords['keywords'].apply(lambda column_value : " ".join([sub['name'] for sub in column_value]))
18
19 # df['keywords_list'] = df['keywords_text'].apply(lambda column_value : " ".join([sub['name'] for sub in column_value]))
~/opt/anaconda3/lib/python3.7/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
3846 else:
3847 values = self.astype(object).values
-> 3848 mapped = lib.map_infer(values, f, convert=convert_dtype)
3849
3850 if len(mapped) and isinstance(mapped[0], Series):
pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()
<ipython-input-1473-18a756783d63> in <lambda>(column_value)
15
16 # df_movie_keywords['keywords_list'] = df_movie_keywords.apply(lambda row: string_all_keywords(row), axis=1)
---> 17 df_movie_keywords['keywords_list'] = df_movie_keywords['keywords'].apply(lambda column_value : " ".join([sub['name'] for sub in column_value]))
18
19 # df['keywords_list'] = df['keywords_text'].apply(lambda column_value : " ".join([sub['name'] for sub in column_value]))
<ipython-input-1473-18a756783d63> in <listcomp>(.0)
15
16 # df_movie_keywords['keywords_list'] = df_movie_keywords.apply(lambda row: string_all_keywords(row), axis=1)
---> 17 df_movie_keywords['keywords_list'] = df_movie_keywords['keywords'].apply(lambda column_value : " ".join([sub['name'] for sub in column_value]))
18
19 # df['keywords_list'] = df['keywords_text'].apply(lambda column_value : " ".join([sub['name'] for sub in column_value]))
TypeError: string indices must be integers