Используйте Series.reindex
по уникальному movieId
для формирования другого столбца, для того же порядка добавьте Series.sort_values
:
movies_data = pd.read_csv('ml-latest-small/movies.csv')
ratings_data = pd.read_csv('ml-latest-small/ratings.csv')
mov = movies_data['movieId'].sort_values().drop_duplicates()
df = ratings_data.groupby('movieId').rating.mean().reindex(mov).reset_index()
print (df)
movieId rating
0 1 3.920930
1 2 3.431818
2 3 3.259615
3 4 2.357143
4 5 3.071429
... ...
9737 193581 4.000000
9738 193583 3.500000
9739 193585 3.500000
9740 193587 3.500000
9741 193609 4.000000
[9742 rows x 2 columns]
df1 = df[df['rating'].isna()]
print (df1)
movieId rating
816 1076 NaN
2211 2939 NaN
2499 3338 NaN
2587 3456 NaN
3118 4194 NaN
4037 5721 NaN
4506 6668 NaN
4598 6849 NaN
4704 7020 NaN
5020 7792 NaN
5293 8765 NaN
5421 25855 NaN
5452 26085 NaN
5749 30892 NaN
5824 32160 NaN
5837 32371 NaN
5957 34482 NaN
7565 85565 NaN
РЕДАКТИРОВАТЬ:
Если нужен новый столбец для movie_data
DataFrame, используйте DataFrame.merge
с левым соединением:
movies_data = pd.read_csv('ml-latest-small/movies.csv')
ratings_data = pd.read_csv('ml-latest-small/ratings.csv')
df = ratings_data.groupby('movieId', as_index=False).rating.mean()
print (df)
movieId rating
0 1 3.920930
1 2 3.431818
2 3 3.259615
3 4 2.357143
4 5 3.071429
... ...
9719 193581 4.000000
9720 193583 3.500000
9721 193585 3.500000
9722 193587 3.500000
9723 193609 4.000000
[9724 rows x 2 columns]
df = movies_data.merge(df, on='movieId', how='left')
print (df)
movieId title \
0 1 Toy Story (1995)
1 2 Jumanji (1995)
2 3 Grumpier Old Men (1995)
3 4 Waiting to Exhale (1995)
4 5 Father of the Bride Part II (1995)
... ...
9737 193581 Black Butler: Book of the Atlantic (2017)
9738 193583 No Game No Life: Zero (2017)
9739 193585 Flint (2017)
9740 193587 Bungo Stray Dogs: Dead Apple (2018)
9741 193609 Andrew Dice Clay: Dice Rules (1991)
genres rating
0 Adventure|Animation|Children|Comedy|Fantasy 3.920930
1 Adventure|Children|Fantasy 3.431818
2 Comedy|Romance 3.259615
3 Comedy|Drama|Romance 2.357143
4 Comedy 3.071429
... ...
9737 Action|Animation|Comedy|Fantasy 4.000000
9738 Animation|Comedy|Fantasy 3.500000
9739 Drama 3.500000
9740 Action|Animation 3.500000
9741 Comedy 4.000000
[9742 rows x 4 columns]