Воспроизведение наборов данных обзора пива kaggle
https://www.kaggle.com/rdoume/beerreviews
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1504037 entries, 1586613 to 39648
Data columns (total 13 columns):
brewery_id 1504037 non-null int64
brewery_name 1504037 non-null object
review_time 1504037 non-null int64
review_overall 1504037 non-null float64
review_aroma 1504037 non-null float64
review_appearance 1504037 non-null float64
review_profilename 1504037 non-null object
beer_style 1504037 non-null object
review_palate 1504037 non-null float64
review_taste 1504037 non-null float64
beer_name 1504037 non-null object
beer_abv 1504037 non-null float64
beer_beerid 1504037 non-null int64
dtypes: float64(6), int64(3), object(4)
memory usage: 160.6+ MB
Я только что сделал сводную таблицу и возвращает следующие результаты
review_stat_by_beer = df[['beer_name','review_overall','review_aroma','review_appearance','review_palate','review_taste']]\
.drop_duplicates(['beer_name'])\
.pivot_table(index="beer_name", aggfunc=("count",'mean','median'))
review_stat_by_beer.info()
<class 'pandas.core.frame.DataFrame'>
Index: 44075 entries, ! (Old Ale) to 葉山ビール (Hayama Beer)
Data columns (total 15 columns):
(review_appearance, count) 44075 non-null int64
(review_appearance, mean) 44075 non-null float64
(review_appearance, median) 44075 non-null float64
(review_aroma, count) 44075 non-null int64
(review_aroma, mean) 44075 non-null float64
(review_aroma, median) 44075 non-null float64
(review_overall, count) 44075 non-null int64
(review_overall, mean) 44075 non-null float64
(review_overall, median) 44075 non-null float64
(review_palate, count) 44075 non-null int64
(review_palate, mean) 44075 non-null float64
(review_palate, median) 44075 non-null float64
(review_taste, count) 44075 non-null int64
(review_taste, mean) 44075 non-null float64
(review_taste, median) 44075 non-null float64
dtypes: float64(10), int64(5)
memory usage: 5.4+ MB
Попытка выбрать эти столбцы
review_stat_by_beer.(review_appearance, count) # SyntaxError: invalid syntax
review_stat_by_beer[(review_appearance, count)] #NameError: name 'review_appearance' is not defined
review_stat_by_beer['(review_appearance, count)'] #KeyError: '(review_appearance, count)'
как выбрать эти результаты сводной таблицы? Моя конечная цель - сделать математику между 2 столбцами:
(review_overall, mean) minus (review_taste, mean)
Есть мысли? Спасибо!