Pandas накопленная сумма с оконной функцией - PullRequest
0 голосов
/ 13 января 2020

Я использую набор данных tips от seaborn:

import pandas as pd
import seaborn as sns
tips = sns.load_dataset("tips")
tips['rowid'] = tips.index

Я хотел бы создать столбец, который будет представлять собой совокупное число людей, которые дали чаевые более 3, с фильтром мужчина и ужин. В число не должна входить текущая строка (ср. 1 PRECEDING в следующем запросе).

SQL эквивалент будет:

SELECT *, 
    SUM(CASE WHEN tip >= 3 AND sex='male' AND time='Dinner' THEN 1 ELSE NULL END) 
            OVER (PARTITION BY sex, time ORDER BY rowid ROWS BETWEEN unbounded PRECEDING AND 1 PRECEDING) as cnt
FROM tips
ORDER BY rowid ;

Как можно добиться того же с Pandas? Из того, что я прочитал, я мог бы использовать некоторые функции прокрутки и преобразования, но мне это не удалось.

Конечный кадр данных должен выглядеть следующим образом:

enter image description here

edit : фрагмент кадра данных, запрошенный ansev

    total_bill  tip sex smoker  day time    size    rowid   cnt
index                                   
0   16.99   1.01    Female  No  Sun Dinner  2   0   NaN
1   10.34   1.66    Male    No  Sun Dinner  3   1   NaN
2   21.01   3.50    Male    No  Sun Dinner  3   2   NaN
3   23.68   3.31    Male    No  Sun Dinner  2   3   1.0
4   24.59   3.61    Female  No  Sun Dinner  4   4   NaN
5   25.29   4.71    Male    No  Sun Dinner  4   5   2.0
6   8.77    2.00    Male    No  Sun Dinner  2   6   3.0
7   26.88   3.12    Male    No  Sun Dinner  4   7   3.0
8   15.04   1.96    Male    No  Sun Dinner  2   8   4.0
9   14.78   3.23    Male    No  Sun Dinner  2   9   4.0
10  10.27   1.71    Male    No  Sun Dinner  2   10  5.0
11  35.26   5.00    Female  No  Sun Dinner  4   11  NaN
12  15.42   1.57    Male    No  Sun Dinner  2   12  5.0
13  18.43   3.00    Male    No  Sun Dinner  4   13  5.0
14  14.83   3.02    Female  No  Sun Dinner  2   14  NaN
15  21.58   3.92    Male    No  Sun Dinner  2   15  6.0
16  10.33   1.67    Female  No  Sun Dinner  3   16  NaN
17  16.29   3.71    Male    No  Sun Dinner  3   17  7.0
18  16.97   3.50    Female  No  Sun Dinner  3   18  NaN
19  20.65   3.35    Male    No  Sat Dinner  3   19  8.0
20  17.92   4.08    Male    No  Sat Dinner  2   20  9.0
21  20.29   2.75    Female  No  Sat Dinner  2   21  NaN
22  15.77   2.23    Female  No  Sat Dinner  2   22  NaN
23  39.42   7.58    Male    No  Sat Dinner  4   23  10.0
24  19.82   3.18    Male    No  Sat Dinner  2   24  11.0
25  17.81   2.34    Male    No  Sat Dinner  4   25  12.0
26  13.37   2.00    Male    No  Sat Dinner  2   26  12.0
27  12.69   2.00    Male    No  Sat Dinner  2   27  12.0
28  21.70   4.30    Male    No  Sat Dinner  2   28  12.0
29  19.65   3.00    Female  No  Sat Dinner  2   29  NaN

1 Ответ

1 голос
/ 13 января 2020

Я думаю, вам нужно

df['cnt'] = ( df.loc[df['sex'].eq('Male') & df['time'].eq('Dinner'),'tip']
                .ge(3)
                .cumsum()
                .shift() )

# if not ordered
#df['cnt'] = ( df.sort_values('rowid')
#                .loc[df['sex'].eq('Male') & df['time'].eq('Dinner'),'tip']
#                .ge(3)
#                .cumsum()
#                .shift() )

ОБНОВЛЕНИЕ

df['cnt']=( df.loc[df['sex'].eq('Male') & df['time'].eq('Dinner'),'tip']
              .ge(3)
              .cumsum()
              .shift()
              .where(lambda x: x.gt(0))
            )

#       total_bill   tip     sex smoker  day    time  size  rowid   cnt
#index                                                                 
#0           16.99  1.01  Female     No  Sun  Dinner     2      0   NaN
#1           10.34  1.66    Male     No  Sun  Dinner     3      1   NaN
#2           21.01  3.50    Male     No  Sun  Dinner     3      2   NaN
#3           23.68  3.31    Male     No  Sun  Dinner     2      3   1.0
#4           24.59  3.61  Female     No  Sun  Dinner     4      4   NaN
#5           25.29  4.71    Male     No  Sun  Dinner     4      5   2.0
#6            8.77  2.00    Male     No  Sun  Dinner     2      6   3.0
#7           26.88  3.12    Male     No  Sun  Dinner     4      7   3.0
#8           15.04  1.96    Male     No  Sun  Dinner     2      8   4.0
#9           14.78  3.23    Male     No  Sun  Dinner     2      9   4.0
#10          10.27  1.71    Male     No  Sun  Dinner     2     10   5.0
#11          35.26  5.00  Female     No  Sun  Dinner     4     11   NaN
#12          15.42  1.57    Male     No  Sun  Dinner     2     12   5.0
#13          18.43  3.00    Male     No  Sun  Dinner     4     13   5.0
#14          14.83  3.02  Female     No  Sun  Dinner     2     14   NaN
#15          21.58  3.92    Male     No  Sun  Dinner     2     15   6.0
#16          10.33  1.67  Female     No  Sun  Dinner     3     16   NaN
#17          16.29  3.71    Male     No  Sun  Dinner     3     17   7.0
#18          16.97  3.50  Female     No  Sun  Dinner     3     18   NaN
#19          20.65  3.35    Male     No  Sat  Dinner     3     19   8.0
#20          17.92  4.08    Male     No  Sat  Dinner     2     20   9.0
#21          20.29  2.75  Female     No  Sat  Dinner     2     21   NaN
#22          15.77  2.23  Female     No  Sat  Dinner     2     22   NaN
#23          39.42  7.58    Male     No  Sat  Dinner     4     23  10.0
#24          19.82  3.18    Male     No  Sat  Dinner     2     24  11.0
#25          17.81  2.34    Male     No  Sat  Dinner     4     25  12.0
#26          13.37  2.00    Male     No  Sat  Dinner     2     26  12.0
#27          12.69  2.00    Male     No  Sat  Dinner     2     27  12.0
#28          21.70  4.30    Male     No  Sat  Dinner     2     28  12.0
#29          19.65  3.00  Female     No  Sat  Dinner     2     29   NaN
...