Условие времени прерывания кадра данных в Pandas - PullRequest
0 голосов
/ 04 августа 2020

Ниже представлен фрейм данных (df), состоящий из разных магазинов и дат за период с января по август.

| datetime         | shop    | val |
|------------------|---------|-----|
| 04-07-2020 13:32 | ASSY#1  | 23  |
| 06-07-2020 07:25 | ASSY#1  | 22  |
| 06-07-2020 21:26 | BODY#1  | 22  |
| 07-07-2020 15:22 | ASSY#1  | 20  |
| 07-07-2020 19:55 | PAINT#1 | 22  |
| 07-07-2020 16:55 | ETM#1   | 60  |

Вывод: новый столбец 'break'

| datetime         | shop    | val |  break |
|------------------|---------|-----|--------|
| 04-07-2020 13:32 | ASSY#1  | 23  | Tea    |
| 06-07-2020 07:25 | ASSY#1  | 22  | Normal |
| 06-07-2020 21:26 | BODY#1  | 22  | Normal |
| 07-07-2020 15:22 | ASSY#1  | 20  | Normal |
| 07-07-2020 19:55 | PAINT#1 | 22  | Normal |
| 07-07-2020 16:55 | ETM#1   | 60  | Normal |

Условие для проверки, у меня есть несколько таких магазинов Если магазин - «Сборка №1» и если «datetime» попадает в категорию перерыва

| Break Category | Body Shop#1  | Paint#1 Shop | Assy#1 Shop  |
|----------------|--------------|--------------|--------------|
| Tea            | 8.53 ~9.00   | 8.53 ~9.00   | 8.53 ~9.00   |
| Tea            | 13.30 ~13.37 | 13.30 ~13.37 | 13.30 ~13.37 |
| Tea            | 17.23 ~17.30 | 17.23 ~17.30 | 17.23 ~17.30 |
| Tea            | 22.30 ~22.37 | 22.30 ~22.37 | 22.30 ~22.37 |
| Lunch          | 11.00 ~11.30 | 11.15 ~11.45 | 11.30 ~12.00 |
| Dinner         | 20.00 ~20.30 | 20.15 ~20.45 | 20.30 ~21.00 |
| Supper         | 02.20 ~02.40 | 02.40 ~3.00  | 02.40 ~03.00 |
| Tea            | 05.00 ~05.17 | 05.00 ~05.17 | 05.00 ~05.17 |

Мой код

df['break'] = np.where((df['SHOP']= 'ASSY#1')  & (df['TIME'] >= '08:53:00') & (df['TIME'] <= '08:59:59'), 'Tea',
                 np.where((df['SHOP']= 'ASSY#1')  & (df['TIME'] >= '13:30:00') & (df['TIME'] <= '13:36:59'), 'Tea',
                 np.where((df['SHOP']= 'ASSY#1')  & (df['TIME'] >= '17:23:00') & (df['TIME'] <= '17:29:59'), 'Tea',
                 np.where((df['SHOP']= 'ASSY#1')  & (df['TIME'] >= '22:30:00') & (df['TIME'] <= '22:36:59'), 'Tea',
                 np.where((df['SHOP']= 'ASSY#1')  & (df['TIME'] >= '11:30:00') & (df['TIME'] <= '11:59:59'), 'Lunch',
                 np.where((df['SHOP']= 'ASSY#1')  & (df['TIME'] >= '20:30:00') & (df['TIME'] <= '20:59:59'), 'Dinner',
                 np.where((df['SHOP']= 'ASSY#1')  & (df['TIME'] >= '02:40:00') & (df['TIME'] <= '02:59:59'), 'Supper',
                 np.where((df['SHOP']= 'ASSY#1')  & (df['TIME'] >= '05:00:00') & (df['TIME'] <= '05:16:59'), 'Tea',
                 
                 np.where((df['SHOP']= 'PAINT#1') & (df['TIME'] >= '08:53:00') & (df['TIME'] <= '08:59:59'), 'Tea',                 
                 np.where((df['SHOP']= 'PAINT#1')  & (df['TIME'] >= '13:30:00') & (df['TIME'] <= '13:36:59'), 'Tea',
                 np.where((df['SHOP']= 'PAINT#1')  & (df['TIME'] >= '17:23:00') & (df['TIME'] <= '17:29:59'), 'Tea',
                 np.where((df['SHOP']= 'PAINT#1')  & (df['TIME'] >= '22:30:00') & (df['TIME'] <= '22:36:59'), 'Tea',
                 np.where((df['SHOP']= 'PAINT#1')  & (df['TIME'] >= '11:15:00') & (df['TIME'] <= '11:44:59'), 'Lunch',
                 np.where((df['SHOP']= 'PAINT#1')  & (df['TIME'] >= '20:15:00') & (df['TIME'] <= '20:44:59'), 'Dinner',
                 np.where((df['SHOP']= 'PAINT#1')  & (df['TIME'] >= '02:40:00') & (df['TIME'] <= '02:59:59'), 'Supper',
                 np.where((df['SHOP']= 'PAINT#1')  & (df['TIME'] >= '05:00:00') & (df['TIME'] <= '05:16:59'), 'Tea',
                 
                 np.where((df['SHOP']= 'BODY#1')  & (df['TIME'] >= '08:53:00') & (df['TIME'] <= '08:59:59'), 'Tea',
                 np.where((df['SHOP']= 'BODY#1')  & (df['TIME'] >= '13:30:00') & (df['TIME'] <= '13:36:59'), 'Tea',
                 np.where((df['SHOP']= 'BODY#1')  & (df['TIME'] >= '17:23:00') & (df['TIME'] <= '17:29:59'), 'Tea',
                 np.where((df['SHOP']= 'BODY#1')  & (df['TIME'] >= '22:30:00') & (df['TIME'] <= '22:36:59'), 'Tea',
                 np.where((df['SHOP']= 'BODY#1')  & (df['TIME'] >= '11:00:00') & (df['TIME'] <= '11:29:59'), 'Lunch',
                 np.where((df['SHOP']= 'BODY#1')  & (df['TIME'] >= '20:00:00') & (df['TIME'] <= '20:29:59'), 'Dinner',
                 np.where((df['SHOP']= 'BODY#1')  & (df['TIME'] >= '02:20:00') & (df['TIME'] <= '02:39:59'), 'Supper',
                 np.where((df['SHOP']= 'BODY#1')  & (df['TIME'] >= '05:00:00') & (df['TIME'] <= '05:16:59'), 'Tea', 'Normal'))

Это лучший метод или есть какой-либо другой эффективный метод?

1 Ответ

1 голос
/ 05 августа 2020

Вот способ сделать это. Для анализа часов требуется некоторая работа - см. Ниже:

df.datetime = pd.to_datetime(df.datetime)
df["hour"] = df.datetime.dt.strftime("%H.%M")

intervals = times_df.melt(id_vars="Break_Category", value_name="interval", var_name="shop")
intervals[["start", "end"]] = intervals.interval.str.split("~", expand=True)
intervals.start = intervals.start.str.strip()
intervals.end = intervals.end.str.strip()

# Parser the shop names and make them upper case to match the main dataframe. 
intervals.shop = intervals.shop.str.extract(r"(.*)_Shop")
intervals.shop = intervals.shop.str.upper()

# Some of the break times are in formats like 8:53. Should be transformed
# To 08.53
intervals.loc[intervals.start.str.len() == 4, "start"] = "0" + intervals.start
intervals.loc[intervals.end.str.len() == 4, "end"] = "0" + intervals.end

df = pd.merge(df, intervals, on="shop", how = "left")
df["break"] = ""

# The main logic - check if the time of sale is during a break. 

# in this line, rows that are part of a break get the name of that 
# break. E.g., if the tea break is between 1 and 2 PM, and the sale 
# took place on 1:30, the value of "break" for that row would become 
# "Tea break"
df.loc[(df["start"] < df.hour) & (df["end"] > df.hour), "break"] = df.Break_Category

res = pd.DataFrame(df.groupby(["datetime", "shop", "val"])["break"].max())
# (If 'val' can be none, than group by datetime and shop only. )

res.loc[res["break"] == "", "break"] = "Normal"

Результат:

                                  break
datetime            shop    val        
2020-04-07 13:32:00 ASSY#1  23      Tea
2020-06-07 07:25:00 ASSY#1  22   Normal
2020-06-07 21:26:00 BODY#1  22   Normal
2020-07-07 15:22:00 ASSY#1  20   Normal
2020-07-07 16:55:00 ETM#1   60   Normal
2020-07-07 19:55:00 PAINT#1 22   Normal
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...