Создайте скользящую сумму и среднее значение различных переменных в pandas кадре данных - PullRequest
0 голосов
/ 14 июля 2020

У меня есть фреймворк с большим количеством данных об окружающей среде, который выглядит примерно так.

import pandas as pd
import numpy as np

df = pd.DataFrame({'temperature' : np.random.uniform(5,15, 500), 'Precipitation' : np.random.uniform(0, 3, 500)}, index=pd.date_range('1/1/2020', periods=500, freq='H'))
df['TempUnit'] = 'celsius'
df['PrecipUnit'] = 'mm'

enter image description here

I want to create new columns from this data which calculates 6 hourly average of temperature, and 6 hourly sums of precipitation. I am using the following method:

df['rolling_sum_by_time'] = df.groupby(df.index.time)['Precipitation'].apply(lambda x: x.rolling('6h').sum())

This is not summing the data - it gives the output below, I am not sure where I am going wrong here.
enter image description here

I have also tried creating a new dataframe using groupby like so:

temp_6h = df.groupby('temperature').rolling('6H').mean()

which screws up the dates and puts them out of order (not what I want):

enter image description here

Ideally what I am looking for is a dataframe which looks like this (below), where every six hours the values in the previous 6 hours are summed (for precipitation) and averaged (for temperature). Ideally I would like this to work at specified 6 hour intervals, i.e. at 0600, 1200, 1800, and 2400 hours.

введите описание изображения здесь

Буду очень признателен за помощь! Спасибо.

Ответы [ 2 ]

1 голос
/ 14 июля 2020

Вы можете сделать это таким же образом.

df['temp_avg'] = df.temperature.resample('6h',label = 'right', closed = 'right').mean()
df['precip_sum'] = df.Precipitation.resample('6h', label ='right', closed='right').sum()
0 голосов
/ 14 июля 2020

Попробуйте следующее:

mport pandas as pd
import numpy as np

df = pd.DataFrame({'temperature' : np.random.uniform(5,15, 500), 'Precipitation' : np.random.uniform(0, 3, 500)}, index=pd.date_range('1/1/2020', periods=500, freq='H'))
df['TempUnit'] = 'celsius'
df['PrecipUnit'] = 'mm'

rolling_sum_by_time = []
for x in range(0,len(df["Precipitation"])):
    P = [df["Precipitation"][x-2+i] for i in range(6) if x-2+i >= 0 and x-2+i < len(df["Precipitation"])]
    rolling_sum_by_time.append(sum(P))
df['rolling_sum_by_time'] = rolling_sum_by_time

rolling_sum_by_time = []
for x in range(0,len(df["Precipitation"])):
    P = [df["Precipitation"][x-2+i] for i in range(6) if x-2+i >= 0 and x-2+i < len(df["Precipitation"])]
    rolling_sum_by_time.append(sum(P)/len(P))
df['rolling_sum_by_time'] = rolling_sum_by_time

temp_6h = []
for x in range(0,len(df["temperature"])):
    P = [df["temperature"][x-2+i] for i in range(6) if x-2+i >= 0 and x-2+i < len(df["temperature"])]
    temp_6h.append(sum(P))
df['temp_6h'] = temp_6h

...