У меня есть фреймворк с большим количеством данных об окружающей среде, который выглядит примерно так.
import pandas as pd
import numpy as np
df = pd.DataFrame({'temperature' : np.random.uniform(5,15, 500), 'Precipitation' : np.random.uniform(0, 3, 500)}, index=pd.date_range('1/1/2020', periods=500, freq='H'))
df['TempUnit'] = 'celsius'
df['PrecipUnit'] = 'mm'
I want to create new columns from this data which calculates 6 hourly average of temperature, and 6 hourly sums of precipitation. I am using the following method:
df['rolling_sum_by_time'] = df.groupby(df.index.time)['Precipitation'].apply(lambda x: x.rolling('6h').sum())
This is not summing the data - it gives the output below, I am not sure where I am going wrong here.
I have also tried creating a new dataframe using groupby
like so:
temp_6h = df.groupby('temperature').rolling('6H').mean()
which screws up the dates and puts them out of order (not what I want):
Ideally what I am looking for is a dataframe which looks like this (below), where every six hours the values in the previous 6 hours are summed (for precipitation) and averaged (for temperature). Ideally I would like this to work at specified 6 hour intervals, i.e. at 0600, 1200, 1800, and 2400 hours.
введите описание изображения здесь
Буду очень признателен за помощь! Спасибо.