У меня есть фреймворк с большим количеством данных об окружающей среде, который выглядит примерно так.
import pandas as pd
import numpy as np
df = pd.DataFrame({'temperature' : np.random.uniform(5,15, 500), 'Precipitation' : np.random.uniform(0, 3, 500)}, index=pd.date_range('1/1/2020', periods=500, freq='H'))
df['TempUnit'] = 'celsius'
df['PrecipUnit'] = 'mm'
data:image/s3,"s3://crabby-images/f4f5f/f4f5fa03c3fdcd029f18c37835a3eb791c3a84f7" alt="enter image description here"
I want to create new columns from this data which calculates 6 hourly average of temperature, and 6 hourly sums of precipitation. I am using the following method:
df['rolling_sum_by_time'] = df.groupby(df.index.time)['Precipitation'].apply(lambda x: x.rolling('6h').sum())
This is not summing the data - it gives the output below, I am not sure where I am going wrong here.
data:image/s3,"s3://crabby-images/ba132/ba132849b1911ab74a311262a1cef885df70e593" alt="enter image description here"
I have also tried creating a new dataframe using groupby
like so:
temp_6h = df.groupby('temperature').rolling('6H').mean()
which screws up the dates and puts them out of order (not what I want):
data:image/s3,"s3://crabby-images/cd77b/cd77b39d7412b1ae94a0a85074294df93766ca8e" alt="enter image description here"
Ideally what I am looking for is a dataframe which looks like this (below), where every six hours the values in the previous 6 hours are summed (for precipitation) and averaged (for temperature). Ideally I would like this to work at specified 6 hour intervals, i.e. at 0600, 1200, 1800, and 2400 hours.
введите описание изображения здесь
Буду очень признателен за помощь! Спасибо.