У меня есть набор данных с данными прогноза, которые неоднозначно обновляются в течение 42 часов. Вот пример:
df_old = pd.DataFrame({'IssueDatetime': ['2010-01-01 09:00:00', '2010-01-01 09:00:00', '2010-01-01 09:00:00','2010-01-01 09:00:00','2010-01-01 09:00:00'],
'endtime':['2010-01-03 03:00:00','2010-01-03 03:00:00','2010-01-03 03:00:00','2010-01-03 03:00:00','2010-01-03 03:00:00'],
'Regions': ['EAST COAST-CAPE ST FRANCIS AND SOUTH', 'EAST COAST-CAPE ST FRANCIS AND SOUTH', 'EAST COAST-CAPE ST FRANCIS AND SOUTH','NORTHEAST COAST','NORTHEAST COAST'],
'forecastTime': ['2010-01-01 09:00:00','2010-01-01 15:00:00','2010-01-01 19:00:00','2010-01-01 09:00:00','2010-01-01 12:00:00'],
'forecast_Dir':[150,180,45,45,45],
'windSpeed':[20,90,35,45,15]})
Проблема заключается в промежутках между часами df ['predTime'] и df ['endtime]. Я попытался использовать свои ограниченные pandas знания для группировки и повторной выборки данных, но, поскольку даты повторяются, я не могу получить индекс datetime.
В конечном итоге моя цель - расширить фрейм данных, чтобы часы между ними исходные часы в кадре данных имеют собственные строки вплоть до конечного периода ...
Пример желаемого результата:
df_new = pd.DataFrame({'IssueDatetime': [ '2010-01-01 09:00:00', '2010-01-01 09:00:00', '2010-01-01 09:00:00', '2010-01-01 09:00:00', '2010-01-01 09:00:00', '2010-01-01 09:00:00','2010-01-01 09:00:00'],
'endtime':['2010-01-03 03:00:00','2010-01-03 03:00:00','2010-01-03 03:00:00','2010-01-03 03:00:00','2010-01-03 03:00:00','2010-01-03 03:00:00','2010-01-03 03:00:00'],
'Regions': ['EAST COAST-CAPE ST FRANCIS AND SOUTH', 'EAST COAST-CAPE ST FRANCIS AND SOUTH','EAST COAST-CAPE ST FRANCIS AND SOUTH','EAST COAST-CAPE ST FRANCIS AND SOUTH','EAST COAST-CAPE ST FRANCIS AND SOUTH','EAST COAST-CAPE ST FRANCIS AND SOUTH','EAST COAST-CAPE ST FRANCIS AND SOUTH'],
'forecastTime': ['2010-01-01 09:00:00','2010-01-01 10:00:00','2010-01-01 11:00:00','2010-01-01 12:00:00','2010-01-01 13:00:00','2010-01-01 14:00:00','2010-01-01 15:00:00'],
'forecast_Dir':[150,150,150,150,150,150,180],
'windSpeed':[20,20,20,20,20,20,90]})
Примечание для первого региона, часы между df ['predTime'] = '2010-01-01 09:00:00' и df ['predTime'] = '2010-01-01 15:00:00' должны быть отдельными строками. По сути, я ищу увеличение частоты дискретизации, чтобы заполнить недостающие часы.
РЕДАКТИРОВАТЬ: - Исходный фрейм данных
IssueDatetime endtime \
0 2013-01-01 09:00:00 2013-01-03 03:00:00
1 2013-01-01 09:00:00 2013-01-03 03:00:00
2 2013-01-01 09:00:00 2013-01-03 03:00:00
3 2013-01-01 09:00:00 2013-01-03 03:00:00
4 2013-01-01 09:00:00 2013-01-03 03:00:00
... ... ...
53585 2016-12-30 09:00:00 2017-01-01 03:00:00
53586 2016-12-30 09:00:00 2017-01-01 03:00:00
53587 2016-12-30 09:00:00 2017-01-01 03:00:00
53588 2016-12-30 09:00:00 2017-01-01 03:00:00
53589 2016-12-30 09:00:00 2017-01-01 03:00:00
Regions forecastTime \
0 SOUTH COAST 2013-01-01 09:00:00
1 SOUTH COAST 2013-01-01 18:00:00
2 SOUTH COAST 2013-01-02 06:00:00
3 SOUTH COAST 2013-01-02 13:00:00
4 EAST COAST-CAPE ST FRANCIS AND SOUTH 2013-01-01 09:00:00
... ... ...
53585 SOUTHWESTERN GRAND BANKS 2016-12-30 18:00:00
53586 SOUTHWESTERN GRAND BANKS 2016-12-31 09:00:00
53587 SOUTHWESTERN GRAND BANKS 2016-12-31 15:00:00
53588 SOUTHWESTERN GRAND BANKS 2016-12-31 18:00:00
53589 SOUTHWESTERN GRAND BANKS 2017-01-01 00:00:00
forecastHour forecast_Dir forecast_WindSpeed_low \
0 0.0 270 35
1 9.0 270 25
2 21.0 225 15
3 28.0 270 35
4 0.0 270 35
... ... ... ...
53585 9.0 135 40
53586 24.0 135 40
53587 30.0 135 40
53588 33.0 315 25
53589 39.0 315 25
forecast_WindSpeed_gust forecast_WindSpeed_high \
0 None None
1 None None
2 None None
3 None None
4 None None
... ... ...
53585 None 50
53586 None 50
53587 None 50
53588 None 35
53589 None None
forecast_WindSpeed_exception_1_type forecast_Dir_exception_1 \
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN
... ... ...
53585 NaN NaN
53586 OVER NORTHWESTERN SECTIONS 315
53587 NaN NaN
53588 NaN NaN
53589 NaN NaN
forecast_WindSpeed_low_exception_1 forecast_WindSpeed_high_exception_1
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN
... ... ...
53585 NaN NaN
53586 25 None
53587 NaN NaN
53588 NaN NaN
53589 NaN NaN