Интерполяция по датам не работает в Panda + многомерная интерполяция - PullRequest
0 голосов
/ 12 июня 2018

многомерная интерполяция с неработающим фреймом данных

import pandas as pd
import numpy as np
raw_data = {'CCY_CODE': ['SGD','USD','USD','USD','USD','USD','USD','EUR','EUR','EUR','EUR','EUR','EUR','USD'],
            'END_DATE': ['16/03/2018','17/03/2018','17/03/2018','17/03/2018','17/03/2018','17/03/2018','17/03/2018',
                        '17/03/2018','17/03/2018','17/03/2018','17/03/2018','17/03/2018','17/03/2018','17/03/2018'],
            'STRIKE':[0.005,0.01,0.015,0.02,0.025,0.03,0.035,0.04,0.045,0.05,0.55,0.06,0.065,0.07],
            'VOLATILITY':[np.nan,np.nan,0.3424,np.nan,0.2617,0.2414,np.nan,np.nan,0.215,0.212,0.2103,np.nan,0.2092,np.nan]
           }
df_volsurface = pd.DataFrame(raw_data,columns = ['CCY_CODE','END_DATE','STRIKE','VOLATILITY'])
df_volsurface['END_DATE'] = pd.to_datetime(df_volsurface['END_DATE'])
df_volsurface.interpolate(method='akima',limit_direction='both')

Вывод:

<table><tbody><tr><th> </th><th>CCY_CODE</th><th>END_DATE</th><th>STRIKE</th><th>VOLATILITY</th></tr><tr><td>0</td><td>SGD</td><td>3/16/2018</td><td>0.005</td><td>NaN</td></tr><tr><td>1</td><td>USD</td><td>3/17/2018</td><td>0.01</td><td>NaN</td></tr><tr><td>2</td><td>USD</td><td>3/17/2018</td><td>0.015</td><td>0.3424</td></tr><tr><td>3</td><td>USD</td><td>3/17/2018</td><td>0.02</td><td>0.296358</td></tr><tr><td>4</td><td>USD</td><td>3/17/2018</td><td>0.025</td><td>0.2617</td></tr><tr><td>5</td><td>USD</td><td>3/17/2018</td><td>0.03</td><td>0.2414</td></tr><tr><td>6</td><td>USD</td><td>3/17/2018</td><td>0.035</td><td>0.230295</td></tr><tr><td>7</td><td>EUR</td><td>3/17/2018</td><td>0.04</td><td>0.220911</td></tr><tr><td>8</td><td>EUR</td><td>3/17/2018</td><td>0.045</td><td>0.215</td></tr><tr><td>9</td><td>EUR</td><td>3/17/2018</td><td>0.05</td><td>0.212</td></tr><tr><td>10</td><td>EUR</td><td>3/17/2018</td><td>0.55</td><td>0.2103</td></tr><tr><td>11</td><td>EUR</td><td>3/17/2018</td><td>0.06</td><td>0.209471</td></tr><tr><td>12</td><td>EUR</td><td>3/17/2018</td><td>0.065</td><td>0.2092</td></tr><tr><td>13</td><td>USD</td><td>3/17/2018</td><td>0.07</td><td>NaN</td></tr></tbody></table>

Ожидаемый результат:

<table><tbody><tr><th> </th><th>CCY_CODE</th><th>END_DATE</th><th>STRIKE</th><th>VOLATILITY</th></tr><tr><td>0</td><td>SGD</td><td>3/16/2018</td><td>0.005</td><td>NaN</td></tr><tr><td>1</td><td>USD</td><td>3/17/2018</td><td>0.01</td><td>Expected some logical value</td></tr><tr><td>2</td><td>USD</td><td>3/17/2018</td><td>0.015</td><td>0.3424</td></tr><tr><td>3</td><td>USD</td><td>3/17/2018</td><td>0.02</td><td>0.296358</td></tr><tr><td>4</td><td>USD</td><td>3/17/2018</td><td>0.025</td><td>0.2617</td></tr><tr><td>5</td><td>USD</td><td>3/17/2018</td><td>0.03</td><td>0.2414</td></tr><tr><td>6</td><td>USD</td><td>3/17/2018</td><td>0.035</td><td>0.230295</td></tr><tr><td>7</td><td>EUR</td><td>3/17/2018</td><td>0.04</td><td>0.220911</td></tr><tr><td>8</td><td>EUR</td><td>3/17/2018</td><td>0.045</td><td>0.215</td></tr><tr><td>9</td><td>EUR</td><td>3/17/2018</td><td>0.05</td><td>0.212</td></tr><tr><td>10</td><td>EUR</td><td>3/17/2018</td><td>0.55</td><td>0.2103</td></tr><tr><td>11</td><td>EUR</td><td>3/17/2018</td><td>0.06</td><td>0.209471</td></tr><tr><td>12</td><td>EUR</td><td>3/17/2018</td><td>0.065</td><td>0.2092</td></tr><tr><td>13</td><td>USD</td><td>3/17/2018</td><td>0.07</td><td>Expected some logical value</td></tr></tbody></table>

Методы линейной интерполяции позволяют копировать последние доступные значения для всех пропущенных значений вперед и назад без учета ccy_code

df_volsurface.interpolate(method='linear',limit_direction='both')

Вывод:

<table><tbody><tr><th>CCY_CODE</th><th>END_DATE</th><th>STRIKE</th><th>VOLATILITY</th><th> </th></tr><tr><td>0</td><td>SGD</td><td>3/16/2018</td><td>0.005</td><td>0.3424</td></tr><tr><td>1</td><td>USD</td><td>3/17/2018</td><td>0.01</td><td>0.3424</td></tr><tr><td>2</td><td>USD</td><td>3/17/2018</td><td>0.015</td><td>0.3424</td></tr><tr><td>3</td><td>USD</td><td>3/17/2018</td><td>0.02</td><td>0.30205</td></tr><tr><td>4</td><td>USD</td><td>3/17/2018</td><td>0.025</td><td>0.2617</td></tr><tr><td>5</td><td>USD</td><td>3/17/2018</td><td>0.03</td><td>0.2414</td></tr><tr><td>6</td><td>USD</td><td>3/17/2018</td><td>0.035</td><td>0.2326</td></tr><tr><td>7</td><td>EUR</td><td>3/17/2018</td><td>0.04</td><td>0.2238</td></tr><tr><td>8</td><td>EUR</td><td>3/17/2018</td><td>0.045</td><td>0.215</td></tr><tr><td>9</td><td>EUR</td><td>3/17/2018</td><td>0.05</td><td>0.212</td></tr><tr><td>10</td><td>EUR</td><td>3/17/2018</td><td>0.55</td><td>0.2103</td></tr><tr><td>11</td><td>EUR</td><td>3/17/2018</td><td>0.06</td><td>0.20975</td></tr><tr><td>12</td><td>EUR</td><td>3/17/2018</td><td>0.065</td><td>0.2092</td></tr><tr><td>13</td><td>USD</td><td>3/17/2018</td><td>0.07</td><td>0.2092</td></tr></tbody></table>

Любая помощь приветствуется!Спасибо!

1 Ответ

0 голосов
/ 18 июня 2018

Я хотел бы отметить, что это все еще одномерная интерполяция.У нас есть одна независимая переменная ('STRIKE') и одна зависимая переменная ('VOLATILITY').Интерполяция выполняется для разных условий, например, для каждого дня, каждой валюты, каждого сценария и т. Д. Ниже приведен пример того, как интерполяция может выполняться на основе 'END_DATE' и 'CCY_CODE'.

# set all the conditions as index
df_volsurface.set_index(['END_DATE', 'CCY_CODE', 'STRIKE'], inplace=True)
df_volsurface.sort_index(level=['END_DATE', 'CCY_CODE', 'STRIKE'], inplace=True)
# create separate columns for all criteria except the independent variable
df_volsurface = df_volsurface.unstack(level=['END_DATE', 'CCY_CODE'])
for ccy in df_volsurface:
    indices = df_volsurface[ccy].notna()
    if not any(indices):
        continue  # we are not interested in a column with only NaN
    x = df_volsurface.index.get_level_values(level='STRIKE')  # independent var
    y = df_volsurface[ccy]  # dependent var
    # create interpolation function
    f_interp = scipy.interpolate.interp1d(x[indices], y[indices], kind='linear', 
                            bounds_error=False, fill_value='extrapolate')
    df_volsurface['VOL_INTERP', ccy[1], ccy[2]] = f_interp(x)
print(df_volsurface)

Интерполяция для других условий должна работать аналогично.В результате получается DataFrame:

         VOLATILITY                    VOL_INTERP         
END_DATE 2018-03-16 2018-03-17         2018-03-17         
CCY_CODE        SGD        EUR     USD        EUR      USD
STRIKE                                                    
0.005           NaN        NaN     NaN    0.23900  0.42310
0.010           NaN        NaN     NaN    0.23600  0.38275
0.015           NaN        NaN  0.3424    0.23300  0.34240
0.020           NaN        NaN     NaN    0.23000  0.30205
0.025           NaN        NaN  0.2617    0.22700  0.26170
0.030           NaN        NaN  0.2414    0.22400  0.24140
0.035           NaN        NaN     NaN    0.22100  0.22110
0.040           NaN        NaN     NaN    0.21800  0.20080
0.045           NaN     0.2150     NaN    0.21500  0.18050
0.050           NaN     0.2120     NaN    0.21200  0.16020
0.055           NaN     0.2103     NaN    0.21030  0.13990
0.060           NaN        NaN     NaN    0.20975  0.11960
0.065           NaN     0.2092     NaN    0.20920  0.09930
0.070           NaN        NaN     NaN    0.20865  0.07900

Используйте df_volsurface.stack() для возврата к мультииндексу по вашему выбору.Есть также несколько pandas методов интерполяции на выбор.Однако я не нашел удовлетворительного решения вашей проблемы, используя method='akima', потому что он только интерполирует между заданными точками данных, но, кажется, не экстраполирует за пределы.

...