Question

Приведенный ниже кодовый блок отлично работает с одним кадром данных, он берет кадр данных серии дата-время и создает скользящее окно с задержкой в среднем в 1 час для четырех столбцов датчика.У меня есть список фреймов данных, чтобы сделать это, хотя, есть ли способ перебрать список или создать функцию, которая будет делать это, чтобы у меня не было повторяющихся блоков кода?

Списокдатафреймы:

df_list = [df_t6,
           df_t7,
           df_t8,
           df_t11,
           df_t14,
           df_t15,
           df_t17,
           df_t19]

Блок кода, который работает следующим образом:

# df_t6 telemetry means lag window

# create an empty list 'temp'
temp = [] 
# define the feature columns to be iterated
features = ['HP', 'Coolant1', 'AccumulatedWork', 'CuttingHP']
# loop
for column in features:
    # append to the list 'temp' a three hour (1H) sample taking the mean for each 'column' from the 'features' list
    temp.append(pd.pivot_table(df_t6, index = 'datetime', columns = 'Tool', values = column)
                .resample('1H', closed = 'left', label = 'right').mean().unstack())
# create a dataframe to hold the information and concat the 'temp' list
sensorData1H_mean = pd.concat(temp, axis = 1)
# name the columns using the list 'features' + '1H_mean'
sensorData1H_mean.columns = [n + '_1H_mean' for n in features]
# reset the index values
sensorData1H_mean.reset_index(inplace = True)

Я знаю, что могу определить метод для этого, как показано ниже, для быстрой итерации, но мне было интересно, есть ли более быстрый/ лучше?

def oneHmean(d):
    # create an empty list 'temp'
    temp = [] 
    # define the feature columns to be iterated
    features = ['HP', 'Coolant1', 'AccumulatedWork', 'CuttingHP']
    # loop
    for column in features:
        # append to the list 'temp' a three hour (1H) sample taking the mean for each 'column' from the 'features' list
        temp.append(pd.pivot_table(d, index = 'datetime', columns = 'Tool', values = column)
                    .resample('1H', closed = 'left', label = 'right').mean().unstack())
    # create a dataframe to hold the information and concat the 'temp' list
    sensorData1H_mean = pd.concat(temp, axis = 1)
    # name the columns using the list 'features' + '1H_mean'
    sensorData1H_mean.columns = [n + '_1H_mean' for n in features]
    # reset the index values
    sensorData1H_mean.reset_index(inplace = True)
    return sensorData1H_mean

df_t6_m = oneHmean(df_t6)
df_t7_m = oneHmean(df_t7)

и т. д. *

Подмножества:

df_t6:

   Unnamed: 0  IDData    HP  Coolant1  AccumulatedWork  CuttingHP Tool          datetime
0           0       0     0       388            30452      -1775   T6   2019-02-22 11:50:21 
1           1       1  1812       388            30452         37   T6   2019-02-22 11:50:21 
2           2       2  1775       388            30452          0   T6   2019-02-22 11:50:21
3           3       3  1797       382            30452         22   T6   2019-02-22 11:50:21
4           4       4  1797       382            30452         22   T6   2019-02-22 11:50:21

df_t7:

   Unnamed: 0  IDData    HP  Coolant1  AccumulatedWork  CuttingHP Tool          datetime  
0           0       0  1646        14             3291      -1912   T7   2019-02-22 11:50:42
1           1       1  1680        14             3291      -1878   T7   2019-02-22 11:50:42 
2           2       2  1719        14             3291      -1839   T7   2019-02-22 11:50:42  
3           3       3  1673        14             3291      -1885   T7   2019-02-22 11:50:42
4           4       4  1648        14             3291      -1910   T7   2019-02-22 11:50:42

Ian Thompson · Answer 1 · 26 февраля 2019

Я думаю, вы можете объединить df s, groupby a key, а затем применить вашу oneHmean функцию.

# concat the dfs into one, add a key for each to separate them
df = pd.concat([
    df_t6,
    df_t7
], keys=[
    't6', 't7'
])

# your function
def oneHmean(d):
    # create an empty list 'temp'
    temp = [] 
    # define the feature columns to be iterated
    features = ['HP', 'Coolant1', 'AccumulatedWork', 'CuttingHP']
    # loop
    for column in features:
        # append to the list 'temp' a three hour (1H) sample taking the mean for each 'column' from the 'features' list
        temp.append(pd.pivot_table(d, index = 'datetime', columns = 'Tool', values = column)
                    .resample('1H', closed = 'left', label = 'right').mean().unstack())
    # create a dataframe to hold the information and concat the 'temp' list
    sensorData1H_mean = pd.concat(temp, axis = 1)
    # name the columns using the list 'features' + '1H_mean'
    sensorData1H_mean.columns = [n + '_1H_mean' for n in features]
    # reset the index values
    sensorData1H_mean.reset_index(inplace = True)
    return sensorData1H_mean

# group on the keys and apply your function
df.groupby(level=0).apply(oneHmean)

Результаты

Как я могу запустить список данных через агрегирующий цикл?

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

1 Ответ

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Как я могу запустить список данных через агрегирующий цикл?

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

1 Ответ

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Похожие темы