Опрос результатов моделирования Маркова - Помощь и отзывы высоко ценится - PullRequest
0 голосов
/ 30 марта 2019

Я построил цепь Маркова, с помощью которой я могу имитировать распорядок дня людей (модели активности).Каждый день симуляции делится на 144 временных шага, и человек может выполнить одно из четырнадцати действий.Это: В гостях - на работе (1) В гостях - на отдыхе (2) В гостях - за покупками (3) Спать (4) Готовить (5) Посудомоечная машина (6) Прачечная (7) Пылесосить (8) Смотреть телевизор (9) С помощью ПК(10) Личная гигиена (11) Глажка (12) Прослушивание музыки (13) Другое (14)

Я рассчитал вероятности перехода для цепи Маркова из набора данных, содержащего в общей сложности 226 дневников людей, которыезадокументировали их деятельность в 10-минутном интервале.Набор данных можно найти здесь: https://www.dropbox.com/s/tyjvh810eg5brkr/data?dl=0

Теперь я смоделировал 4000 дней с помощью цепочки Маркова и хочу проверить результаты с помощью исходного набора данных.

Для этой цели я анализирую результаты для действий в отдельности и рассчитываю ожидаемое значение, а также 95% доверительный интервал для каждого временного шага.Затем я проверяю, находится ли среднее значение для активности из исходного набора данных в этом интервале, и вычисляю количество всех точек, которые не лежат в пределах верхнего или нижнего порога доверительного интервала.

Однако для каждого моделирования я получаю отклонения разной высоты, иногда в диапазоне 1%, иногда более 10%.

Интересно, как это возможно и возможно ли?возможно вообще.

Я смоделировал 4x 4000 дней, и результаты:

Graphical results for first simulation

Results for the simulation

Первая часть кода здесь (извините, но она слишком длинная):

# import of libraries

# First, create a 143xarray for transition probability matrix since the first state of the activity pattern will be determined by the starting probability which will be calculated later.

data= data(Dataframe vom dataset see link above)

# in case activity is not observed in time step t, return 0 instead divide by 0

def a1(x, y):
    try:
        return x / y
    except ZeroDivisionError:
        return 0

#create matrix for transition probabilities
matrix = np.empty(shape=(143, 14, 14))

#iterate through each time step and calculate probabilities for each activity (from state j to state m)

#save matrix as 3D-array
h=data
for i in range(0, 143):
    for j in range(1, 15):
        for m in range(1, 15):
            a = a1(len(h[(h.iloc[:, i] == j) & (h.iloc[:, i + 1] == m)]), len(h[h.iloc[:, i] == j]))
            matrix[i, j - 1, m - 1] = a
np.save(einp_wi_week, matrix)

# now calculate starting probabilities for time step 0 (initial start activity)

matrix_original_probability_starting_state= np.empty(shape=(14, 1))
for i in range(0, 1):
    for j in range(1, 15):
        a = a1(len(h[(h.iloc[:, i] == j)]), len(h.iloc[:, i]))
        matrix_original_probability_starting_state[j - 1, i] = a
np.save(einp_wi_week_start, matrix_original_probability_starting_state)


# import transition probabilities for markov-chain (I have several worksheets to keep track of my approach and to document the results)


# MARKOV-CHAINS TO GENERATE ACTIVITY PATTERNS


# for every time step, the markov-chain can be either in one of these states / activities
states = ["1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14"]

# Possible sequences of state changes (since transferring from state 1 to state 11, for instance has the same code as transferring from 11 to 1, all state transfers for 1 to a number higher than 9 is adjusted in the transfer code)
transitionName = [["11", "12", "13", "14", "15", "16", "17", "18", "19", "0110", "0111", "0112", "0113", "0114"], ["21", "22", "23", "24", "25", "26", "27", "28", "29", "210", "211", "212", "213", "214"], ["31", "32", "33", "34", "35", "36", "37", "38", "39", "310", "311", "312", "313", "314"], ["41", "42", "43", "44", "45", "46", "47", "48", "49", "410", "411", "412", "413", "414"], ["51", "52", "53", "54", "55", "56", "57", "58", "59", "510", "511", "512", "513", "514"], ["61", "62", "63", "64", "65", "66", "67", "68", "69", "610", "611", "612", "613", "614"], ["71", "72", "73", "74", "75", "76", "77", "78", "79", "710", "711", "712", "713", "714"], ["81", "82", "83", "84", "85", "86", "87", "88", "89", "810", "811", "812", "813", "814"], ["91", "92", "93", "94", "95", "96", "97", "98", "99", "910", "911", "912", "913", "914"], ["101", "102", "103", "104", "105", "106", "107", "108", "109", "1010", "1011", "1012", "1013", "1014"], ["111", "112", "113", "114", "115", "116", "117", "118", "119", "1110", "1111", "1112", "1113", "1114"], ["121", "122", "123", "124", "125", "126", "127", "128", "129", "1210", "1211", "1212", "1213", "1214"], ["131", "132", "133", "134", "135", "136", "137", "138", "139", "1310", "1311", "1312", "1313", "1314"], ["141", "142", "143", "144", "145", "146", "147", "148", "149", "1410", "1411", "1412", "1413", "1414"]]


def activity_forecast(simulation_days):
    # activitypattern is an empty array that stores the sequence of states in the simulation.
    activitypattern = []
    for m in range(0,simulation_days):
        # for each simulation day, the activitypattern for all 144 time steps (10-min resolution) must be generated

        # pro determines the transition probabilities matrix that is used for the simulation day and can be changed with regards to the day simulated (weekday, Friday, Saturday, Sunday). 

        # Need to be changed for the validation in order to validate activity patterns for weekdays, Fridays, Saturdays, and Sundays (see transition probability matrices above and comments later)

        pro = einp_wi_week

        #Determine starting activity. Therefore use vector with starting probabilities for all activites, generated in "Clustern_Ausgangsdaten..."
        activityToday = np.random.choice(states,replace=True,p=einp_wi_week_start[:,0])



        # create array were the sequence of states for simulation day is stored. This array will be appended to the activity pattern array

        activityList = [activityToday]

        #since the first activity state for each simulation day is determined, we have to generate activity states for the 143 following time steps
        for i in range(0,143):
            #create sequence of activities
            if activityToday == "1":
                change = np.random.choice(transitionName[0],replace=True,p=pro[i,0,:])
                if change == "11":
                    activityList.append("1")
                    pass
                elif change == "12":
                    activityToday = "2"
                    activityList.append("2")
                elif change == "13":
                    activityToday = "3"
                    activityList.append("3")
                elif change == "14":
                    activityToday = "4"
                    activityList.append("4")
                elif change == "15":
                    activityToday = "5"
                    activityList.append("5")
                elif change == "16":
                    activityToday = "6"
                    activityList.append("6")
                elif change == "17":
                    activityToday = "7"
                    activityList.append("7")
                elif change == "18":
                    activityToday = "8"
                    activityList.append("8")
                elif change == "19":
                    activityToday = "9"
                    activityList.append("9")
                elif change == "0110":
                    activityToday = "10"
                    activityList.append("10")
                elif change == "0111":
                    activityToday = "11"
                    activityList.append("11")
                elif change == "0112":
                    activityToday = "12"
                    activityList.append("12")
                elif change == "0113":
                    activityToday = "13"
                    activityList.append("13")
                else:
                    activityToday = "14"
                    activityList.append("14")
            elif activityToday == "2":
                change = np.random.choice(transitionName[1],replace=True,p=pro[i,1,:])           
                if change == "22":
                    activityList.append("2")
                    pass
                elif change == "21":
                    activityToday = "1"
                    activityList.append("1")
                elif change == "23":
                    activityToday = "3"
                    activityList.append("3")
                elif change == "24":
                    activityToday = "4"
                    activityList.append("4")
                elif change == "25":
                    activityToday = "5"
                    activityList.append("5")
                elif change == "26":
                    activityToday = "6"
                    activityList.append("6")
                elif change == "27":
                    activityToday = "7"
                    activityList.append("7")
                elif change == "28":
                    activityToday = "8"
                    activityList.append("8")
                elif change == "29":
                    activityToday = "9"
                    activityList.append("9")
                elif change == "210":
                    activityToday = "10"
                    activityList.append("10")
                elif change == "211":
                    activityToday = "11"
                    activityList.append("11")
                elif change == "212":
                    activityToday = "12"
                    activityList.append("12")
                elif change == "213":
                    activityToday = "13"
                    activityList.append("13")
                else:
                    activityToday = "14"
                    activityList.append("14")
            [code is too long - code needs to be written for each activity (1-14)]

        activitypattern.append(activityList)
    # save the files in order to use them later for comparison(and also for documentation since they can change for each simulation)
    df=pd.DataFrame(activitypattern)
    df.to_csv("Activitypatterns_synthetic_one_person_ft_aggregated_week_4000_days_new",index=False)

#call function
activity_forecast(4000)

Я рад любым советам / отзывам.

Спасибо, Феликс

1 Ответ

0 голосов
/ 30 марта 2019

Вот еще код:)

# Activity patterns have been saved

# Now, use original data and calculate the probability for each time step

k=data
matrix_original_one_person_wi_week= np.empty(shape=(14, 144))
for i in range (0,144):
    for j in range (1,15):
        a=a1(len(k[(k.iloc[:,i]==j)]),len(k.iloc[:,i]))
        matrix_original_one_person_wi_week[j-1,i]=a

#create dataframe
matrix_df_activities_original_one_person_winter_week=pd.DataFrame(matrix_original_one_person_wi_week)
print(matrix_df_activities_original_one_person_winter_week)


# Activity profiles have been generated for [4000] simulation days and stored as csv file.

# loading data
activities_one_person_winter_week_4000_days=pd.read_csv("XXX", delimiter=";")

# load timesteps csv (10-min-resolution array) to create array with 10-min time steps as x-axis
timesteps = pd.read_csv("XXX", delimiter=";", header=None)
timesteps_array = timesteps[0].values.tolist()

# now transform to dataframe
df_activities_one_person_winter_week_4000_days=pd.DataFrame(activities_one_person_winter_week_4000_days)

# upper and lower threshold for the 95% confidence interval are stored within a single array.
confidence_upper = []
confidence_lower = []

# now iterate through each time step to get the mean value and the upper and lower threshold

# this is done for each activity

# for each column of the dataframe, change values of activities to 1 (activity observed) or 0 (other activity observed)

# for changing the activity for which the values should be calculated, just change the the first value within np.where (a==)

for i in range(0,144):
    a = df_activities_one_person_winter_week_4000_days.iloc[:,i]

    #look for specifc value (first number) and replace it with (1) the rest ist (0), activity starts with 1, not zero!
    count = np.where(a == 1, 1, 0)
    mean = count.mean()

    #confidence interval is 0.95
    confidence = 0.95

    # number of observations
    n = len(count)

    #standard error
    std_err = stats.sem(count)

    # calculate value for lower and upper threshold
    threshold_value = std_err * stats.t.ppf((1 + confidence) / 2, n - 1)

    # add / subtract threshold value from mean value mean
    upper_threshold = mean + threshold_value
    lower_threshold = mean - threshold_value

    #append values to array
    confidence_upper.append(upper_threshold*100)
    confidence_lower.append(lower_threshold*100)

#now we need to calculate the mean value for each activity for each time step from the original data

# just change the first value in iloc for the activity you want to have the probabilities for (its always "a" (activity number defined in np.where) - 1)

original_data_value=np.array(matrix_df_activities_original_one_person_winter_week.iloc[0,:])   




# Now count the frequency how many data points of the original distributiona are outside of the confidence interval

# first multiply value by 100 in order to compare (since other arrays are in XX% format)
#original_data_value_percentage=[]
#for i in range(0,144):
    #a=original_data_value[i]
    #original_data_value_percentage.append(a)
#original_data_value_percentage

# now create array where all values that are lower/upper of confidence interval are stored
count_frequency=[]

#iterate through each time step
for i in range(0,144):
    if (original_data_value[i]*100)<confidence_lower[i]:
        count_frequency.append(original_data_value[i]*100)
        print("lower",i, (original_data_value[i]*100),confidence_lower[i])
    elif (original_data_value[i]*100)>confidence_upper[i]:
        count_frequency.append(original_data_value[i]*100)
        print("upper", i, (original_data_value[i]*100),confidence_upper[i])
    else:
        pass

print((len(count_frequency)/144)*100)



...