Я добавил дополнительный округ к вашим данным и составил вымышленный индекс для HPI для округа Барбур:
state = ["Alabama"] * 12
county = ["Baldin County"] * 6 + ["Barbour County"] * 6
year = [x for y in range(2) for x in range(2012, 2018)]
hpi = [125, 130, 127.5, 142, 160, 139, 98, 108, 102, 115, 118, 114]
data = {"Year": year, "State": state, "County": county, "HPI": hpi}
df = pd.DataFrame(data)
# Sorting is necessary.
df = df.sort_values(['State', 'County', 'Year'])
print(df)
Year State County HPI
0 2012 Alabama Baldin County 125.0
1 2013 Alabama Baldin County 130.0
2 2014 Alabama Baldin County 127.5
3 2015 Alabama Baldin County 142.0
4 2016 Alabama Baldin County 160.0
5 2017 Alabama Baldin County 139.0
6 2012 Alabama Barbour County 98.0
7 2013 Alabama Barbour County 108.0
8 2014 Alabama Barbour County 102.0
9 2015 Alabama Barbour County 115.0
10 2016 Alabama Barbour County 118.0
11 2017 Alabama Barbour County 114.0
Исходя из этой базы, мы смещаем «HPI» и делим, чтобы получить результаты данных, которые вы ищем.
df["3 year appreciation"] = df.HPI / df['HPI'].shift(3)
print(df)
Year State County HPI 3 year appreciation
0 2012 Alabama Baldin County 125.0 NaN
1 2013 Alabama Baldin County 130.0 NaN
2 2014 Alabama Baldin County 127.5 NaN
3 2015 Alabama Baldin County 142.0 1.136000
4 2016 Alabama Baldin County 160.0 1.230769
5 2017 Alabama Baldin County 139.0 1.090196
6 2012 Alabama Barbour County 98.0 0.690141
7 2013 Alabama Barbour County 108.0 0.675000
8 2014 Alabama Barbour County 102.0 0.733813
9 2015 Alabama Barbour County 115.0 1.173469
10 2016 Alabama Barbour County 118.0 1.092593
11 2017 Alabama Barbour County 114.0 1.117647
Однако теперь у вас есть NaN в начале и неверные значения для первых трех лет каждого округа. Чтобы исправить это, мы группируем штат / округ, затем извлекаем первые три года для каждой группы, используя head (3), затем получаем значения индекса, затем фильтруем и устанавливаем в ноль.
df.loc[df.groupby(["State", "County"]).head(3).index, "3 year appreciation"] = 0
print(df)
Year State County HPI 3 year appreciation
0 2012 Alabama Baldin County 125.0 0.000000
1 2013 Alabama Baldin County 130.0 0.000000
2 2014 Alabama Baldin County 127.5 0.000000
3 2015 Alabama Baldin County 142.0 1.136000
4 2016 Alabama Baldin County 160.0 1.230769
5 2017 Alabama Baldin County 139.0 1.090196
6 2012 Alabama Barbour County 98.0 0.000000
7 2013 Alabama Barbour County 108.0 0.000000
8 2014 Alabama Barbour County 102.0 0.000000
9 2015 Alabama Barbour County 115.0 1.173469
10 2016 Alabama Barbour County 118.0 1.092593
11 2017 Alabama Barbour County 114.0 1.117647
Общий код :
import pandas as pd
state = ["Alabama"] * 12
county = ["Baldin County"] * 6 + ["Barbour County"] * 6
year = [x for y in range(2) for x in range(2012, 2018)]
hpi = [125, 130, 127.5, 142, 160, 139, 98, 108, 102, 115, 118, 114]
data = {"Year": year, "State": state, "County": county, "HPI": hpi}
df = pd.DataFrame(data)
df = df.sort_values(['State', 'County', 'Year'])
df["3 year appreciation"] = df.HPI / df['HPI'].shift(3)
df.loc[df.groupby(["State", "County"]).head(3).index, "3 year appreciation"] = 0