Документация говорит, что вы должны ожидать некоторой ручной очистки после вызова pd.read_html()
.Я не уверен, как расширить этот код для ваших возможно разрозненных htmls.С учетом вышесказанного, достигает ли это того, чего вы хотите?
# Read df
df_other=pd.read_html(url, header=0, match='Number of Plinth')
# To keep only the targeted columns; have a look at df_other - it's cluttered.
targeted_columns = ['Sr.No.', 'Project Name', 'Name', 'Proposed Date of Completion',
'Number of Basement\'s', 'Number of Plinth', 'Number of Podium\'s',
'Number of Slab of Super Structure', 'Number of Stilts',
'Number of Open Parking', 'Number of Closed Parking']
# 'Project Name'=='SRUSHTI COMPLEX' is an easy way to extract the two dataframes of interest. Also resetting index and dropping.
df_other = df_other[0].loc[df_other[0]['Project Name']=='SRUSHTI COMPLEX',targeted_columns].reset_index(drop=True)
# This is useful for the merge step later since the Sr.No. in df_one and df_two int
df_other['Sr.No.'] = df_other['Sr.No.'].astype(int)
# Extract the two rows as dataframes that correspond to each frame you mentioned
df_other_one = df_other.iloc[[0]]
df_other_two = df_other.iloc[[1]]
Как только это будет сделано, вы можете использовать merge
для объединения фреймов данных
df_one_ = df_one.merge(df_other_one, on='Sr.No.')
print(df_one_)
Sr.No. Apartment Type Carpet Area (in Sqmts) Number of Apartment \
0 1 Shops 70.63 6
Number of Booked Apartment Project Name Name \
0 0 SRUSHTI COMPLEX A and B
Proposed Date of Completion Number of Basement's Number of Plinth \
0 NaN 0 1
Number of Podium's Number of Slab of Super Structure Number of Stilts \
0 0 5 1
Number of Open Parking Number of Closed Parking
0 48 1
df_two_ = df_two.merge(df_other_two, on='Sr.No.')
print(df_two_)
Sr.No. Apartment Type Carpet Area (in Sqmts) Number of Apartment \
0 2 1BHK 1409.68 43
Number of Booked Apartment Project Name Name \
0 4 SRUSHTI COMPLEX C and D
Proposed Date of Completion Number of Basement's Number of Plinth \
0 NaN 0 1
Number of Podium's Number of Slab of Super Structure Number of Stilts \
0 0 5 1
Number of Open Parking Number of Closed Parking
0 51 1