Просто простой пример того, как это может работать:
import pandas as pd
# setting up the DataFrame with sample data
df = pd.DataFrame({'Document': ['This is ', 'first', None, 'This is ', 'second', `None, 'this ', 'is ', 'third'],`
'Score': [None, 1, None, None, 2, None, None, 3, None]})
result_df = pd.DataFrame({'Document':[], 'Score':[]})
doc = ''
for index, row in df.iterrows():
if pd.notnull(row['Score']):
#any not NaN value within processed document is score
score = row['Score']
if row['Document']:
#build doc string until the line is not NaN
doc += row['Document']
else:
result_df = result_df.append({'Document':doc, 'Score':score}, ignore_index=True)
doc = ''
if doc:
#when the last line (Document) is not NaN save/print results also:
result_df = result_df.append({'Document':doc, 'Score':score}, ignore_index=True)
Вывод (result_df):
Document Score
0 This is first 1.0
1 This is second 2.0
2 This is third 3.0