Например, фрейм данных:
df = pd.DataFrame(data = {'id': ['393848', '30495'],
'text' : ['This is Gabanna. @RT Her human Jose rushed past firefighters into his burning home to rescue her. She suffered burns on her nose and paws, but will be just fine. The family lost everything else. You can help them rebuild below. 14/10 for both (via @KUSINews)',
'Meet Milo. He’s a smiley boy who tore a ligament in his back left zoomer. The surgery to fix it went well, but he’s still at the hospital being monitored. He’s going to work very hard to fetch at full speed again, and you can help him do it below. 13/10']
})
Я написал несколько функций:
def tokenize(df):
def process_tokens(df): #return column with lists of tokens
def process_reg(text): #return plain text
return " ".join([i for i in re.sub(r'[^a-zA-Z\s]', "", str(text)).split()])
df['tokens'] = [process_reg(text).split() for text in df['text']]
return process_tokens(df)
tokenize(df)
def process(df): #return column with dicts
def process_group(token): #convert list of tokens into dictionery
return pd.DataFrame(token, columns=["term"]).groupby('term').size().to_dict()
df['dic'] = [process_group(token) for token in df['tokens']]
process(df)
Они отлично работают один за другим, и я получил то, что ожидал:
Я искал решение объединить все функции в одну, чтобы можно было передать кадр данных только один раз.
Не могу найти.
Пожалуйста, помогите