Я работаю над текстовой проблемой, где у меня есть мой информационный фрейм pandas, содержащий много столбцов, из которых один состоит из параграфов. В выводе мне нужны 3 столбца, как определено -
- Длина самых больших слов
- Количество самых больших слов (в случае, если есть подобная длина)
- Общее количество слов одинаковой длины.
Я отвечаю за слово, если оно отделено пробелом. Поиск ответа с использованием python apply-map
.
Вот пример входных данных -
df = pd.DataFrame({'text':[
"that's not where the biggest opportunity is - it's with heart failure drug - very very huge market....",
"Of course! I just got diagnosed with congestive heart failure and type 2 diabetes. I smoked for 12 years and ate like crap for about the same time. I quit smoking and have been on a diet for a few weeks now. Let me assure you that I'd rather have a coke, gummi bears, and a bag of cheez doodles than a pack of cigs right now. Addiction is addiction.",
"STILLWATER, Okla. (AP) ? Medical examiner spokeswoman SpokesWoman: Oklahoma State player Tyrek Coger died of enlarged heart, manner of death ruled natural."
]})
df
text
0 that's not where the biggest opportunity is - ...
1 Of course! I just got diagnosed with congestiv...
2 STILLWATER, Okla. (AP) ? Medical examiner spok...
Вот ожидаемый результат -
text word_count word_length words
0 that's not where the biggest opportunity is - ... 1 11 opportunity
1 Of course! I just got diagnosed with congestiv... 1 10 congestive
2 STILLWATER, Okla. (AP) ? Medical examiner spok... 2 11 spokeswoman SpokesWoman