Использование list comprehension
:
words = [a for sub in train['tokenize'] for a in sub]
Или chain.from_iterable
:
from itertools import chain
words = list(chain.from_iterable(train['tokenize']))
Образец :
train = pd.DataFrame({'tokenize':[['a','s','d'],['ss','dd'],['aa','ss','dd']]})
print (train)
tokenize
0 [a, s, d]
1 [ss, dd]
2 [aa, ss, dd]
words = [a for sub in train['tokenize'] for a in sub]
print (words)
['a', 's', 'd', 'ss', 'dd', 'aa', 'ss', 'dd']