Во-первых, какой-нибудь причудливый код для создания DataFrame.
from io import StringIO
import pandas as pd
sio = StringIO("""I am just going to type up something because you inserted an image instead ctr+c and ctr+v the code to Stackoverflow.
Actually, it's unclear what you want to do with the ngram counts.
Perhaps, it might be better to use the `nltk.everygrams()` if you want a global count.
And if you're going to build some sort of ngram language model, then it might not be efficient to do it as you have done too.""")
with sio as fin:
texts = [line for line in fin]
df = pd.DataFrame({'text': texts})
Затем вы можете легко использовать DataFrame.apply
для извлечения нграмм, например,
from collections import Counter
from functools import partial
from nltk import ngrams, word_tokenize
for i in range(1, 4):
_ngrams = partial(ngrams, n=i)
df['{}-grams'.format(i)] = df['text'].apply(lambda x: Counter(_ngrams(word_tokenize(x))))