Токенайзер NLTK по умолчанию не распознает предложения здесь, потому что отсутствует последняя пунктуация. Вы можете добавить его перед каждой новой строкой "\n"
.
Например:
comments1 = comments1.replace("\n", ".\n")
tokens = sent_tokenize(comments1)
for token in tokens:
print("sentence: " + token)
Вы получите что-то подобное (усеченное для удобства чтения):
sentence: 1, Opposition MP Namal Rajapaksa questions Environment Ministe [...] Sirisena over Wilpattu deforestation issue.
sentence: 2, but he should remember that it all started with his dad and [...] a coma in that days .
sentence: 3, Opposition MP Namal Rajapaksa questions Environment Ministe [...] Sirisena over Wilpattu deforestation issue.
sentence: 4, Pawu meya ba ba meyage [...]
sentence: 5, We visited Wilpaththu in August 2013 These are some of the [...] deforestation of Wilpattu as Srilankans .
sentence: 6, Mamath wiruddai .
sentence: 7, Yeah we should get together and do something.
sentence: B, Mama Kamathyi oka kawada hari wenna tiyana deyak .
sentence: 9, Yes Channa Karunaratne we should stand agaist this act dnt [...] as per news spreading Pls .
sentence: 10, LTTE eken elawala daapu oya minissunta awurudu 30kin passe [...] sathungena balaneka thama manussa kama .
sentence: 11, Shanthar mahaththayo ow aththa eminisunta idam denna one g [...] vikalpa yojana gena evi kiyala.
sentence: 12, You are wrong They must be given lands No one opposes it W [...]