Сегментирование абзацев с помощью spaCy - PullRequest
0 голосов
/ 23 марта 2020

У меня есть текстовый файл "scrap_txt" что-то вроде этого

I take the US News College rankings with a big grain of salt. I find some aspects of the ranking useful. For example, the Business and Engineering rankings are fairly accurate. The Peer Assessment rating is also telling.\nn\But other elements of the ranking either make little to no sense, or are easily “gamed” by universities. How is it possible to use one formula to determine the financial viability of a large public university and a small private university? How do alumni donations equal student satisfaction? How are universities measuring class sizes?\n\nFor example, how is UCLA’s Financial Resources rank #20 while Michigan’s is #40 when Michigan’s endowment is 250% larger than UCLA’s ($12 billion vs $5 billion), revenues generated from tuition at Michigan significantly exceed those generated at UCLA, institutional budget at Michigan exceeds UCLA’s, and the cost of operations at Michigan significantly lower than at UCLA?\n\n

Абзац можно сказать как набор строк, которые были обернуты \ n \ n в начале и конце их предложения. , Может кто-нибудь, пожалуйста, помогите мне с кодом для такой проблемы. Результирующие 2 абзаца должны выглядеть следующим образом

paragraph1 = But other elements of the ranking either make little to no sense, or are easily “gamed” by universities. How is it possible to use one formula to determine the financial viability of a large public university and a small private university? How do alumni donations equal student satisfaction? How are universities measuring class sizes?

paragraph2 = For example, how is UCLA’s Financial Resources rank #20 while Michigan’s is #40 when Michigan’s endowment is 250% larger than UCLA’s ($12 billion vs $5 billion), revenues generated from tuition at Michigan significantly exceed those generated at UCLA, institutional budget at Michigan exceeds UCLA’s, and the cost of operations at Michigan significantly lower than at UCLA?
...