Преобразование ответа JSON в Pandas dataframe имеет специальные символы для таких слов, как "нет", не - PullRequest
0 голосов
/ 21 сентября 2019

Я - API REST сообщества NYT (https://developer.nytimes.com/docs/community-api-product/1/overview) для просмотра комментариев пользователей к статье NYT *

Мой код Python выглядит следующим образом

def _get_data_from_nyc(nyc_resource_endpoint):
    response = requests.get(nyc_resource_endpoint)
    return response

response_comments = _get_data_from_nyc(nyt_comments_endpoint).json()

comments = pd.DataFrame(response_comments["results"]["comments"])

Это дает всеполя JSON в виде столбцов данных Pandas корректно, но в случае, когда у commentBody есть текст, показанный ниже

Like anything else there’s a spectrum of offenses from: patting a woman on her arriere to threatening threatening her job if she doesn’t comply to outright rape. All bad behavior; but requiring proportional punishment.
Let’s also not forget the flip side of sexual harassment; sexual advancement. When over the centuries women have leveraged their femininity to curry a favor; from taking the day off to getting a raise or a promotion. Off course this topic never gets the press that harassment does.
Sad part is there will be a quiet backlash from Metoo. There will be a reluctance to hire females; just to “stay out of trouble!”

В поле комментариев Pandas текст имеет вид

Like anything else there’s a spectrum of offenses from: patting a woman on her arriere to threatening threatening her job if she doesn’t comply to outright rape. All bad behavior; but requiring proportional punishment.
Let’s also not forget the flip side of sexual harassment; sexual advancement. When over the centuries women have leveraged their femininity to curry a favor; from taking the day off to getting a raise or a promotion. Off course this topic never gets the press that harassment does.
Sad part is there will be a quiet backlash from Metoo. There will be a reluctance to hire females; just to “stay out of trouble!”

, где апострофы преобразуются в специальныеЯ думаю, что так должно быть в случае всех commentBody с символами пунктуации, но следующее корректно копируется в кадр данных Pandas

"'Actually I am that sort of girl, to look at,' [Julia said]. 'I'm good at games. I was a troop-leader in the Spies. I do voluntary work three evenings a week for the Junior Anti-Sex League. Hours and hours I've spent pasting their bloody rot all over London. I always carry one end of a banner in the processions. I always Iook cheerful and I never shirk anything. Always yell with the crowd, that's what I say. It's the only way to be safe.'"

George Orwell, 1984, Part 2, Chapter 2.  In the current climate, wiser words were never spoken. 
"'Actually I am that sort of girl, to look at,' [Julia said]. 'I'm good at games. I was a troop-leader in the Spies. I do voluntary work three evenings a week for the Junior Anti-Sex League. Hours and hours I've spent pasting their bloody rot all over London. I always carry one end of a banner in the processions. I always Iook cheerful and I never shirk anything. Always yell with the crowd, that's what I say. It's the only way to be safe.'"

George Orwell, 1984, Part 2, Chapter 2.  In the current climate, wiser words were never spoken. 

В этом нет специальных символов.

Чтоможет быть проблема, и любые предложения по обработке этого приветствуются. Нужны ли нам открывающие и закрывающие "и" для каждого комментария для правильного преобразования? Нам нужно сохранить commentBody, как это есть в JSON, и, следовательно, не может удалить знаки препинания.

