хранить длинные строки в MySQL, используя python - PullRequest
0 голосов
/ 26 апреля 2018

Это часть моего скрап-проекта. и он большой, поэтому я не могу поставить здесь весь сценарий, но я могу попытаться прояснить ситуацию, насколько это возможно. версия Python 3.6.2

Я пытаюсь сохранить строку с emoji в базе данных MySQL, вот моя схема базы данных

Создать базу данных:

DATABASE = "CREATE DATABASE IF NOT EXISTS testdb DEFAULT CHARACTER SET 'utf8'"

Изменить базу данных:

ALTER_DB = "ALTER SCHEMA `testdb`  DEFAULT CHARACTER SET utf8mb4"

Таблица ответов, темы, пользователь:

TABLES = {}
TABLES['replyes'] = (
    "CREATE TABLE IF NOT EXISTS `replyes` ("
    "  `reply_no` int(11) NOT NULL AUTO_INCREMENT,"
    "  `thread_name` TEXT NOT NULL,"
    "  `reply_text` LONGTEXT CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,"
    "  `replyer` varchar(30) NOT NULL,"
    "  `reply_reactions` int(5),"
    "  `reply_date` varchar(11) NOT NULL,"
    "  `add_date` TIMESTAMP NOT NULL DEFAULT now(),"
    "  PRIMARY KEY (`reply_no`)"
    ") ENGINE=InnoDB")


TABLES['threads'] = (
    "CREATE TABLE IF NOT EXISTS `threads` ("
    "  `thread_no` int(11) NOT NULL AUTO_INCREMENT,"
    "  `topic_name` varchar(50) NOT NULL,"
    "  `group_name` varchar(50) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,"
    "  `thread_name` TEXT NOT NULL,"
    "  `thread_text` LONGTEXT CHARACTER SET utf8 COLLATE utf8_unicode_ci,"
    "  `thread_starter` varchar(30) NOT NULL,"
    "  `thread_reactions` int(5),"
    "  `thread_replyes` int(5),"
    "  `thread_date` varchar(11) NOT NULL,"
    "  `thread_url`  varchar(150) NOT NULL,"
    "  `add_date` TIMESTAMP NOT NULL DEFAULT now(),"
    "  PRIMARY KEY (`thread_no`)"
    ") ENGINE=InnoDB")

TABLES['users'] = (
    "  CREATE TABLE IF NOT EXISTS `users` ("
    "  `user_no` int(11) NOT NULL AUTO_INCREMENT,"
    "  `user_name` varchar(30) NOT NULL,"
    "  `user_posts` int(11),"
    "  `user_comments` int(11),"
    "  `visibility` varchar(8),"
    "  `user_location` varchar(30),"
    "  `user_since` varchar(30),"
    "  `groups` int(3),"
    "  `group_names` LONGTEXT CHARACTER SET utf8 COLLATE utf8_unicode_ci,"
    "  `group_urls` LONGTEXT,"
    "  `add_date` TIMESTAMP NOT NULL DEFAULT now(),"
    "  PRIMARY KEY (`user_no`)"
    ") ENGINE=InnoDB")

Вставить запрос:

insert_replyes = """INSERT INTO replyes(thread_name, reply_text, replyer,reply_reactions, reply_date) values("{thread_name}","{reply_text}", "{replyer}", {reply_reactions}, "{reply_date}")"""
insert_thread = """INSERT INTO threads(topic_name, group_name, thread_name,thread_text, thread_starter, thread_reactions, thread_replyes, thread_date,thread_url) values("{topic_name}","{group_name}","{thread_name}","{thread_text}", "{thread_starter}",{thread_reactions},{thread_replyes}, "{thread_date}", "{thread_url}")"""
insert_user = """INSERT INTO users(user_name, user_posts, user_comments,visibility, user_location, user_since, groups, group_names, group_urls) values("{user_name}", {user_posts}, {user_comments}, "{visibility}", "{user_location}", "{user_since}", {groups}, "group_names}", "{group_urls}")"""

Я вставляю данные, содержащие различные символы, такие как эмодзи, специальный символ ($, \xc2\xa0 ~ \xc2\xa0 \xc2\xa0 ~ \xc2\xa0 \xc2\xa0 \xc2\xa0 и т. Д.). Я пробовал много CHARACTER SET для базы данных, а также таблицы.

И все работает, если некоторые данные не сохраняются в базе данных

Рабочие данные:

# For replyes
INSERT INTO replyes(thread_name, reply_text, replyer,reply_reactions, reply_date) values("Is this positive??!! Plz help!! ","b'I see it in both! Congratulations \n '", "Skymomof4", 1, "2018-04-26")


INSERT INTO replyes(thread_name, reply_text, replyer,reply_reactions, reply_date) values("Sticky: Rules, Tools and Helpful Links. Updated with working photos","b'Thank you so much for ALL the info, you are amazing!!! '", "chandresteen", 0, "2018-04-26")

# For users
INSERT INTO users(user_name, user_posts, user_comments,visibility, user_location, user_since, groups, group_names, group_urls) values("chandresteen", 7, 605, "Public", "Coconut Creek,FL", "September 2013", 10, "['Big Kids', 'Getting Pregnant - Trying to Conceive ', 'High-Tech Methods for Getting Pregnant - IVF, ICSI, FET', 'May 2015 Birth Club', 'November 2018 Birth Club', 'Preschoolers', 'Soy Isoflavones, Clomid, Vitex & Femara Girls!', 'Toddlers', 'Trying to Conceive Community', 'TTC/Pregnancy South Africa']", "['https://community.babycenter.com/groups/a155/big_kids', 'https://community.babycenter.com/groups/a6720413/getting_pregnant_-_trying_to_conceive', 'https://community.babycenter.com/groups/a696465/high-tech_methods_for_getting_pregnant_-_ivf_icsi_fet', 'https://community.babycenter.com/groups/a6748015/may_2015_birth_club', 'https://community.babycenter.com/groups/a6768388/november-2018-birth-club', 'https://community.babycenter.com/groups/a145/preschoolers', 'https://community.babycenter.com/groups/a6731007/soy_isoflavones_clomid_vitex_femara_girls', 'https://community.babycenter.com/groups/a135/toddlers', 'https://community.babycenter.com/groups/a43905/trying_to_conceive_community', 'https://community.babycenter.com/groups/a6758887/ttcpregnancy_south_africa']")

# For threads
INSERT INTO threads(topic_name, group_name, thread_name,thread_text, thread_starter, thread_reactions, thread_replyes, thread_date,thread_url) values("Getting Pregnant","b'Getting Pregnant - Trying to Conceive '","Sticky: Rules, Tools and Helpful Links. Updated with working photos","b'Welcome to the BBC   Group Getting Pregnant! Below are some links to some threads that would help some of you lovely ladies! Good Luck on your TTC   journey. \xc2\xa0 ~ \xc2\xa0 \xc2\xa0 ~ \xc2\xa0 \xc2\xa0 \xc2\xa0 If there is something not here that you would like to see please feel free to message me. Sorry that the list is so short! Babydust ladies!! '", "~Dovah~",59,6, "2018-04-26", "https://community.babycenter.com/post/a60866779/sticky_rules_tools_and_helpful_links._updated_with_working_photos")

выше работают, что на самом деле в базе данных

Не рабочие данные

# For threads
INSERT INTO threads(topic_name, group_name, thread_name,thread_text, thread_starter, thread_reactions, thread_replyes, thread_date,thread_url) values("Getting Pregnant","b'Getting Pregnant - Trying to Conceive '","Struggling with TTC Process","b"This has officially turned into more of a supportive post surrounding the TTC   process when its taking longer. It's been sooooo difficult trying to go through this the past 1.5 yrs... waiting to finally get our BFP   while others come and go with their journies. OP:\xc2\xa0 Hey ladies.... any input would be greatly appreciated. Hubby and I have been TTC   going on month 15, I think about 20 cycles as mine are shorter. I've been to an RE   and slowly have been getting everything done to see if anything is wrong.... nothing is so far. All that's left is an HSG test. Even hubby has super sperm... yet we're not pregnant. The Dr's office wants to put me on birth control for a week or so and start me on Clomid then do an injection for ovulation followed by IUI. This all feels so quick and sudden that I'm just all over the place with whether to go ahead and I need to let them know basically tomorrow! (No pressure!) My struggle is that a part of me feels that if God wanted it, it would have happened already.... so is doing something like IUI   going against God? I'm so scared that if i do, the baby will have problems and I don't know if I can handle that. On the other hand, I don't want to wait years and years for another baby! It crushes me every time someone else is pregnant and I'm still longing for mine. Thoughts??? "", "PaoPao820",6,50, "2018-04-26", "https://community.babycenter.com/post/a68579914/struggling-with-ttc-process")

# For replyes
INSERT INTO replyes(thread_name, reply_text, replyer,reply_reactions, reply_date) values("Is this positive??!! Plz help!! ","b'Thank you!! I took another this morning and it looks exactly the same, not any darker so fingers crossed \xf0\x9f\xa4\x9e\xf0\x9f\x8f\xbb\xf0\x9f\xa4\x9e\xf0\x9f\x8f\xbb '", "BeBeNBabY", 0, "2018-04-26")

# For user
INSERT INTO users(user_name, user_posts, user_comments,visibility, user_location, user_since, groups, group_names, group_urls) values("~Dovah~", 544, 12988, "Public", "Whiterun, Skyrim", "February 2012", 3, "['Crocheting Mamas', 'Getting Pregnant - Trying to Conceive ', "Getting Pregnant's GOT PREGNANT!"]", "['https://community.babycenter.com/groups/a90405/crocheting_mamas', 'https://community.babycenter.com/groups/a6720413/getting_pregnant_-_trying_to_conceive', 'https://community.babycenter.com/groups/a6762598/getting_pregnants_got_pregnant']")

это то, как я создаю запрос вставки из элементов scrapy, весь код для хранения и создания оператора вставки записывается в piplines.py под pipline проекта код для хранения:

import mysql.connector as sql

config = {
    'user': 'root',
    'password': 'root',
    'host': '127.0.0.1',
    'charset': "utf8",
    'use_unicode': True,
}
connection = sql.connect(**config)

curser = connection.cursor()

string = insert_thread.format(
    topic_name=item['topic'],
    group_name=item['group'],
    thread_name=item['name'],
    thread_text=item['text'],
    thread_starter=item['starter'],
    thread_reactions=item['reactions'],
    thread_replyes=item['replyes'],
    thread_date=item['date'],
    thread_url=item['url']
)
cursor.execute(string)
connection.commit()

Все не рабочие данные выдают такую ​​же ошибку, как показано ниже

    Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/twisted/internet/defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/media/mnthn/work/Office/2018/james_mcallister/babycenterforum/babycenterforum/pipelines.py", line 43, in process_item
    self.insert_user(item, spider)
  File "/media/mnthn/work/Office/2018/james_mcallister/babycenterforum/babycenterforum/pipelines.py", line 100, in insert_user
    self.insert(string)
  File "/media/mnthn/work/Office/2018/james_mcallister/babycenterforum/babycenterforum/pipelines.py", line 132, in insert
    self.cursor.execute(string.replace('\n',''))
  File "/usr/lib/python3/dist-packages/mysql/connector/cursor.py", line 566, in execute
    self._handle_result(self._connection.cmd_query(stmt))
  File "/usr/lib/python3/dist-packages/mysql/connector/connection.py", line 537, in cmd_query
    result = self._handle_result(self._send_cmd(ServerCmd.QUERY, query))
  File "/usr/lib/python3/dist-packages/mysql/connector/connection.py", line 436, in _handle_result
    raise errors.get_exception(packet)
mysql.connector.errors.ProgrammingError: 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'Getting Pregnant's GOT PREGNANT!"]", "['https://community.babycenter.com/groups/' at line 1

Я много раз пытался ввести данные вручную с помощью оболочки python, я искал в интернете целый день, пытался решить эту проблему около четырех дней, но на самом деле ничего не работает

...