Я продолжаю проект scrapy с предыдущего вопроса: Элемент вывода scrapy в виде 1 элемента списка в строке У меня есть код scrapy, возвращающий данные из событий ufc одним методом анализа и впоследствии возвращающий итоги и округленияподробные данные для каждого совпадения события в дополнительном методе анализа (отдельные ссылки).
Вернутые данные, полученные в результирующем файле csv, верны.Однако форматирование проблематично:
event_name event_date event_loc attendance wclass method mthdtl finround fintime winner loser bout fighters method_txt mthdtl_txt m_finround m_fintime timefrmt ref w_kd l_kd w_sigstr l_sigstr w_sigstr_perc l_sigstr_perc w_tot_str l_tot_str w_td l_td w_td_perc l_td_perc w_sub_att l_sub_att w_pass l_pass w_rev l_rev r1_w_kd r1_w_tot_str r1_w_td r1_w_td_perc r1_w_sub_att r1_w_pass r1_w_rev r1_l_kd r1_l_tot_str r1_l_td r1_l_td_perc r1_l_sub_att r1_l_pass r1_l_rev r1_w_sigstr r1_l_sigstr r1_w_sigstr_perc r1_w_sigstr_perc r1_w_sigstr_head r1_l_sigstr_head r1_w_sigstr_body r1_l_sigstr_body r1_w_sigstr_leg r1_l_sigstr_leg r1_w_sigstr_dist r1_l_sigstr_dist r1_w_sigstr_clinch r1_l_sigstr_clinch r1_w_sigstr_ground r1_l_sigstr_ground r2_w_kd r2_w_tot_str r2_w_td r2_w_td_perc r2_w_sub_att r2_w_pass r2_w_rev r2_l_kd r2_l_tot_str r2_l_td r2_l_td_perc r2_l_sub_att r2_l_pass r2_l_rev r2_w_sigstr r2_l_sigstr r2_w_sigstr_perc r2_w_sigstr_perc r2_w_sigstr_head r2_l_sigstr_head r2_w_sigstr_body r2_l_sigstr_body r2_w_sigstr_leg r2_l_sigstr_leg r2_w_sigstr_dist r2_l_sigstr_dist r2_w_sigstr_clinch r2_l_sigstr_clinch r2_w_sigstr_ground r2_l_sigstr_ground r3_w_kd r3_w_tot_str r3_w_td r3_w_td_perc r3_w_sub_att r3_w_pass r3_w_rev r3_l_kd r3_l_tot_str r3_l_td r3_l_td_perc r3_l_sub_att r3_l_pass r3_l_rev r3_w_sigstr r3_l_sigstr r3_w_sigstr_perc r3_w_sigstr_perc r3_w_sigstr_head r3_l_sigstr_head r3_w_sigstr_body r3_l_sigstr_body r3_w_sigstr_leg r3_l_sigstr_leg r3_w_sigstr_dist r3_l_sigstr_dist r3_w_sigstr_clinch r3_l_sigstr_clinch r3_w_sigstr_ground r3_l_sigstr_ground r4_w_kd r4_w_tot_str r4_w_td r4_w_td_perc r4_w_sub_att r4_w_pass r4_w_rev r4_l_kd r4_l_tot_str r4_l_td r4_l_td_perc r4_l_sub_att r4_l_pass r4_l_rev r4_w_sigstr r4_l_sigstr r4_w_sigstr_perc r4_w_sigstr_perc r4_w_sigstr_head r4_l_sigstr_head r4_w_sigstr_body r4_l_sigstr_body r4_w_sigstr_leg r4_l_sigstr_leg r4_w_sigstr_dist r4_l_sigstr_dist r4_w_sigstr_clinch r4_l_sigstr_clinch r4_w_sigstr_ground r4_l_sigstr_ground r5_w_kd r5_w_tot_str r5_w_td r5_w_td_perc r5_w_sub_att r5_w_pass r5_w_rev r5_l_kd r5_l_tot_str r5_l_td r5_l_td_perc r5_l_sub_att r5_l_pass r5_l_rev r5_w_sigstr r5_l_sigstr r5_w_sigstr_perc r5_w_sigstr_perc r5_w_sigstr_head r5_l_sigstr_head r5_w_sigstr_body r5_l_sigstr_body r5_w_sigstr_leg r5_l_sigstr_leg
UFC 241: Cormier vs. Miocic 2 August 17, 2019 Anaheim, California, USA 17,304 Heavyweight,, KO/TKO Punches 4 04:09 Stipe Miocic Daniel Cormier
UFC 241: Cormier vs. Miocic 2 August 17, 2019 Anaheim, California, USA 17,304 Welterweight, U-DEC 3 05:00 Nate Diaz Anthony Pettis
UFC 241: Cormier vs. Miocic 2 August 17, 2019 Anaheim, California, USA 17,304 Middleweight,, U-DEC 3 05:00 Paulo Costa Yoel Romero
Welterweight Bout Anthony Pettis,Nate Diaz Decision - Unanimous 3 05:00 3 Rnd (5-5-5) Mike Beltran,Guilherme Bravo,Derek Cleary,Ron McCarthy 0 1 69 of 133 114 of 201 51% 56% 86 of 153 205 of 306 0 of 0 1 of 1 0% 100% 1 0 0 4 2 1 0 23 of 41 0 of 0 0% 1 0 0 0 62 of 88 1 of 1 100% 0 2 0
14 of 31 22 of 42 45% 45% 9 of 22 15 of 33 2 of 2 5 of 6 3 of 7 2 of 3 9 of 24 9 of 23 5 of 7 6 of 9 0 of 0 7 of 10 0 40 of 70 0 of 0 0% 0 0 0 0 65 of 114 0 of 0 0% 0 0 0 36 of 66 54 of 100 54% 54% 28 of 55 45 of 87 7 of 9 7 of 11 1 of 2 2 of 2 26 of 54 29 of 63 10 of 12 25 of 37 0 of 0 0 of 0 0 23 of 42 0 of 0 0% 0 0 2 1 78 of 104 0 of 0 0% 0 2 1 19 of 36 38 of 59 52% 52% 17 of 34 34 of 52 1 of 1 4 of 6 1 of 1 0 of 1 11 of 24 13 of 23 5 of 8 12 of 17 3 of 4 13 of 19
Middleweight Bout Yoel Romero,Paulo Costa Decision - Unanimous 3 05:00 3 Rnd (5-5-5) Jason Herzog,Guilherme Bravo,Ron McCarthy,Michael Bell 1 1 125 of 284 118 of 213 44% 55% 125 of 284 118 of 213 1 of 4 0 of 0 25% 0% 0 0 0 0 0 0 1 32 of 69 0 of 2 0% 0 0 0 1 37 of 69 0 of 0 0% 0 0 0
32 of 69 37 of 69 46% 46% 23 of 54 19 of 46 2 of 7 16 of 20 7 of 8 2 of 3 31 of 68 32 of 61 1 of 1 2 of 2 0 of 0 3 of 6 0 40 of 91 1 of 1 100% 0 0 0 0 37 of 71 0 of 0 0% 0 0 0 40 of 91 37 of 71 43% 43% 28 of 77 24 of 53 6 of 7 12 of 17 6 of 7 1 of 1 39 of 90 36 of 70 1 of 1 1 of 1 0 of 0 0 of 0 0 53 of 124 0 of 1 0% 0 0 0 0 44 of 73 0 of 0 0% 0 0 0 53 of 124 44 of 73 42% 42% 45 of 113 24 of 49 3 of 6 18 of 21 5 of 5 2 of 3 48 of 118 42 of 71 5 of 6 2 of 2 0 of 0 0 of 0
UFC Heavyweight Title Bout Daniel Cormier,Stipe Miocic KO/TKO Punches to Head At Distance 4 04:09 5 Rnd (5-5-5-5-5) Herb Dean 0 1 181 of 263 123 of 229 68% 53% 230 of 317 135 of 244 1 of 3 1 of 3 33% 33% 0 0 2 0 0 0 0 71 of 83 1 of 2 50% 0 2 0 0 9 of 18 0 of 0 0% 0 0 0
37 of 46 7 of 13 80% 80% 25 of 34 3 of 8 7 of 7 0 of 0 5 of 5 4 of 5 13 of 16 6 of 12 3 of 3 0 of 0 21 of 27 1 of 1 0 59 of 85 0 of 0 0% 0 0 0 0 48 of 84 0 of 0 0% 0 0 0 56 of 82 46 of 82 68% 68% 56 of 81 37 of 72 0 of 0 8 of 9 0 of 1 1 of 1 45 of 68 42 of 76 11 of 14 4 of 6 0 of 0 0 of 0 0 69 of 100 0 of 1 0% 0 0 0 0 40 of 73 1 of 3 33% 0 0 0 57 of 86 34 of 67 66% 66% 53 of 82 28 of 61 1 of 1 5 of 5 3 of 3 1 of 1 50 of 76 24 of 50 7 of 10 10 of 17 0 of 0 0 of 0 0 31 of 49 0 of 0 0% 0 0 0 1 38 of 69 0 of 0 0% 0 0 0 31 of 49 36 of 67 63% 63% 28 of 46 18 of 47 1 of 1 14 of 16 2 of 2 4 of 4 31 of 49 30 of 57 0 of 0 5 of 5 0 of 0 1 of 5
Во-первых, элементы из первого и второго методов анализа отображаются в отдельных строках.Эти вторые элементы являются своего рода подмножеством в виде отдельного блока полностью справа и ниже элементов первого метода анализа.
Впоследствии, внутри элементов второго метода анализа (ниже и справа от первого блока строк элементов).) элементы пропускают строку, чтобы вместить круглые данные из условия if-elif-else.Это данные, которые расположены между этими строками.Я использую предметы и загрузчики предметов, но в настоящее время я не использую никаких пользовательских конвейеров предметов.Я запускаю паука из командной строки и выводю в csv команду:
scrapy crawl stats -o stats.csv
Сокращенный код:
class StatsSpider(scrapy.Spider):
name = 'stats'
allowed_domains = ['ufcstats.com']
start_urls = ['http://ufcstats.com/statistics/events/completed?page=all']
DOWNLOADER_MIDDLEWARES = {
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
'scrapy_user_agents.middlewares.RandomUserAgentMiddleware': 400,}
#ITEM_PIPELINES = {'stats.pipelines.StatsPipeline': 300,}
custom_settings = {# specifies exported fields and order
'FEED_EXPORT_FIELDS': [ *extensive feed_export_fields* ]}
def parse(self, response):
rev_orderd_events = response.css('tr.b-statistics__table-row')
# full event_links
# event_links = rev_orderd_events.css('i>a::attr(href)').extract()
# for url in event_links:
# yield scrapy.Request(url=event_links, callback=self.parse_event)
event_links = rev_orderd_events.css('i>a::attr(href)')[3].extract()
# for links in event_links:
# yield scrapy.Request(url=links,callback=self.parse_event)
yield scrapy.Request(url=event_links,callback=self.parse_event,dont_filter=True)
def parse_event(self, response):
pg = response.css('div.l-page__container')
for event in response.css('div.b-fight-details'):
event_name = pg.css('h2.b-content__title>span::text').extract_first()
event_date = event.css('ul.b-list__box-list>li:nth-child(1)::text').extract()
event_loc = event.css('ul.b-list__box-list>li:nth-child(2)::text').extract()
attendance = event.css('ul.b-list__box-list>li:nth-child(3)::text').extract()
child(odd)::text').extract()
for fights in event.css('tr')[1:]:
il = ItemLoader(StatsItem(), selector=fights)
il.add_value('event_name', event_name)
il.add_value('event_date', event_date)
il.add_value('event_loc', event_loc)
il.add_value('attendance', attendance)
il.add_css('winner', 'td.b-fight-details__table-col:nth-child(2) p.b-fight-details__table-text:nth-child(odd)>a::text')
il.add_css('loser', 'td.b-fight-details__table-col:nth-child(2) p.b-fight-details__table-text:nth-child(even)>a::text')
il.add_css('wclass','td.b-fight-details__table-col:nth-child(7)>p:nth-child(1)::text')
il.add_css('method','td.b-fight-details__table-col:nth-child(8)>p:nth-child(odd)::text')
il.add_css('mthdtl','td.b-fight-details__table-col:nth-child(8)>p:nth-child(even)::text')
il.add_css('finround','td.b-fight-details__table-col:nth-child(9)>p:nth-child(odd)::text')
il.add_css('fintime','td.b-fight-details__table-col:nth-child(10)>p:nth-child(odd)::text')
yield il.load_item()
match_links = pg.css('tr>td:nth-child(1) a::attr(href)').extract()
for links in match_links:
yield scrapy.Request(url=links, callback=self.parse_match)
def parse_match(self, response):
section = response.css('section.b-statistics__section_details')
f_dtl = section.css('div.b-fight-details')
# m_event = section.css('h2>a::text').extract()
m_info = f_dtl.css('div.b-fight-details__fight div i::text').extract()
m_fin_dtl = f_dtl.css('div.b-fight-details__content>p::text').extract()
ref = f_dtl.css('div.b-fight-details__content i>span::text').extract()
#table_rows = f_dtl.css('tr.b-fight-details__table-row>td.b-fight-details__table-col>p::text').extract()
#timefrmt = f_dtl.css('div.b-fight-details__fight div i::text')[15].extract()
fighters = f_dtl.css('table:nth-child(1) tr.b-fight-details__table-row>td.b-fight-details__table-col>p>a::text').extract()
m_totals = f_dtl.css('table:nth-child(1) tr.b-fight-details__table-row>td.b-fight-details__table-col>p::text').extract()
rounds = f_dtl.css('table:nth-child(2) tr.b-fight-details__table-row>td.b-fight-details__table-col>p::text').extract()
for info in section:
il = ItemLoader(StatsItem(), selector=section)
il.add_value('bout', m_info)
il.add_value('method_txt', m_info)
il.add_value('mthdtl_txt' , m_fin_dtl)
il.add_value('m_finround' , m_info)
il.add_value('m_fintime', m_info)
il.add_value('timefrmt', m_info)
il.add_value('ref', ref)
il.add_value('fighters', fighters)
il.add_value('w_kd', m_totals)
il.add_value('w_sigstr', m_totals)
il.add_value('w_sigstr_perc', m_totals)
il.add_value('w_tot_str', m_totals)
il.add_value('w_td', m_totals)
il.add_value('w_td_perc', m_totals)
il.add_value('w_sub_att', m_totals)
il.add_value('w_pass', m_totals)
il.add_value('w_rev', m_totals)
il.add_value('l_kd', m_totals)
il.add_value('l_sigstr', m_totals)
il.add_value('l_sigstr_perc', m_totals)
il.add_value('l_tot_str', m_totals)
il.add_value('l_td', m_totals)
il.add_value('l_td_perc', m_totals)
il.add_value('l_sub_att', m_totals)
il.add_value('l_pass', m_totals)
il.add_value('l_rev', m_totals)
il.add_value('r1_w_kd', rounds)
# il.add_value('r1_w_sigstr', rounds)
# il.add_value('r1_w_sigstr_perc', rounds)
il.add_value('r1_w_tot_str', rounds)
il.add_value('r1_w_td', rounds)
il.add_value('r1_w_td_perc', rounds)
il.add_value('r1_w_sub_att', rounds)
il.add_value('r1_w_pass', rounds)
il.add_value('r1_w_rev', rounds)
il.add_value('r1_l_kd', rounds)
# il.add_value('r1_l_sigstr', rounds)
# il.add_value('r1_l_sigstr_perc', rounds)
il.add_value('r1_l_tot_str', rounds)
il.add_value('r1_l_td', rounds)
il.add_value('r1_l_td_perc', rounds)
il.add_value('r1_l_sub_att', rounds)
il.add_value('r1_l_pass', rounds)
il.add_value('r1_l_rev', rounds)
yield il.load_item()
if len(rounds) == 42:
r1 = ItemLoader(round_1_items(), selector = section)
r1...
yield r1.load_item()
elif len(rounds) == 84:
r2 = ItemLoader(round_2_items(), selector = section)
r2...
yield r2.load_item()
elif len(rounds) == 126:
r3 = ItemLoader(round_3_items(), selector = section)
r3...
yield r3.load_item()
elif len(rounds) == 168:
r4 = ItemLoader(round_4_items(), selector = section)
r4...
yield r4.load_item()
elif len(rounds) == 210:
r5 = ItemLoader(round_5_items(), selector = section)
r5....
yield r5.load_item()
else:
il = ItemLoader(StatsItem(), selector=section)
il.add_value('rounders', rounds)
yield il.load_item()
Я бы хотел, чтобы каждый элемент выводился в виде одной строки CSV.Таким образом, если текущий CSV-вывод CSV имеет вид:
1 (block of rows)
2a
2b (alternating total/round detail rows)
Я хочу, чтобы мой CSV был:
1 - 2a - 2b...