Как записать данные в новые столбцы в CSV при webscraping? - PullRequest
0 голосов
/ 26 мая 2018

Я перебираю горячие диаграммы R & B / хип-хопа на билборде и могу получить все свои данные, но когда я начинаю записывать свои данные в CSV, все форматирование неверно.

Данные для Номер прошлой недели , Пиковая позиция и Недели на графике все отображаются под первыми 3 столбцами моего CSV, а не столбцами, в которых находятся соответствующие заголовки.

Это мой текущий код:

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = 'https://www.billboard.com/charts/r-b-hip-hop-songs'

# Opens web connetion and grabs page
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

# HTML parsing
page_soup = soup(page_html, "html.parser")

# Grabs song title, artist and picture
mainContainer = page_soup.findAll("div", {"class":"chart-row__main- 
display"})

# CSV filename creation
filename = "Billboard_Hip_Hop_Charts.csv"
f = open(filename, "w")

# Creating Headers
headers = "Billboard Number, Artist Name, Song Title, Last Week Number, Peak 
Position, Weeks On Chart\n"
f.write(headers)

# Get Billboard Number, Artist Name and Song Title 
for container in mainContainer:
    # Gets billboard number
    billboard_number = container.div.span.text

    # Gets artist name
    artist_name_a_tag = container.findAll("", {"class":"chart-row__artist"})
    artist_name = artist_name_a_tag[0].text.strip()

    # Gets song title
    song_title = container.h2.text

    print("Billboard Number: " + billboard_number)
    print("Artist Name: " + artist_name)
    print("Song Title: " + song_title)

    f.write(billboard_number + "," + artist_name + "," + song_title + "\n")

# Grabs side container from main container
secondaryContainer = page_soup.findAll("div", {"class":"chart-row__secondary"})

# Get Last Week Number, Peak Position and Weeks On Chart
for container in secondaryContainer:
    # Gets last week number
    last_week_number_tag = container.findAll("", {"class":"chart-row__value"})
    last_week_number = last_week_number_tag[0].text

    # Gets peak position
    peak_position_tag = container.findAll("", {"class":"chart-row__value"})
    peak_position = peak_position_tag[1].text

    # Gets week on chart
    weeks_on_chart_tag = container.findAll("", {"class":"chart-row__value"})
    weeks_on_chart = weeks_on_chart_tag[2].text

    print("Last Week Number: " + last_week_number)
    print("Peak Position: " + peak_position)
    print("Weeks On Chart: " + weeks_on_chart)

    f.write(last_week_number + "," + peak_position + "," + weeks_on_chart + "\n")

f.close()

Вот так выглядит мой csv с заголовками Номер рекламного щита , Имя исполнителя , Название песни, Номер прошлой недели , Пиковая позиция и Недели на графике .

1  Drake                                          Nice For What               
2  Post Malone Featuring Ty Dolla $ign            Psycho                      
3  Drake                                          God's Plan                  
4  Post Malone                                    Better Now                  
5  Post Malone Featuring 21 Savage                Rockstar                    
6  BlocBoy JB Featuring Drake                     Look Alive                  
7  Post Malone                                    Paranoid                    
8  Lil Dicky Featuring Chris Brown                Freaky Friday               
9  Post Malone                                    Rich & Sad                  
10 Post Malone Featuring Swae Lee                 Spoil My Night              
11 Post Malone Featuring Nicki Minaj              Ball For Me                 
12 Migos Featuring Drake                          Walk It Talk It             
13 Post Malone Featuring G-Eazy & YG              Same Bitches                
14 Cardi B| Bad Bunny & J Balvin                  I Like It                   
15 Post Malone                                    Zack And Codeine            
16 Post Malone                                    Over Now                    
17 Cardi B                                        Be Careful                  
18 Post Malone                                    Takin' Shots                
19 The Weeknd & Kendrick Lamar                    Pray For Me                 
20 Rich The Kid                                   Plug Walk                   
21 The Weeknd                                     Call Out My Name            
22 Bruno Mars & Cardi B                           Finesse                     
23 Post Malone                                    Candy Paint                 
24 Ella Mai                                       Boo'd Up                    
25 Rae Sremmurd & Juicy J                         Powerglide                  
26 Post Malone                                    92 Explorer                 
27 J. Cole                                        ATM                         
28 J. Cole                                        KOD                         
29 Post Malone                                    Otherside                   
30 Post Malone                                    Blame It On Me              
31 J. Cole                                        Kevin's Heart               
32 Kendrick Lamar & SZA                           All The Stars               
33 Nicki Minaj                                    Chun-Li                     
34 Lil Pump                                       Esskeetit                   
35 Migos                                          Stir Fry                    
36 Famous Dex                                     Japan                       
37 Post Malone                                    Sugar Wraith                
38 Cardi B Featuring Migos                        Drip                        
39 XXXTENTACION                                   Sad!                        
40 Jay Rock| Kendrick Lamar| Future & James Blake King's Dead                 
41 Rich The Kid Featuring Kendrick Lamar          New Freezer                 
42 Logic & Marshmello                             Everyday                    
43 J. Cole                                        Motiv8                      
44 YoungBoy Never Broke Again                     Outside Today               
45 Post Malone                                    Jonestown (Interlude)       
46 Cardi B Featuring 21 Savage                    Bartier Cardi               
47 YoungBoy Never Broke Again                     Overdose                    
48 J. Cole                                        1985 (Intro To The Fall Off)
49 J. Cole                                        Photograph                  
50 Khalid| Ty Dolla $ign & 6LACK                  OTW
1  1                                              2
2  1                                              6
3  1                                              17
4  2                                              12
5  3                                              14
10 6                                              8
...

Любая помощь при размещении данных впомогает правая колонка!

1 Ответ

0 голосов
/ 26 мая 2018

Ваш код излишне запутан и его очень трудно читать.Вам вообще не нужно было создавать два контейнера, потому что одного контейнера достаточно для получения необходимых данных.Попробуйте следующий способ и найдите CSV с данными, заполненными соответственно:

import requests, csv
from bs4 import BeautifulSoup

url = 'https://www.billboard.com/charts/r-b-hip-hop-songs'

with open('Billboard_Hip_Hop_Charts.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerow(['Billboard Number','Artist Name','Song Title','Last Week Number','peak_position','weeks_on_chart'])

    res = requests.get(url)
    soup = BeautifulSoup(res.text, "html.parser")

    for container in soup.find_all("article",class_="chart-row"):

        billboard_number = container.find(class_="chart-row__current-week").text

        artist_name_a_tag = container.find(class_="chart-row__artist").text.strip()

        song_title = container.find(class_="chart-row__song").text

        last_week_number_tag = container.find(class_="chart-row__value")
        last_week_number = last_week_number_tag.text

        peak_position_tag = last_week_number_tag.find_parent().find_next_sibling().find(class_="chart-row__value")
        peak_position = peak_position_tag.text

        weeks_on_chart_tag = peak_position_tag.find_parent().find_next_sibling().find(class_="chart-row__value").text

        print(billboard_number,artist_name_a_tag,song_title,last_week_number,peak_position,weeks_on_chart_tag)
        writer.writerow([billboard_number,artist_name_a_tag,song_title,last_week_number,peak_position,weeks_on_chart_tag])

Вывод будет выглядеть так:

1 Childish Gambino This Is America 1 1 2
2 Drake Nice For What 2 1 6
3 Drake God's Plan 3 1 17
4 Post Malone Featuring Ty Dolla $ign Psycho 4 2 12
5 BlocBoy JB Featuring Drake Look Alive 5 3 14
6 Ella Mai Boo'd Up 10 6 8
...