Разбор HTML-файла для Excel Python - PullRequest
0 голосов
/ 19 октября 2019

Я использую BeautifulSoup и пытаюсь проанализировать вывод в Excel.

<div id="MainContent_BuildSheetUpdatePanel">
                <div id="MainContent_BuildSheetPanel">
                    <div class="row">
                        <div class="col-sm-4 mt-2">
                            <div class="card border-primary">
                                <div class="card-header">
                                    <h4 class="card-title text-center">SCHOOL:</h4>
                                </div>
                                <div class="card-body">
                                    <div class="form-group">
                                        <label>Class ID: </label>
                                        <input name="ctl00$MainContent$ClassIdTextBox" type="text" value="250" id="MainContent_IdTextBox" disabled="disabled"  class="aspNetDisabled form-control">
                                            <span id="MainContent_rfvClassIdTextBox" style="color:Red;display:none;">Required</span>
                                    </div>
                                    <div class="form-group">
                                        <label>Profile ID: </label>
                                        <input name="ctl00$MainContent$ProfileIdTextBox" type="text" value="NA" id="MainContent_ServiceIdTextBox" disabled="disabled"  class="aspNetDisabled form-control">
                                        <span id="MainContent_rfvProfileIdTextBox" style="color:Red;display:none;">Required</span> 
                                    </div>
                                    <div class="form-group">
                                        <label>Serial Number: </label>
                                        <input name="ctl00$MainContent$NumberTextBox" type="text" value="763" id="MainContent_NumberTextBox" disabled="disabled"  class="aspNetDisabled form-control">
                                        <span id="MainContent_rfvNumberTextBox" style="color:Red;display:none;">Required</span> 
                                    </div>
                                    <div class="form-group">
                                        <label>MC Number: </label>
                                        <input name="ctl00$MainContent$MCSerialNumberTextBox" type="text" value="290" id="MainContent_SerialNumberTextBox" disabled="disabled"  class="aspNetDisabled form-control">
                                        <span id="MainContent_rfvMCSerialNumberTextBox" style="color:Red;display:none;">Required</span> 
                                    </div>
                                    <div class="form-group">
                                        <label>SK: </label>
                                        <input name="ctl00$MainContent$SkTextBox" type="text" value="384xm" id="MainContent_SkTextBox" disabled="disabled" class="aspNetDisabled form-control">
                                    </div>
                                    <div class="form-group">
                                        <label>Profile: </label>
                                        <input name="ctl00$MainContent$ProfileTextBox" type="text" value="NA" id="MainContent_ProfileTextBox" disabled="disabled"  class="aspNetDisabled form-control">
                                    </div>
                                    <div class="form-group">
                                        <label>Address: </label>
                                        <input name="ctl00$MainContent$AddressTextBox" type="text" value="192.168.56.54" id="MainContent_AddressTextBox" disabled="disabled"  class="aspNetDisabled form-control">
                                    </div>
                                    <div class="form-group">
                                        <label>Dn: </label>
                                        <input name="ctl00$MainContent$DnTextBox" type="text" value="NA" id="MainContent_DnTextBox" disabled="disabled"  class="aspNetDisabled form-control">
                                        <span id="MainContent_rfvoDnTextBox" style="color:Red;display:none;">Required</span> 
                                    </div>
                                    <div class="form-group">
                                        <label>Hostname: </label>
                                        <input name="ctl00$MainContent$PrimaryHostNameTextBox" type="text" value="N/A" id="MainContent_HostNameTextBox" disabled="disabled" class="aspNetDisabled form-control">
                                    </div>
                                    <div class="form-group">
                                        <label>Primary: </label>
                                        <input name="ctl00$MainContent$PrimarySidTextBox" type="text" value="N/A" id="MainContent_SidTextBox" disabled="disabled" class="aspNetDisabled form-control">
                                    </div>
                                    <div class="form-group">
                                        <label>Server: </label>
                                        <input name="ctl00$MainContent$ServerTextBox" type="text" value="sv41" id="MainContent_ServerTextBox" disabled="disabled" class="aspNetDisabled form-control">
                                    </div>
                                    <div class="form-group">
                                        <label>Server-Address: </label>
                                        <input name="ctl00$MainContent$AddressTextBox" type="text" value="10.56.1.41" id="MainContent_AddressTextBox" disabled="disabled" class="aspNetDisabled form-control">
                                         <span id="MainContent_ServerIpTxtRequiredFieldValidator" style="color:Red;display:none;">Required</span>                    
                                    </div>
                                    </div>
                                </div>
                            </div>
                        </div>

Ожидаемый вывод:

Идентификатор класса ID профиля Серийный номер MC Номер SK Адрес профиля Dn Имя хоста Primary 250 NA763 290 384xm NA 192.168.56.54 NA NA NA

from bs4 import BeautifulSoup 
import requests 

html= """Inputfile """ 

for item in soup.select("div.form-group"): print(item.get_text())

1 Ответ

1 голос
/ 20 октября 2019

Требуется атрибут «значение». В зависимости от вашего полного HTML вы можете сократить селекторы.

from bs4 import BeautifulSoup as bs
import csv

soup = bs(your_html, 'lxml')

with open("data.csv", "w", encoding="utf-8-sig", newline='') as csv_file:
    w = csv.writer(csv_file, delimiter = ",", quoting=csv.QUOTE_MINIMAL)
    w.writerow([i.text for i in soup.select('.form-group label')])
    w.writerow([i['value'] for i in soup.select('input.aspNetDisabled')])

Определенные элементы:

soup = bs(your_html, 'lxml')

items = ['"Class ID:"','"Serial Number:"','"Hostname:"']
items = ','.join(items)
nodes = [i['value'] for i in soup.select(f'label:contains({items}) + .aspNetDisabled')]
headers =  [i.text for i in soup.select(f'label:contains({items})')]

with open("data.csv", "w", encoding="utf-8-sig", newline='') as csv_file:
    w = csv.writer(csv_file, delimiter = ",", quoting=csv.QUOTE_MINIMAL)
    w.writerow(headers)
    w.writerow(nodes)
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...