Организация таблицы в Excel, как она отображается в HTML с Python и bs4 - PullRequest
0 голосов
/ 06 мая 2020

Я успешно очистил таблицу, которую хочу отобразить в формате .xlsx

Когда она отображается в браузере, я хочу, чтобы она отображалась в Excel именно так.

Как это должен отображаться:

A1 = 1.

B1 = Подготовьтесь к оказанию помощи в действиях и действиях, связанных с реагированием на инцидент

C1 = 1,1

D1 = Определить должностных лиц и законодательные требования WHS для реагирования на инциденты

A2 = Пусто

B2 = Пусто

C2 = 1,2

D2 = Определить политику рабочего места, процедуры и процессы, касающиеся планирования реагирования на инциденты и отчетности

Код, который у меня есть, ниже, за ним следует HTML, который я скопировал.

for i in Elements.findAll('tr'):
    columns = i.findAll('td')
    output_row = []
    for column in columns:
        sub_rows = column.findAll('p')
        for row in sub_rows:
            output_row.append(row.get_text(separator=' '))
    Element_rows.append(output_row)

-----------------------------------------------------------------

<table class="ait-table" width="943">
<tr>
<td style="border:1px solid ;;vertical-align: top;" width="299">
<p class="ait4"><strong class="ait24">ELEMENTS</strong>�</p>
</td>
<td style="border:1px solid ;;vertical-align: top;" width="766">
<p class="ait4"><strong class="ait24">PERFORMANCE CRITERIA</strong>�</p>
</td>
</tr>
<tr>
<td style="border:1px solid ;;vertical-align: top;" width="299">
<p class="ait4"><em class="ait7">Elements describe the essential outcomes.</em></p>
</td>
<td style="border:1px solid ;;vertical-align: top;" width="766">
<p class="ait4"><em class="ait7">Performance criteria describe the performance needed to demonstrate achievement of the element.</em></p>
</td>
</tr>
<tr>
<td style="border:1px solid #333333;;vertical-align: top;" width="299">
<p class="ait4">1. Prepare to assist with actions and activities associated with incident response</p>
</td>
<td style="border:1px solid #333333;;vertical-align: top;" width="766">
<p class="ait4">1.1 Identify duty holders and WHS legislative requirements for incident response</p>
<p class="ait4">1.2 Identify workplace policies, procedures and processes concerning incident response planning and reporting</p>
<p class="ait4">1.3 Communicate requirements for responding to incident to required personnel within scope of own role and work area</p>
<p class="ait4">1.4 Contribute to developing communication mechanisms to notify manager of incident</p>
</td>
</tr>
<tr>
<td style="border:1px solid #333333;;vertical-align: top;" width="299">
<p class="ait4">2. Assist with implementing response procedures during incident</p>
</td>
<td style="border:1px solid #333333;;vertical-align: top;" width="766">
<p class="ait4">2.1 Provide initial assistance to those involved in incident within scope of own role and expertise and according to organisational incident response policies and procedures</p>
<p class="ait4">2.2 Assist with documenting incident according to workplace procedures and processes</p>
<p class="ait4">2.3 Assist with meeting legislative requirements regarding incident, within scope of own role and expertise</p>
<p class="ait4">2.4 Assist with reporting incident to external authorities, according to legislative requirements and workplace procedures and processes </p>
</td>
</tr>
<tr>
<td style="border:1px solid #333333;;vertical-align: top;" width="299">
<p class="ait4">3. Contribute to collecting WHS information about incident</p>
</td>
<td style="border:1px solid #333333;;vertical-align: top;" width="766">
<p class="ait4">3.1 Assist with obtaining information and data from those involved about actions and events leading up to, during and after an incident, using appropriate data collection techniques</p>
<p class="ait4">3.2 Assist with identifying and accessing sources of additional information and data related to incident</p>
<p class="ait4">3.3 Compile and enter information according to record-keeping requirements</p>
</td>
</tr>
<tr>
<td style="border:1px solid #333333;;vertical-align: top;" width="299">
<p class="ait4">4. Assist with incident investigation</p>
</td>
<td style="border:1px solid #333333;;vertical-align: top;" width="766">
<p class="ait4">4.1 Assist with applying required incident investigation processes</p>
<p class="ait4">4.2 Use appropriate analysis techniques to interpret causes of incident and communicate with advisors when participating in workplace investigations</p>
<p class="ait4">4.3 Review incident reports according to organisational policies and procedures</p>
<p class="ait4">4.4 Contact responsible persons and relevant authorities as outlined in WHS laws, and organisational policies and procedures</p>
<p class="ait4">4.5 Contribute to communicating investigation outcomes to relevant stakeholders according to organisational policies and procedures</p>
</td>
</tr>
<tr>
<td style="border:1px solid #333333;;vertical-align: top;" width="299">
<p class="ait4">5. Contribute to developing and implementing recommended measures and actions arising from incident investigation</p>
</td>
<td style="border:1px solid #333333;;vertical-align: top;" width="766">
<p class="ait4">5.1 Contribute to developing incident investigation recommendations </p>
<p class="ait4">5.2 Assist with obtaining approval of developed recommendations from required stakeholders according to organisational policies and procedures</p>
<p class="ait4">5.3 Assist with communicating approved recommendations to required stakeholders according to organisational policies and procedures</p>
<p class="ait4">5.4 Contribute to implementing recommended measures and actions arising from incident investigation within scope of own role and according to WHS legislative requirements</p>
</td>
</tr>
</table>

1 Ответ

1 голос
/ 06 мая 2020

В этом примере используются re и itertools.zip_longest для получения требуемых значений и модуль csv для записи файла (html_data - это фрагмент кода из вашего вопроса):

import re
import csv
from bs4 import BeautifulSoup
from itertools import zip_longest

soup = BeautifulSoup(html_data, 'html.parser')
tds = soup.select('td')

with open('data.csv', 'w', newline='') as csvfile:
    writer = csv.writer(csvfile, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)

    for td1, td2 in zip(tds[::2], tds[1::2]):
        cell_ab = re.findall(r'(\d+.)\s*(.*)', td1.text)
        if not cell_ab:
            continue
        cell_cd = re.findall(r'(\d+.\d+)\s*(.*)', td2.text)

        for (a, b), (c, d) in zip_longest(cell_ab, cell_cd, fillvalue=(None, None)):
            writer.writerow([a, b, c, d])

результат - файл data.csv (снимок экрана из моего LibreOffice Cal c):

enter image description here

...