Question

Это моя таблица HTML.

<table class="table_c" id="myd">
<tbody>
    <tr class="grp">
        <th class="col>MyGrp1</th>
    </tr>
    <tr class="item">
        <th class="col label" scope="row">Item0.1 Header</th>
        <td class="col data" data-th="MyGrp1">Item0.1 Value</td>
    </tr>
    <tr class="grp">
        <th class="col label" colspan="2" scope="row">MyGrp</th>
    </tr>
    <tr class="item">
        <th class="col label" scope="row">Item1.1 Header</th>
        <td class="col data" >Item1.1 Value</td>
    </tr>
    <tr class="item">
        <th class="col label" scope="row">Item1.2 Header</th>
        <td class="col data">Item1.2 Value</td>
    </tr>
    <tr class="item">
    <th class="col label" scope="row">Item1.3 Header</th>
    <td class="col data"">Item1.2 Value</td>
    </tr>
</tbody>
</table>

Я хочу, чтобы таблица была проанализирована, как показано ниже

MyGrp1<new line>
<tab char>Item0.1 Header<tab char>Item0.1 Value<new line>
MyGrp2<new line>
<tab char>Item1.1 Header<tab char>Item1.1 Value<new line>
<tab char>Item1.2 Header<tab char>Item1.2 Value<new line>
<tab char>Item1.3 Header<tab char>Item1.3 Value<new line>

Я могу получить все узлы 'tr' или 'th'. Но я не знаю, как перебирать таблицу по узлам. Как я могу очистить таблицу Html и получить мой результат выше?

Chillie · Answer 1 · 05 сентября 2018

Но я не знаю, как перебирать таблицу по узлам.

BeautifulSoup * find_all предоставляет вам последовательность объектов тегов, которые вы можете циклически просматривать.

Также обратите внимание, что в вашей html-таблице есть проблемы с синтаксисом: <th class="col>MyGrp1</th> - пропущенная цитата <td class="col data"">Item1.2 Value</td> - двойные кавычки

Итак, при условии, что sample является вашей html-таблицей в виде строчки, и она имеет действительный html, вот пример того, что вы можете сделать:

from bs4 import BeautifulSoup as bs

soup = bs(sample, 'lxml-html')
trs = soup.find_all('tr')
group = None # in case there are items before the first group
for tr in trs:
    if 'grp' in tr.get('class'):
        print(tr.th.text)
    elif 'item' in tr.get('class'):
        label = tr.th.text
        value = tr.td.text
        print('{} {}'.format(label, value))

Smith Dwayne · Answer 2 · 05 сентября 2018

Я сделал следующее, чтобы получить ответ. Я даю свое решение здесь. Пожалуйста, поправьте меня, если я ошибаюсь.

result = ""
for tr in table_t.findAll('tr'):
    if 'grp' in tr.get("class"):
        for th in tr.findAll('th'):
            result += "\n" + th.text.strip()
            #print(th.text.strip())
    elif 'item' in tr.get("class"):
        children_th = tr.find("th")
        children_td = tr.find("td")
        result += "\n\t" + children_th.text.strip() + "\t" + children_td.text.strip()
print(result)

Albin Paul · Answer 3 · 05 сентября 2018

Я использовал панды для этого

import pandas as pd
import html5lib

string="""<table class="table_c" id="myd">
<tbody>
    <tr class="grp">
        <th class="col">MyGrp1</th>
    </tr>
    <tr class="item">
        <th class="col label" scope="row">Item0.1 Header</th>
        <td class="col data" data-th="MyGrp1">Item0.1 Value</td>
    </tr>
    <tr class="grp">
        <th class="col label" colspan="2" scope="row">MyGrp</th>
    </tr>
    <tr class="item">
        <th class="col label" scope="row">Item1.1 Header</th>
        <td class="col data" >Item1.1 Value</td>
    </tr>
    <tr class="item">
        <th class="col label" scope="row">Item1.2 Header</th>
        <td class="col data">Item1.2 Value</td>
    </tr>
    <tr class="item">
    <th class="col label" scope="row">Item1.3 Header</th>
    <td class="col data"">Item1.2 Value</td>
    </tr>
</tbody>
</table>"""
df = pd.read_html(string)
print(df)

выход

[                0              1
0          MyGrp1            NaN
1  Item0.1 Header  Item0.1 Value
2           MyGrp            NaN
3  Item1.1 Header  Item1.1 Value
4  Item1.2 Header  Item1.2 Value
5  Item1.3 Header  Item1.2 Value]

Python Web Scraping Html Table используя красивый суп

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

Ответы [ 3 ]

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Python Web Scraping Html Table используя красивый суп

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

Ответы [ 3 ]

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Нет похожих вопросов