Как очистить данные в строке таблицы html (xml) и значения ее дочерних элементов, используя xpath на основе текстовой строки? - PullRequest
0 голосов
/ 01 июля 2018

Вот HTML-код, который я пытаюсь очистить с помощью Xpath:

<table class="ClassGrid" cellspacing="0" cellpadding="0" border="0" id="_ctl0_phMainContent_dgrdClasses" style="border-collapse:collapse;">
<tbody>
    <tr>
        <td class="ClassGridRow1" colspan="3">
            <hr>
        </td>
    </tr>
    <tr>
        <td class="ClassGridRow1">
            <div class="ClassGridBox1">Address 123
                <br>
                <br><a target="_blank" class="gridDirections" href="/Classes/Directions.aspx#104">Directions</a></div>
        </td>
        <td class="ClassGridRow2">
          <div class="ClassGridBox2">12/12/2018</div>
        </td>
        <td class="ClassGridRow3">
            <div class="ClassGridBox3"><a id="_ctl0_phMainContent_dgrdClasses__ctl3_hplAddToCart" class="whitelight" href="/validate.aspx?ClassID1=4233&amp;ClassID2=0">Book Now</a></div>
        </td>
    </tr>
    <tr>
        <td class="ClassGridRow1">
            <div class="ClassGridBox1"></div>
        </td>
        <td class="ClassGridRow2">
          <div class="ClassGridBox2">1/24/2019</div>
        </td>
        <td class="ClassGridRow3">
            <div class="ClassGridBox3"><a id="_ctl0_phMainContent_dgrdClasses__ctl4_hplAddToCart" class="whitelight" href="/validate.aspx?ClassID1=4306&amp;ClassID2=0">Book Now</a></div>
        </td>
    </tr>
    <tr>
        <td class="ClassGridRow1">
            <div class="ClassGridBox1"></div>
        </td>
        <td class="ClassGridRow2">
            <div class="ClassGridBox2">Saturday, August 4</div>
        </td>
        <td class="ClassGridRow3">
            <div class="ClassGridBoxNone"></div>
        </td>
    </tr>
    <tr>
        <td class="ClassGridRow1">
            <div class="ClassGridBox1"></div>
        </td>
        <td class="ClassGridRow2">
            <div class="ClassGridBox2">Saturday, August 18</div>
        </td>
        <td class="ClassGridRow3">
            <div class="ClassGridBox3"><a id="_ctl0_phMainContent_dgrdClasses__ctl6_hplAddToCart" class="whitelight" href="/validate.aspx?ClassID1=4346&amp;ClassID2=0">Book Now</a></div>
        </td>
    </tr>
    <tr>
        <td class="ClassGridRow1">
            <div class="ClassGridBox1"></div>
        </td>
        <td class="ClassGridRow2">
            <div class="ClassGridBox2">Thursday, August 30</div>
        </td>
        <td class="ClassGridRow3">
            <div class="ClassGridBox3"><a id="_ctl0_phMainContent_dgrdClasses__ctl7_hplAddToCart" class="whitelight" href="/validate.aspx?ClassID1=4313&amp;ClassID2=0">Book Now</a></div>
        </td>
    </tr>
    <tr>
        <td class="ClassGridRow1">
            <div class="ClassGridBox1"></div>
        </td>
        <td class="ClassGridRow2">
            <div class="ClassGridBox2">Saturday, September 8</div>
        </td>
        <td class="ClassGridRow3">
            <div class="ClassGridBox3"><a id="_ctl0_phMainContent_dgrdClasses__ctl8_hplAddToCart" class="whitelight" href="/validate.aspx?ClassID1=4330&amp;ClassID2=0">Book Now</a></div>
        </td>
    </tr>
    <tr>
        <td class="ClassGridRow1">
            <div class="ClassGridBox1"></div>
        </td>
        <td class="ClassGridRow2">
            <div class="ClassGridBox2">Tuesday, September 18</div>
        </td>
        <td class="ClassGridRow3">
            <div class="ClassGridBox3"><a id="_ctl0_phMainContent_dgrdClasses__ctl9_hplAddToCart" class="whitelight" href="/validate.aspx?ClassID1=4331&amp;ClassID2=0">Book Now</a></div>
        </td>
    </tr>
    <tr>
        <td class="ClassGridRow1" colspan="3">
            <hr>
        </td>
    </tr>
    <tr>
        <td class="ClassGridRow1">
            <div class="ClassGridBox1">Address 0000
                <br><a target="_blank" class="gridDirections" href="/Classes/Directions.aspx#190">Directions</a></div>
        </td>
        <td class="ClassGridRow2">
            <div class="ClassGridBox2">Saturday, July 21</div>
        </td>
        <td class="ClassGridRow3">
            <div class="ClassGridBox3"><a id="_ctl0_phMainContent_dgrdClasses__ctl11_hplAddToCart" class="whitelight" href="/validate.aspx?ClassID1=4242&amp;ClassID2=0">Book Now</a></div>
        </td>
    </tr>
    <tr>
        <td class="ClassGridRow1">
            <div class="ClassGridBox1"></div>
        </td>
        <td class="ClassGridRow2">
            <div class="ClassGridBox2">Tuesday, August 28</div>
        </td>
        <td class="ClassGridRow3">
            <div class="ClassGridBox3"><a id="_ctl0_phMainContent_dgrdClasses__ctl12_hplAddToCart" class="whitelight" href="/validate.aspx?ClassID1=4243&amp;ClassID2=0">Book Now</a></div>
        </td>
    </tr>
    <tr>
        <td class="ClassGridRow1">
            <div class="ClassGridBox1"></div>
        </td>
        <td class="ClassGridRow2">
            <div class="ClassGridBox2">Tuesday, September 25</div>
        </td>
        <td class="ClassGridRow3">
            <div class="ClassGridBox3"><a id="_ctl0_phMainContent_dgrdClasses__ctl13_hplAddToCart" class="whitelight" href="/validate.aspx?ClassID1=4271&amp;ClassID2=0">Book Now</a></div>
        </td>
    </tr>
    <tr>
        <td class="ClassGridRow1" colspan="3">
            <hr>
        </td>
    </tr>
    <tr>
        <td class="ClassGridRow1">
            <div class="ClassGridBox1">Address 456
                <br><a target="_blank" class="gridDirections" href="/Classes/Directions.aspx#69">Directions</a></div>
        </td>
        <td class="ClassGridRow2">
            <div class="ClassGridBox2">Wednesday, August 1</div>
        </td>
        <td class="ClassGridRow3">
            <div class="ClassGridBox3"><a id="_ctl0_phMainContent_dgrdClasses__ctl15_hplAddToCart" class="whitelight" href="/validate.aspx?ClassID1=4276&amp;ClassID2=0">Book Now</a></div>
        </td>
    </tr>
    <tr>
        <td class="ClassGridRow1">
            <div class="ClassGridBox1"></div>
        </td>
        <td class="ClassGridRow2">
            <div class="ClassGridBox2">Saturday, August 25</div>
        </td>
        <td class="ClassGridRow3">
            <div class="ClassGridBox3"><a id="_ctl0_phMainContent_dgrdClasses__ctl16_hplAddToCart" class="whitelight" href="/validate.aspx?ClassID1=4277&amp;ClassID2=0">Book Now</a></div>
        </td>
    </tr>
    <tr>
        <td class="ClassGridRow1">
            <div class="ClassGridBox1"></div>
        </td>
        <td class="ClassGridRow2">
            <div class="ClassGridBox2">Thursday, September 13</div>
        </td>
        <td class="ClassGridRow3">
            <div class="ClassGridBox3"><a id="_ctl0_phMainContent_dgrdClasses__ctl17_hplAddToCart" class="whitelight" href="/validate.aspx?ClassID1=4348&amp;ClassID2=0">Book Now</a></div>
        </td>
    </tr>
    <tr>
        <td class="ClassGridRow1">
            <div class="ClassGridBox1"></div>
        </td>
        <td class="ClassGridRow2">
            <div class="ClassGridBox2">Saturday, October 6</div>
        </td>
        <td class="ClassGridRow3">
            <div class="ClassGridBox3"><a id="_ctl0_phMainContent_dgrdClasses__ctl18_hplAddToCart" class="whitelight" href="/validate.aspx?ClassID1=4278&amp;ClassID2=0">Book Now</a></div>
        </td>
    </tr>
    <tr>
        <td class="ClassGridRow1">
            <div class="ClassGridBox1"></div>
        </td>
        <td class="ClassGridRow2">
            <div class="ClassGridBox2">Wednesday, October 31</div>
        </td>
        <td class="ClassGridRow3">
            <div class="ClassGridBox3"><a id="_ctl0_phMainContent_dgrdClasses__ctl19_hplAddToCart" class="whitelight" href="/validate.aspx?ClassID1=4279&amp;ClassID2=0">Book Now</a></div>
        </td>
    </tr>
    <tr>
        <td class="ClassGridRow1">
            <div class="ClassGridBox1"></div>
        </td>
        <td class="ClassGridRow2">
            <div class="ClassGridBox2">Saturday, November 17</div>
        </td>
        <td class="ClassGridRow3">
            <div class="ClassGridBox3"><a id="_ctl0_phMainContent_dgrdClasses__ctl20_hplAddToCart" class="whitelight" href="/validate.aspx?ClassID1=4280&amp;ClassID2=0">Book Now</a></div>
        </td>
    </tr>
    <tr>
        <td class="ClassGridRow1" colspan="3">
            <hr>
        </td>
    </tr>
    <tr>
        <td class="ClassGridRow1">
            <div class="ClassGridBox1">Address 789
                <br><a target="_blank" class="gridDirections" href="/Classes/Directions.aspx#223">Directions</a></div>
        </td>
        <td class="ClassGridRow2">
            <div class="ClassGridBox2">Saturday, August 4</div>
        </td>
        <td class="ClassGridRow3">
            <div class="ClassGridBox3"><a id="_ctl0_phMainContent_dgrdClasses__ctl22_hplAddToCart" class="whitelight" href="/validate.aspx?ClassID1=4347&amp;ClassID2=0">Book Now</a></div>
        </td>
    </tr>
    <tr>
        <td class="ClassGridRow1">
            <div class="ClassGridBox1"></div>
        </td>
        <td class="ClassGridRow2">
            <div class="ClassGridBox2">Saturday, August 18</div>
        </td>
        <td class="ClassGridRow3">
            <div class="ClassGridBox3"><a id="_ctl0_phMainContent_dgrdClasses__ctl23_hplAddToCart" class="whitelight" href="/validate.aspx?ClassID1=4305&amp;ClassID2=0">Book Now</a></div>
        </td>
    </tr>
    <tr>
        <td class="ClassGridRow1">
            <div class="ClassGridBox1"></div>
        </td>
        <td class="ClassGridRow2">
            <div class="ClassGridBox2">Thursday, September 20</div>
        </td>
        <td class="ClassGridRow3">
            <div class="ClassGridBox3"><a id="_ctl0_phMainContent_dgrdClasses__ctl24_hplAddToCart" class="whitelight" href="/validate.aspx?ClassID1=4332&amp;ClassID2=0">Book Now</a></div>
        </td>
    </tr>
    <tr>
        <td class="ClassGridRow1" colspan="3">
            <hr>
        </td>
    </tr>
</tbody>

И я пытаюсь вернуть значение ClassGridRow1 , ClassGridRow2 и ClassGridBox3 , если ClassGridRow1 содержит текстовую строку

"Адрес 123"

например. Пока что мне не повезло получить что-либо, кроме содержимого узла контекста. Кто-нибудь может помочь? Самый ценный!

1 Ответ

0 голосов
/ 03 июля 2018

Если у вас есть все функции XPath, вы можете выбрать узел <div class="ClassGridBox1"> и пометить text() с помощью регулярное выражение fn:replace:

//tbody/tr/td/div[@class="ClassGridBox1"]/[replace(text(),'(^[a-zA-Z.-]+ [0-9]+).*','$1', 's')]

Демо

Или немного расслаблен после некоторой обработки текстового сообщения:

//tbody/tr/td/div[@class="ClassGridBox1"]/text()
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...