Как извлечь значения из HTML-страницы, хранящейся в виде строки, используя функцию curl - PullRequest
0 голосов
/ 13 ноября 2010

Я использую PHP / curl, чтобы получить HTML-код в строку, а затем мне нужно извлечь следующие данные и затем спроектировать график из них.

Данные, которые мне нужны, выглядят следующим образом:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <meta name="generator" content=
  "HTML Tidy for Linux (vers 25 March 2009), see www.w3.org" />

  <title></title>
</head>

<body>
  <table>
    <tbody>
      <tr>
        <td>
          <h3>Income</h3>
        </td>
      </tr>

      <tr>
        <td>Operating income</td>

        <td class="numericalColumn">22,922.00</td>

        <td class="numericalColumn">21,507.30</td>

        <td class="numericalColumn">17,492.60</td>

        <td class="numericalColumn">13,683.90</td>

        <td class="numericalColumn">10,227.12</td>
      </tr>

      <tr>
        <td>
          <h3>Expenses</h3>
        </td>
      </tr>

      <tr>
        <td>Material consumed</td>

        <td class="numericalColumn">4,029.40</td>

        <td class="numericalColumn">3,442.60</td>

        <td class="numericalColumn">2,952.30</td>

        <td class="numericalColumn">1,889.00</td>

        <td class="numericalColumn">1,367.67</td>
      </tr>

      <tr>
        <td>Manufacturing expenses&nbsp;</td>

        <td class="numericalColumn">2,213.20</td>

        <td class="numericalColumn">1,841.80</td>

        <td class="numericalColumn">299.80</td>

        <td class="numericalColumn">120.50</td>

        <td class="numericalColumn">1,020.70</td>
      </tr>

      <tr>
        <td>Personnel expenses</td>

        <td class="numericalColumn">9,062.80</td>

        <td class="numericalColumn">9,249.80</td>

        <td class="numericalColumn">7,409.10</td>

        <td class="numericalColumn">5,768.20</td>

        <td class="numericalColumn">4,279.03</td>
      </tr>

      <tr>
        <td>Selling expenses</td>

        <td class="numericalColumn">378.10</td>

        <td class="numericalColumn">308.40</td>

        <td class="numericalColumn">532.10</td>

        <td class="numericalColumn">-</td>

        <td class="numericalColumn">171.05</td>
      </tr>

      <tr>
        <td>Adminstrative expenses</td>

        <td class="numericalColumn">1,737.00</td>

        <td class="numericalColumn">1,906.00</td>

        <td class="numericalColumn">2,583.70</td>

        <td class="numericalColumn">2,651.70</td>

        <td class="numericalColumn">904.78</td>
      </tr>

      <tr>
        <td>Expenses capitalised</td>

        <td class="numericalColumn">-</td>

        <td class="numericalColumn">-</td>

        <td class="numericalColumn">-</td>

        <td class="numericalColumn">-</td>

        <td class="numericalColumn">-</td>
      </tr>

      <tr>
        <td>Cost of sales</td>

        <td class="numericalColumn">17,420.50</td>

        <td class="numericalColumn">16,748.60</td>

        <td class="numericalColumn">13,777.00</td>

        <td class="numericalColumn">10,429.40</td>

        <td class="numericalColumn">7,743.22</td>
      </tr>

      <tr>
        <td>Operating profit</td>

        <td class="numericalColumn">5,501.50</td>

        <td class="numericalColumn">4,758.70</td>

        <td class="numericalColumn">3,715.60</td>

        <td class="numericalColumn">3,254.50</td>

        <td class="numericalColumn">2,483.90</td>
      </tr>

      <tr>
        <td>Other recurring income</td>

        <td class="numericalColumn">434.20</td>

        <td class="numericalColumn">468.20</td>

        <td class="numericalColumn">326.90</td>

        <td class="numericalColumn">288.70</td>

        <td class="numericalColumn">113.59</td>
      </tr>

      <tr>
        <td>Adjusted PBDIT</td>

        <td class="numericalColumn">5,935.70</td>

        <td class="numericalColumn">5,226.90</td>

        <td class="numericalColumn">4,042.50</td>

        <td class="numericalColumn">3,543.20</td>

        <td class="numericalColumn">2,597.49</td>
      </tr>

      <tr>
        <td>Financial expenses</td>

        <td class="numericalColumn">108.40</td>

        <td class="numericalColumn">196.80</td>

        <td class="numericalColumn">116.80</td>

        <td class="numericalColumn">7.20</td>

        <td class="numericalColumn">3.13</td>
      </tr>

      <tr>
        <td>Depreciation&nbsp;</td>

        <td class="numericalColumn">579.60</td>

        <td class="numericalColumn">533.60</td>

        <td class="numericalColumn">456.00</td>

        <td class="numericalColumn">359.80</td>

        <td class="numericalColumn">292.26</td>
      </tr>

      <tr>
        <td>Other write offs</td>

        <td class="numericalColumn">-</td>

        <td class="numericalColumn">-</td>

        <td class="numericalColumn">-</td>

        <td class="numericalColumn">-</td>

        <td class="numericalColumn">-</td>
      </tr>

      <tr>
        <td>Adjusted PBT</td>

        <td class="numericalColumn">5,247.70</td>

        <td class="numericalColumn">4,496.50</td>

        <td class="numericalColumn">3,469.70</td>

        <td class="numericalColumn">3,176.20</td>

        <td class="numericalColumn">2,302.10</td>
      </tr>

      <tr>
        <td>Tax charges&nbsp;</td>

        <td class="numericalColumn">790.80</td>

        <td class="numericalColumn">574.10</td>

        <td class="numericalColumn">406.40</td>

        <td class="numericalColumn">334.10</td>

        <td class="numericalColumn">286.10</td>
      </tr>

      <tr>
        <td>Adjusted PAT</td>

        <td class="numericalColumn">4,456.90</td>

        <td class="numericalColumn">3,922.40</td>

        <td class="numericalColumn">3,063.30</td>

        <td class="numericalColumn">2,842.10</td>

        <td class="numericalColumn">2,016.00</td>
      </tr>

      <tr>
        <td>Non recurring items</td>

        <td class="numericalColumn">441.10</td>

        <td class="numericalColumn">-948.60</td>

        <td class="numericalColumn">-</td>

        <td class="numericalColumn">-</td>

        <td class="numericalColumn">38.33</td>
      </tr>

      <tr>
        <td>Other non cash adjustments</td>

        <td class="numericalColumn">-</td>

        <td class="numericalColumn">-</td>

        <td class="numericalColumn">-</td>

        <td class="numericalColumn">-</td>

        <td class="numericalColumn">-33.85</td>
      </tr>

      <tr>
        <td>Reported net profit</td>

        <td class="numericalColumn">4,898.00</td>

        <td class="numericalColumn">2,973.80</td>

        <td class="numericalColumn">3,063.30</td>

        <td class="numericalColumn">2,842.10</td>

        <td class="numericalColumn">2,020.48</td>
      </tr>

      <tr>
        <td>Earnigs before appropriation</td>

        <td class="numericalColumn">4,898.00</td>

        <td class="numericalColumn">2,973.80</td>

        <td class="numericalColumn">3,063.30</td>

        <td class="numericalColumn">2,842.10</td>

        <td class="numericalColumn">2,020.48</td>
      </tr>

      <tr>
        <td>Equity dividend</td>

        <td class="numericalColumn">880.90</td>

        <td class="numericalColumn">586.00</td>

        <td class="numericalColumn">876.50</td>

        <td class="numericalColumn">873.70</td>

        <td class="numericalColumn">712.88</td>
      </tr>

      <tr>
        <td>Preference dividend</td>

        <td class="numericalColumn">-</td>

        <td class="numericalColumn">-</td>

        <td class="numericalColumn">-</td>

        <td class="numericalColumn">-</td>

        <td class="numericalColumn">-</td>
      </tr>

      <tr>
        <td>Dividend tax</td>

        <td class="numericalColumn">128.30</td>

        <td class="numericalColumn">99.60</td>

        <td class="numericalColumn">148.90</td>

        <td class="numericalColumn">126.80</td>

        <td class="numericalColumn">99.98</td>
      </tr>

      <tr>
        <td>Retained earnings</td>

        <td class="numericalColumn">3,888.80</td>

        <td class="numericalColumn">2,288.20</td>

        <td class="numericalColumn">2,037.90</td>

        <td class="numericalColumn">1,841.60</td>

        <td class="numericalColumn">1,207.62</td>
      </tr>
    </tbody>
  </table>
</body>
</html>

Я хочу извлечь каждое значение, например, Производственные данные и значения всех лет, упомянутых в этой строке.Как мне это сделать?

Я нашел что-то вроде preg_match('#<tr><th>(.*)</th> <td><b>price</b></td></tr>#', $content, $match);, но не получил желаемых значений.

Ответы [ 2 ]

0 голосов
/ 13 ноября 2010

Если я правильно понял ваш вопрос, вы хотите, чтобы что-то вроде было сделано . это было написано мной, поэтому, если вам нужны разъяснения, я бы с удовольствием помог.

ура!

0 голосов
/ 13 ноября 2010

Вы можете использовать библиотеки типа PHP Simple HTML DOM Parser для извлечения данных из HTML / XHTML.http://simplehtmldom.sourceforge.net/manual.htm

Пример:

$pageDom = str_get_html( $rawHtmlData );
foreach( $pageDom->find( 'td' ) as $tblElem )
{
    if( FALSE !== stristr( $tblElem->innertext, 'Manufacturing expenses' ) )
    {
        // Do stuff
    }
}
...