Разбор WikiPedia Введение PHP - PullRequest
2 голосов
/ 03 мая 2011

Я прочитал другие вопросы на этом сайте - используя приведенный здесь пример ответа -

API Википедии: только анализируемое введение

Я дошел до стадии, когда я получаю первый раздел статьи в Википедии. Но первый раздел включает в себя картинки, а также текст. Все, что я хочу, это текст. вот вывод html из моего ответа cURL

 $ Array
(
[parse] => Array
    (
        [text] => Array
            (
                [*] => <div class="dablink">This article is about sports known as    football.  For the ball used in these sports, see <a href="/wiki/Football_(ball)">Football  (ball)</a>.</div> 
   <div class="thumb tright"> 
   <div class="thumbinner" style="width:227px;"><a href="/wiki/File:Football4.png"   class="image"><img alt=""    src="http://upload.wikimedia.org/wikipedia/commons/thumb/d/d2/Football4.png/225px-   Football4.png" width="225" height="274" class="thumbimage" /></a> 
   <div class="thumbcaption"> 
   <div class="magnify"><a href="/wiki/File:Football4.png" class="internal"  title="Enlarge"><img src="http://bits.wikimedia.org/skins-1.17/common/images/magnify- clip.png" width="15" height="11" alt="" /></a></div> 
   Some of the many different games known as football. From top left to bottom right:      <a href="/wiki/Association_football">Association football</a> or soccer, <a   href="/wiki/Australian_rules_football">Australian rules football</a>, <a  href="/wiki/International_rules_football">International rules football</a>, <a  href="/wiki/Rugby_Union" class="mw-redirect" title="Rugby Union">Rugby Union</a>, <a  href="/wiki/Rugby_League" class="mw-redirect" title="Rugby League">Rugby League</a>, and <a  href="/wiki/American_Football" class="mw-redirect" title="American Football">American   Football</a>.</div> 
  </div> 
  </div> 
  <p>The game of <b>football</b> is any of several similar <a href="/wiki/Team_sport"  title="Team sport">team sports</a>, of similar origins which involve advancing a ball into   a goal area in an attempt to score. Many of these involve <a href="/wiki/Kick_(football)"  title="Kick (football)">kicking</a> a ball with the foot to score a <a  href="/wiki/Goal_(sport)" title="Goal (sport)">goal</a>, though not all codes of football  using kicking as a primary means of advancing the ball or scoring. The most popular of these sports worldwide is <a href="/wiki/Association_football">association football</a>,   more commonly known as just "football" or "soccer". Unqualified, the word <i><a  href="/wiki/Football_(word)" title="Football (word)">football</a></i> applies to whichever  form of football is the most popular in the regional context in which the word appears,  including <a href="/wiki/American_football">American football</a>, <a href="/wiki/Australian_rules_football">Australian rules football</a>, <a  href="/wiki/Canadian_football">Canadian football</a>, <a  href="/wiki/Gaelic_football">Gaelic football</a>, <a href="/wiki/Rugby_league">rugby  league</a>, <a href="/wiki/Rugby_union">rugby union</a> and other related games. These variations are known as "codes".</p> 
    <div class="toclimit-3"></div> 

Код, который я на самом деле хочу, находится в тегах абзаца, если это нужно? (начинается со слов - «игра»

Вот моя URL-ссылка, которая захватывает данные в php:

 'http://en.wikipedia.org/w/api.php?action=parse&page='.$search.'&redirects=1&format=json&prop=text&section=0'

Пример кода, который я пробовал -

 <?php

 include_once('simple_html_dom.php');

 $html = file_get_html('http://amazon.co.uk/');

 foreach($html->find('p') as $element)   
 {
 echo $element->plaintext . '<br>';
 }

 ?>

Это, к сожалению, возвращает пустую страницу

1 Ответ

1 голос
/ 03 мая 2011

Просто скачайте Простой HTML DOM-парсер

И затем используйте это:

include_once('simple_html_dom.php');

$html = file_get_html('http://en.wikipedia.org/wiki/Football');

foreach($html->find('p') as $element)   
{
    echo $element->plaintext . '<br>';
    break;
}
...