Используйте DomDocument
и DOMXPath
для анализа html, полученного из данного URL:
function outputMetaTags($url){
// $url = 'https://www.myntra.com/casual-shoes/kook-n-keech/kook-n-keech-men-white-sneakers/2154180/buy';
$streamContext = stream_context_create(array(
"http" => array(
"header" => "User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36",
'follow_location' => false
)
)
); //we try to act as browser, just in case server forbids us to access to page
$htmlData = file_get_contents($url, false, $streamContext); //fetch the html data from given url
//libxml_use_internal_errors(true); //optionally disable libxml url errors and warnings
$doc = new DOMDocument(); //parse with DOMDocument
$doc->loadHTML($htmlData);
$xpath = new DOMXPath($doc); //create DOMXPath object and parse loaded DOM from HTML
$query = '//*/meta';
$metaData = $xpath->query($query);
foreach ($metaData as $singleMeta) {
//for og:image, check if $singleMeta->getAttribute('property') === 'og:image', same goes with og:url
//not every meta has property or name attribute
if(!empty($singleMeta->getAttribute('property'))){
echo $singleMeta->getAttribute('property') . "\n";
}elseif(!empty($singleMeta->getAttribute('name'))){
echo $singleMeta->getAttribute('name') . "\n";
}
//get content from meta tag
echo $singleMeta->getAttribute('content') . "\n";
}
}
Подробнее о DOMDocument и DOMXpath:
http://php.net/manual/en/class.domdocument.php
http://php.net/manual/en/class.domxpath.php
О метатегах:
https://developer.mozilla.org/en-US/docs/Web/HTML/Element/meta