Извлечение информации из Whois с помощью RegExp - PullRequest
1 голос
/ 22 марта 2020

Как я могу извлечь несколько сегментов из результата поиска Whois?

Я получаю массив, который приводит к поиску Whois (из foreach l oop).

Итак например, если я хочу все от строки «домен ....» до «>>> Последнее обновление» базы данных WHOIS: -line. Как мне это сделать?

Whois выполняется с помощью команды exe c:

foreach ($query as $domain) {               
            $scanUrl = 'whois '.$domain->url;
            exec($scanUrl, $output);             
    }

Whois работает без проблем, и я могу получить созданный, срок действия и регистраторы с preg_grep:

    $domainCreated  = preg_grep('/created/', $output);
    $domainExpires  = preg_grep('/expires/', $output);
    $domainRegistrar  = preg_grep('/registrar..........:/', $output);

Но мне нужно получить несколько частей из массива, например, из строки domain .... в >>> Последнее обновление базы данных WHOIS: -line.

Все результаты Whois находятся в одном массиве. Результат Whois выглядит следующим образом:

Array
(
[0] =>
[1] => domain.............: iltalehti.fi
[2] => status.............: Registered
[3] => created............: 1.1.1991 00:00:00
[4] => expires............: 31.8.2022 00:00:00
[5] => available..........: 30.9.2022 00:00:00
[6] => modified...........: 6.9.2017
[7] => holder transfer....: 13.7.2013
[8] => RegistryLock.......: no
[9] =>
[10] => Nameservers
[11] =>
[12] => nserver............: a.ns-sec.com [Technical Error]
[13] => nserver............: d.ns-sec.org [OK]
[14] => nserver............: c.ns-sec.fi [178.217.128.53] 
[2001:67c:224:53::53:1] [OK]
[15] => nserver............: b.ns-sec.net [OK]
[16] =>
[17] => DNSSEC
[18] =>
[19] => dnssec.............: no
[20] =>
[21] => Holder
[22] =>
[23] => name...............: Alma Media Oyj
[24] => register number....: 1944757-4
[25] => address............: PL 140
[26] => address............: 00101
[27] => address............: Helsinki
[28] => country............: Finland
[29] => phone..............: +358 10 665 000
[30] => holder email.......:
[31] =>
[32] => Registrar
[33] =>
[34] => registrar..........: Cybercom Finland Oy
[35] => www................: www.cybercom.com
[36] =>
[37] => >>> Last update of WHOIS database: 24.3.2020 12:45:05 (EET) <<<
[38] =>
[39] =>
[40] => Copyright (c) Finnish Transport and Communications Agency Traficom
[41] =>
[42] =>
[43] => domain.............: yle.fi
[44] => status.............: Registered
[45] => created............: 1.1.1991 00:00:00
[46] => expires............: 31.8.2020 00:00:00
[47] => available..........: 30.9.2020 00:00:00
[48] => modified...........: 16.1.2018
[49] => RegistryLock.......: no
[50] =>
[51] => Nameservers
[52] =>
[53] => nserver............: ns-997.awsdns-60.net [OK]
[54] => nserver............: ns-1394.awsdns-46.org [OK]
[55] => nserver............: ns-1882.awsdns-43.co.uk [OK]
[56] => nserver............: ns-76.awsdns-09.com [OK]
[57] =>
[58] => DNSSEC
[59] =>
[60] => dnssec.............: no
[61] =>
[62] => Holder
[63] =>
[64] => name...............: Yleisradio Oy
[65] => register number....: 0215438-8
[66] => address............: Radiokatu 5
[67] => address............: 00024
[68] => address............: Yleisradio
[69] => country............: Finland
[70] => phone..............: +358914801
[71] => holder email.......:
[72] =>
[73] => Registrar
[74] =>
[75] => registrar..........: Yleisradio Oy
[76] =>
[77] => >>> Last update of WHOIS database: 24.3.2020 12:45:12 (EET) <<<
[78] =>
[79] =>
[80] => Copyright (c) Finnish Transport and Communications Agency Traficom
[81] =>
[82] =>
[83] => domain.............: is.fi
[84] => status.............: Registered
[85] => created............: 12.9.2016 10:01:17
[86] => expires............: 12.9.2020 10:01:17
[87] => available..........: 12.10.2020 10:01:17
[88] => modified...........: 17.9.2017
[89] => holder transfer....: 3.2.2017
[90] => RegistryLock.......: no
[91] =>
[92] => Nameservers
[93] =>
[94] => nserver............: ns-2017.awsdns-60.co.uk [OK]
[95] => nserver............: ns-824.awsdns-39.net [OK]
[96] => nserver............: ns-111.awsdns-13.com [OK]
[97] => nserver............: ns-1159.awsdns-16.org [OK]
[98] =>
[99] => DNSSEC
[100] =>
[101] => dnssec.............: no
[102] =>
[103] => Holder
[104] =>
[105] => name...............: Sanoma Media Finland Oy
[106] => register number....: 1515901-4
[107] => address............: Töölönlahdenkatu 2
[108] => address............: 00100
[109] => address............: Helsinki
[110] => country............: Finland
[111] => phone..............: +35891221
[112] => holder email.......:
[113] =>
[114] => Registrar
[115] =>
[116] => registrar..........: Sanoma Oyj
[117] =>
[118] => >>> Last update of WHOIS database: 24.3.2020 12:46:59 (EET) <<<
[119] =>
[120] =>
[121] => Copyright (c) Finnish Transport and Communications Agency Traficom
[122] =>
[123] =>
[124] => domain.............: hs.fi
[125] => status.............: Registered
[126] => created............: 10.7.2009 00:00:00
[127] => expires............: 14.7.2020 11:17:58
[128] => available..........: 14.8.2020 11:17:58
[129] => modified...........: 7.9.2017
[130] => RegistryLock.......: no
[131] =>
[132] => Nameservers
[133] =>
[134] => nserver............: ns-83.awsdns-10.com [OK]
[135] => nserver............: ns-1635.awsdns-12.co.uk [OK]
[136] => nserver............: ns-1461.awsdns-54.org [OK]
[137] => nserver............: ns-678.awsdns-20.net [OK]
[138] =>
[139] => DNSSEC
[140] =>
[141] => dnssec.............: no
[142] =>
[143] => Holder
[144] =>
[145] => name...............: Sanoma Media Finland Oy / Helsingin Sanomat
[146] => register number....: 1515901-4
[147] => address............: Töölönlahdenkatu 2
[148] => address............: 00100
[149] => address............: Helsinki
[150] => country............: Finland
[151] => phone..............: +35891221
[152] => holder email.......:
[153] =>
[154] => Registrar
[155] =>
[156] => registrar..........: Sanoma Oyj
[157] =>
[158] => >>> Last update of WHOIS database: 24.3.2020 12:45:20 (EET) <<<
[159] =>
[160] =>
[161] => Copyright (c) Finnish Transport and Communications Agency Traficom
[162] =>
)

Я пробовал что-то вроде:

$domainRawScan = preg_grep('/\bdomain\b.*\b>>> Last update of WHOIS database:\b/', $output);

Но я очень плохо знаком с использованием RegExp и нахожу синтаксис довольно запутанным. Любая помощь будет оценена.

1 Ответ

0 голосов
/ 22 марта 2020

Один из способов - взять массив $output, возвращенный командой exec, и превратить его обратно в одну строку:

$text = implode("\n", $output)

Затем использовать preg_match_all, чтобы получить все ключевые слова. и значения

preg_match_all('/^(.*?)\\.*: (.+)/m', $text, $matches);

Тогда $matches[1][n] будет иметь ключевое слово n, а $matches[2][n] будет иметь значение n.

Regex Demo

^             # Start of line in multiline mode
(             # Start of capture group 1
   .*?        # Match 0 or more characters until ...
)             # End of capture group 1
\.*           # Match 0 or more periods
:             # Match a colon followed by a space
(             # Start of capture group 2
   .+         # Match 1 or more characters up to but not including a newline
)             # End of capture group 2

Обновление

Каждый раз через l oop вы будете обрабатывать один домен и пары ключевое слово / значение. Что вы будете делать с ними, зависит от вас.

foreach ($query as $domain) {
    $scanUrl = 'whois '. $domain->url;
    $output = []; // start with an empty array
    exec($scanUrl, $output);
    $text = implode("\n", $output);
    preg_match_all('/^(.*?)\\.*: (.+)/m', $text, $matches);
    $n = count($matches[1]); // number of keyword/value pairs
    for ($i = 0; $i < $n; $i++) {
        // display next keyword/value pair:
        echo $matches[1][$i], "->", $matches[2][$i], "\n";
    }
}

Обновление 2

Вместо объединения массива строк, возвращаемых командой exec, в единственная строка и выполнение preg_match_all, которое затем даст вам массив совпадений, может быть удобнее делать отдельные вызовы preg_match для отдельных строк вывода из команды exec:

foreach ($query as $domain) {
    $scanUrl = 'whois '. $domain->url;
    $output = []; // start with an empty array
    exec($scanUrl, $output);
    foreach ($output as $line) {
         if (preg_match('/^(.*?)\\.*: (.+)/', $line, $matches)) {
             echo $matches[1], "->", $matches[2], "\n";
         }
    }    
}
...