WebRequest с сайта, возвращающего неожиданные результаты - PullRequest
0 голосов
/ 04 июня 2018

Так что я считаю, что я нахожусь на 99% пути, за исключением того, что результаты, которые я получаю, не соответствуют ожиданиям.

Запрос, который я пытаюсь выполнить, здесь на вкладке Discovery End Date.

Я использую WebRequest, подобный найденному здесь из Microsoft Docs, для отправки данных.

Ввод во внешний интерфейс: (Черезбраузер)

enter image description here

Результаты в интерфейсе: (через браузер)

enter image description here

Мой код:

Вызов:

GetPage("L", "943", "17", "MID");

Функция:

    internal static int ajaxId = 0;
    public static string GetPage(string docket_1, string docket_2, string docket_3, string venue)
    {
        // Create the initial request (Required for cookies/ids/uri)
        var request = (HttpWebRequest)WebRequest.Create("https://portal.njcourts.gov/webe7/prweb/PRServletPublicAuth/uoQwtbB8g8Qb57vj6yfidBDhxX-dskXV*/!STANDARD?AppName=CIVSearch");
        // set decompression
        request.AutomaticDecompression = DecompressionMethods.GZip;
        // define what the request accepts
        request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8";
        // user agent required due to NJCourts having requirements for web requests
        request.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36";
        // create cookie container
        request.CookieContainer = new CookieContainer();
        // get response
        var r = (HttpWebResponse)request.GetResponse();

        // Take the request and get the page stream that is returned.
        var page = GetPageContent(r);

        // Get the harnessId from the page returned for the next request URI
        var harnessId =
            Regex.Match(page, "<input type='hidden' id='pzHarnessID' value='(.*?)'").Groups[1].ToString();

        // Get the specified URI that's required for the request that is returned from the initial page.
        var dynamicURI = Regex.Match(page, "\"url\": \"(.*?)\"").Groups[1].ToString();

        // Create the full URI
        var url_begin = "https://portal.njcourts.gov" + dynamicURI +
                        "&pyEncodedParameters=true" +
                        "&pzKeepPageMessages=false" +
                        "&UITemplatingStatus=N" +
                        "&StreamName=DiscoverEndDate" +
                        "&BaseReference=DscEndDteSearchPage" +
                        "&StreamClass=Rule-HTML-Section" +
                        "&bClientValidation=true" +
                        "&FormError=NONE" +
                        "&pyCustomError=CivilErrorSection" +
                        "&PreActivity=ValidateDiscoveryEndDate" +
                        "&UsingPage=true" +
                        "&HeaderButtonSectionName=" +
                        $"&pzHarnessID={harnessId}" +
                        "&inStandardsMode=true" +
                        $"&AJAXTrackID={++ajaxId}" +
                        "&ClientInt=Start";

        // Create all the required form parameters
        var parameters = new List<string>
        {
            $"$PDscEndDteSearchPage$pCounty={venue}",
            $"$PDscEndDteSearchPage$pDocketType={docket_1}",
            $"$PDscEndDteSearchPage$pDocketSeq={docket_2}",
            $"$PDscEndDteSearchPage$pDocketYear={docket_3}",
            "EXPANDEDSubSectionDiscoverEndDateBB=true",
            "PreActivitiesList=%3Cpagedata%3E%3CdataTransforms%20REPEATINGTYPE%3D%22PageList%22%3E%3Crowdata%20REPEATINGINDEX%3D%221%22%3E%3CdataTransform%3E%3C%2FdataTransform%3E%3C%2Frowdata%3E%3C%2FdataTransforms%3E%3C%2Fpagedata%3E"
        };

        // Create byte array for request stream
        var content = Encoding.UTF8.GetBytes(string.Join("&", parameters));

        // create discovery end date request
        var dedRequest = (HttpWebRequest)WebRequest.Create(url_begin);

        // submit the cookies that were originally received
        dedRequest.CookieContainer = request.CookieContainer;
        // make sure it's post
        dedRequest.Method = "POST";
        // set the content length/type
        dedRequest.ContentLength = content.Length;
        dedRequest.ContentType = "application/x-www-form-urlencoded";
        // set the rest of the header stuff.
        dedRequest.AutomaticDecompression = DecompressionMethods.GZip;
        dedRequest.Accept = "*/*";
        dedRequest.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36";

        // get the request stream and write the form information to it.
        var rs = dedRequest.GetRequestStream();
        rs.Write(content, 0, content.Length);
        rs.Close();

        // try to submit the request and get a response.
        try
        {
            r = (HttpWebResponse)dedRequest.GetResponse();
            page = GetPageContent(r);
        }
        catch (Exception ex)
        {
            Debug.WriteLine(ex);
            page = "";
        }

        return page;
    }

Ожидаемые результаты: (изнутри страницы)

    <tr class="oddRow cellCont" oaargs="NJJ-FW-CIVCaseMgtFW-Data-Case','" id="$PD_DiscoveryEndDateSearch_pa51669961004672418pz$ppxResults$l1" pl_index="1">
        <td title="" data-importance="secondary" data-attribute-name="Venue" style="height:22px;" class="dataValueRead gridCell    gridCellSelected" tabindex="0">
            <div class="oflowDivM ">
                <span>MIDDLESEX</span>
            </div>
        </td>
        <td title="" data-importance="secondary" data-attribute-name="Docket #" headers="a2" style="height:20px;" class="gridCell  gridCellSelected">
            <div class="oflowDivM ">
                <span>
                    <a href="#" onclick="pd(event);" name="DiscoveryEndDateSearchResults_D_DiscoveryEndDateSearch_pa51669961004672418pz.pxResults(1)_24" data-ctl="Link" data-click="[[&quot;processAction&quot;, [&quot;DiscoveryCaseDetail&quot;,&quot;true&quot;,&quot;:event&quot;,&quot;&quot;,&quot;Rule-HTML-Section&quot;,&quot;&quot;,&quot;CIVModalTemplate&quot;,&quot;anim-bottom&quot;,&quot;anim-bottom&quot;,&quot;NJJ-FW-CIVCaseMgtFW-Data-Case&quot;,&quot;NJJ-FW-CIVCaseMgtFW-Data-Case&quot;,&quot;false&quot;,&quot;false&quot;]]]" class="">L-943-17</a>
                </span>
            </div>
        </td>
        <td title="" data-importance="secondary" data-attribute-name="Consolidation" headers="a3" style="height:20px;" class="dataValueRead gridCell  gridCellSelected">
            <div class="oflowDivM ">
                <span>N</span>
            </div>
        </td>
        <td title="" data-importance="secondary" data-attribute-name="Caption" headers="a4" style="height:20px;" class="dataValueRead gridCell  gridCellSelected">
            <div class="oflowDivM ">
                <span>PULTRO JESSICA A VS WEGMANS FOOD MARKETS INC</span>
            </div>
        </td>
        <td title="" data-importance="secondary" data-attribute-name="Discovery End Date" headers="a5" style="height:20px;" class="dataValueRead gridCell  gridCellSelected">
            <div class="oflowDivM ">
                <span>5/10/18</span>
            </div>
        </td>
    </tr>

Фактические результаты:

<div class='field-item dataLabelWrite warning_dataLabelWrite' >**Search returned more than 100 records. Please refine the search criteria to get accurate results</div>

1 Ответ

0 голосов
/ 01 апреля 2019

Больше не имеет значения, они добавили reCAPTCHA, так что это невозможно сделать с помощью веб-запроса.Я почти уверен, что в одиночку заставил их создать это.

...