Я пытаюсь выполнить команду cURL внутри Ruby. Кажется, что у рассматриваемого веб-сайта есть некоторые ограничения, которые не допускают запросы, которые не исходят из браузера. Я обнаружил, если я запускаю ниже локально, в консоли rails это работает успешно:
base = 'https://www.mrporter.com/api/inseason/search/resources/store/mrp_us/productview'
url = 'https://www.mrporter.com/en-us/mens/product/kiton/clothing/short-sleeve-polo-shirts/slim-fit-waffle-knit-cotton-polo-shirt/17957409495068194'
external_id = '17957409495068194'
agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:73.0) Gecko/20100101 Firefox/73.0'
x_ibm_client_id = '16c6e258-e6f8-4015-8c52-697f6e65ad67'
page = `curl -vs '#{base}/#{external_id}?locale=en_GB' -H 'User-Agent: #{agent}' --compressed -H 'Referer: #{url}' -H 'X-Ibm-Client-ID: #{x_ibm_client_id}'`
И вывод выглядит так:
[8] pry(main)> page = `curl -vs '#{base}/#{external_id}?locale=en_GB' -H 'User-Agent: #{agent}' --compressed -H 'Referer: #{url}' -H 'X-Ibm-Client-ID: #{x_ibm_client_id}'`
* Trying 184.26.19.177...
* TCP_NODELAY set
* Connected to www.mrporter.com (184.26.19.177) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH
* successfully set certificate verify locations:
* CAfile: /etc/ssl/cert.pem
CApath: none
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
} [222 bytes data]
* TLSv1.2 (IN), TLS handshake, Server hello (2):
{ [102 bytes data]
* TLSv1.2 (IN), TLS handshake, Certificate (11):
{ [4598 bytes data]
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
{ [333 bytes data]
* TLSv1.2 (IN), TLS handshake, Server finished (14):
{ [4 bytes data]
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
} [70 bytes data]
* TLSv1.2 (OUT), TLS change cipher, Client hello (1):
} [1 bytes data]
* TLSv1.2 (OUT), TLS handshake, Finished (20):
} [16 bytes data]
* TLSv1.2 (IN), TLS change cipher, Client hello (1):
{ [1 bytes data]
* TLSv1.2 (IN), TLS handshake, Finished (20):
{ [16 bytes data]
* SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384
* ALPN, server accepted to use h2
* Server certificate:
* subject: C=IT; ST=Milan; L=Milano; O=YOOX NET-A-PORTER GROUP S.p.A.; OU=Application Delivery; CN=corporate.ynap.com
* start date: Jun 6 00:00:00 2019 GMT
* expire date: Sep 4 12:00:00 2020 GMT
* subjectAltName: host "www.mrporter.com" matched cert's "www.mrporter.com"
* issuer: C=US; O=DigiCert Inc; CN=DigiCert SHA2 Secure Server CA
* SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x7f9567000400)
> GET /api/inseason/search/resources/store/mrp_us/productview/17957409495068194?locale=en_GB HTTP/2
> Host: www.mrporter.com
> Accept: */*
> Accept-Encoding: deflate, gzip
> User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:73.0) Gecko/20100101 Firefox/73.0
> Referer: https://www.mrporter.com/en-us/mens/product/kiton/clothing/short-sleeve-polo-shirts/slim-fit-waffle-knit-cotton-polo-shirt/17957409495068194
> X-Ibm-Client-ID: 16c6e258-e6f8-4015-8c52-697f6e65ad67
>
* Connection state changed (MAX_CONCURRENT_STREAMS updated)!
< HTTP/2 200
< access-control-allow-methods: GET
< access-control-allow-origin: *
< access-control-expose-headers: APIm-Debug-Trans-Id, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, X-Global-Transaction-ID
< cached_response: true
< content-encoding: gzip
< content-language: en-US
< content-type: application/json
< etag: 1583405535492
< x-backside-transport: OK OK
< x-envoy-upstream-service-time: 20
< x-global-transaction-id: 1921b6cf5e619c6887edb601
< x-robots-tag: noindex, nofollow, noarchive
< content-length: 3950
< x-edgeconnect-midmile-rtt: 142
< x-edgeconnect-origin-mex-latency: 40
< cache-control: no-transform, max-age=52
< expires: Fri, 06 Mar 2020 00:47:07 GMT
< date: Fri, 06 Mar 2020 00:46:15 GMT
< vary: Accept-Encoding
< set-cookie: geoIP=US; expires=Sat, 07-Mar-2020 00:46:15 GMT; path=/; domain=.mrporter.com
< set-cookie: ak_bmsc=D66B64A17F10849160B617FFB5DEB643C7EFB7E4A47D0000579D615EBE40A073~plHzchkfgrBgq0TMTdRi7yaJ0ggq4X1sntmjxaffGMvL2PfUWN4RDUuaBgTaNcbc8Hv8l89iedHOeGv6rZnD+fOJJ3sCtZbJJHaM7tplhxomDyOTUvxHuOwuE6MRiQvsiqL5/qZcR2yL2J4dBWdRWlurRIc6LZRXYU+z1GCiS6fuetktq1RdFN14g4z8Qip0c1iJ8wxug0PHQ7w+i7QaQcL+K+g/XsAft88b0ieiCsDjQ=; expires=Fri, 06 Mar 2020 02:46:15 GMT; max-age=7200; path=/; domain=.mrporter.com; HttpOnly
< server-timing: cdn-cache; desc=HIT
< server-timing: edge; dur=1
< set-cookie: bm_sz=0E8BB7FAD32C46D0C6D3151410BD69CF~YAAQ5Lfvx8tbXaVwAQAAp51OrQcnC7/OQeZWE5uARSX+5Et49l1oXnIRQUy/jrYArr6Jy3pTDto+e7Z+GLlOzlO1wOmraY0LWoMw0iUhJv4hu6yy6UIskcn2tyzRZ9MnVjhwTUpUKNTX9OFoatCELMzSJQ1hWgwV5H1ADQ5hrWbGo//bwB8PQl772YaBSU/OYHc=; Domain=.mrporter.com; Path=/; Expires=Fri, 06 Mar 2020 04:46:15 GMT; Max-Age=14400; HttpOnly
< set-cookie: _abck=C6010EDE398DD59F044B99C59F6C1339~-1~YAAQ5Lfvx8xbXaVwAQAAp51OrQPDTx4fh5hG52Bs91HRkZ63ZuHyTt0zM9hPQS1w3CUculECJfbr6+GeRDD0l8ZeHnnfUa6D3TyGJ7mmjtquQ/36SMVEJoUt8IAF9IX78HPqrfXIlAm6ObssMip2p2jLer+ao0tEVx9koCJZGGaW83Q2meYEYc8Z9nExfywlMCN8Cx0Oacu3LfT9pKa/fylyiTCAje9B6E4tOrw7nQ3jb4Xn15E7NOZ0BfYSQmD0dFNELX7tNfS6uo/Krl8VenXkkgVNBShVUs+ZDC38ZB/yVuOYvw2ecUZ62R4=~-1~-1~-1; Domain=.mrporter.com; Path=/; Expires=Sat, 06 Mar 2021 00:46:15 GMT; Max-Age=31536000; Secure
<
{ [513 bytes data]
* Connection #0 to host www.mrporter.com left intact
=> "{\"recordSetTotal\":1,\"resourceId\":\"\\/search\\/resources\\/store\\/MRP_US\\/productview\\/17957409495068194?locale=en_GB\",\"recordSetCount\":1,\"recordSetComplete\":\"true\",\"recordSetStartNumbe=> "{\"recordSetTotal\":1,\"resourceId\":\"\\/search\\/resources\\/store\\/MRP_US\\/productview\\/17957409495068194?locale=en_GB\",\"recordSetCount\":1,\"recordSetComplete\":\"true\",\"recordSetStartNumbe
r\":0,\"recordSe...
...
<snippet trimmed for brevity>
Когда я запускаю точно так же вещь, использующая heroku run rails c
Результаты выглядят так:
irb(main):033:0> page = `curl -vs '#{base}/#{external_id}?locale=en_GB' -H 'User-Agent: #{agent}' --compressed -H 'Referer: #{url}' -H 'X-Ibm-Client-ID: #{x_ibm_client_id}'`
* Trying 104.102.199.162...
* TCP_NODELAY set
* Connected to www.mrporter.com (104.102.199.162) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/ssl/certs/ca-certificates.crt
CApath: /etc/ssl/certs
} [5 bytes data]
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
} [512 bytes data]
* TLSv1.3 (IN), TLS handshake, Server hello (2):
{ [122 bytes data]
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
{ [1 bytes data]
* TLSv1.3 (IN), TLS handshake, Unknown (8):
{ [29 bytes data]
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
{ [1 bytes data]
* TLSv1.3 (IN), TLS handshake, Certificate (11):
{ [4603 bytes data]
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
{ [1 bytes data]
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
{ [264 bytes data]
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
{ [1 bytes data]
* TLSv1.3 (IN), TLS handshake, Finished (20):
{ [52 bytes data]
* TLSv1.3 (OUT), TLS change cipher, Client hello (1):
} [1 bytes data]
* TLSv1.3 (OUT), TLS Unknown, Certificate Status (22):
} [1 bytes data]
* TLSv1.3 (OUT), TLS handshake, Finished (20):
} [52 bytes data]
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
* subject: C=IT; ST=Milan; L=Milano; O=YOOX NET-A-PORTER GROUP S.p.A.; OU=Application Delivery; CN=corporate.ynap.com
* start date: Jun 6 00:00:00 2019 GMT
* expire date: Sep 4 12:00:00 2020 GMT
* subjectAltName: host "www.mrporter.com" matched cert's "www.mrporter.com"
* issuer: C=US; O=DigiCert Inc; CN=DigiCert SHA2 Secure Server CA
* SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
} [5 bytes data]
* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
} [1 bytes data]
* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
} [1 bytes data]
* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
} [1 bytes data]
* Using Stream ID: 1 (easy handle 0x5557c69a6580)
} [5 bytes data]
* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
} [1 bytes data]
> GET /api/inseason/search/resources/store/mrp_us/productview/17957409495068194?locale=en_GB HTTP/2
> Host: www.mrporter.com
> Accept: */*
> Accept-Encoding: deflate, gzip
> User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:73.0) Gecko/20100101 Firefox/73.0
> Referer: https://www.mrporter.com/en-us/mens/product/kiton/clothing/short-sleeve-polo-shirts/slim-fit-waffle-knit-cotton-polo-shirt/17957409495068194
> X-Ibm-Client-ID: 16c6e258-e6f8-4015-8c52-697f6e65ad67
>
{ [5 bytes data]
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
{ [1 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [249 bytes data]
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
{ [1 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [249 bytes data]
* TLSv1.3 (IN), TLS Unknown, Unknown (23):
{ [1 bytes data]
* Connection state changed (MAX_CONCURRENT_STREAMS updated)!
} [5 bytes data]
* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
} [1 bytes data]
* TLSv1.3 (IN), TLS Unknown, Unknown (23):
{ [1 bytes data]
* TLSv1.3 (IN), TLS Unknown, Unknown (23):
{ [1 bytes data]
< HTTP/2 403
< mime-version: 1.0
< content-type: text/html
< content-length: 375
< expires: Fri, 06 Mar 2020 01:13:39 GMT
< date: Fri, 06 Mar 2020 01:13:39 GMT
< set-cookie: geoIP=US; expires=Sat, 07-Mar-2020 01:13:39 GMT; path=/; domain=.mrporter.com
< server-timing: cdn-cache; desc=HIT
< server-timing: edge; dur=1
< set-cookie: bm_sz=FA62CF8FD79F074125FF7B574EFBC86B~YAAQZwrGFwmd3GBwAQAAiLNnrQeNNuOv9E++mvhBE9hf3j6TVZKE8+Dd38P+uy41ZTA/LO/3OpX3KchTHWZ2PN6rBUOvoT91i+KC06el+bYLPd0L+15Bwp0V/RV8qvE9GYtwhK6lJHxSZzY2Ivf6+n5QjXrhAaCM4yA/U+7e5pw7AxiBh12OUGr242P3qTUZtrM=; Domain=.mrporter.com; Path=/; Expires=Fri, 06 Mar 2020 05:13:39 GMT; Max-Age=14400; HttpOnly
< set-cookie: _abck=9B6F1ACB95F962EA85B7F3193EB21DF2~-1~YAAQZwrGFwqd3GBwAQAAiLNnrQP/tU1Alx0OCjgP4qSYoU8XgDko7fYPaoSgc0W4+CZ5e4u7zAS/LV0K1BOCq7tHwl7OL/1e1u2+K8F3IDu/j1r/oUagr/U8MfYoQkSpkFrD6vkVY8neDI9Bddh90CMs4ERjwF9pyhJM2vLEkMTzP7PzudNn9jiP6cY4V8AAduVIxuAG5CoaQF7fyQz4EbpYN6gz0k6XpVBQqJOizRj0gr6t0PIUooTUQ0HfkpnLMqiRMpCR30xO4favj8stVHsRtTX8PYybzimu9HhSJUzMRh7gqDEw0Mku44s=~-1~-1~-1; Domain=.mrporter.com; Path=/; Expires=Sat, 06 Mar 2021 01:13:39 GMT; Max-Age=31536000; Secure
<
{ [170 bytes data]
* TLSv1.3 (IN), TLS Unknown, Unknown (23):
{ [1 bytes data]
* Connection #0 to host www.mrporter.com left intact
=> "<HTML><HEAD>\n<TITLE>Access Denied</TITLE>\n</HEAD><BODY>\n<H1>Access Denied</H1>\n \nYou don't have permission to access \"http://www.mrporter.com/api/inseason/search/resources/store/mrp_us/productview/17957409495068194?\" on this server.<P>\nReference #18.670ac617.1583457219.9dbe8ce9\n</BODY>\n</HTML>\n"
На моей локальной машине, когда я делаю curl --version
Я получаю это:
❯ curl --version
curl 7.54.0 (x86_64-apple-darwin18.0) libcurl/7.54.0 LibreSSL/2.6.5 zlib/1.2.11 nghttp2/1.24.1
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp smb smbs smtp smtps telnet tftp
Features: AsynchDNS IPv6 Largefile GSS-API Kerberos SPNEGO NTLM NTLM_WB SSL libz HTTP2 UnixSockets HTTPS-proxy
На герою, делаю heroku run "curl --version"
Я получаю следующее:
❯ heroku run "curl --version"
Running curl --version on ⬢ unstitched-app... up, run.7780 (Hobby)
curl 7.58.0 (x86_64-pc-linux-gnu) libcurl/7.58.0 OpenSSL/1.1.1 zlib/1.2.11 libidn2/2.0.4 libpsl/0.19.1 (+libidn2/2.0.4) nghttp2/1.30.0 librtmp/2.3
Release-Date: 2018-01-24
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtmp rtsp smb smbs smtp smtps telnet tftp
Features: AsynchDNS IDN IPv6 Largefile GSS-API Kerberos SPNEGO NTLM NTLM_WB SSL libz TLS-SRP HTTP2 UnixSockets HTTPS-proxy PSL
Внешне все кажется идентичным. Может ли это быть связано с версией cURL на heroku или некоторыми настройками платформы? Другая возможность, о которой я подумал, состояла в том, что, возможно, сайт может блокировать диапазоны IP-адресов Heroku, но я не знаю, насколько это правдоподобно.
Может кто-нибудь заметить что-то, чего я, возможно, пропускаю, что вызывает 403 на heroku? а не на моей локальной машине?