Если вы сохраняете некоторую информацию о странице после первого запроса, вы можете перестроить страницу позже без необходимости повторного запроса с сервера.
# 1) store the page information
# uri: a URI instance
# response: a hash of response headers
# body: a string
# code: the HTTP response code
page = agent.get(url)
uri, response, body, code = [page.uri, page.response, page.body, page.code]
# 2) rebuild the page, given the stored information
page = Mechanize::Page.new(uri, response, body, code, agent)
Я использовал эту технику в пауках/ Скреперы, так что код может быть изменен без необходимости повторного запроса всех страниц.Например:
# agent: a Mechanize instance
# storage: must respond to [] and []=, and must accept and return arbitrary ruby objects.
# for in-memory storage, you could use a Hash.
# or, you could write something that is backed by a filesystem, mongodb, riak, redis, s3, etc...
# logger: a Logger instance
class Foobar < Struct.new(:agent, :storage, :logger)
def get_cached(uri)
cache_key = "_cache/#{uri}"
if args = storage[cache_key]
logger.debug("getting (cached) #{uri}")
uri, response, body, code = args
page = Mechanize::Page.new(uri, response, body, code, agent)
agent.send(:add_to_history, page)
page
else
logger.debug("getting (UNCACHED) #{uri}")
page = agent.get(uri)
storage[cache_key] = [page.uri, page.response, page.body, page.code]
page
end
end
end
Что вы могли бы использовать следующим образом:
require 'logger'
require 'pp'
require 'rubygems'
require 'mechanize'
storage = {}
foo = Foobar.new(Mechanize.new, storage, Logger.new(STDOUT))
foo.get_cached("http://ifconfig.me/ua")
foo.get_cached("http://ifconfig.me/ua")
foo.get_cached("http://ifconfig.me/ua")
foo.get_cached("http://ifconfig.me/encoding")
foo.get_cached("http://ifconfig.me/encoding")
pp storage
Который печатает следующую информацию:
D, [2013-10-19T14:13:32.019291 #18107] DEBUG -- : getting (UNCACHED) http://ifconfig.me/ua
D, [2013-10-19T14:13:36.375649 #18107] DEBUG -- : getting (cached) http://ifconfig.me/ua
D, [2013-10-19T14:13:36.376822 #18107] DEBUG -- : getting (cached) http://ifconfig.me/ua
D, [2013-10-19T14:13:36.376910 #18107] DEBUG -- : getting (UNCACHED) http://ifconfig.me/encoding
D, [2013-10-19T14:13:52.830416 #18107] DEBUG -- : getting (cached) http://ifconfig.me/encoding
{"_cache/http://ifconfig.me/ua"=>
[#<URI::HTTP:0x007fe4ac94d098 URL:http://ifconfig.me/ua>,
{"date"=>"Sat, 19 Oct 2013 19:13:33 GMT",
"server"=>"Apache",
"vary"=>"Accept-Encoding",
"content-encoding"=>"gzip",
"content-length"=>"87",
"connection"=>"close",
"content-type"=>"text/plain"},
"Mechanize/2.7.2 Ruby/2.0.0p247 (http://github.com/sparklemotion/mechanize/)\n",
"200"],
"_cache/http://ifconfig.me/encoding"=>
[#<URI::HTTP:0x007fe4ac99d2a0 URL:http://ifconfig.me/encoding>,
{"date"=>"Sat, 19 Oct 2013 19:13:48 GMT",
"server"=>"Apache",
"vary"=>"Accept-Encoding",
"content-encoding"=>"gzip",
"content-length"=>"42",
"connection"=>"close",
"content-type"=>"text/plain"},
"gzip,deflate,identity\n",
"200"]}