Я рекомендую установить Apify SDK package , где вы можете просто использовать один из вспомогательных классов для управления страницами / браузерами кукловода.
PuppeteerPool : он управляет экземплярами браузерадля тебя. Если вы установите одну страницу для браузера. Каждая новая страница будет создавать новый экземпляр браузера.
const puppeteerPool = new PuppeteerPool({
maxOpenPagesPerInstance: 1,
});
const page1 = await puppeteerPool.newPage();
const page2 = await puppeteerPool.newPage();
const page3 = await puppeteerPool.newPage();
// ... do something with the pages ...
// Close all browsers.
await puppeteerPool.destroy();
Или PuppeteerCrawler более мощный с несколькими опциями и помощниками. Вы можете управлять всем гусеничным ходом в кукольнике. Вы можете проверить пример PuppeteerCrawler .
edit: Пример использования PuppeteerCrawler 10 concurency
const Apify = require('apify');
Apify.main(async () => {
// Apify.openRequestQueue() is a factory to get a preconfigured RequestQueue instance.
// We add our first request to it - the initial page the crawler will visit.
const requestQueue = await Apify.openRequestQueue();
await requestQueue.addRequest({ url: 'https://news.ycombinator.com/' }); // Adds URLs you want to process
// Create an instance of the PuppeteerCrawler class - a crawler
// that automatically loads the URLs in headless Chrome / Puppeteer.
const crawler = new Apify.PuppeteerCrawler({
requestQueue,
maxConcurrency: 10, // Set max concurrency
puppeteerPoolOptions: {
maxOpenPagesPerInstance: 1, // Set up just one page for one browser instance
},
// The function accepts a single parameter, which is an object with the following fields:
// - request: an instance of the Request class with information such as URL and HTTP method
// - page: Puppeteer's Page object (see https://pptr.dev/#show=api-class-page)
handlePageFunction: async ({ request, page }) => {
// Code you want to process on each page
},
// This function is called if the page processing failed more than maxRequestRetries+1 times.
handleFailedRequestFunction: async ({ request }) => {
// Code you want to process when handlePageFunction failed
},
});
// Run the crawler and wait for it to finish.
await crawler.run();
console.log('Crawler finished.');
});
Пример использования RequestList:
const Apify = require('apify');
Apify.main(async () => {
const requestList = new Apify.RequestList({
sources: [
// Separate requests
{ url: 'http://www.example.com/page-1' },
{ url: 'http://www.example.com/page-2' },
// Bulk load of URLs from file `http://www.example.com/my-url-list.txt`
{ requestsFromUrl: 'http://www.example.com/my-url-list.txt', userData: { isFromUrl: true } },
],
persistStateKey: 'my-state',
persistSourcesKey: 'my-sources',
});
// This call loads and parses the URLs from the remote file.
await requestList.initialize();
const crawler = new Apify.PuppeteerCrawler({
requestList,
maxConcurrency: 10, // Set max concurrency
puppeteerPoolOptions: {
maxOpenPagesPerInstance: 1, // Set up just one page for one browser instance
},
// The function accepts a single parameter, which is an object with the following fields:
// - request: an instance of the Request class with information such as URL and HTTP method
// - page: Puppeteer's Page object (see https://pptr.dev/#show=api-class-page)
handlePageFunction: async ({ request, page }) => {
// Code you want to process on each page
},
// This function is called if the page processing failed more than maxRequestRetries+1 times.
handleFailedRequestFunction: async ({ request }) => {
// Code you want to process when handlePageFunction failed
},
});
// Run the crawler and wait for it to finish.
await crawler.run();
console.log('Crawler finished.');
});