The landscape of web automation has been revolutionized by Puppeteer, a powerful Node.js library that provides a high-level API to control Chrome or Chromium browsers programmatically. Whether you’re building automated testing frameworks, scraping dynamic content, or generating PDF reports, Puppeteer offers unparalleled capabilities for headless browser automation.

In this comprehensive guide, we’ll explore Puppeteer’s advanced features, from basic automation to complex real-world scenarios, complete with production-ready code examples and performance optimization techniques.

puppeteer automation

Why Choose Puppeteer for Browser Automation?

Before diving into implementation details, let’s understand why Puppeteer has become the go-to solution for browser automation:

  • Native Chrome Integration: Direct access to Chrome DevTools Protocol for maximum compatibility
  • Rich API: Comprehensive methods for page interaction, element manipulation, and content extraction
  • Performance: Optimized for speed with built-in performance monitoring capabilities
  • Flexibility: Support for both headless and headed modes, custom extensions, and mobile emulation
  • Production Ready: Robust error handling, retry mechanisms, and enterprise-grade features

Getting Started

Installation and Setup

1
2
3
npm install puppeteer
# or for a specific Chrome version
npm install puppeteer-core

Basic Setup and Configuration

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
import puppeteer from 'puppeteer';

// Basic browser launch
const browser = await puppeteer.launch({
  headless: true,                    // Run in headless mode
  devtools: false,                   // Open DevTools
  slowMo: 0,                        // Slow down operations
  timeout: 30000,                   // Default timeout
  args: [
    '--no-sandbox',                 // Disable sandbox
    '--disable-setuid-sandbox',     // Disable setuid sandbox
    '--disable-dev-shm-usage',      // Overcome limited resource problems
    '--disable-accelerated-2d-canvas',
    '--no-first-run',
    '--no-zygote',
    '--disable-gpu'
  ]
});

// Create a new page
  const page = await browser.newPage();
await page.setViewport({ width: 1280, height: 720 });

// Navigate to a URL
await page.goto('https://example.com', { 
  waitUntil: 'networkidle2',
  timeout: 30000
});

// Close browser when done
  await browser.close();

Core Concepts and Features

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
// Advanced navigation with multiple wait conditions
await page.goto('https://example.com', {
  waitUntil: ['load', 'domcontentloaded', 'networkidle0'],
  timeout: 30000
});

// Wait for specific elements
await page.waitForSelector('.dynamic-content');
await page.waitForFunction(() => document.querySelectorAll('.item').length > 0);

// Handle multiple pages
const pages = await browser.pages();
const newPage = await browser.newPage();
await newPage.goto('https://another-site.com');

Element Selection and Manipulation

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
// Multiple selection strategies
const element = await page.$('.selector');                    // Single element
const elements = await page.$$('.selector');                 // Multiple elements
const elementWithText = await page.$x('//div[contains(text(), "Hello")]'); // XPath

// Advanced element interaction
await page.click('.button', { button: 'left', clickCount: 1 });
await page.hover('.hover-element');
await page.focus('input[name="email"]');
await page.type('input[name="email"]', 'user@example.com', { delay: 100 });

// Form submission with validation
await page.select('select[name="country"]', 'US');
await page.check('input[type="checkbox"]');
await page.uncheck('input[type="checkbox"]');

Screenshot and PDF Generation

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// Advanced screenshot options
await page.screenshot({
  path: 'screenshot.png',
  fullPage: true,
  clip: { x: 0, y: 0, width: 1920, height: 1080 },
  omitBackground: false,
  quality: 90,
  type: 'png'
});

// PDF generation with custom options
await page.pdf({
  path: 'document.pdf',
  format: 'A4',
  printBackground: true,
  margin: {
    top: '20px',
    right: '20px',
    bottom: '20px',
    left: '20px'
  },
  displayHeaderFooter: true,
  headerTemplate: '<div style="font-size: 10px; text-align: center; width: 100%;">Header</div>',
  footerTemplate: '<div style="font-size: 10px; text-align: center; width: 100%;">Footer</div>'
});

Real-World Use Cases

1. E-commerce Product Monitoring System

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
import puppeteer from 'puppeteer';

class ProductMonitor {
  constructor() {
    this.browser = null;
  }
  
  async init() {
      this.browser = await puppeteer.launch({
        headless: true,
        args: ['--no-sandbox', '--disable-setuid-sandbox']
      });
    }

  async monitorProduct(productUrl, targetPrice) {
    const page = await this.browser.newPage();
    
    try {
      await page.goto(productUrl, { waitUntil: 'networkidle2' });
      
      // Wait for price element to load
      await page.waitForSelector('.price, [data-testid="price"], .product-price');
      
      const productData = await page.evaluate(() => {
        const priceElement = document.querySelector('.price, [data-testid="price"], .product-price');
        const titleElement = document.querySelector('h1, .product-title, [data-testid="product-title"]');
        const availabilityElement = document.querySelector('.stock, .availability, [data-testid="availability"]');
        
        return {
          title: titleElement?.textContent?.trim() || 'Unknown',
          price: priceElement?.textContent?.trim() || 'N/A',
          availability: availabilityElement?.textContent?.trim() || 'Unknown',
          timestamp: new Date().toISOString()
        };
      });

      const currentPrice = parseFloat(productData.price.replace(/[^0-9.]/g, ''));
      const isPriceDrop = currentPrice <= targetPrice;
      const isInStock = !productData.availability.toLowerCase().includes('out of stock');

      return {
        ...productData,
        isPriceDrop,
        isInStock,
        shouldNotify: isPriceDrop && isInStock
      };

    } finally {
      await page.close();
    }
  }

  async close() {
    if (this.browser) {
      await this.browser.close();
    }
  }
}

// Usage
const monitor = new ProductMonitor();
await monitor.init();

const result = await monitor.monitorProduct(
  'https://example-store.com/product/123',
  29.99
);

if (result.shouldNotify) {
  console.log(`Price drop alert: ${result.title} is now $${result.price}`);
}

2. Automated Social Media Content Scheduler

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
class SocialMediaScheduler {
  constructor(credentials) {
    this.credentials = credentials;
    this.browser = null;
  }

  async init() {
    this.browser = await puppeteer.launch({
      headless: false, // Need to see the browser for social media
      args: ['--no-sandbox', '--disable-setuid-sandbox']
    });
  }

  async scheduleTwitterPost(content, imagePath = null) {
    const page = await this.browser.newPage();
    
    try {
      // Navigate to Twitter
      await page.goto('https://twitter.com/compose/tweet');
      
      // Wait for compose area
      await page.waitForSelector('[data-testid="tweetTextarea_0"]');
      
      // Type the content
      await page.type('[data-testid="tweetTextarea_0"]', content);
      
      // Upload image if provided
      if (imagePath) {
        const fileInput = await page.$('input[type="file"]');
        await fileInput.uploadFile(imagePath);
        await page.waitForSelector('[data-testid="tweetButton"]');
      }
      
      // Schedule the tweet (if using Twitter's scheduling feature)
      await page.click('[data-testid="tweetButton"]');
      
      // Wait for confirmation
      await page.waitForSelector('[data-testid="toast"]', { timeout: 5000 });
      
      return { success: true, message: 'Tweet scheduled successfully' };
      
    } catch (error) {
      return { success: false, error: error.message };
    } finally {
      await page.close();
    }
  }

  async scheduleLinkedInPost(content, imagePath = null) {
    const page = await this.browser.newPage();
    
    try {
      await page.goto('https://www.linkedin.com/feed/');
      
      // Click on "Start a post"
      await page.waitForSelector('[data-control-name="compose_post"]');
      await page.click('[data-control-name="compose_post"]');
      
      // Wait for modal to open
      await page.waitForSelector('.ql-editor');
      
      // Type content
      await page.type('.ql-editor', content);
      
      // Upload image if provided
      if (imagePath) {
        await page.click('[data-test-id="image-upload-button"]');
        const fileInput = await page.$('input[type="file"]');
        await fileInput.uploadFile(imagePath);
        await page.waitForSelector('[data-test-id="image-preview"]');
      }
      
      // Post the content
      await page.click('[data-test-id="post-submit-button"]');
      
      return { success: true, message: 'LinkedIn post published' };
      
    } catch (error) {
      return { success: false, error: error.message };
    } finally {
      await page.close();
    }
  }
}

// Usage
const scheduler = new SocialMediaScheduler();
await scheduler.init();

await scheduler.scheduleTwitterPost('Check out our latest product! #innovation', './product-image.jpg');
await scheduler.scheduleLinkedInPost('Excited to share our company milestone...');

3. Advanced Web Application Testing Framework

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
class WebAppTester {
  constructor(baseUrl) {
    this.baseUrl = baseUrl;
    this.browser = null;
    this.testResults = [];
  }

  async init() {
    this.browser = await puppeteer.launch({
      headless: false, // Show browser for debugging
      devtools: true,
      args: ['--no-sandbox', '--disable-setuid-sandbox']
    });
  }

  async runUserFlow(userFlow) {
    const page = await this.browser.newPage();
    const startTime = Date.now();
    
    try {
      // Set up page monitoring
      await this.setupPageMonitoring(page);
      
      // Execute user flow steps
      for (const step of userFlow.steps) {
        await this.executeStep(page, step);
      }
      
      const endTime = Date.now();
      const duration = endTime - startTime;
      
      this.testResults.push({
        flowName: userFlow.name,
        success: true,
        duration,
        timestamp: new Date().toISOString()
      });
      
      return { success: true, duration };
      
    } catch (error) {
      this.testResults.push({
        flowName: userFlow.name,
        success: false,
        error: error.message,
        timestamp: new Date().toISOString()
      });
      
      return { success: false, error: error.message };
    } finally {
      await page.close();
    }
  }

  async executeStep(page, step) {
    switch (step.action) {
      case 'navigate':
        await page.goto(`${this.baseUrl}${step.url}`, { waitUntil: 'networkidle2' });
        break;
        
      case 'click':
        await page.waitForSelector(step.selector);
        await page.click(step.selector);
        break;
        
      case 'type':
        await page.waitForSelector(step.selector);
        await page.type(step.selector, step.value);
        break;
        
      case 'wait':
        await page.waitForTimeout(step.duration);
        break;
        
      case 'assert':
        const element = await page.$(step.selector);
        if (!element) {
          throw new Error(`Element not found: ${step.selector}`);
        }
        break;
        
      case 'screenshot':
        await page.screenshot({ path: `screenshots/${step.name}.png` });
        break;
    }
  }

  async setupPageMonitoring(page) {
    // Monitor console errors
    page.on('console', msg => {
      if (msg.type() === 'error') {
        console.error('Browser console error:', msg.text());
      }
    });

    // Monitor network requests
    page.on('request', request => {
      console.log('Request:', request.url());
    });

    page.on('response', response => {
      if (!response.ok()) {
        console.error('Failed request:', response.url(), response.status());
      }
    });
  }

  async generateTestReport() {
    const report = {
      totalTests: this.testResults.length,
      passedTests: this.testResults.filter(r => r.success).length,
      failedTests: this.testResults.filter(r => !r.success).length,
      averageDuration: this.testResults.reduce((sum, r) => sum + (r.duration || 0), 0) / this.testResults.length,
      results: this.testResults
    };

    return report;
  }
}

// Usage
const tester = new WebAppTester('https://myapp.com');
await tester.init();

const userFlow = {
  name: 'User Registration Flow',
  steps: [
    { action: 'navigate', url: '/register' },
    { action: 'type', selector: 'input[name="email"]', value: 'test@example.com' },
    { action: 'type', selector: 'input[name="password"]', value: 'password123' },
    { action: 'click', selector: 'button[type="submit"]' },
    { action: 'wait', duration: 2000 },
    { action: 'assert', selector: '.success-message' },
    { action: 'screenshot', name: 'registration-success' }
  ]
};

const result = await tester.runUserFlow(userFlow);
console.log('Test result:', result);

Advanced Features

Custom Page Evaluation Functions

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
// Register custom functions in the browser context
await page.evaluateOnNewDocument(() => {
  // Add custom utility functions
  window.utils = {
    waitForElement: (selector, timeout = 5000) => {
      return new Promise((resolve, reject) => {
        const element = document.querySelector(selector);
        if (element) {
          resolve(element);
          return;
        }
        
        const observer = new MutationObserver(() => {
          const element = document.querySelector(selector);
          if (element) {
            observer.disconnect();
            resolve(element);
          }
        });
        
        observer.observe(document.body, { childList: true, subtree: true });
        
        setTimeout(() => {
          observer.disconnect();
          reject(new Error(`Element ${selector} not found within ${timeout}ms`));
        }, timeout);
      });
    },
    
    scrollToElement: (selector) => {
      const element = document.querySelector(selector);
      if (element) {
        element.scrollIntoView({ behavior: 'smooth', block: 'center' });
        return true;
      }
      return false;
    }
  };
});

// Use custom functions
await page.evaluate(() => {
  return window.utils.waitForElement('.dynamic-content');
});

Advanced Request Interception and Modification

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
class AdvancedScraper {
  constructor() {
    this.browser = null;
    this.requestCache = new Map();
  }

  async init() {
    this.browser = await puppeteer.launch({
      headless: true,
      args: ['--no-sandbox', '--disable-setuid-sandbox']
    });
  }

  async scrapeWithInterception(url, options = {}) {
    const page = await this.browser.newPage();
    
    // Set up request interception
    await page.setRequestInterception(true);
    
    page.on('request', (request) => {
      const requestUrl = request.url();
      
      // Block unnecessary resources
      if (options.blockImages && request.resourceType() === 'image') {
        request.abort();
        return;
      }
      
      if (options.blockCSS && request.resourceType() === 'stylesheet') {
        request.abort();
        return;
      }
      
      // Modify requests
      if (options.addHeaders) {
        const headers = { ...request.headers(), ...options.addHeaders };
        request.continue({ headers });
        return;
      }
      
      // Cache responses
      if (this.requestCache.has(requestUrl)) {
        const cachedResponse = this.requestCache.get(requestUrl);
        request.respond(cachedResponse);
        return;
      }
      
      request.continue();
    });
    
    // Cache responses
    page.on('response', async (response) => {
      const url = response.url();
      if (response.ok() && options.cacheResponses) {
        try {
          const buffer = await response.buffer();
          this.requestCache.set(url, {
            status: response.status(),
            headers: response.headers(),
            body: buffer
          });
        } catch (error) {
          console.warn('Failed to cache response:', url);
        }
      }
    });
    
    await page.goto(url, { waitUntil: 'networkidle2' });
    return page;
  }
}

Mobile Device Emulation

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
class MobileTester {
  constructor() {
    this.browser = null;
    this.devices = {
      iPhone: {
        name: 'iPhone 12',
        userAgent: 'Mozilla/5.0 (iPhone; CPU iPhone OS 14_0 like Mac OS X) AppleWebKit/605.1.15',
        viewport: { width: 390, height: 844, deviceScaleFactor: 3 }
      },
      iPad: {
        name: 'iPad Pro',
        userAgent: 'Mozilla/5.0 (iPad; CPU OS 14_0 like Mac OS X) AppleWebKit/605.1.15',
        viewport: { width: 1024, height: 1366, deviceScaleFactor: 2 }
      },
      Android: {
        name: 'Pixel 5',
        userAgent: 'Mozilla/5.0 (Linux; Android 11; Pixel 5) AppleWebKit/537.36',
        viewport: { width: 393, height: 851, deviceScaleFactor: 2.75 }
      }
    };
  }

  async init() {
    this.browser = await puppeteer.launch({
      headless: false, // Show browser for mobile testing
      args: ['--no-sandbox', '--disable-setuid-sandbox']
    });
  }

  async testOnDevice(url, deviceName) {
    const device = this.devices[deviceName];
    if (!device) {
      throw new Error(`Unknown device: ${deviceName}`);
    }

    const page = await this.browser.newPage();
    
    // Emulate device
    await page.emulate({
      userAgent: device.userAgent,
      viewport: device.viewport,
      deviceScaleFactor: device.viewport.deviceScaleFactor
    });

    // Add touch events support
    await page.evaluateOnNewDocument(() => {
      // Simulate touch events for mouse events
      document.addEventListener('mousedown', (e) => {
        const touchEvent = new TouchEvent('touchstart', {
          touches: [new Touch({ identifier: 1, target: e.target, clientX: e.clientX, clientY: e.clientY })]
        });
        e.target.dispatchEvent(touchEvent);
      });
    });

    await page.goto(url, { waitUntil: 'networkidle2' });
    
    // Test touch interactions
    await page.tap('.mobile-button');
    await page.tap('.mobile-link');
    
    // Test swipe gestures
    await page.evaluate(() => {
      const element = document.querySelector('.swipeable');
      if (element) {
        const touchStart = new TouchEvent('touchstart', {
          touches: [new Touch({ identifier: 1, target: element, clientX: 100, clientY: 100 })]
        });
        const touchEnd = new TouchEvent('touchend', {
          touches: [new Touch({ identifier: 1, target: element, clientX: 200, clientY: 100 })]
        });
        element.dispatchEvent(touchStart);
        element.dispatchEvent(touchEnd);
      }
    });

    return page;
  }
}

Performance Monitoring and Optimization

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
class PerformanceMonitor {
  constructor() {
    this.browser = null;
    this.metrics = [];
  }

  async init() {
    this.browser = await puppeteer.launch({
      headless: true,
      args: ['--no-sandbox', '--disable-setuid-sandbox']
    });
  }

  async measurePagePerformance(url) {
    const page = await this.browser.newPage();
    
    // Start performance tracing
    await page.tracing.start({ 
      path: `trace-${Date.now()}.json`,
      screenshots: true 
    });

    const startTime = Date.now();
    
    // Monitor performance metrics
    const performanceMetrics = await page.evaluateOnNewDocument(() => {
      window.performanceMetrics = {
        navigationStart: performance.timing.navigationStart,
        loadEventEnd: 0,
        domContentLoaded: 0,
        firstPaint: 0,
        firstContentfulPaint: 0
      };

      // Capture First Paint and First Contentful Paint
      const observer = new PerformanceObserver((list) => {
        for (const entry of list.getEntries()) {
          if (entry.name === 'first-paint') {
            window.performanceMetrics.firstPaint = entry.startTime;
          }
          if (entry.name === 'first-contentful-paint') {
            window.performanceMetrics.firstContentfulPaint = entry.startTime;
          }
        }
      });
      
      observer.observe({ entryTypes: ['paint'] });
    });

    await page.goto(url, { waitUntil: 'networkidle2' });
    
    // Get performance metrics
    const metrics = await page.evaluate(() => {
      const timing = performance.timing;
      const navigation = performance.getEntriesByType('navigation')[0];
      
      return {
        loadTime: timing.loadEventEnd - timing.navigationStart,
        domContentLoaded: timing.domContentLoadedEventEnd - timing.navigationStart,
        firstPaint: window.performanceMetrics?.firstPaint || 0,
        firstContentfulPaint: window.performanceMetrics?.firstContentfulPaint || 0,
        totalSize: navigation?.transferSize || 0,
        domSize: document.documentElement.innerHTML.length
      };
    });

  await page.tracing.stop();
  
    const endTime = Date.now();
    const totalTime = endTime - startTime;

    this.metrics.push({
      url,
      timestamp: new Date().toISOString(),
      ...metrics,
      totalTime
    });

    await page.close();
    return metrics;
  }

  async optimizePagePerformance(url) {
    const page = await this.browser.newPage();
    
    // Block unnecessary resources
    await page.setRequestInterception(true);
    page.on('request', (request) => {
      const resourceType = request.resourceType();
      if (['image', 'stylesheet', 'font', 'media'].includes(resourceType)) {
        request.abort();
      } else {
        request.continue();
      }
    });

    // Preload critical resources
    await page.evaluateOnNewDocument(() => {
      const criticalCSS = `
        body { font-family: system-ui, sans-serif; }
        .critical { display: block; }
      `;
      
      const style = document.createElement('style');
      style.textContent = criticalCSS;
      document.head.appendChild(style);
    });

    const startTime = Date.now();
    await page.goto(url, { waitUntil: 'domcontentloaded' });
    const endTime = Date.now();

    return {
      optimizedLoadTime: endTime - startTime,
      timestamp: new Date().toISOString()
    };
  }

  generateReport() {
    const report = {
      totalTests: this.metrics.length,
      averageLoadTime: this.metrics.reduce((sum, m) => sum + m.loadTime, 0) / this.metrics.length,
      averageFirstPaint: this.metrics.reduce((sum, m) => sum + m.firstPaint, 0) / this.metrics.length,
      slowestPage: this.metrics.reduce((slowest, current) => 
        current.loadTime > slowest.loadTime ? current : slowest
      ),
      fastestPage: this.metrics.reduce((fastest, current) => 
        current.loadTime < fastest.loadTime ? current : fastest
      ),
      metrics: this.metrics
    };

    return report;
  }
}

Best Practices and Performance Optimization

1. Browser Instance Management

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
class BrowserPool {
  constructor(maxBrowsers = 5) {
    this.maxBrowsers = maxBrowsers;
    this.browsers = [];
    this.availableBrowsers = [];
  }

  async getBrowser() {
    if (this.availableBrowsers.length > 0) {
      return this.availableBrowsers.pop();
    }

    if (this.browsers.length < this.maxBrowsers) {
      const browser = await puppeteer.launch({
        headless: true,
        args: [
          '--no-sandbox',
          '--disable-setuid-sandbox',
          '--disable-dev-shm-usage',
          '--disable-accelerated-2d-canvas',
          '--no-first-run',
          '--no-zygote',
          '--disable-gpu',
          '--disable-web-security',
          '--disable-features=VizDisplayCompositor'
        ]
      });
      
      this.browsers.push(browser);
      return browser;
    }

    // Wait for a browser to become available
    return new Promise((resolve) => {
      const checkAvailable = () => {
        if (this.availableBrowsers.length > 0) {
          resolve(this.availableBrowsers.pop());
        } else {
          setTimeout(checkAvailable, 100);
        }
      };
      checkAvailable();
    });
  }

  releaseBrowser(browser) {
    this.availableBrowsers.push(browser);
  }

  async closeAll() {
    await Promise.all(this.browsers.map(browser => browser.close()));
    this.browsers = [];
    this.availableBrowsers = [];
  }
}

2. Advanced Error Handling and Retry Logic

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
class RobustPuppeteer {
  constructor(options = {}) {
    this.maxRetries = options.maxRetries || 3;
    this.retryDelay = options.retryDelay || 2000;
    this.timeout = options.timeout || 30000;
  }

  async executeWithRetry(operation, context = {}) {
    let lastError;
    
    for (let attempt = 1; attempt <= this.maxRetries; attempt++) {
      try {
        return await operation(context);
      } catch (error) {
        lastError = error;
        console.warn(`Attempt ${attempt} failed:`, error.message);
        
        if (attempt < this.maxRetries) {
          const delay = this.retryDelay * Math.pow(2, attempt - 1); // Exponential backoff
          await this.sleep(delay);
        }
      }
    }
    
    throw new Error(`Operation failed after ${this.maxRetries} attempts. Last error: ${lastError.message}`);
  }

  async safeScrape(url, selectors) {
    return this.executeWithRetry(async (context) => {
      const browser = await puppeteer.launch({
        headless: true,
        args: ['--no-sandbox', '--disable-setuid-sandbox']
      });
      
      try {
        const page = await browser.newPage();
        page.setDefaultTimeout(this.timeout);
        
        // Set up error monitoring
        page.on('pageerror', (error) => {
          console.error('Page error:', error.message);
        });
        
        page.on('requestfailed', (request) => {
          console.warn('Request failed:', request.url(), request.failure().errorText);
        });
        
        await page.goto(url, { 
          waitUntil: 'networkidle2',
          timeout: this.timeout
        });
        
        const data = await page.evaluate((sel) => {
          const result = {};
          for (const [key, selector] of Object.entries(sel)) {
            const element = document.querySelector(selector);
            result[key] = element ? element.textContent.trim() : null;
          }
          return result;
        }, selectors);
        
      return data;
      
      } finally {
        await browser.close();
      }
    });
  }

  sleep(ms) {
    return new Promise(resolve => setTimeout(resolve, ms));
  }
}

3. Memory Management and Resource Optimization

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
class OptimizedScraper {
  constructor() {
    this.browser = null;
    this.activePages = new Set();
  }

  async init() {
    this.browser = await puppeteer.launch({
      headless: true,
      args: [
        '--no-sandbox',
        '--disable-setuid-sandbox',
        '--disable-dev-shm-usage',
        '--disable-accelerated-2d-canvas',
        '--no-first-run',
        '--no-zygote',
        '--disable-gpu',
        '--memory-pressure-off',
        '--max_old_space_size=4096'
      ]
    });
  }

  async createOptimizedPage() {
    const page = await this.browser.newPage();
    this.activePages.add(page);
    
    // Optimize page settings
    await page.setViewport({ width: 1280, height: 720 });
    await page.setJavaScriptEnabled(true);
    await page.setDefaultNavigationTimeout(30000);
    
    // Block unnecessary resources
    await page.setRequestInterception(true);
    page.on('request', (request) => {
      const resourceType = request.resourceType();
      if (['image', 'stylesheet', 'font', 'media'].includes(resourceType)) {
        request.abort();
      } else {
        request.continue();
      }
    });
    
    // Monitor memory usage
    page.on('close', () => {
      this.activePages.delete(page);
    });
    
    return page;
  }

  async cleanup() {
    // Close all active pages
    await Promise.all([...this.activePages].map(page => page.close()));
    this.activePages.clear();
    
    // Force garbage collection if available
    if (global.gc) {
      global.gc();
    }
  }

  async close() {
    await this.cleanup();
    if (this.browser) {
      await this.browser.close();
      this.browser = null;
    }
  }
}

4. Production Monitoring and Logging

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
class ProductionPuppeteer {
  constructor(config = {}) {
    this.config = {
      logLevel: config.logLevel || 'info',
      enableMetrics: config.enableMetrics || true,
      maxConcurrent: config.maxConcurrent || 5,
      ...config
    };
    this.metrics = {
      totalRequests: 0,
      successfulRequests: 0,
      failedRequests: 0,
      averageResponseTime: 0
    };
  }

  log(level, message, data = {}) {
    const timestamp = new Date().toISOString();
    const logEntry = {
      timestamp,
      level,
      message,
      data,
      pid: process.pid
    };
    
    console.log(JSON.stringify(logEntry));
  }

  async executeWithMonitoring(operation, context = {}) {
    const startTime = Date.now();
    this.metrics.totalRequests++;
    
    try {
      this.log('info', 'Starting operation', { context });
      
      const result = await operation(context);
      
      const duration = Date.now() - startTime;
      this.metrics.successfulRequests++;
      this.updateAverageResponseTime(duration);
      
      this.log('info', 'Operation completed successfully', { 
        duration,
        context 
      });
      
      return result;
      
    } catch (error) {
      const duration = Date.now() - startTime;
      this.metrics.failedRequests++;
      
      this.log('error', 'Operation failed', {
        error: error.message,
        stack: error.stack,
        duration,
        context
      });
      
      throw error;
    }
  }

  updateAverageResponseTime(duration) {
    const total = this.metrics.successfulRequests + this.metrics.failedRequests;
    this.metrics.averageResponseTime = 
      (this.metrics.averageResponseTime * (total - 1) + duration) / total;
  }

  getMetrics() {
    return {
      ...this.metrics,
      successRate: this.metrics.totalRequests > 0 
        ? (this.metrics.successfulRequests / this.metrics.totalRequests) * 100 
        : 0
    };
  }
}

Final Thoughts: Puppeteer in Production

Puppeteer represents a paradigm shift in web automation, offering:

  • Reliability: Robust error handling and retry mechanisms for production environments
  • Performance: Optimized resource management and concurrent execution capabilities that leverage Node.js event loop efficiency
  • Flexibility: Support for complex scenarios including mobile testing, performance monitoring, and advanced scraping
  • Scalability: Built-in features for handling large-scale automation tasks with proper Node.js clustering strategies
  • Monitoring: Comprehensive logging and metrics collection for operational insights using production monitoring techniques

Whether you’re building automated testing suites, data extraction pipelines, or performance monitoring systems, Puppeteer provides the foundation for creating robust, scalable automation solutions. The combination of Chrome’s rendering engine and Node.js’s flexibility creates endless possibilities for innovative applications.

The key to success with Puppeteer lies in understanding its capabilities, implementing proper resource management, and following production best practices. With the techniques and patterns outlined in this guide, you’re equipped to build enterprise-grade automation systems that can handle the most demanding real-world scenarios.

Further Reading