TJ Solutions
← Back to projects

Scrape Runner Tool

2022
Archived — Moved on for now...
NodeJSPuppeteer

A configurable web scraping framework using NodeJS and Puppeteer. Designed for running multiple scraping jobs with scheduling and error handling.

The Story

After building several one-off scrapers, I wanted a reusable framework that could handle common scraping patterns. Scrape Runner was designed to be configurable and robust.

Architecture

  • Plugin-based scraper definitions
  • Puppeteer for JavaScript-heavy sites
  • Built-in retry and error handling
  • Scheduling with cron expressions
  • Output to multiple formats (JSON, CSV, DB)

Technical Highlights

The framework used a job queue system to manage multiple concurrent scrapes. Puppeteer handled dynamic content while a plugin system allowed defining new scrapers without touching core code.

What I Learned

  • Headless Browsers
  • Job Scheduling
  • Error Recovery