Scrape Runner Tool
2022Archived — Moved on for now...
NodeJSPuppeteer
A configurable web scraping framework using NodeJS and Puppeteer. Designed for running multiple scraping jobs with scheduling and error handling.
The Story
After building several one-off scrapers, I wanted a reusable framework that could handle common scraping patterns. Scrape Runner was designed to be configurable and robust.
Architecture
- Plugin-based scraper definitions
- Puppeteer for JavaScript-heavy sites
- Built-in retry and error handling
- Scheduling with cron expressions
- Output to multiple formats (JSON, CSV, DB)
Technical Highlights
The framework used a job queue system to manage multiple concurrent scrapes. Puppeteer handled dynamic content while a plugin system allowed defining new scrapers without touching core code.
What I Learned
- •Headless Browsers
- •Job Scheduling
- •Error Recovery