Running scheduled tasks in Heroku
Many of my projects often times have a background job that runs on a schedule:
- Text me a reminder if something happened or didn’t happen.
- Cleanup jobs for files, databases, caches, etc.
- Polling for status (e.g. a queue) and acting based on the event.
All things that in a different era I would have used the old and reliable cron.
In the new brave world of PaaS things work a slightly different way (better!). Because I host all my apps on Heroku, I heavily relied on Heroku’s built in Scheduler add-on. Despite disclaimers, it has been working great for years with no misses.
But, alas, it’s scheduling flexibility is limited so I started doing some research and, of course, I found a module on npm that gives me all the flexibility I need, and takes me back 25 years when I was a Unix admin.
I decided to structure my cronjobs using some conventions.
- Every project with a job, will have a folder
cronjobs
- An
index.js
file will export all jobs
function run(){
//Do something
}
module.exports = [{
name: 'Awesome job',
schedule: "0 * * * *", //Every hour
description: "Awesome job description",
job: () => {
run();
}
}];
Notice it exports an array
. In some cases, I have many jobs for an app. I can easily arrange all these in different files under the same folder:
/appX
/cronjobs
index.js
job1.js
job2.js
And then:
//JOB1
function run(){
//Do something
}
module.exports = {
name: 'job1',
schedule: "0 8 * * *", //Every day @ 8:00AM
description: "job1 description",
job: () => {
run();
}
};
index.js
would simply:
module.exports = [
require('./job1'),
require('./job2')
];
- All jobs are hosted in Heroku’s custom clock process which only requires defining an entry in the project’s
ProcFile
web: node index.js
clock: node clock.js
- Last but not least, the actual host process. Here’s where CronJob’s are instantiated. Because I am using this file convention, I wrote a little bit of a generic loader so I can simply add new jobs as needed and they will be automagically picked up, loaded and ran at the desired times.
require('dotenv').config();
const CronJob = require('cron').CronJob;
const _ = require('lodash');
const moment = require('moment-timezone');
const { log } = console;
var normalizedPath = require("path").join(__dirname);
const fs = require("fs");
fs
.readdirSync(normalizedPath + "/apps", { withFileTypes: true })
.forEach((file) => {
//Folders that start with "_" are ignored by convention
if(file.isDirectory() && file.name.startsWith('_') === false){
const cronPath = normalizedPath + "/apps/" + file.name + "/cronjobs";
if(fs.existsSync(cronPath)){
const job = require(cronPath); //Each cronjob file can contain many "crons"
job.forEach((cj) => {
log('Loading ' + cj.name);
log('Schedule: ', cj.schedule);
const cron = new CronJob(cj.schedule, cj.job, null, true, 'America/Los_Angeles');
cron.start();
//Setup a simple timer to show next scheduled run
setInterval(() => {
log(`${cj.name} next run: ${moment(cron.nextDates(1)[0])
.tz('America/Los_Angeles')
.format('YYYY-MM-DD - hh:mm:ss')}`)
}, 10000);
});
}
}
});
Notable missing parts / caveats
- The logging infrastructure is pretty primitive
- Minimal exception handling
- No real monitoring of jobs (just
heroku logs -t -d=clock
)