bullmq-stalled-jobs-debug-guide

Stalled Jobs in BullMQ: What They Are and Why They Matter

Stalled jobs are one of the most common and frustrating issues when working with BullMQ. You run your queues, everything looks fine, and then—boom! A job silently stalls and fails without any actionable error.

In this post, we’ll break down what stalled jobs are, why they happen, and how to prevent and resolve them in real-world scenarios.


What is a “Stalled” Job in BullMQ?

A stalled job is one that was picked up by a worker, but the worker either crashed or didn’t complete the job in the expected time. BullMQ’s watchdog detects this and moves the job to the failed state with the reason "job stalled more than allowable limit".


Common Causes of Stalled Jobs

  • Worker crashes or memory leaks
  • Long-running jobs without proper keep-alive/ping
  • Redis connectivity issues
  • Insufficient lock duration
  • Misconfigured concurrency

🔧 How to Prevent Stalled Jobs

const myQueue = new Queue('my-queue', {
  connection: redisConnection,
  defaultJobOptions: {
    removeOnComplete: true,
    removeOnFail: false,
    timeout: 10000, // 10 seconds
  },
});

Best Practices:

  • Increase lockDuration for long jobs
  • Set a timeout value for job-level safety
  • Use concurrency wisely
  • Monitor worker health using a process watchdog
  • Log everything

🛠️ How to Debug Stalled Jobs (with Logs & Tools)

  • Log process() function entry and exit
  • Use BullMQ Events (stalled, failed, completed)
  • Observe Redis memory/latency issues
  • Add a retry strategy (not infinite)
worker.on('failed', (job, err) => {
  console.error(`Job ${job.id} failed due to ${err.message}`);
});


Bonus: Visual Monitoring with Upqueue.io

Keeping track of when and where your jobs stall is hard without the right tools. Upqueue.io helps developers observe and debug stalled jobs more easily — with visual job timelines, failed job alerts, and memory usage monitors that can reveal root causes early.

Upqueue includes:

✅ Failed Job Monitor

✅ Redis Memory Usage Alerts

✅ Connection Health Check

✅ Beautiful UI that highlights stuck queues

(You can start with a 14-day free trial — no credit card needed.)


📈 Summary

Stalled jobs aren’t just annoying — they can seriously hurt your queue reliability. With proper timeout settings, better worker observability, and the right tools like Upqueue, you can stay ahead of these silent failures.