Monitoring Celery Queue Length with RabbitMQ

Earlier this week Matthew Schinckel wrote a post about how he monitors Celery queue sizes with a Redis backend.

RabbitMQ is also a popular backend for Celery, and it took us a long time to get good visibility into our task queues and backlogs. So we'd like to share our solution for monitoring RabbitMQ celery queues for others that might be in a similar situation.

Do read Matts post first. I'm going to assume you've done that, and just talk about how this script is different.

Link to the full code!

The trick to getting queue length details is to run the script from the RabbitMQ host itself, where rabbitmqctl is installed and allowed to speak to the cluster. We do this with Cron.

First, get the distinct virtual hosts in your cluster. You might just have one, but we have several.

VHOSTS=$(/usr/sbin/rabbitmqctl list_vhosts -q | grep -v "/" | xargs -n1 | sort -u)

Next, define the stats to fetch from RabbitMQ:

STATS2READ=( messages_ready messages_unacknowledged consumers )

Finally, loop over the virtual hosts and statistics, and push the metrics to Cloudwatch (or send them to New Relic!):

for VHOST in ${VHOSTS[@]}; do
    for STATISTIC in ${STATS2READ[@]}; do
        MDATA=''
        while read -r VALUE QUEUE ; do
            MDATA+="MetricName=$STATISTIC,Value=$VALUE,Unit=Count,Dimensions=[{Name=Queue,Value=$QUEUE},{Name=InstanceId,Value=$EC2ID},{Name=VHost,Value=$VHOST}] "
        done < <(/usr/sbin/rabbitmqctl list_queues -p "$VHOST" "$STATISTIC" name | grep -Ev "pidbox|celeryev|Listing|done")
        /usr/local/bin/aws cloudwatch put-metric-data --endpoint-url $ENDPOINT --namespace $NS --region ap-southeast-2 --metric-data $MDATA
    done
done

Use Grafana to visualise the metrics from CloudWatch, and get pretty graphs!

Ready Tasks