Building a Message/Job-Queue Optimized for DragonflyDB

focused-jaguar · October 8, 2023, 11:13pm

I’ve been looking for a message/job queue for a while now and I haven’t find any good options… One of my top requirement is the ability to support QoS based on some arbitrary metric aka the noisy neighbor problem. For example, we have many customers that tries to send a lot of emails at the same time. Using a fifo means the customer sending 1m emails will use most of the resources and the customer sending 100 emails may not see any progress until the 1m “block” is done processing.

We already made a message queue system based on a single Redis instance (replicated). We are evaluating adding new feature or switching to something pre-made (not necessarily Redis based). Our MQ wasn’t design for a Redis Cluster and because of how Redis distribute the load via hash slots, I don’t think it ever will. Being on Redis, I’ve looked at BullMQ Pro which feature wise is like 90% of what we want, but because of the --cluster_mode=emulated --lock_on_hashtags, BullMQ suffers from the same problem and won’t scale unless we make multiple queues which we are trying to avoid. We want to scale our MQ to 250-500k item/min (~4-8k/sec) and i don’t think it would be possible for a single shard to handle that load…

So to achieve our scale, i was thinking of using DragonflyDB and build a new message queue around the new capabilities it offers. Redis Clusters seems to be an ops nightmare, i would rather vertically scale and remove all the hash slot bs. I do understand there is a limit to v-scale but we are on GCP, if we hit that limit, i will be amazed. Do you guys have any tips or guidelines about how to approach this ?

The only rules i can think of right now are:

No dynamic keys possible in lua
Maybe use multi over lua

focused-jaguar · October 8, 2023, 11:45pm

One thing i would really like to see is a LMOVE with a COUNT option… I don’t understand how that’s not supported in Redis . Moving multiple items from one list to the other in a single atomic instruction seems such a basic functionality…

Is there a way to some how add something like that in DragonflyDB ?

inclusive-numbat · October 9, 2023, 7:06pm

You can easily mplement this with lua

jazzy-pug · October 10, 2023, 1:43am

8k jobs per second is not going to be a problem for a single shard if running on a fast machine, at least from BullMQ’s side.

jazzy-pug · October 10, 2023, 1:45am

But I think, it would not be unreasonable to consider using 8 queues, and partition your groups evenly among these 8 queues. You will then be able to use multithreading and easily surpass your performance requirements.

focused-jaguar · October 10, 2023, 1:46am

Interesting

focused-jaguar · October 10, 2023, 1:49am

I guess i under estimated what bull/DF can do on a single shard. I thought that in the end the consumers/producers would all fight each other

jazzy-pug · October 10, 2023, 1:51am

It will be a single shard in DFLY, but you can use as many workers as you need to perform the actual job. Workers can easily be instantiated in as many machines as you need, and/or use something like pm2 to utilize several cores in multicore instances.

jazzy-pug · October 10, 2023, 1:53am

Not sure if you have read this post: https://blog.taskforce.sh/implementing-mail-microservice-with-bullmq/

jazzy-pug · October 10, 2023, 1:53am

Although it is not using the Pro version I think it may be useful for you.

focused-jaguar · October 10, 2023, 2:41am

Thanks for your time, i think we gonna start by doing a poc with BullMQ (the none Pro version) and see from there.

jazzy-pug · October 10, 2023, 12:37pm

We just published some benchmarks results that you may be interested in: https://bullmq.io/news/101023/dragonfly-compatibility/

inclusive-numbat · October 10, 2023, 12:42pm

The preview image is so great, @jazzy-pug ! <:Laughing_Facepalm:1084929862810222622>

jazzy-pug · October 10, 2023, 12:42pm

Yeah, I like it a lot too

inclusive-numbat · October 10, 2023, 12:45pm

Thank you for publishing the article! Please note that we are still working on providing even more improvements to Dragonfly/BullMQ integration and the results will most likely improve event further. It’s just a start

jazzy-pug · October 10, 2023, 12:45pm

Yes, I am aware of it. I explain this on the article as well.

jazzy-pug · October 10, 2023, 12:47pm

Btw, have you explored the possibility of using LuaJIT ?

jazzy-pug · October 10, 2023, 12:48pm

Lua 5.4 is supposed to be 2x faster than 5.1, but LuaJIT something like 10x faster. All game engines use that version instead of the stock version.

focused-jaguar · October 10, 2023, 3:34pm

Damn that image xD

Thanks for the post, I’m gonna try to look into it this week

It’s interesting to see that consumers with 100 concurrency seem to reduce the speed of process jobs per second. It looks like a consumer bottleneck to me. Either that or they are fighting each other too much.

Also DF is running on 8t, pushing and consuming from more than 8 queues at the same time doesn’t sound optimal. Might be interesting to dig deeper into that ratio.

@inclusive-numbat Does DF hashtag implementation make sure to always reuse the same thread ? Are threads lock to their cpu core or do they move around ? All of those things would probably help with getting more performance for queues.

jazzy-pug · October 10, 2023, 4:35pm

The worker concurrency does not reduce speed, on the contrary. Do not confuse worker concurrency, which is capable of running works in parallel due to async calls in NodeJS, with Dragonfly concurrency support. The case with 1 queue and more than 1 thread is what causes a drop in performance and it is understood why it happens.

Topic		Replies	Views
Testing DragonflyDB with Bull Dragonfly Technical	36	122	June 4, 2023
Redis/Dragonfly stream does not deliver chunks, unreliable behaviour, or anything I am missing General	1	35	February 4, 2025
Having problems with large pipeline commands using the py-redis client Migration to Dragonfly	1	43	October 7, 2024
Cluster mode support Dragonfly Technical	16	113	June 8, 2023
Does pub/sub affect the throughput? Dragonfly Technical	16	65	August 1, 2023

Building a Message/Job-Queue Optimized for DragonflyDB

Related topics