Building a Message/Job-Queue Optimized for DragonflyDB

I’ve been looking for a message/job queue for a while now and I haven’t find any good options… One of my top requirement is the ability to support QoS based on some arbitrary metric aka the noisy neighbor problem. For example, we have many customers that tries to send a lot of emails at the same time. Using a fifo means the customer sending 1m emails will use most of the resources and the customer sending 100 emails may not see any progress until the 1m “block” is done processing.

We already made a message queue system based on a single Redis instance (replicated). We are evaluating adding new feature or switching to something pre-made (not necessarily Redis based). Our MQ wasn’t design for a Redis Cluster and because of how Redis distribute the load via hash slots, I don’t think it ever will. Being on Redis, I’ve looked at BullMQ Pro which feature wise is like 90% of what we want, but because of the --cluster_mode=emulated --lock_on_hashtags, BullMQ suffers from the same problem and won’t scale unless we make multiple queues which we are trying to avoid. We want to scale our MQ to 250-500k item/min (~4-8k/sec) and i don’t think it would be possible for a single shard to handle that load…

So to achieve our scale, i was thinking of using DragonflyDB and build a new message queue around the new capabilities it offers. Redis Clusters seems to be an ops nightmare, i would rather vertically scale and remove all the hash slot bs. I do understand there is a limit to v-scale but we are on GCP, if we hit that limit, i will be amazed. Do you guys have any tips or guidelines about how to approach this ?

The only rules i can think of right now are:

  • No dynamic keys possible in lua
  • Maybe use multi over lua

One thing i would really like to see is a LMOVE with a COUNT option… I don’t understand how that’s not supported in Redis :roll_eyes:. Moving multiple items from one list to the other in a single atomic instruction seems such a basic functionality…

Is there a way to some how add something like that in DragonflyDB ?

You can easily mplement this with lua

8k jobs per second is not going to be a problem for a single shard if running on a fast machine, at least from BullMQ’s side.

But I think, it would not be unreasonable to consider using 8 queues, and partition your groups evenly among these 8 queues. You will then be able to use multithreading and easily surpass your performance requirements.

Interesting

I guess i under estimated what bull/DF can do on a single shard. I thought that in the end the consumers/producers would all fight each other

It will be a single shard in DFLY, but you can use as many workers as you need to perform the actual job. Workers can easily be instantiated in as many machines as you need, and/or use something like pm2 to utilize several cores in multicore instances.

Not sure if you have read this post: https://blog.taskforce.sh/implementing-mail-microservice-with-bullmq/

Although it is not using the Pro version I think it may be useful for you.

Thanks for your time, i think we gonna start by doing a poc with BullMQ (the none Pro version) and see from there.

We just published some benchmarks results that you may be interested in: https://bullmq.io/news/101023/dragonfly-compatibility/

The preview image is so great, @jazzy-pug ! <:Laughing_Facepalm:1084929862810222622>

Yeah, I like it a lot too :slightly_smiling_face:

Thank you for publishing the article! Please note that we are still working on providing even more improvements to Dragonfly/BullMQ integration and the results will most likely improve event further. It’s just a start :slightly_smiling_face:

Yes, I am aware of it. I explain this on the article as well.

Btw, have you explored the possibility of using LuaJIT ?

Lua 5.4 is supposed to be 2x faster than 5.1, but LuaJIT something like 10x faster. All game engines use that version instead of the stock version.

Damn that image xD

Thanks for the post, I’m gonna try to look into it this week

It’s interesting to see that consumers with 100 concurrency seem to reduce the speed of process jobs per second. It looks like a consumer bottleneck to me. Either that or they are fighting each other too much.

Also DF is running on 8t, pushing and consuming from more than 8 queues at the same time doesn’t sound optimal. Might be interesting to dig deeper into that ratio.

@inclusive-numbat Does DF hashtag implementation make sure to always reuse the same thread ? Are threads lock to their cpu core or do they move around ? All of those things would probably help with getting more performance for queues.

The worker concurrency does not reduce speed, on the contrary. Do not confuse worker concurrency, which is capable of running works in parallel due to async calls in NodeJS, with Dragonfly concurrency support. The case with 1 queue and more than 1 thread is what causes a drop in performance and it is understood why it happens.