and everything should either be connected by 10G or 40G, not sure about the specifics on that
there is nothing to tune on dragonfly for sure
you can run memtier locally
i’ll try that
colocated with dragonfly
so basically you have 96 threads
you can run dragonfly with --proactor_threads=48
so it will occupy the first 48 cpus
and you can run memtier with taskset -c 48-95 memtier_benchmark --threads 48 --ratio=1:0 -n 100000 -c 20
then they won’t fight over the cpus
i’ll try to run that at some point this week, i’m not immediately concerned with the results and i’ll await the next release. Thanks so much for taking a look and assisting with troubleshooting!
Just updated to 1.13.0 on our QA vm and I was still seeing strange numbers, although this time I think it was related to my overall misunderstanding of how the values should be used in conjunction with each other…
I settled on this prometheus query:
sum by(cmd, agent_hostname) (rate(dragonfly_commands_duration_seconds_total{agent_hostname=~"$agent_hostname"}[$__rate_interval]))/sum by(cmd, agent_hostname) (rate(dragonfly_commands_total{agent_hostname=~"$agent_hostname"}[$__rate_interval]))
This takes the total latency and does the math to ensure that it’s per command (and averaged). I had naievely assumed that the total duration when rate()'d was equivalent to the average latency. Now the numbers I’m seeing much more closely match those of the usec_per_call value in INFO commandstats . Think i’m all good here! Will let you know if I see anything strange on the production deployment later today.
so, @spirited-jay was right here: https://discord.com/channels/981533931486724126/1179828110590488646/1180803147241885806
![]()
Just deployed production and all the values in usec_per_call line up with the dashboard
I really appreciate all the help and maybe some day soon i’ll update with how everything is going on our end with the migrations away from Redis + Memcached in our colo