SSD Tiering and docker

Hi,

I’m new to Dragonfly, testing it to replace Redis. I’m unable to run Dragonfly in a docker container with ssd tiering enabled. Even the simplest case doesn’t work:
docker run --network=host --ulimit memlock=-1 -v /tmp/data:/data docker.dragonflydb.io/dragonflydb/dragonfly --tiered_prefix /data --tiered_max_file_size=20G

When I try to run it, it gives me:
I20241024 11:52:26.835927 1 dfly_main.cc:693] Starting dragonfly df-v1.24.0-7870f594660539f48894e37a8fd9d6a133fa21ff
I20241024 11:52:26.836308 1 dfly_main.cc:737] maxmemory has not been specified. Deciding myself…
I20241024 11:52:26.836325 1 dfly_main.cc:746] Found 15.13GiB available memory. Setting maxmemory to 12.10GiB
W20241024 11:52:26.836359 1 dfly_main.cc:370] Weird error 1 switching to epoll
I20241024 11:52:26.914208 1 proactor_pool.cc:147] Running 2 io threads
I20241024 11:52:26.916714 1 engine_shard_set.cc:83] Max file size is: 20.00GiB
F20241024 11:52:26.919064 13 engine_shard.cc:435] Only ioring based backing storage is supported. Exiting…
*** Check failure stack trace: ***
F20241024 11:52:26.919067 12 engine_shard.cc:435] Only ioring based backing storage is supported. Exiting…
*** Check failure stack trace: ***
@ 0x5a56f2b85343 google::LogMessage::SendToLog()
@ 0x5a56f2b85343 google::LogMessage::SendToLog()
@ 0x5a56f2b7db07 google::LogMessage::Flush()
@ 0x5a56f2b7db07 google::LogMessage::Flush()
@ 0x5a56f2b7f48f google::LogMessageFatal::~LogMessageFatal()
@ 0x5a56f2b7f48f google::LogMessageFatal::~LogMessageFatal()
@ 0x5a56f238de4e dfly::EngineShard::InitTieredStorage()
@ 0x5a56f23976ec _ZN5boost7context6detail11fiber_entryINS1_12fiber_recordINS0_5fiberEN4util3fb219FixedStackAllocatorEZNS6_6detail15WorkerFiberImplIKZNS5_12ProactorPool15AwaitFiberOnAllIZN4dfly14EngineShardSet4InitEjSt8functionIFvvEEEUljPNS6_12ProactorBaseEE0_Li0EEEvOT_EUljSI_E_JRjRSI_EEC4IS7_EESt17basic_string_viewIcSt11char_traitsIcEERKNS0_12preallocatedESL_OSN_SO_SP_EUlOS4_E_EEEEvNS1_10transfer_tE
@ 0x5a56f238de4e dfly::EngineShard::InitTieredStorage()
@ 0x5a56f23976ec _ZN5boost7context6detail11fiber_entryINS1_12fiber_recordINS0_5fiberEN4util3fb219FixedStackAllocatorEZNS6_6detail15WorkerFiberImplIKZNS5_12ProactorPool15AwaitFiberOnAllIZN4dfly14EngineShardSet4InitEjSt8functionIFvvEEEUljPNS6_12ProactorBaseEE0_Li0EEEvOT_EUljSI_E_JRjRSI_EEC4IS7_EESt17basic_string_viewIcSt11char_traitsIcEERKNS0_12preallocatedESL_OSN_SO_SP_EUlOS4_E_EEEEvNS1_10transfer_tE
@ 0x5a56f298a96f make_fcontext
*** SIGABRT received at time=1729770746 on cpu 0 ***
PC: @ 0x782ec74c09fc (unknown) pthread_kill
[failure_signal_handler.cc : 345] RAW: Signal 11 raised at PC=0x782ec7452898 while already in AbslFailureSignalHandler()
*** SIGSEGV received at time=1729770746 on cpu 0 ***
PC: @ 0x782ec7452898 (unknown) abort

I tried on a VM and on bare metal, same result. The hard drive is nvme ssd. When I run it in a docker container but without ssd tiering then everything is fine. When I run it as a standalone app (without docker) and with ssd tiering then it works as well. But combination of the two (docker+ssd tiering) doesn’t work.

^ This is where Dragonfly decides to exit. According to our doc, io_uring API is a critical requirement for SSD tiering.

io_uring is something I am not familiar with, but I searched it for you. In the meantime, I will ask engineers as well. Also note that SSD tiering is an experimental feature at the moment (2024-10), so issues like this might be expected.

Here’s my search result:

Docker’s support for io_uring has evolved over time, but currently io_uring is not fully supported in Docker’s default configuration. Using Docker is convenient for many development and operational tasks, but for features that depend on specific Linux kernel capabilities, running on native Linux is often necessary.

Stay tuned.

Thanks, that helps a lot. In worst case I’ll just run it standalone, but there’s another thing that worries me:

Dragonfly’s data tiering focuses on string values exceeding 64 characters in size

Do you know if this ‘64 characters’ can be configured in any way? We’ve got almost 100GB of data, but no single value exceeds this limit. We wanted to use SSD tiering when it gets stable enough, but this requirement makes it useless in our application.

I don’t think it’s configurable at the moment. Maybe it would be in the future, but keep in mind that offloading to SSD has some tradeoffs. I suppose that’s how the design decision was made to start with 64 characters.

Maybe you can share more about your workload, and we can help to suggest.

Just checked with the team – Docker has restrictions around io_uring (similar issues here and here), and while --security-opt seccomp=unconfined can be a workaround, consider the maintenance and operational needs with this approach. For best performance and other reasons, we also recommend running Dragonfly on bare metal in many cases.

Dragonfly provides efficient in-memory capabilities and is ideal for many fast-response use cases. For small string values, Dragonfly SSD tiering does not help very well. Note that it’s a feature to balance memory and disk usage. And unlike snapshotting, it’s not a feature for persistency.

If persistency is a high priority with your use case, other options like ScyllaDB (Cassandra-compatible) or Pika (Redis-compatible with a focus on persistency) may be worth exploring. We want to suggest the best possible solution for you, and if there’s a better fit for Dragonfly, we are always here to help.

Thanks a lot. Currently we use Redis. The thing is that we do have pretty big amount of data (almost 100GB, all of them relatively small keys) which forces us to use at least 128GB VMs. Needless to say that the startup time of Redis with such a database is just pathetic (several minutes).
The way we use the data is: less than 10% of data is being actively used, while remaining 90+% could be stored on a hard drive, with lazy access. We do accept slower access to the data on hard drive. So for us SSD tiering seemed to be quite intersting option. But it looks like it doesn’t fit our needs. I’ll take a look at Pika. Something Redis-compatible would be ideal, as we use features like pub-sub and so on.

Yeah, Pika is purely on disk.

^ When that’s the case, I agree that Dragonfly SSD tiering is a very strong fit… if we don’t have the 64 char limit.