Unusual snapshot behaviour

adochan · January 28, 2026, 11:41am

Hi there,

We are currently deploying multiple dragonfly clusters which are configured to snapshot to S3. We are seeing a very odd problem with one of those clusters.

We have created a bucket - let’s call it df-snapshots and in each of the dragonfly instances, we define the snapshots as:

  snapshot:
    cron: '*/5 * * * *'
    dir: s3://df-snapshots/clusterX

where X is a number from 1 to 22.
You will notice that there is no trailing slash on the dir config and we found that our cluster1 cluster was failing to start up - it would log that it was searching for snapshot and then would eventually timeout and the readiness probes would kill it.
Without the slash in place, we realised that this was trying to read from cluster1* (which, at the time had a lot of snapshots in them). We resolved this by adding the slash to the dir and it looked like the problem was resolved - we restarted the pods and they came up instantly.
However, we later found that the same issue was happening again - pods failing while searching for snapshot on startup.
We can replicate this by deleting the cluster1 folder from S3, this will allow the pod to start with the log line:
W20260128 11:34:07.311956 1 server_family.cc:945] Load snapshot: No snapshot found
but as soon as a snapshot is created in that location and we roll the pod, we get the same hanging behaviour.
We can edit the DF config to point to a completely different folder - e.g. cluster123 and it appears to work just fine.

While trying to figure this out, I temporarily upgraded to 1.36.0 (we’re currently on 1.31.0) and saw an equally perplexing issue. In this case, pointing to the cluster1 folder works just fine - I can see snapshots being created and the pods can restart. However, the pods always indicate that
W20260128 11:34:07.311956 1 server_family.cc:945] Load snapshot: No snapshot found

What is going on here?

joezhou_df · February 3, 2026, 11:18am

Hi @adochan,

Thanks for sharing the issue with us. I suppose you are running Dragonfly using the K8s operator? Do you mind sharing more information, and I will ask the engineering team to check.

Dragonfly version: 1.31.0 or 1.36.0
Dragonfly K8s operator version: (i.e., 1.4.0)
OS: (i.e., Ubuntu 20.04)
Kernel: (i.e., using the command uname -a)
Containerized?: Kubernetes

adochan · February 3, 2026, 4:15pm

Here you go:

Dragonfly:
image: docker.dragonflydb.io/dragonflydb/dragonfly:v1.31.0

Operator:
image: docker.dragonflydb.io/dragonflydb/operator:v1.2.1

OS:
Ubuntu 22.04.5 LTS

Kernel:
Linux vector-state-gen1-0 6.12.63-84.121.amzn2023.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Dec 31 02:07:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Yeah this is running on k8s - EKS specifically.

adochan · February 17, 2026, 3:15pm

Hi @joezhou_df,

Did you look into this at all? I just returned to it and retested after significantly reducing the volume of data in our s3 bucket and now it appears to be functioning better - it’s now identifying the snapshots in the bucket in around 30 seconds and starting successfully. It seems like maybe there is some bad S3 path traversal happening in there somewhere.

Thanks

Topic		Replies	Views
Are snapshots working using Docker compose? Kubernetes	5	189	June 26, 2025
Error running dragonfly 1.4.0 Dragonfly Technical	3	34	July 1, 2023
exited with code 0 Dragonfly Technical	15	77	January 29, 2024
Unable to restore from .dfs unless --force_epoll=true Dragonfly Technical	1	85	July 17, 2025
dragonfly start error with " filename may not contain directory separators " Dragonfly Technical	6	100	August 2, 2023

Unusual snapshot behaviour

Related topics