In this post I will show you what can you do whet an OSD is full and the ceph cluster is locked.
OSDs should never be full in theory and administrators should monitor how full OSDs are. If OSDs are approaching 80% full, it’s time for the administrator to take action to prevent OSDs from filling up. Action can include re-weighting the OSDs in question and or adding more OSDs to the cluster.
# ceph osd dump | grep ratio full_ratio 0.95 backfillfull_ratio 0.9 nearfull_ratio 0.85
By default, when OSDs reach 85% capacity,
nearfull_ratio warning is triggered.
By default when OSDs reach 90% capacity,
backfillfull_ratio warning is triggered. At this point the cluster will deny backfilling to the OSD in question.
By default when OSDs reach 95% capacity,
full_ratio is triggered, all PGs (Placement Groups) on the OSDs in question will be marked Read Only, as well as all pools which are associated with the PGs on the OSD. The cluster is marked Read Only, to prevent corruption from occurring.
Check Osd usage:
ceph --connect-timeout=5 osd df tree
To get the cluster out of this state, data needs to be pushed away or removed from the OSDs in question. In the below example it is a single OSD in question (osd.52), but there could be many OSDs that are marked full.
The first objective is to get the OSDs that are full below 95% capacity, so the cluster is not marked Read Only. It is possible to achieve this goal with a lower Weight value, .90, .85, .80, etc.
ceph osd set noout ceph osd reweight 52 .85
ceph osd set-full-ratio .96 will change the
full_ratio to 96% and remove the Read Only flag on OSDs which are 95% -96% full. If OSDs are 96% full it’s possible to set
ceph osd set-full-ratio .97, however, do NOT set this value too high.
ceph osd set-backfillfull-ratio 91 will change the
backfillfull_ratio to 91% and allow backfill to occur on OSDs which are 90-91% full. This setting is helpful when there are multiple OSDs which are full.
ceph osd set-nearfull-ratio .90 ceph osd set-backfillfull-ratio .95 ceph osd set-full-ratio .97
Now we can add more OSD to the cluster or force the rebalance of the data. In this case I will do the second because I hawe no more space in my server for more OSD disks.
ceph balancer on ceph balancer mode upmap ceph balancer status ceph balancer ls
If the percentig is below the 96 treshold you can configure back the ratios:
ceph osd set-nearfull-ratio .85 ceph osd set-backfillfull-ratio .90 ceph osd set-full-ratio .95