Cluster Pools got marked read only, OSDs are near full.

Page content

In this post I will show you what can you do whet an OSD is full and the ceph cluster is locked.

OSDs should never be full in theory and administrators should monitor how full OSDs are. If OSDs are approaching 80% full, it’s time for the administrator to take action to prevent OSDs from filling up. Action can include re-weighting the OSDs in question and or adding more OSDs to the cluster.

# ceph osd dump | grep ratio
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85

By default, when OSDs reach ‘‘‘85%’’’ capacity, nearfull_ratio warning is triggered. By default when OSDs reach ‘‘‘90%’’’ capacity, backfillfull_ratio warning is triggered. At this point the cluster will deny backfilling to the OSD in question. By default when OSDs reach ‘‘‘95%’’’ capacity, full_ratio is triggered, all PGs (Placement Groups) on the OSDs in question will be marked Read Only, as well as all pools which are associated with the PGs on the OSD. The cluster is marked Read Only, to prevent corruption from occurring.

Check Osd usage

ceph --connect-timeout=5 osd df tree
ceph osd status

+----+--------+-------+-------+--------+---------+--------+---------+-----------+
| id |  host  |  used | avail | wr ops | wr data | rd ops | rd data |   state   |
+----+--------+-------+-------+--------+---------+--------+---------+-----------+
| 0  | ceph01 | 1352G | 1441G |    0   |     0   |    0   |     0   | exists,up |
| 1  | ceph02 | 1104G | 1689G |    0   |     0   |    0   |     0   | exists,up |
| 2  | ceph03 | 1800G |  994G |    0   |     0   |    0   |     0   | exists,up |
| 3  | ceph04 | 1764G | 1030G |    0   |     0   |    0   |     0   | exists,up |
| 4  | ceph01 | 1185G | 1608G |    0   |     0   |    0   |     0   | exists,up |
| 5  | ceph01 | 1107G | 1686G |    0   |     0   |    0   |     0   | exists,up |
| 6  | ceph02 |  614G |  316G |    0   |     0   |    0   |     0   | exists,up |
| 7  | ceph03 |  370G |  560G |    0   |     0   |    0   |     0   | exists,up |
| 8  | ceph03 |  411G |  520G |    0   |     0   |    0   |     0   | exists,up |
| 9  | ceph04 |  493G |  438G |    0   |     0   |    0   |     0   | exists,up |
| 10 | ceph04 |  285G |  645G |    0   |     0   |    0   |     0   | exists,up |
+----+--------+-------+-------+--------+---------+--------+---------+-----------+

To get the cluster out of this state, data needs to be pushed away or removed from the OSDs in question. In the below example it is a single OSD in question (osd.4), but there could be many OSDs that are marked full.
The first objective is to get the OSDs that are full below 95% capacity, so the cluster is not marked Read Only. It is possible to achieve this goal with a lower Weight value, .90, .85, .80, etc.

ceph osd set noout
ceph osd reweight 4 .85

Temprary Fix

ceph osd set-full-ratio .96 will change the full_ratio to 96% and remove the Read Only flag on OSDs which are 95% -96% full. If OSDs are 96% full it’s possible to set ceph osd set-full-ratio .97, however, do NOT set this value too high.

ceph osd set-backfillfull-ratio 91 will change the backfillfull_ratio to 91% and allow backfill to occur on OSDs which are 90-91% full. This setting is helpful when there are multiple OSDs which are full.

ceph osd set-nearfull-ratio .90
ceph osd set-backfillfull-ratio .95
ceph osd set-full-ratio .97

Fix by balance

Now we can add more OSD to the cluster or force the rebalance of the data. In this case I will do the second because I hawe no more space in my server for more OSD disks.

ceph balancer on
ceph balancer mode upmap
ceph balancer status
ceph balancer ls

I’m going to run the reweight by utilization command on my cluster; this is will automatically adjust the OSDs over storage utilization to redistribute pages and rebalance my cluster.

ceph osd reweight-by-utilization

moved 10 / 512 (1.95312%)
avg 51.2
stddev 21.9399 -> 22.4535 (expected baseline 6.78823)
min osd.10 with 20 -> 20 pgs (0.390625 -> 0.390625 * mean)
max osd.3 with 87 -> 90 pgs (1.69922 -> 1.75781 * mean)

oload 120
max_change 0.05
max_change_osds 4
average_utilization 0.4747
overload_utilization 0.5697
osd.6 weight 1.0000 -> 0.9500
osd.7 weight 1.0000 -> 0.9500
osd.8 weight 1.0000 -> 0.9500
osd.1 weight 1.0000 -> 0.9500

We can see that Ceph made adjustments to 4 OSDs and only lowered their weight by .05 like I mentioned just a small about is necessary. You’ll notice that the storage was redistributed, and now let’s look at my final result on how the reweighting looks for each OSD.

ceph osd tree

ID CLASS WEIGHT   TYPE NAME          STATUS REWEIGHT PRI-AFF 
-1       20.92242 root default                               
-3        8.18697     host ceph01                         
 0   hdd  2.72899         osd.0          up  1.00000 1.00000 
 4   hdd  2.72899         osd.4          up  1.00000 1.00000 
 5   hdd  2.72899         osd.5          up  1.00000 1.00000 
-5        3.63869     host ceph02                         
 1   hdd  2.72899         osd.1          up  0.80005 1.00000 
 6   hdd  0.90970         osd.6          up  0.75006 1.00000 
-7        4.54839     host ceph03                         
 2   hdd  2.72899         osd.2          up  1.00000 1.00000 
 7   hdd  0.90970         osd.7          up  0.75006 1.00000 
 8   hdd  0.90970         osd.8          up  0.75006 1.00000 
-9        4.54839     host ceph04                         
 3   hdd  2.72899         osd.3          up  0.95001 1.00000 
 9   hdd  0.90970         osd.9          up  1.00000 1.00000 
10   hdd  0.90970         osd.10         up  1.00000 1.00000 

If the percentig is below the 96 treshold you can configure back the ratios:

ceph osd set-nearfull-ratio .85
ceph osd set-backfillfull-ratio .90
ceph osd set-full-ratio .95