Replace CEPH SSD journal disk
In this post I will show you how you can change the end of life journal SSD in Ceph.
Replace a SSD disk used as journal for filestore
Let’s suppose that we need to replace /dev/nvme0n1. This device is used for journal for osd.10 and osd.11:
[root@ceph-osd-02 ~]# ceph device ls | grep ceph-osd-02
ST6000NM0115-1YZ110_ZAD5KF07 ceph-osd-02:sda osd.10
ST6000NM0115-1YZ110_ZAD5N8P7 ceph-osd-02:sdb osd.11
Samsung_SSD_970_EVO_Plus_250GB_S4EUNJ0N111052K ceph-osd-02:nvme0n1 osd.10 osd.11
Let’s tell ceph to not rebalance the cluster as we stop these OSDs for maintenance:
[root@ceph-osd-02 ~]# ceph osd set noout
Let’s stop the affected OSDs:
[root@ceph-osd-02 ~]# systemctl stop ceph-osd@10.service
[root@ceph-osd-02 ~]# systemctl stop ceph-osd@11.service
Let’s flush the journals for these OSDs:
[root@ceph-osd-02 ~]# ceph-osd -i 10 --flush-journal
[root@ceph-osd-02 ~]# ceph-osd -i 11 --flush-journal
Backup nvme0n1 partition table.
[root@ceph-osd-02 ~]# sfdisk -l /dev/nvme0n1 > nvme0n1.partition.table.txt
Let’s replace the device nvme0n1. In case, let’s zap it:
[root@ceph-osd-02 ~]# ceph-disk zap /dev/nvme0n1
Let’s partition the new disk, using this script:
#!/bin/bash
osds="10 11"
journal_disk=/dev/nvme0n1
part_number=0
for osd_id in $osds; do
part_number=$((part_number+1))
journal_uuid=$(cat /var/lib/ceph/osd/ceph-$osd_id/journal_uuid)
echo "journal_uuid: ${journal_uuid}"
echo "part_number: ${part_number}"
sgdisk --new=${part_number}:0:+30720M --change-name=${part_number}:'ceph journal' --partition-guid=${part_number}:$journal_uuid --typecode=${part_number}:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- $journal_disk
done
OR with backup:
[root@ceph-osd-02 ~]# sfdisk /dev/nvme0n1 < nvme0n1.partition.table.txt
Then:
[root@ceph-osd-02 ~]# ceph-osd --mkjournal -i 10
[root@ceph-osd-02 ~]# ceph-osd --mkjournal -i 11
Let’s restart the osds:
[root@ceph-osd-02 ~]# systemctl restart ceph-osd@10.service
[root@ceph-osd-02 ~]# systemctl restart ceph-osd@11.service
Finally:
[root@ceph-osd-02 ~]# ceph osd unset noout
Replace a SSD disk used as db for bluestore
Let’s suppose that we need to replace /dev/nvme0n1. This device is used for journal for osd.10 and osd.11:
[root@ceph-osd-02 ~]# ceph device ls | grep ceph-osd-02
ST6000NM0115-1YZ110_ZAD5KF07 ceph-osd-02:sda osd.10
ST6000NM0115-1YZ110_ZAD5N8P7 ceph-osd-02:sdb osd.11
Samsung_SSD_970_EVO_Plus_250GB_S4EUNJ0N111052K ceph-osd-02:nvme0n1 osd.10 osd.11
Check the LVM paricioning on the nvme0n1:
[root@ceph-osd-02 ~]# lsblk
sda 8:0 0 5.5T 0 disk
└─ceph--a2d09b40--caa4--4720--8953--5e86750da005-osd--block--de012ee4--60c4--4623--a98c--20b3256a6587 253:6 0 5.5T 0 lvm
sdb 8:16 0 5.5T 0 disk
└─ceph--01fefec3--2549--40dc--b03e--ea1cbf0c22f1-osd--block--df90bd50--cd26--4306--8c4a--6d97148870e8 253:8 0 5.5T 0 lvm
...
nvme0n1 259:0 0 232.9G 0 disk
├─ceph--2dd99fb0--5e5a--4795--a14d--8fea42f9b4e9-osd--db--6463679d--ccd6--4988--a4fa--6bb0037b8f7a 253:5 0 115G 0 lvm
└─ceph--2dd99fb0--5e5a--4795--a14d--8fea42f9b4e9-osd--db--3b39c364--92cb--41c4--8150--ce7f4bdb4b2c 253:7 0 115G 0 lvm
vgdisplay -v
We find the volume groups used for db and block. This physical devices in our case it is:
db: ceph-2dd99fb0-5e5a-4795-a14d-8fea42f9b4e9
block1: ceph-a2d09b40-caa4-4720-8953-5e86750da005
block2: ceph-01fefec3-2549-40dc-b03e-ea1cbf0c22f1
[root@ceph-osd-02 ~]# vgdisplay -v ceph-2dd99fb0-5e5a-4795-a14d-8fea42f9b4e9
--- Volume group ---
VG Name ceph-2dd99fb0-5e5a-4795-a14d-8fea42f9b4e9
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 5
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 2
Open LV 2
Max PV 0
Cur PV 1
Act PV 1
VG Size 232.88 GiB
PE Size 4.00 MiB
Total PE 59618
Alloc PE / Size 58880 / 230.00 GiB
Free PE / Size 738 / 2.88 GiB
VG UUID f29Vag-1PrI-fo7x-Dvhm-TNDl-2cfY-5hFY33
--- Logical volume ---
LV Path /dev/ceph-2dd99fb0-5e5a-4795-a14d-8fea42f9b4e9/osd-db-6463679d-ccd6-4988-a4fa-6bb0037b8f7a
LV Name osd-db-6463679d-ccd6-4988-a4fa-6bb0037b8f7a
VG Name ceph-2dd99fb0-5e5a-4795-a14d-8fea42f9b4e9
LV UUID UWFXM5-5ZmF-Kb4f-jTqc-KuqZ-IWc7-UjHhXY
LV Write Access read/write
LV Creation host, time ceph-osd-02, 2021-06-04 21:48:01 +0200
LV Status available
# open 12
LV Size 115.00 GiB
Current LE 29440
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:5
--- Logical volume ---
LV Path /dev/ceph-2dd99fb0-5e5a-4795-a14d-8fea42f9b4e9/osd-db-3b39c364-92cb-41c4-8150-ce7f4bdb4b2c
LV Name osd-db-3b39c364-92cb-41c4-8150-ce7f4bdb4b2c
VG Name ceph-2dd99fb0-5e5a-4795-a14d-8fea42f9b4e9
LV UUID e52dYE-FuRK-TMPv-U8vx-38pv-KdfE-R0fwBo
LV Write Access read/write
LV Creation host, time ceph-osd-02, 2021-06-04 21:48:30 +0200
LV Status available
# open 12
LV Size 115.00 GiB
Current LE 29440
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:7
--- Physical volumes ---
PV Name /dev/nvme0n1
PV UUID 4bVZmc-Vku7-rWPd-RHrn-xUFf-WrYb-yidijN
PV Status allocatable
Total PE / Free PE 59618 / 738
I.e. it is used as db for OSD 10-11
Let’s ‘disable these OSDs’:
[root@c-osd-5 /]# ceph osd crush reweight osd.10 0
reweighted item id 10 name 'osd.10' to 0 in crush map
[root@c-osd-5 /]# ceph osd crush reweight osd.11 0
reweighted item id 11 name 'osd.11' to 0 in crush map
Wait that the status is HEALTH-OK. Then destroy the osds(be sure to have saved the mappings first !!):
[root@c-osd-5 /]# ll /var/lib/ceph/osd/ceph-10/ | grep block
lrwxrwxrwx 1 ceph ceph 93 Jun 4 21:48 block -> /dev/ceph-a2d09b40-caa4-4720-8953-5e86750da005/osd-block-de012ee4-60c4-4623-a98c-20b3256a6587
lrwxrwxrwx 1 ceph ceph 90 Jun 4 21:48 block.db -> /dev/ceph-2dd99fb0-5e5a-4795-a14d-8fea42f9b4e9/osd-db-6463679d-ccd6-4988-a4fa-6bb0037b8f7a
[root@c-osd-5 /]# ll /var/lib/ceph/osd/ceph-11/ | grep block
lrwxrwxrwx 1 ceph ceph 93 Jun 4 21:48 block -> /dev/ceph-01fefec3-2549-40dc-b03e-ea1cbf0c22f1/osd-block-df90bd50-cd26-4306-8c4a-6d97148870e8
lrwxrwxrwx 1 ceph ceph 90 Jun 4 21:48 block.db -> /dev/ceph-2dd99fb0-5e5a-4795-a14d-8fea42f9b4e9/osd-db-3b39c364-92cb-41c4-8150-ce7f4bdb4b2c
[root@ceph-osd-02 ~]# ceph osd out osd.10
[root@ceph-osd-02 ~]# ceph osd out osd.11
[root@ceph-osd-02 ~]# ceph osd crush remove osd.10
[root@ceph-osd-02 ~]# ceph osd crush remove osd.11
[root@ceph-osd-02 ~]# systemctl stop ceph-osd@10.service
[root@ceph-osd-02 ~]# systemctl stop ceph-osd@11.service
[root@ceph-osd-02 ~]# ceph auth del osd.10
[root@ceph-osd-02 ~]# ceph auth del osd.11
[root@ceph-osd-02 ~]# ceph osd rm osd.10
[root@ceph-osd-02 ~]# ceph osd rm osd.11
[root@ceph-osd-02 ~]# umount /var/lib/ceph/osd/ceph-10
[root@ceph-osd-02 ~]# umount /var/lib/ceph/osd/ceph-11
Destroy the volume group created on this SSD disk (be sure to have saved the vgdisplay output first !):
[root@ceph-osd-02 ~]# vgdisplay -v ceph-2dd99fb0-5e5a-4795-a14d-8fea42f9b4e9 > ceph-vg.txt
[root@ceph-osd-02 ~]# vgremove ceph-2dd99fb0-5e5a-4795-a14d-8fea42f9b4e9
Replace the SSD disk. Suppose the new one is always called vdk. Recreate volume group and logical volume (refer to the previous vgdisplay output):
[root@c-osd-5 /]# vgcreate ceph-2dd99fb0-5e5a-4795-a14d-8fea42f9b4e9 /dev/vdk
Physical volume "/dev/vdk" successfully created.
Volume group "ceph-2dd99fb0-5e5a-4795-a14d-8fea42f9b4e9" successfully created
[root@c-osd-5 /]# lvcreate -L 115GB -n osd-db-6463679d-ccd6-4988-a4fa-6bb0037b8f7a ceph-2dd99fb0-5e5a-4795-a14d-8fea42f9b4e9
Logical volume "osd-db-6463679d-ccd6-4988-a4fa-6bb0037b8f7a" created.
[root@c-osd-5 /]# lvcreate -L 115GB -n osd-db-3b39c364-92cb-41c4-8150-ce7f4bdb4b2c ceph-2dd99fb0-5e5a-4795-a14d-8fea42f9b4e9
[root@c-osd-5 /]#
Let’s do a lvm zap:
[root@c-osd-5 /]# ceph-volume lvm zap /var/lib/ceph/osd/ceph-10/block
[root@c-osd-5 /]# ceph-volume lvm zap /var/lib/ceph/osd/ceph-11/block
Let’s create the OSD:
[root@c-osd-5 /]# ceph-volume lvm create --bluestore --data ceph-a2d09b40-caa4-4720-8953-5e86750da005/osd-block-de012ee4-60c4-4623-a98c-20b3256a6587 --block.db ceph-2dd99fb0-5e5a-4795-a14d-8fea42f9b4e9/osd-db-6463679d-ccd6-4988-a4fa-6bb0037b8f7a
[root@c-osd-5 /]# ceph-volume lvm create --bluestore --data ceph-01fefec3-2549-40dc-b03e-ea1cbf0c22f1/osd-block-df90bd50-cd26-4306-8c4a-6d97148870e8 --block.db ceph-2dd99fb0-5e5a-4795-a14d-8fea42f9b4e9/ceph-2dd99fb0-5e5a-4795-a14d-8fea42f9b4e9