Ceph Operations Guide
This is a guide for Ceph operations.

Installation (Helm)
Add the Rook Helm repository:
bashhelm repo add rook-release https://charts.rook.io/releaseInstall the operator:
bashhelm install --create-namespace --namespace rook-ceph rook-ceph rook-release/rook-ceph -f values.yamlInstall the cluster:
bashhelm install --create-namespace --namespace rook-ceph rook-ceph-cluster --set operatorNamespace=rook-ceph rook-release/rook-ceph-cluster -f values.yaml
Operator Configuration
Default Values for rook-ceph-operator
Pod Resource Requests & Limits:
yamlresources: requests: cpu: 20mGlobal Log Level:
yamllogLevel: INFOCSI Configuration:
RBD Provisioner Resources:
yamlcsiRBDProvisionerResource: | - name: csi-provisioner resource: requests: cpu: 10m ...RBD Plugin Resources:
yamlcsiRBDPluginResource: | - name: driver-registrar resource: requests: memory: 128Mi cpu: 5m limits: memory: 256Mi ...CephFS Provisioner and Plugin Resources (similar format).
NFS Provisioner and Plugin Resources (similar format).
Monitoring:
yamlmonitoring: enabled: true
Cluster Configuration
Toolbox:
yamltoolbox: enabled: true resources: requests: cpu: '10m'Ceph Cluster Specifications:
yamlcephClusterSpec: dashboard: port: 7000 labels: monitoring: release: prometheus-stack resources: mgr: requests: cpu: "50m" mon: requests: cpu: "100m" ... removeOSDsIfOutAndSafeToRemove: true
Removing OSDs
Stop the Rook Operator:
bashkubectl -n rook-ceph scale deployment rook-ceph-operator --replicas=0Mark OSD as out:
bashceph osd out osd.<ID>Confirm OSD is down:
bashkubectl -n rook-ceph scale deployment rook-ceph-osd-<ID> --replicas=0 ceph osd down osd.<ID>Wait for backfilling to complete (
active+cleanPGs).Remove the OSD:
bashceph osd purge <ID> --yes-i-really-mean-it ceph auth del osd.<ID> ceph osd crush remove <nodeName>Verify:
bashceph osd treeRestart the Rook Operator.
Disk Partitioning
List available disks:
bashsudo fdisk -lPartition a disk:
bashsudo fdisk /dev/sda # Use `n` to create and `w` to save.
Clearing Devices
Clear partitions:
bashsgdisk --zap-all $DISK dd if=/dev/zero of="$DISK" bs=1M count=100 oflag=direct,dsync
Exposing Monitoring GUI
Certificate Definition (Cert Manager):
yamlapiVersion: cert-manager.io/v1 kind: Certificate metadata: name: ceph-sololude-certificate namespace: istio-ingress spec: secretName: ceph-ingress-cert commonName: ceph.sololude.com dnsNames: - ceph.sololude.com issuerRef: name: sololude-issuerGateway Definition (Istio):
yamlapiVersion: networking.istio.io/v1beta1 kind: Gateway metadata: name: rook-ceph-dashboard-gw namespace: rook-ceph spec: selector: app: istio-ingressgateway servers: - port: number: 443 name: https-ceph protocol: HTTPS hosts: - ceph.sololude.com tls: mode: SIMPLE credentialName: ceph-ingress-cert - port: number: 80 name: http-ceph protocol: HTTP hosts: - ceph.sololude.comVirtual Service Definition (Istio):
yamlapiVersion: networking.istio.io/v1 kind: VirtualService metadata: name: ceph-gateway-vs namespace: rook-ceph spec: hosts: - ceph.sololude.com gateways: - rook-ceph-dashboard-gw http: - route: - destination: host: rook-ceph-mgr-dashboard
Issues and Troubleshooting
Service Port Change:
Set
cephClusterSpec.dashboard.port=7000in Helm values.
OSD Keyring Mismatch:
Retrieve keyrings and resolve mismatch.
Entity Exists with Key Mismatch:
Delete older auth:
bashceph auth del osd.x
Monitoring
Enable monitoring in Helm values:
yamlmonitoring: enabled: trueAdd labels for monitoring:
yamlcephClusterSpec.labels.monitoring={release: prometheus-stack}
Upgrade
Upgrade Helm:
bashcurl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bashUpgrade using Helm:
bashhelm upgrade -n rook-ceph rook-ceph rook-release/rook-ceph -f values.yaml
