Installing Ceph on K8s by Helm

Posted by Yan on January 2, 2019

This blog is written at Jan 2019. And the k8s and ceph-helm version used are:

  • k8s: v1.12.2
  • ceph chart: 743a7441ba4361866a6e017b4f8fa5c15e34e640, which is the head of the master branch of ceph-helm.

What is ceph

In a simple term:

  • Ceph is a opensource software storage platform. The core members of ceph have included Canonical, Cisco, Red Hat, Intel e.g.

  • Ceph implements object storage and provides object-, block- and file-level storage

Why helm

Helm is a package management tool for kubernetes. There are basically two parts for Helm, the helm client and tiller. According to its documentation, Helm is very easy to install to a k8s cluster:

    #!/bin/bash

    # install helm client
    brew install kubernetes-helm

    # initial Helm
    helm init

    # if you need change the helm repo or for some reason you cannot the offical
    # resource of helm which is hosted by google cloud
    helm init --upgrade -i \
    registry.cn-hangzhou.aliyuncs.com/google_containers/tiller:v2.12.0 \
    --stable-repo-url https://kubernetes.oss-cn-hangzhou.aliyuncs.com/charts \
    --node-selectors "tiller-enabled"="true"

    # upgrade tiller
    helm init --upgrade

    # delete or reinstall tiller
    # besides, most of time, you will don't need delete the helm chart already
    # deployed in the k8s by helm.
    # Those helm-charts will be manageable after the tiller is re-installed
    kubectl delete deployment tiller-deploy --namespace kube-system

    # or run
    helm reset

    # then reinstall tiller.
    helm init

Install Ceph by Helm

1. Build the ceph-chart

Checking out and building the ceph helm chart is the very first thing the offical tutorial introduced. By reading this issue on the offical ceph helm chart repo, the ceph-helm in official helm repo is very outdated.

    #!/bin/bash

    # initial a local helm repo
    helm serve &
    helm repo add local http://localhost:8879/charts

    # checkout the ceph helm chart repo, and build
    git clone https://github.com/ceph/ceph-helm
    cd ceph-helm/ceph
    make

Once above done, the helm will use the local helm chart to install ceph into your k8s.

2. Install Ceph Cluster into K8s

Below is a typical ceph config file for k8s:

    #!/bin/bash

    cat ~/ceph-overrides.yaml
    # network:
    #   public:   10.244.0.0/16
    #   cluster:   10.244.0.0/16

    # osd_devices:
    #   - name: dev-sdb
    #     device: /dev/sdb
    #     zap: "1"
    # storageclass:
    #   name: ceph-rbd
    #   pool: rbd
    #   user_id: k8s

    # create namespace for ceph
    kubectl create namespace ceph

    # configure the ceph rbac in k8s by the rbac.yaml in the ceph chart repo
    kubectl create -f ~/ceph-helm/ceph/rbac.yaml

    # label the k8s node for deploying the ceph monitor and ceph manager
    kubectl label node <nodename> ceph-mon=enabled ceph-mgr=enabled

    # label the k8s node for deploying the ceph osd agent and the actual machine
    # contains the hard drive
    kubectl label node <nodename> ceph-osd=enabled ceph-osd-device-dev-sdb=enabled

    # to remove the ceph osd for some reason,
    # you just need delete the ceph-osd-device-dev-sdb=enabled label
    kubectl label node <nodename> ceph-osd-device-dev-sdb-

    # now, deploy ceph cluster by helm
    helm install --name=ceph local/ceph --namespace=ceph -f ceph-overrides.yaml

Make sure the public and cluster IP are same and covered by the actual host’s IP range. Otherwise, the ceph network will not function well.

The osd_devices part is used to define the physical hard drive that will be used as the storage device. You must make sure that the hard drive is clean and not contain any partition. Otherwise, you can use the following command to checkout and delete those partitions:

    #!/bin/bash

    # check/dev/sdb the status of sbd
    lsblk /dev/sdb
    # NAME   MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
    # vdb    252:16   0  40G  0 disk
    # |-vdb1 252:17   0  35G  0 part
    # `-vdb2 252:18   0   5G  0 part

    # remove the header info by clean the very first 110 mb
    dd if=/dev/zero of=/dev/sdb1 bs=1M count=110 oflag=sync
    dd if=/dev/zero of=/dev/sdb2 bs=1M count=110 oflag=sync

    # delete the partition by parted
    parted /dev/sdb rm 1
    parted /dev/sdb rm 2

You will need run this every time you reinstall ceph cluster and want ceph to use that hard drive as its storage again.

After you’ve created the ceph chart, you might have an unhealthy cluster instead of a healthy one in the offical tutorial. Don’t worry about this, if it’s just unhealthy caused by unactive pgs. for example my one:

ceph-unhealth.png

3. Configure the keys for a pod to use PVC

Now we need configure the keys for a pod in a namespace so that it can use the persistent volume claim.

    #!/bin/bash

    # go to the ceph-mon pod
    kubectl -n ceph exec -ti ceph-mon-cppdk -c ceph-mon -- bash
    # and get the auth key
    ceph auth get-or-create-key client.k8s mon 'allow r' osd \
    'allow rwx pool=rbd'  | base64
    # QVFCLzdPaFoxeUxCRVJBQUVEVGdHcE9YU3BYMVBSdURHUEU0T0E9PQo=
    exit

    # edit the pod to use the key
    kubectl -n ceph edit secrets/pvc-ceph-client-key

    # apiVersion: v1
    # data:
    # key: QVFCLzdPaFoxeUxCRVJBQUVEVGdHcE9YU3BYMVBSdURHUEU0T0E9PQo=
    # kind: Secret
    # metadata:
    # creationTimestamp: 2017-10-19T17:34:04Z
    # name: pvc-ceph-client-key
    # namespace: ceph
    # resourceVersion: "8665522"
    # selfLink: /api/v1/namespaces/ceph/secrets/pvc-ceph-client-key
    # uid: b4085944-b4f3-11e7-add7-002590347682
    # type: kubernetes.io/rbd

    # Create a Pod that consumes a RBD in the default namespace.
    # Copy the user secret from the ceph namespace to default
    kubectl -n ceph get secrets/pvc-ceph-client-key -o json | \
    jq '.metadata.namespace = "default"' | kubectl create -f -
    # secret "pvc-ceph-client-key" created

    # checkout all hte secrets in default namespace
    kubectl get secrets
    # NAME                  TYPE                                  DATA      AGE
    # default-token-r43wl   kubernetes.io/service-account-token   3         61d
    # pvc-ceph-client-key   kubernetes.io/rbd                     1         20s

4. Adjust the Ceph Pool size

It’s almost impossible to have a healthy ceph cluster after first installation. Now, let’s fix this. First, create a rbd pool:

    # create a rbd pool
    kubectl -n ceph exec -ti ceph-mon-cppdk -c ceph-mon -- ceph osd pool create rbd 256
    # pool 'rbd' created

    # initial the rbd pool
    kubectl -n ceph exec -ti ceph-mon-cppdk -c ceph-mon -- rbd pool init rbd

From what I’ve noticed, the ceph status depends on the osd size and pool size. My ceph cluster has configured two osd at the very beginning but relatively large pool size. And I’ve tried several times, none of those tries have successful ceph cluster. The main reason is the default pool size isn’t proper for a small osd cluster. To adjust the pool size:

    #!/bin/bash

    # see all the pool's name and it's size
    ceph osd lspools

    # checkout the pool name and its  size
    ceph osd dump | grep 'replicated size'

    # use ceph osd pool set adjust the pool size
    # ceph osd pool set {poolname} size {num-replicas}
    ceph osd pool set .rgw.root size 1
    ceph osd pool set cephfs_data size 1
    ceph osd pool set cephfs_metadata size 1
    ceph osd pool set default.rgw.control size 1
    ceph osd pool set default.rgw.meta size 1
    ceph osd pool set default.rgw.log size 1

For my cluster, I simply reduce all the pool size to 1 since I only have 2 osd devices. Afterwards, the ceph cluster status should turn to health.

    #!/bin/bash

    # check ceph status
    kubectl -n ceph exec -ti ceph-mon-xm7df -c ceph-mon -- ceph -s
    # cluster:
    #     id:     9ab761d6-60bd-4197-9513-2dfe47b3cf57
    #     health: HEALTH_OK

    # services:
    #     mon: 1 daemons, quorum k8s-node1
    #     mgr: k8s-node1(active)
    #     mds: cephfs-1/1/1 up  {0=mds-ceph-mds-68c79b5cc-6r54r=up:active}
    #     osd: 2 osds: 2 up, 2 in
    #     rgw: 1 daemon active

    # data:
    #     pools:   7 pools, 198 pgs
    #     objects: 299 objects, 136 MB
    #     usage:   350 MB used, 11163 GB / 11164 GB avail
    #     pgs:     198 active+clean

5. Install Ceph client on host machine

Installing ceph client into the host machine is missing in all the tutorials about installing ceph by Helm. This is because rbd will be used when Kubelet creates and mounts the pvc to the pod. If you you didn’t install ceph client, you may see the following errors or similar and the pod will be stucked at Container Creating stage:

rbd: Error creating rbd image: executable file not found in $PATH

There is no extra configuration needed to be done after installing rbd or ceph-client.

Install ceph-common package on to your host machine which has the ceph-mon and ceph-mgr running is all you need.

#!/bin/bash

yum install python-rbd
yum install ceph-common

6. Test the Ceph cluster

In order to tell if the ceph cluster is working or not, we can create a pvc and mount it to a pod:

    #!/bin/bash

    # the pvc configuration file
    cat pvc-rbd.yaml
    # kind: PersistentVolumeClaim
    # apiVersion: v1
    # metadata:
    # name: ceph-pvc
    # spec:
    # accessModes:
    # - ReadWriteOnce
    # resources:
    #     requests:
    #     storage: 20Gi
    # storageClassName: ceph-rbd

    # create the pvc
    kubectl create -f pvc-rbd.yaml
    # persistentvolumeclaim "ceph-pvc" created

Now, the pvc and its pv should be created. Use kubectl get pvc and kubectl get pv to see the the pvc and the pv.

Next, we want to create a pod use that pvc we just created:

    #!/bin/bash

    # create a k8s config file
    cat pod-with-rbd.yaml
    # kind: Pod
    # apiVersion: v1
    # metadata:
    # name: mypod
    # spec:
    # containers:
    #     - name: busybox
    #     image: busybox
    #     command:
    #         - sleep
    #         - "3600"
    #     volumeMounts:
    #     - mountPath: "/mnt/rbd"
    #         name: vol1
    # volumes:
    #     - name: vol1
    #     persistentVolumeClaim:
    #         claimName: ceph-pvc

    # create the pod
    kubectl create -f pod-with-rbd.yaml
    # pod "mypod" created

7. Managing Kubernetes and its network

By the time you create the pod, you may see the container is stucked at conatiner creating stage. And the log could say something like:

can not resolve ceph-mon.ceph.svc.cluster.local

Remember we’ve installed the ceph onto the host machine and the rbd on host will be invoked by kubelet during pod creation. The host machine is trying to reach the ceph monitor node and failed since it’s actually in the k8s cluster to use the kube-dns. There are few ways to fix this. One is add the kube-dns svc endpoint to /etc/resolv.conf as a dns service. Or add the domain mapping to the /etc/hosts to my node’s host machine.

11.7.100.123 ceph-mon.ceph.svc.cluster.local

However, the second way is not secure as if the ceph-mon is restart or the IP is changed inside k8s cluster. The ceph cluster will be broke.

After fixing the dns issue, the cannot resolve ceph-mon.ceph.svc.cluster.local should be gone. But I see another one which is:

timeout expired waiting for volumes to attach/mount for pod :
Error syncing pod 096ac42b-919a-11e8-bd1d-fa163eaae838
("mypod_ceph(096ac42b-919a-11e8-bd1d-fa163eaae838)"), skipping:
timeout expired waiting for volumes to attach/mount for pod
"ceph"/"mypod". list of unattached/unmounted volumes=[vol1]

The offical tutorial has documented the reason, which is:

Kubernetes uses the RBD kernel module to map RBDs to hosts.
 Luminous requires CRUSH_TUNABLES 5 (Jewel). The minimal kernel
 version for these tunables is 4.5. If your kernel does not
 support these tunables, run ceph osd crush tunables hammer

So bash into your ceph-mon pod and run ceph osd crush tunables hammer. It’s caused by the ceph helm chart is using a old version of luminous image.

Conclusion

Above is all you need to do for using helm install ceph.

Delete Ceph cluster

Delete ceph with helm will open another worm can. When you run helm delete --purge ceph, two extra one-time pod will be started to delete all keys that ceph created. However, if you run find / -name "*ceph*", you will see there are some leftover that the helm didn’t managed well:

...
/var/lib/ceph-helm
/var/lib/ceph-helm/ceph
/var/lib/ceph-helm/ceph/mon/bootstrap-osd/ceph.keyring
/var/lib/ceph-helm/ceph/mon/bootstrap-mds/ceph.keyring
/var/lib/ceph-helm/ceph/mon/bootstrap-rgw/ceph.keyring
...

Those keys will be a blocker for your brand new ceph cluster trying to communicate to each other. So we have delete them manually everytime if we want to re-install ceph by helm.

Also, you may need remove the partition listed in step 2 when a harddrive is reused.

Reference