我正在寻找一种方法,当我缩小实例数量时,自动删除分配给 StatefulSet 的 Pod 的 PersistentVolumeClaims。有没有办法在 k8s 中做到这一点?我还没有在文档中找到任何内容。
preStop
Lifecycle Handler 可以提交 Job
来清理 PVC,假设 Pod 的 ServiceAccount
有 Role
来执行此操作。不幸的是,Lifecycle Handler 文档说 exec
会阻止 Pod 删除,因此从 Pod 的角度来看,发生的任何事情都需要异步。
另一种方法可能是使用
CronJob
无条件扫描集群或命名空间,并删除未分配的 PVC 或与特定选择器匹配的 PVC。
但我不认为有任何“固有的”能力可以做到这一点,因为(至少在我自己的使用中)向上和向下缩放 StatefulSet
是合理的,并且当将其放大时,实际上会希望
Pod
在 StatefulSet
中恢复其身份,其中通常包括任何持久数据。StatefulSet
并实现了我自己的“自定义控制器”,本质上将
ownerReference修补到每个
PersistentVolumeClaim
中,这样一旦使用它的 Pod
消失了,它就会被垃圾收集。 这个“自定义控制器”是作为驻留在 ConfigMap
中的 shell 脚本实现的,并通过简单的
ReplicationController
保持运行。请注意,需要获得许可才能使用kubectl
。看起来像这样(缩短到这个问题的范围):
---
# make one such map per runner
apiVersion: v1
kind: ConfigMap
metadata:
name: runner01
namespace: myrunners
labels:
myrunner: "true"
data:
# whatever env runner01 needs
RUNNER_UUID: "{...}"
---
apiVersion: v1
kind: ConfigMap
metadata:
name: scripts
namespace: myrunners
data:
# this is the "custom-controller"
runner-controller.sh: |
#!/bin/bash
echo "pod $HOSTNAME started"
#
[ ! -d /etc/yaml ] && echo "error: /etc/yaml/ not existing">/dev/stderr && exit 1
#
function runner {
name=$1
ns=$(grep 'namespace:' $name.yaml | head -1 | cut -d ':' -f2 | sed 's/\s//g')
while true; do
echo "--> starting pod $name in namespace=$ns..."
kubectl apply -f PVCs.$name.yaml
kubectl apply -f $name.yaml
# Bind the runners PersistentVolumeClaims to its Pod via
# ownerReferences, so each PVC gets deleted when the Pod terminates.
pod_uid=$(kubectl get pod $name -n $ns -o=jsonpath='{.metadata.uid}')
PVCs=$(grep claimName $name.yaml | cut -d ':' -f 2 | sed 's/\s//g')
for pvc in $PVCs; do
kubectl patch pvc $pvc -n $ns --type='json' -p='[{
"op": "add",
"path": "/metadata/ownerReferences",
"value": [{
"apiVersion": "v1",
"kind": "Pod",
"name": "'"$name"'",
"uid": "'"$pod_uid"'",
"blockOwnerDeletion": true
}]
}]'
done
kubectl wait -n $ns --timeout=-1s --for=delete pod $name
echo "$name pod got terminated, wait for its PVCs to be gone..."
for pvc in $PVCs; do
kubectl wait -n $ns --timeout=-1s --for=delete pvc $pvc &
done
wait
echo "$name terminated"
done
}
#
function keep_runners_running {
# note: pass all runner names as params
echo "observing runners..."
while true; do
sleep 5
alive_runners=0
for job in $(jobs -p); do
if ! kill -0 $job 2>/dev/null; then
echo "runner $job has exited, restarting pod ${@[$job]}..."
runner ${@[$job]} &
else
((alive_runners++))
fi
done
done
}
#
# --- main
cd /etc/yaml/
RUNNERS=$(kubectl -n myrunners get configmap -l myrunner=true -o name | awk -F/ '{print $2}' ORS=' ')
echo "found configMaps for runners: $RUNNERS"
echo "starting runners..."
for name in $RUNNERS; do
runner $name & # have bash keep it as background job
done
#
trap 'echo "controller was asked to terminate, exiting..."; jobs -p | xargs -r kill; exit;' SIGINT SIGTERM
#
keep_runners_running $RUNNERS &
wait # forever
#
---
# -- Runner Pods
# - each runner is a Pod with PersistentVolumeClaim(s)
# - provide one "runnerXX.yaml" + "PVCs.runnerXX.yaml" pair for each runner
apiVersion: v1
kind: ConfigMap
metadata:
name: yaml
namespace: myrunners
data:
runner01.yaml: |
---
apiVersion: v1
kind: Pod
metadata:
name: runner01
namespace: myrunners
labels:
app: myrunner
spec:
containers:
- name: runner
image: busybox
command: ['/bin/sh', '-c', 'echo "I am runner $HOSTNAME"; sleep 300;']
volumeMounts:
- name: workdir
mountPath: /var/tmp
volumes:
- name: workdir
persistentVolumeClaim:
claimName: runner01-workdir
---
PVCs.runner01.yaml: |
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: runner01-workdir
namespace: myrunners
labels:
app: myrunner
runner: runner01
spec:
accessModes:
- ReadWriteOnce
storageClassName: directpv-min-io
resources:
requests:
storage: 1Gi
---
---
# have the "customer-controller" be running all the time
apiVersion: v1
kind: ReplicationController
metadata:
name: controller
namespace: myrunners
labels:
app: controller
spec:
replicas: 1
selector:
app: runner-controller.sh
template:
metadata:
name: controller
namespace: myrunners
labels:
app: runner-controller.sh
spec:
serviceAccount: mykubectl
containers:
- name: controller
imagePullPolicy: IfNotPresent
image: bitnami/kubectl
command: ['/etc/scripts/runner-controller.sh']
volumeMounts:
- name: scripts
mountPath: /etc/scripts
- name: yaml
mountPath: /etc/yaml
volumes:
- name: scripts
configMap:
name: scripts
defaultMode: 0555
- name: yaml
configMap:
name: yaml
defaultMode: 0444
---
通过声明创建的
PersistentVolume
驻留在放置
Pod
的节点的本地硬盘上。这会产生特定运行者对特定节点的粘性。避免这种情况的唯一方法是删除声明,因此该卷被删除,这将使下一个新 Pod 可以自由地放置在任何节点上,就好像它以前从未存在过一样。本质上,它与存储方面的 StatefulSet
相反(与本地磁盘卷管理器一起使用时,Deployment
或 Job
等其他 Pod 控制器的行为方式相同)。仅当应用程序在磁盘使用方面无状态时才有用。