我有一个 Redis 集群的 k8s 部署,其中包含 3 个 Sentinels 副本和 3 个从属服务器。我在其中一个哨兵 Pod 上遇到错误:
*** FATAL CONFIG FILE ERROR (Redis 6.2.3) ***
Reading the configuration file, at line 4
>>> 'sentinel monitor mymaster 6379 2'
Unrecognized sentinel configuration statement.
我的哨兵清单如下:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: sentinel
namespace: demos
spec:
serviceName: sentinel
replicas: 3
selector:
matchLabels:
app: sentinel
template:
metadata:
labels:
app: sentinel
spec:
initContainers:
- name: config
image: redis:6.2.3-alpine
imagePullPolicy: "IfNotPresent"
command: [ "sh", "-c" ]
args:
- |
REDIS_PASSWORD=a-very-complex-password-here
nodes=redis-0.redis,redis-1.redis,redis-2.redis
for i in ${nodes//,/ }
do
echo "finding master at $i"
MASTER=$(redis-cli --no-auth-warning --raw -h $i -a $REDIS_PASSWORD info replication | awk '{print $1}' | grep master_host: | cut -d ":" -f2)
if [ "$MASTER" == "" ]; then
echo "no master found"
MASTER=
else
echo "found $MASTER"
break
fi
done
echo "sentinel monitor mymaster $MASTER 6379 2" >> /tmp/master
echo "port 5000
sentinel resolve-hostnames yes
sentinel announce-hostnames yes
$(cat /tmp/master)
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 60000
sentinel parallel-syncs mymaster 1
sentinel auth-pass mymaster $REDIS_PASSWORD
" > /etc/redis/sentinel.conf
cat /etc/redis/sentinel.conf
volumeMounts:
- name: redis-config
mountPath: /etc/redis/
containers:
- name: sentinel
image: redis:6.2.3-alpine
imagePullPolicy: "IfNotPresent"
command: ["redis-sentinel"]
args: ["/etc/redis/sentinel.conf"]
ports:
- containerPort: 5000
name: sentinel
volumeMounts:
- name: redis-config
mountPath: /etc/redis/
- name: data
mountPath: /data
volumes:
- name: redis-config
emptyDir: {}
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "cinder-csi"
resources:
requests:
storage: 50Mi
---
apiVersion: v1
kind: Service
metadata:
name: sentinel
namespace: demos
spec:
clusterIP: None
ports:
- port: 5000
targetPort: 5000
name: sentinel
selector:
app: sentinel
以及 Redis 清单:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: redis
namespace: demos
spec:
serviceName: redis
replicas: 3
selector:
matchLabels:
app: redis
template:
metadata:
labels:
app: redis
spec:
initContainers:
- name: config
image: redis:6.2.3-alpine
imagePullPolicy: "IfNotPresent"
command: [ "sh", "-c" ]
args:
- |
cp /tmp/redis/redis.conf /etc/redis/redis.conf
echo "finding master..."
MASTER_FDQN=`hostname -f | sed -e 's/redis-[0-9]\./redis-0./'`
if [ "$(redis-cli -h sentinel -p 5000 ping)" != "PONG" ]; then
echo "master not found, defaulting to redis-0"
if [ "$(hostname)" == "redis-0" ]; then
echo "this is redis-0, not updating config..."
else
echo "updating redis.conf..."
echo "slaveof $MASTER_FDQN 6379" >> /etc/redis/redis.conf
fi
else
echo "sentinel found, finding master"
MASTER="$(redis-cli -h sentinel -p 5000 sentinel get-master-addr-by-name mymaster | grep -E '(^redis-\d{1,})|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})')"
echo "master found : $MASTER, updating redis.conf"
echo "slaveof $MASTER 6379" >> /etc/redis/redis.conf
fi
volumeMounts:
- name: redis-config
mountPath: /etc/redis/
- name: config
mountPath: /tmp/redis/
containers:
- name: redis
image: redis:6.2.3-alpine
imagePullPolicy: "IfNotPresent"
command: ["redis-server"]
args: ["/etc/redis/redis.conf"]
ports:
- containerPort: 6379
name: redis
volumeMounts:
- name: data
mountPath: /data
- name: redis-config
mountPath: /etc/redis/
volumes:
- name: redis-config
emptyDir: {}
- name: config
configMap:
name: redis-config
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "cinder-csi"
resources:
requests:
storage: 50Mi
---
apiVersion: v1
kind: Service
metadata:
name: redis
namespace: demos
spec:
clusterIP: None
ports:
- port: 6379
targetPort: 6379
name: redis
selector:
app: redis
从 k8s 我有:
kubectl -n demos get pods
NAME READY STATUS RESTARTS AGE
redis-0 1/1 Running 0 3h3m
redis-1 1/1 Running 0 23m
redis-2 1/1 Running 0 23m
sentinel-0 0/1 CrashLoopBackOff 40 3h3m
sentinel-1 1/1 Running 0 126m
sentinel-2 1/1 Running 0 8m33s
我也提取了 redis-2 的日志:
1:S 26 May 2022 13:58:46.924 # Server initialized
1:S 26 May 2022 13:58:46.925 * Ready to accept connections
1:S 26 May 2022 13:58:46.925 * Connecting to MASTER redis-0.redis.labs.svc.myserver-XXX:XXXX
1:S 26 May 2022 13:58:46.947 * MASTER <-> REPLICA sync started
1:S 26 May 2022 13:58:46.947 * Non blocking connect for SYNC fired the event.
1:S 26 May 2022 13:58:46.947 * Master replied to PING, replication can continue...
1:S 26 May 2022 13:58:46.951 * Partial resynchronization not possible (no cached master)
1:S 26 May 2022 13:58:46.954 * Full resync from master: 6b11d6e184b481e9112053er6575790dhjdj782:55557
1:S 26 May 2022 13:58:47.038 * MASTER <-> REPLICA sync: receiving 178 bytes from master to disk
1:S 26 May 2022 13:58:47.038 * MASTER <-> REPLICA sync: Flushing old data
1:S 26 May 2022 13:58:47.044 * MASTER <-> REPLICA sync: Loading DB in memory
1:S 26 May 2022 13:58:47.052 * Loading RDB produced by version 6.2.3
1:S 26 May 2022 13:58:47.052 * RDB age 1 seconds
1:S 26 May 2022 13:58:47.052 * RDB memory usage when created 1.89 Mb
1:S 26 May 2022 13:58:47.053 * MASTER <-> REPLICA sync: Finished with success
1:S 26 May 2022 13:58:47.053 * Background append only file rewriting started by pid 11
1:S 26 May 2022 13:58:47.091 * AOF rewrite child asks to stop sending diffs.
11:C 26 May 2022 13:58:47.091 * Parent agreed to stop sending diffs. Finalizing AOF...
11:C 26 May 2022 13:58:47.091 * Concatenating 0.00 MB of AOF diff received from parent.
11:C 26 May 2022 13:58:47.091 * SYNC append only file rewrite performed
11:C 26 May 2022 13:58:47.091 * AOF rewrite: 0 MB of memory used by copy-on-write
1:S 26 May 2022 13:58:47.154 * Background AOF rewrite terminated with success
1:S 26 May 2022 13:58:47.154 * Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)
1:S 26 May 2022 13:58:47.154 * Background AOF rewrite finished successfully
我不太确定为什么哨兵的另一个副本失败了。
我错过了什么?
我记得我也遇到过同样的情况。请跟随流程:
config
启动并配置sentinel这意味着哨兵正在报告初始化容器配置错误的内容。让我们回到初始化容器。为什么
'sentinel monitor mymaster 6379 2'
行没有master,没有找到,但是init容器继续配置而没有master地址。这就是问题所在。
解决办法是什么?
if [ "$MASTER" = "" ]; then
echo "Sorry! No master was found!"
echo "Exiting..."
exit 0
fi
startup probe
来探测 Redis 网络。这确保了当 redis 网络准备就绪时,哨兵容器将被初始化。对我来说,我喜欢快速解决方案,但我没有实现这个。检查这篇文章:https://github.com/bitnami/charts/issues/9296