我们正在使用托管在 Kubernetes 集群中的多个版本的 Redis 服务器(6.2.5 和 7.0.10 分别尝试)。使用默认的 aspnet:6.0 图像(靶心)。我们广泛地阅读和写入了 Redis 一段时间(已经差不多一年了)但是有 2 天我们一直收到超时错误。错误详情如下:
StackExchange.Redis.RedisTimeoutException: Timeout awaiting response
(outbound=188KiB, inbound=0KiB, 37574ms elapsed, timeout is 30000ms),
command=ZADD, next: ZADD ADQ, inst: 0, qu: 0, qs: 126, aw: False,
bw: SpinningDown, rs: ReadAsync, ws: Idle, in: 3949, in-pipe: 0, out-pipe: 0,
last-in: 0, cur-in: 0, sync-ops: 0, async-ops: 28310, serverEndpoint: redis7.sf-app.svc:6379,
conn-sec: 1583.67, mc: 1/1/0, mgr: 10 of 10 available,
clientName: orchestration-317-qzp5s(SE.Redis-v2.6.96.30123), IOCP: (Busy=0,Free=1000,Min=16,Max=1000),
WORKER: (Busy=345,Free=32422,Min=16,Max=32767), POOL: (Threads=345,QueuedItems=29523,CompletedItems=4409684),
v: 2.6.96.30123
这些超时发生在我尚未精确测量的短时间内。 这是正在运行的进程的 .net 计数器输出:
% Time in GC since last GC (%) 0
Allocation Rate (B / 1 sec) 227,048
CPU Usage (%) 0
Exception Count (Count / 1 sec) 0
GC Committed Bytes (MB) 2,790
GC Fragmentation (%) 1.406
GC Heap Size (MB) 2,392
Gen 0 GC Count (Count / 1 sec) 0
Gen 0 Size (B) 384
Gen 1 GC Count (Count / 1 sec) 0
Gen 1 Size (B) 63,191,160
Gen 2 GC Count (Count / 1 sec) 0
Gen 2 Size (B) 8,249,120
IL Bytes Jitted (B) 1,533,858
LOH Size (B) 5,032,512
Monitor Lock Contention Count (Count / 1 sec) 0
Number of Active Timers 9
Number of Assemblies Loaded 174
Number of Methods Jitted 19,209
POH (Pinned Object Heap) Size (B) 24,134,616
ThreadPool Completed Work Item Count (Count / 1 sec) 14
ThreadPool Queue Length 3,611
ThreadPool Thread Count 822
Time spent in JIT (ms / 1 sec) 0
.net 跟踪输出:
1. Threads 100% 0%
2. (Non-Activities) 99.96% 0%
3. Task.ExecuteWithThreadLocal(class System.Threading.Tasks.Task&,class S 97.31% 0%
ystem.Threading.Thread)
4. ManualResetEventSlim.Wait(int32,value class System.Threading.Cancellat 97.06% 96.93%
ionToken)
5. Task.InternalWaitCore(int32,value class System.Threading.CancellationT 97.06% 0%
oken)
6. Task.SpinThenBlockingWait(int32,value class System.Threading.Cancellat 97.06% 0%
ionToken)
7. ThreadPoolWorkQueue.Dispatch() 96.98% 0%
8. PortableThreadPool+WorkerThread.WorkerThreadStart() 96.98% 0%
9. ExecutionContext.RunFromThreadPoolDispatchLoop(class System.Threading. 96.95% 0%
我曾尝试增加
PoolSize
(最多200)设置并尝试设置ThreadPool.SetMinThreads
,还尝试将工作量减少一半但都没有帮助。我们还在同一个应用程序中使用SignalR
,我们在Hub
方法中调用redis(我不知道是否相关)。我们大量使用像 zadd, zpopmin, incr, incrby
方法这样的命令。我们使用的所有方法都是异步方法。正如我所说,除了最近两天外,它已经运行了一段时间没有问题。感谢您的帮助或任何诊断瓶颈的建议。