WAL写入中的剥离池缺乏导致节点群集节点失败

问题描述 投票:0回答:1

3节点点燃集群上的中等工作负载会导致一个节点在归档WAL时失败并带有剥离池启动。

这种情况一周发生一次或两次。

我已经检查了所有可能挂起WAL翻转的IO问题。但这个问题仍然存在

我在spring boot应用程序中使用最新的ignite 2.7作为库

: >>> Possible starvation in striped pool.
    Deadlock: false
    Completed: 1397
Thread [name="sys-stripe-7-#8%server.node%", id=22, state=WAITING, blockCnt=3, waitCnt=757]
    Lock [object=java.util.concurrent.locks.ReentrantLock$NonfairSync@b01791b, ownerName=sys-#214%server.node%, ownerId=248]
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
        at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
        at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
        at o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.awaitNext(FileWriteAheadLogManager.java:2871)
        at o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.access$2300(FileWriteAheadLogManager.java:2451)
        at o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager.rollOver(FileWriteAheadLogManager.java:1205)
        at o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager.log(FileWriteAheadLogManager.java:836)
        at o.a.i.i.processors.cache.GridCacheMapEntry.logUpdate(GridCacheMapEntry.java:4267)
        at o.a.i.i.processors.cache.GridCacheMapEntry$AtomicCacheUpdateClosure.update(GridCacheMapEntry.java:6333)
        at o.a.i.i.processors.cache.GridCacheMapEntry$AtomicCacheUpdateClosure.call(GridCacheMapEntry.java:6082)
        at o.a.i.i.processors.cache.GridCacheMapEntry$AtomicCacheUpdateClosure.call(GridCacheMapEntry.java:5782)
        at o.a.i.i.processors.cache.persistence.tree.BPlusTree$Invoke.invokeClosure(BPlusTree.java:3719)
        at o.a.i.i.processors.cache.persistence.tree.BPlusTree$Invoke.access$5900(BPlusTree.java:3613)
        at o.a.i.i.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:1895)
        at o.a.i.i.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1779)
        at o.a.i.i.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1638)
        at o.a.i.i.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1621)
        at o.a.i.i.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:1935)
        at o.a.i.i.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:428)
        at o.a.i.i.processors.cache.GridCacheMapEntry.innerUpdate(GridCacheMapEntry.java:2295)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processDhtAtomicUpdateRequest(GridDhtAtomicCache.java:3242)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$600(GridDhtAtomicCache.java:135)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$7.apply(GridDhtAtomicCache.java:309)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$7.apply(GridDhtAtomicCache.java:304)
        at o.a.i.i.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1056)
        at o.a.i.i.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:581)
        at o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:380)
        at o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:306)
        at o.a.i.i.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:101)
        at o.a.i.i.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:295)
        at o.a.i.i.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1569)
        at o.a.i.i.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1197)
        at o.a.i.i.managers.communication.GridIoManager.access$4200(GridIoManager.java:127)
        at o.a.i.i.managers.communication.GridIoManager$9.run(GridIoManager.java:1093)
        at o.a.i.i.util.StripedExecutor$Stripe.body(StripedExecutor.java:505)
        at o.a.i.i.util.worker.GridWorker.run(GridWorker.java:120)
        at java.lang.Thread.run(Thread.java:748)
ERROR --- [tcp-disco-msg-worker-#2%server.node%] [] o.a.i.i.u.t.G                            : Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [threadName=sys-stripe-1, blockedFor=10s]
WARN --- [tcp-disco-msg-worker-#2%server.node%] [] o.a.i.i.u.t.G                            : Thread [name="sys-stripe-1-#2%server.node%", id=16, state=WAITING, blockCnt=0, waitCnt=754]
    Lock [object=java.util.concurrent.locks.ReentrantLock$NonfairSync@b01791b, ownerName=sys-#214%server.node%, ownerId=248]
ignite
1个回答
0
投票

默认情况下,Apache Ignite 2.7中的Failure Detection功能配置不是很好。您可以将其关闭(通过设置为NoOp)或设置较大的failureDetectionTimeout以避免此类消息(以及节点关闭)。

© www.soinside.com 2019 - 2024. All rights reserved.