希望我能找到你一切安好。
因此,我在具有 64Gb RAM 的服务器上的 Centos 7 上安装了 Varnish 6.x LTS,实际上按照配置完美运行,并且我将我的设置升级到具有 128Gb RAM 和 CentOS 8 的服务器。 通过这种设置,我经常会因为 OOM 而导致 Varnish 重新启动,即使配置如下:
ExecStart=/usr/sbin/varnishd \
-a :6081 \
-f /etc/varnish/default.vcl \
-l 200m \
-s malloc,38G \
-s Transient=malloc,8G \
-p thread_pool_min=300 \
-p thread_pool_max=500 \
-p workspace_client=256k
在旧服务器上我有类似的设置
-s malloc, 26Gb
而且效果很好。 配置参数的 Transient 部分只是我几天前的尝试,旨在限制瞬态缓存并看看它是否有帮助,但运气不佳,从某种意义上说,它仍然表明它将很快出现 OOM。 我读了很多关于设置此功能的文章,并且我还预计大约 50% 甚至更多的开销,但在这种情况下,它似乎只是上升没有任何限制。 我真的可以使用四眼法来解决我在这里缺少的内容,或者至少可以继续下去。
我当前的清漆统计:
MGT.uptime 269509 1.00 Management process uptime
MGT.child_start 1 0.00 Child process started
MGT.child_exit 0 0.00 Child process normal exit
MGT.child_stop 0 0.00 Child process unexpected exit
MGT.child_died 0 0.00 Child process died (signal)
MGT.child_dump 0 0.00 Child process core dumped
MGT.child_panic 0 0.00 Child process panic
MAIN.summs 7869381 29.20 stat summ operations
MAIN.uptime 269510 1.00 Child process uptime
MAIN.sess_conn 184459 0.68 Sessions accepted
MAIN.sess_drop 0 0.00 Sessions dropped
MAIN.sess_fail 0 0.00 Session accept failures
MAIN.sess_fail_econnaborted 0 0.00 Session accept failures: connection aborted
MAIN.sess_fail_eintr 0 0.00 Session accept failures: interrupted system call
MAIN.sess_fail_emfile 0 0.00 Session accept failures: too many open files
MAIN.sess_fail_ebadf 0 0.00 Session accept failures: bad file descriptor
MAIN.sess_fail_enomem 0 0.00 Session accept failures: not enough memory
MAIN.sess_fail_other 0 0.00 Session accept failures: other
MAIN.client_req_400 0 0.00 Client requests received, subject to 400 errors
MAIN.client_req_417 0 0.00 Client requests received, subject to 417 errors
MAIN.client_req 2373462 8.81 Good client requests received
MAIN.cache_hit 500010 1.86 Cache hits
MAIN.cache_hit_grace 106009 0.39 Cache grace hits
MAIN.cache_hitpass 0 0.00 Cache hits for pass.
MAIN.cache_hitmiss 0 0.00 Cache hits for miss.
MAIN.cache_miss 1611440 5.98 Cache misses
MAIN.backend_conn 12617 0.05 Backend conn. success
MAIN.backend_unhealthy 0 0.00 Backend conn. not attempted
MAIN.backend_busy 0 0.00 Backend conn. too many
MAIN.backend_fail 0 0.00 Backend conn. failures
MAIN.backend_reuse 1966574 7.30 Backend conn. reuses
MAIN.backend_recycle 1976920 7.34 Backend conn. recycles
MAIN.backend_retry 1 0.00 Backend conn. retry
MAIN.fetch_head 0 0.00 Fetch no body (HEAD)
MAIN.fetch_length 475423 1.76 Fetch with Length
MAIN.fetch_chunked 1497899 5.56 Fetch chunked
MAIN.fetch_eof 0 0.00 Fetch EOF
MAIN.fetch_bad 0 0.00 Fetch bad T-E
MAIN.fetch_none 1325 0.00 Fetch no body
MAIN.fetch_1xx 0 0.00 Fetch no body (1xx)
MAIN.fetch_204 0 0.00 Fetch no body (204)
MAIN.fetch_304 3073 0.01 Fetch no body (304)
MAIN.fetch_failed 1462 0.01 Fetch failed (all causes)
MAIN.fetch_no_thread 0 0.00 Fetch failed (no thread)
MAIN.pools 2 . Number of thread pools
MAIN.threads 600 . Total number of threads
MAIN.threads_limited 0 0.00 Threads hit max
MAIN.threads_created 923 0.00 Threads created
MAIN.threads_destroyed 323 0.00 Threads destroyed
MAIN.threads_failed 0 0.00 Thread creation failed
MAIN.thread_queue_len 0 . Length of session queue
MAIN.busy_sleep 878 0.00 Number of requests sent to sleep on busy objhdr
MAIN.busy_wakeup 878 0.00 Number of requests woken after sleep on busy objhdr
MAIN.busy_killed 0 0.00 Number of requests killed after sleep on busy objhdr
MAIN.sess_queued 134 0.00 Sessions queued for thread
MAIN.sess_dropped 0 0.00 Sessions dropped for thread
MAIN.req_dropped 0 0.00 Requests dropped
MAIN.n_object 218508 . object structs made
MAIN.n_vampireobject 0 . unresurrected objects
MAIN.n_objectcore 218665 . objectcore structs made
MAIN.n_objecthead 218773 . objecthead structs made
MAIN.n_backend 1 . Number of backends
MAIN.n_expired 1392690 5.17 Number of expired objects
MAIN.n_lru_nuked 0 0.00 Number of LRU nuked objects
MAIN.n_lru_moved 449086 1.67 Number of LRU moved objects
MAIN.n_lru_limited 0 0.00 Reached nuke_limit
MAIN.losthdr 0 0.00 HTTP header overflows
MAIN.s_sess 184459 0.68 Total sessions seen
MAIN.s_pipe 0 0.00 Total pipe sessions seen
MAIN.s_pass 262012 0.97 Total pass-ed requests seen
MAIN.s_fetch 1873452 6.95 Total backend fetches initiated
MAIN.s_synth 490 0.00 Total synthetic responses made
MAIN.s_req_hdrbytes 1975945691 7331.62 Request header bytes
MAIN.s_req_bodybytes 4279860 15.88 Request body bytes
MAIN.s_resp_hdrbytes 1272508062 4721.56 Response header bytes
MAIN.s_resp_bodybytes 1536158038935 5699818.33 Response body bytes
MAIN.s_pipe_hdrbytes 0 0.00 Pipe request header bytes
MAIN.s_pipe_in 0 0.00 Piped bytes from client
MAIN.s_pipe_out 0 0.00 Piped bytes to client
MAIN.sess_closed 356 0.00 Session Closed
MAIN.sess_closed_err 86861 0.32 Session Closed with error
MAIN.sess_readahead 0 0.00 Session Read Ahead
MAIN.sess_herd 1856881 6.89 Session herd
MAIN.sc_rem_close 97588 0.36 Session OK REM_CLOSE
MAIN.sc_req_close 0 0.00 Session OK REQ_CLOSE
MAIN.sc_req_http10 0 0.00 Session Err REQ_HTTP10
MAIN.sc_rx_bad 0 0.00 Session Err RX_BAD
MAIN.sc_rx_body 0 0.00 Session Err RX_BODY
MAIN.sc_rx_junk 0 0.00 Session Err RX_JUNK
MAIN.sc_rx_overflow 0 0.00 Session Err RX_OVERFLOW
MAIN.sc_rx_timeout 86861 0.32 Session Err RX_TIMEOUT
MAIN.sc_tx_pipe 0 0.00 Session OK TX_PIPE
MAIN.sc_tx_error 0 0.00 Session Err TX_ERROR
MAIN.sc_tx_eof 0 0.00 Session OK TX_EOF
MAIN.sc_resp_close 0 0.00 Session OK RESP_CLOSE
MAIN.sc_overload 0 0.00 Session Err OVERLOAD
MAIN.sc_pipe_overflow 0 0.00 Session Err PIPE_OVERFLOW
MAIN.sc_range_short 0 0.00 Session Err RANGE_SHORT
MAIN.sc_req_http20 0 0.00 Session Err REQ_HTTP20
MAIN.sc_vcl_failure 0 0.00 Session Err VCL_FAILURE
MAIN.client_resp_500 0 0.00 Delivery failed due to insufficient workspace.
MAIN.ws_backend_overflow 0 0.00 workspace_backend overflows
MAIN.ws_client_overflow 0 0.00 workspace_client overflows
MAIN.ws_thread_overflow 0 0.00 workspace_thread overflows
MAIN.ws_session_overflow 0 0.00 workspace_session overflows
MAIN.shm_records 317750020 1178.99 SHM records
MAIN.shm_writes 11262489 41.79 SHM writes
MAIN.shm_flushes 393170 1.46 SHM flushes due to overflow
MAIN.shm_cont 189298 0.70 SHM MTX contention
MAIN.shm_cycles 60 0.00 SHM cycles through buffer
MAIN.backend_req 1979183 7.34 Backend requests made
MAIN.n_vcl 1 . Number of loaded VCLs in total
MAIN.n_vcl_avail 1 . Number of VCLs available
MAIN.n_vcl_discard 0 . Number of discarded VCLs
MAIN.vcl_fail 0 0.00 VCL failures
MAIN.bans 1 . Count of bans
MAIN.bans_completed 1 . Number of bans marked 'completed'
MAIN.bans_obj 0 . Number of bans using obj.*
MAIN.bans_req 0 . Number of bans using req.*
MAIN.bans_added 1 0.00 Bans added
MAIN.bans_deleted 0 0.00 Bans deleted
MAIN.bans_tested 0 0.00 Bans tested against objects (lookup)
MAIN.bans_obj_killed 0 0.00 Objects killed by bans (lookup)
MAIN.bans_lurker_tested 0 0.00 Bans tested against objects (lurker)
MAIN.bans_tests_tested 0 0.00 Ban tests tested against objects (lookup)
MAIN.bans_lurker_tests_tested 0 0.00 Ban tests tested against objects (lurker)
MAIN.bans_lurker_obj_killed 0 0.00 Objects killed by bans (lurker)
MAIN.bans_lurker_obj_killed_cutoff 0 0.00 Objects killed by bans for cutoff (lurker)
MAIN.bans_dups 0 0.00 Bans superseded by other bans
MAIN.bans_lurker_contention 0 0.00 Lurker gave way for lookup
MAIN.bans_persisted_bytes 16 . Bytes used by the persisted ban lists
MAIN.bans_persisted_fragmentation 0 . Extra bytes in persisted ban lists due to fragmentation
MAIN.n_purges 0 0.00 Number of purge operations executed
MAIN.n_obj_purged 0 0.00 Number of purged objects
MAIN.exp_mailed 1822034 6.76 Number of objects mailed to expiry thread
MAIN.exp_received 1822034 6.76 Number of objects received by expiry thread
MAIN.hcb_nolock 2111450 7.83 HCB Lookups without lock
MAIN.hcb_lock 1599004 5.93 HCB Lookups with lock
MAIN.hcb_insert 1599003 5.93 HCB Inserts
MAIN.esi_errors 0 0.00 ESI parse errors (unlock)
MAIN.esi_warnings 0 0.00 ESI parse warnings (unlock)
MAIN.vmods 2 . Loaded VMODs
MAIN.n_gzip 0 0.00 Gzip operations
MAIN.n_gunzip 1963548 7.29 Gunzip operations
MAIN.n_test_gunzip 1604680 5.95 Test gunzip operations
LCK.backend.creat 2 0.00 Created locks
LCK.backend.destroy 0 0.00 Destroyed locks
LCK.backend.locks 3958375 14.69 Lock Operations
LCK.ban.creat 1 0.00 Created locks
LCK.ban.destroy 0 0.00 Destroyed locks
LCK.ban.locks 3507518 13.01 Lock Operations
LCK.busyobj.creat 1998678 7.42 Created locks
LCK.busyobj.destroy 1998580 7.42 Destroyed locks
LCK.busyobj.locks 139797131 518.71 Lock Operations
LCK.cli.creat 1 0.00 Created locks
LCK.cli.destroy 0 0.00 Destroyed locks
LCK.cli.locks 99103 0.37 Lock Operations
LCK.exp.creat 1 0.00 Created locks
LCK.exp.destroy 0 0.00 Destroyed locks
LCK.exp.locks 11454148 42.50 Lock Operations
LCK.hcb.creat 1 0.00 Created locks
LCK.hcb.destroy 0 0.00 Destroyed locks
LCK.hcb.locks 2982034 11.06 Lock Operations
LCK.lru.creat 2 0.00 Created locks
LCK.lru.destroy 0 0.00 Destroyed locks
LCK.lru.locks 3663810 13.59 Lock Operations
LCK.mempool.creat 5 0.00 Created locks
LCK.mempool.destroy 0 0.00 Destroyed locks
LCK.mempool.locks 10186086 37.79 Lock Operations
LCK.objhdr.creat 1599316 5.93 Created locks
LCK.objhdr.destroy 1380547 5.12 Destroyed locks
LCK.objhdr.locks 27912994 103.57 Lock Operations
LCK.pipestat.creat 1 0.00 Created locks
LCK.pipestat.destroy 0 0.00 Destroyed locks
LCK.pipestat.locks 0 0.00 Lock Operations
LCK.sess.creat 184447 0.68 Created locks
LCK.sess.destroy 184449 0.68 Destroyed locks
LCK.sess.locks 4161689 15.44 Lock Operations
LCK.tcp_pool.creat 2 0.00 Created locks
LCK.tcp_pool.destroy 0 0.00 Destroyed locks
LCK.tcp_pool.locks 7901859 29.32 Lock Operations
LCK.vbe.creat 1 0.00 Created locks
LCK.vbe.destroy 0 0.00 Destroyed locks
LCK.vbe.locks 99094 0.37 Lock Operations
LCK.vcapace.creat 1 0.00 Created locks
LCK.vcapace.destroy 0 0.00 Destroyed locks
LCK.vcapace.locks 0 0.00 Lock Operations
LCK.vcl.creat 1 0.00 Created locks
LCK.vcl.destroy 0 0.00 Destroyed locks
LCK.vcl.locks 4010919 14.88 Lock Operations
LCK.vxid.creat 1 0.00 Created locks
LCK.vxid.destroy 0 0.00 Destroyed locks
LCK.vxid.locks 768 0.00 Lock Operations
LCK.waiter.creat 2 0.00 Created locks
LCK.waiter.destroy 0 0.00 Destroyed locks
LCK.waiter.locks 11657249 43.25 Lock Operations
LCK.wq.creat 3 0.00 Created locks
LCK.wq.destroy 0 0.00 Destroyed locks
LCK.wq.locks 12078010 44.81 Lock Operations
LCK.wstat.creat 1 0.00 Created locks
LCK.wstat.destroy 0 0.00 Destroyed locks
LCK.wstat.locks 7148985 26.53 Lock Operations
MEMPOOL.busyobj.live 8 . In use
MEMPOOL.busyobj.pool 10 . In Pool
MEMPOOL.busyobj.sz_wanted 65536 . Size requested
MEMPOOL.busyobj.sz_actual 65504 . Size allocated
MEMPOOL.busyobj.allocs 1979190 7.34 Allocations
MEMPOOL.busyobj.frees 1979182 7.34 Frees
MEMPOOL.busyobj.recycle 1976206 7.33 Recycled from pool
MEMPOOL.busyobj.timeout 176828 0.66 Timed out from pool
MEMPOOL.busyobj.toosmall 0 0.00 Too small to recycle
MEMPOOL.busyobj.surplus 137 0.00 Too many for pool
MEMPOOL.busyobj.randry 2984 0.01 Pool ran dry
MEMPOOL.req0.live 4 . In use
MEMPOOL.req0.pool 11 . In Pool
MEMPOOL.req0.sz_wanted 262144 . Size requested
MEMPOOL.req0.sz_actual 262112 . Size allocated
MEMPOOL.req0.allocs 979824 3.64 Allocations
MEMPOOL.req0.frees 979820 3.64 Frees
MEMPOOL.req0.recycle 979057 3.63 Recycled from pool
MEMPOOL.req0.timeout 147071 0.55 Timed out from pool
MEMPOOL.req0.toosmall 0 0.00 Too small to recycle
MEMPOOL.req0.surplus 0 0.00 Too many for pool
MEMPOOL.req0.randry 767 0.00 Pool ran dry
MEMPOOL.sess0.live 6 . In use
MEMPOOL.sess0.pool 10 . In Pool
MEMPOOL.sess0.sz_wanted 512 . Size requested
MEMPOOL.sess0.sz_actual 480 . Size allocated
MEMPOOL.sess0.allocs 92389 0.34 Allocations
MEMPOOL.sess0.frees 92383 0.34 Frees
MEMPOOL.sess0.recycle 91719 0.34 Recycled from pool
MEMPOOL.sess0.timeout 65581 0.24 Timed out from pool
MEMPOOL.sess0.toosmall 0 0.00 Too small to recycle
MEMPOOL.sess0.surplus 0 0.00 Too many for pool
MEMPOOL.sess0.randry 670 0.00 Pool ran dry
LCK.sma.creat 2 0.00 Created locks
LCK.sma.destroy 0 0.00 Destroyed locks
LCK.sma.locks 36735852 136.31 Lock Operations
SMA.s0.c_req 16689946 61.93 Allocator requests
SMA.s0.c_fail 0 0.00 Allocator failures
SMA.s0.c_bytes 231951766672 860642.52 Bytes allocated
SMA.s0.c_freed 203375575362 754612.35 Bytes freed
SMA.s0.g_alloc 2052078 . Allocations outstanding
SMA.s0.g_bytes 28576191310 . Bytes outstanding
SMA.s0.g_space 12225998002 . Bytes available
SMA.Transient.c_req 2704021 10.03 Allocator requests
SMA.Transient.c_fail 0 0.00 Allocator failures
SMA.Transient.c_bytes 75855969615 281458.83 Bytes allocated
SMA.Transient.c_freed 75855969615 281458.83 Bytes freed
SMA.Transient.g_alloc 0 . Allocations outstanding
SMA.Transient.g_bytes 0 . Bytes outstanding
SMA.Transient.g_space 8589934592 . Bytes available
MEMPOOL.req1.live 5 . In use
MEMPOOL.req1.pool 9 . In Pool
MEMPOOL.req1.sz_wanted 262144 . Size requested
MEMPOOL.req1.sz_actual 262112 . Size allocated
MEMPOOL.req1.allocs 974654 3.62 Allocations
MEMPOOL.req1.frees 974649 3.62 Frees
MEMPOOL.req1.recycle 973872 3.61 Recycled from pool
MEMPOOL.req1.timeout 147049 0.55 Timed out from pool
MEMPOOL.req1.toosmall 0 0.00 Too small to recycle
MEMPOOL.req1.surplus 0 0.00 Too many for pool
MEMPOOL.req1.randry 782 0.00 Pool ran dry
MEMPOOL.sess1.live 5 . In use
MEMPOOL.sess1.pool 9 . In Pool
MEMPOOL.sess1.sz_wanted 512 . Size requested
MEMPOOL.sess1.sz_actual 480 . Size allocated
MEMPOOL.sess1.allocs 92071 0.34 Allocations
MEMPOOL.sess1.frees 92066 0.34 Frees
MEMPOOL.sess1.recycle 91430 0.34 Recycled from pool
MEMPOOL.sess1.timeout 65371 0.24 Timed out from pool
MEMPOOL.sess1.toosmall 0 0.00 Too small to recycle
MEMPOOL.sess1.surplus 0 0.00 Too many for pool
MEMPOOL.sess1.randry 641 0.00 Pool ran dry
VBE.boot.lcn.happy 0 . Happy health probes
VBE.boot.lcn.bereq_hdrbytes 1817898758 6745.20 Request header bytes
VBE.boot.lcn.bereq_bodybytes 4279860 15.88 Request body bytes
VBE.boot.lcn.beresp_hdrbytes 948671121 3519.98 Response header bytes
VBE.boot.lcn.beresp_bodybytes 281279816120 1043671.17 Response body bytes
VBE.boot.lcn.pipe_hdrbytes 0 0.00 Pipe request header bytes
VBE.boot.lcn.pipe_out 0 0.00 Piped bytes to backend
VBE.boot.lcn.pipe_in 0 0.00 Piped bytes from backend
VBE.boot.lcn.conn 8 . Concurrent connections to backend
VBE.boot.lcn.req 1979191 7.34 Backend requests sent
VBE.boot.lcn.unhealthy 0 0.00 Fetches not attempted due to backend being unhealthy
VBE.boot.lcn.busy 0 0.00 Fetches not attempted due to backend being busy
VBE.boot.lcn.fail 0 0.00 Connections failed
VBE.boot.lcn.fail_eacces 0 0.00 Connections failed with EACCES or EPERM
VBE.boot.lcn.fail_eaddrnotavail 0 0.00 Connections failed with EADDRNOTAVAIL
VBE.boot.lcn.fail_econnrefused 0 0.00 Connections failed with ECONNREFUSED
VBE.boot.lcn.fail_enetunreach 0 0.00 Connections failed with ENETUNREACH
VBE.boot.lcn.fail_etimedout 0 0.00 Connections failed ETIMEDOUT
VBE.boot.lcn.fail_other 0 0.00 Connections failed for other reason
VBE.boot.lcn.helddown 0 0.00 Connection opens not attempted
Jemalloc是jemalloc-5.2.1-2.el8.x86_64
最初我从 CentOS 8 的 AppStream 存储库安装了 Varnish,后来我将其更改为 Varnish 社区一,但也没有运气。 我必须提到的是,在 Varnish 安装过程中,它没有安装任何我可以检测到的 jemalloc 包。我手动安装了上面的。
除了我自己,我开始怀疑这个设置在 CentOS 8 上可能有问题。
最后,jemalloc 5.2(在 CentOS 8 上使用)似乎仍然存在一些内存泄漏,或者至少它的内存占用比我在 Centos 7 上使用的 3.6 版本要大得多。
其中,这篇文章解释得很好: https://github.com/jemalloc/jemalloc/issues/1152
显然 v5.2.1 中修复了一些与此相关的错误 https://github.com/jemalloc/jemalloc/releases
但在Linux中似乎并非如此,或者至少与v3.6x相比还是有相当大的差异。
所以,我所做的是,在从社区存储库安装 Varnish 后,我在 Centos 8 上预编译了 jemalloc v4.4,到目前为止似乎没问题。
我不应该认为 Centos 8 Varnish 模块(来自 AppStream)附带的那个模块似乎完全缺少 jemalloc,并且忽略了 malloc 设置的存储限制。
我遇到了与您类似的问题,但内存要求更有限:具有 4GB RAM 的服务器,Varnish 设置为使用 2GB (
-s malloc,2000m
)
在旧服务器上,它位于 Debian Stretch 上,我升级到带有 Debian Bullseye 的新服务器,使用 varnish 6.5.1 和 libjemalloc2 5.2.1。虽然在旧服务器上 Varnish 可以运行一年多而不会出现任何内存问题,但在新服务器上,仅仅 5 天后内存就处于高压状态,Varnish 消耗了 3.5GB RAM,直到 oom_killer 决定杀死它。
我读到的问题可能是内存碎片。
我已经调整了 Varnish 的 jemalloc 配置,情况似乎有了很大改善。 5 天后,Varnish 内存使用量为 2.6 GB,从 3 天前开始稳定。
我所做的是使用以下设置为清漆过程设置一个
MALLOC_CONF
环境变量:
thp:never,narenas:2,dirty_decay_ms:5000,muzzy_decay_ms:5000
dirty/muzzy_decay_ms 可能不会对结果产生太大影响,但禁用透明大页并减少竞技场数量以匹配 CPU 数量似乎击中了要害。一些提示可以在官方TUNING文件上阅读。
为了将该环境变量添加到 Varnish,我将其添加到 Systemd 单元文件中,使用
sudo systemctl edit varnish
,并添加:
[Service]
Environment="MALLOC_CONF=thp:never,narenas:2"
如果您不想降级 jemalloc,这可能是一个值得探索的替代方案。
我想补充一点:
毫无疑问,jemalloc 行为的改变绝对是真实的。
但一般来说,为 Varnish-Cache 的 malloc 存储引擎配置的限制是 net 限制,内存分配器可能会在顶部产生任意开销,这是 Varnish-Cache 无法控制的。
通过提供可配置总限制的存储引擎来解决此问题是SLASH/项目的目标之一。具体来说,
buddy
存储引擎被设计为malloc
的更好替代品,解决了内存限制和LRU公平性问题(请参阅SLASH/自述文件底部的部分)。
Nils,三个 Varnish-Cache 维护者之一
我尝试将 Ubuntu 22.04 libjemalloc 5.2.1 替换为 Debian SID libjemalloc 5.3.0,看看是否可以解决问题。我确认 libjemalloc 5.3.0 修复了内存分配问题,并且性能也更好。