0voice/kernel_memory_management: 总结整理linux内核的内存管理的资料,包含论文,文章,视频,以及应用程序的内存泄露,内存池相关

Linux 里的程序都是调用 malloc() 来申请内存,如果内存不足,直接 malloc() 返回失败就可以,为什么还要去杀死正在运行的进程呢?Linux 允许进程申请超过实际物理内存上限的内存。因为 malloc() 申请的是内存的虚拟地址,系统只是给了程序一个地址范围,由于没有写入数据,所以程序并没有得到真正的物理内存。物理内存只有程序真的往这个地址写入数据的时候,才会分配给程序。

内核态申请的用于内核自己使用的内存统计在哪里?

算用户态进程的内存占用吗,应该不算。比如 VMCS,算在这个内核内存占用里面,而不是单个进程里面。

如何统计内核占用的内存:内存泄漏?从用户态跟踪到内核去 - 腾讯云开发者社区-腾讯云

Linux 内存回收

当系统内存短缺的情况下仍去申请内存,可能会触发系统对内存的回收。

也就是系统释放掉可以回收的内存,比如缓存和缓冲区,就属于可回收内存。当这些都释放掉后,如果还是不够,那么就需要 kill 掉一些容器了。

kernel.org/doc/Documentation/cgroup-v1/memory.txt

kswapd

一个内核线程。每个 NUMA 内存节点会有一个 kswapd 进程,为了衡量内存的使用情况,kswapd 定义了三个内存阈值(watermark,也称为水位),分别是:

页最小阈值(pages_min)、页低阈值(pages_low)和页高阈值(pages_high)。

  • 剩余内存小于 pages_min,说明进程可用内存都耗尽了,只有内核才可以分配内存。
  • 剩余内存落在 pages_minpages_low 中间,说明内存压力比较大,剩余内存不多了。这时 kswapd0 会执行内存回收,直到剩余内存大于 pages_high 为止。
  • 剩余内存落在 pages_lowpages_high 中间,说明内存有一定压力,但还可以满足新内存请求。
  • 剩余内存大于 pages_high,说明剩余内存比较多,没有内存压力。

/proc/sys/vm/swappiness 选项,用来调整使用 Swap 的积极程度。 的范围是 0-100,数值越大,越积极使用 Swap,也就是更倾向于回收匿名页;数值越小,越消极使用 Swap,也就是更倾向于回收文件页。

kswap 进程虽然是系统启动时就会创建,但是大多数时候它处于睡眠状态,只有在进程由于内存不足导致分配内存失败时会被唤醒,从而回收内存,供进程使用。

__alloc_pages_slowpath
    wake_all_kswapds // 路径 1,进入慢分配路径会先唤醒 kswap 进程
        wakeup_kswapd
            wake_up_interruptible(&pgdat->kswapd_wait)
    __alloc_pages_direct_reclaim //路径 1 失败后,进行直接内存回收
        __perform_reclaim
            try_to_free_pages
                throttle_direct_reclaim
                    allow_direct_reclaim
                        wake_up_interruptible(&pgdat->kswapd_wait)

kswapd进程工作原理(一)——初始化及触发-CSDN博客

It is process agnostic, it is only interested in what pages are accessed and when (it is more complex than this of course but to keep things simple we may as well view it this way).

So the real question is "what processes have the greatest burden on memory that are causing kswapd to need to page all the time".

proc/meminfo

知其然知其所以然,/PROC/MEMINFO之谜

内存回收类型

申请内存:

  • 如果发现超过了 cgroup 能申请的内存上限,那么会挑选这个 cgroup 里面的一个或者多个进程杀死;
  • 如果整个系统可用内存不足(一般是因为内存超卖),会触发直接内存回收,如果还不足,就会被 OOM killer 选择一个进程杀死(不一定是本进程)。

主要区分点是触发时机和处理页面类型。

  • 快速内存回收
  • 直接内存回收(同步,阻塞进程):系统 CPU 利用率会升高,系统负荷会增大,因此要尽量避免直接内存回收。
  • kswapd 内存回收(异步,后台,不阻塞进程)

它们的共同点是均使用 LRU 链表等数据结构作为回收用的容器,并且回收流程相似。

  • 对于匿名页(应用程序动态分配的堆内存),可以放入到 swap 分区;
  • 对于文件页,脏页可以进行写回操作,非脏页直接释放即可。

直接内存回收 memory direct reclaim

OOM Killer

会先走直接内存回收,如果发现回收完了还是不够的话,会发生 OOM Kill。

OOM Killer 在 Linux 系统里如果内存不足时,会杀死一个正在运行的进程来释放一些内存。

Linux 内存统计

Cache, shmem

VSS/RSS/PSS/USS

内存耗用:VSS/RSS/PSS/USS 的介绍 - 简书

进程占用内存

进程内存占用分内核态用户态吗?

OOM 问题排查思路

首先查看事发期间的 journal dmesg(有可能已经被清理了):

cat /var/log/messages
# 查看 coredump 保存位置
cat /proc/sys/kernel/core_pattern
journalctl -k --since "2024-12-03 23:28:11" --until "2024-12-04 00:30:00"

进程 OOM 日志

一个非常直观的方式来查看:

sudo cat kern-20241208 | grep reaped

一个完整的日志如下所示。分成以下几部分:

  • call trace。表示造成 OOM 的链路;
  • 内存状态;
  • cgroup 下每一个进程的状态以及其 oom score;
  • 最后选择了哪些进程 kill 掉:
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=54966a6c9d0c88bbc8d9eaa6426451653613cece11693c665330ebc5b4d02719,mems_allowed=0-1,oom_memcg=/kubepods/podcabf8137-c743-4923-96bd-1e59c6ac4a83,task_memcg=/kubepods/podcabf8137-c743-4923-96bd-1e59c6ac4a83/54966a6c9d0c88bbc8d9eaa6426451653613cece11693c665330ebc5b4d02719,task=java,pid=161175,uid=2405

Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel: oom_reaper: reaped process 161175 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

下面这段日志连续 kill 了三个进程:

Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel: argusagent invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=-997
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel: CPU: 88 PID: 154533 Comm: argusagent Kdump: loaded Tainted: G           OE K   5.10.112-005.ali5000.al8.aarch64 #1
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel: Hardware name: Alibaba Alibaba Cloud ECS/Alibaba Cloud ECS, BIOS 1.2.M1.AL.P.157.00 07/29/2023
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel: Call trace:
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel:  dump_backtrace+0x0/0x1e0
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel:  show_stack+0x1c/0x24
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel:  dump_stack+0xcc/0x120
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel:  dump_header+0x3c/0x44
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel:  dump_memcg_header+0x20/0x58
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel:  oom_kill_process+0x26c/0x274
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel:  out_of_memory+0x100/0x3d0
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel:  mem_cgroup_out_of_memory+0x128/0x140
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel:  try_charge+0x544/0x5c0
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel:  mem_cgroup_charge+0x80/0x27c
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel:  __add_to_page_cache_locked+0x290/0x4c0
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel:  add_to_page_cache_lru+0x58/0xf4
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel:  pagecache_get_page+0x240/0x3f0
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel:  filemap_fault+0x544/0x724
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel:  ext4_filemap_fault+0x38/0x980
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel:  __do_fault+0x40/0x1f4
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel:  do_read_fault+0x64/0x36c
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel:  do_fault+0x8c/0x180
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel:  handle_pte_fault+0x84/0x234
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel:  __handle_mm_fault+0x1d8/0x390
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel:  handle_mm_fault+0xa0/0x200
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel:  do_page_fault+0x16c/0x3c0
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel:  do_translation_fault+0xac/0xc8
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel:  do_mem_abort+0x44/0xa0
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel:  el0_ia+0x68/0xdc
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel:  el0_sync_handler+0x90/0xb0
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel:  el0_sync+0x148/0x180
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel: memory: usage 8388392kB, limit 8388608kB, failcnt 82406
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel: memory+swap: usage 8388392kB, limit 9007199254740988kB, failcnt 0
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel: Memory cgroup stats for /kubepods/podcabf8137-c743-4923-96bd-1e59c6ac4a83:
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel: anon 8585138176
file 11624448
kernel_stack 0
percpu 0
sock 0
shmem 0
file_mapped 3784704
file_dirty 6758400
file_writeback 0
anon_thp 1730150400
file_thp 0
shmem_thp 0
inactive_anon 8589103104
active_anon 0
inactive_file 0
active_file 0
unevictable 0
slab_reclaimable 0
slab_unreclaimable 0
slab 0
workingset_refault_anon 0
workingset_refault_file 2321768
workingset_activate_anon 0
workingset_activate_file 261457
workingset_restore_anon 0
workingset_restore_file 20474
workingset_nodereclaim 0
pgfault 10609479070
pgmajfault 55256
pgrefill 4491117
pgscan 59669019
pgsteal 52371733
pgactivate 796306058
pgdeactivate 3688371
pglazyfree 0
pglazyfreed 0
thp_fault_alloc 6925
thp_collapse_alloc 1735
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel: Tasks state (memory values in pages):
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel: [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel: [ 134996]     0 134996      201        1    32768        0          -998 pause
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel: [ 135440]     0 135440      920      313    36864        0          -997 ops_container_i
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel: [ 151151]     0 151151   173921     1797   163840        0          -997 logagent
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel: [ 151753]     0 151753    81791    25909   409600        0          -997 logagent-collec
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel: [ 151759]     0 151759    21487     1213   118784        0          -997 logagent-collec
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel: [ 170234]     0 170234   773586     2704   528384        0          -997 staragentd
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel: [   7252]  1000  7252  1032130    70899  1024000        0          -997 java
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel: [ 169078]     0 169078   187481     5126   184320        0          -997 dfget
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel: [ 103530]     0 103530      571       88    36864        0          -997 sleep
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel: [ 113209]     0 113209    27441      703    65536        0          -997 su
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel: [ 113211]     0 113211    26770      643    57344        0          -997 cli.sh
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel: [ 113230]     0 113230    26468       89    53248        0          -997 sleep
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel: [ 145463]     0 145463     5220     3008    73728        0          -997 systemd
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel: [ 146939]     0 146939     4450      918    65536        0         -1000 sshd
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel: [ 147305]     0 147305    26993      538    53248        0          -997 crond
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel: [ 147509]     0 147509   842691     3611   479232        0          -997 staragentd
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel: [ 150780]     0 150780    41654     1390   360448        0          -997 rasp-daemon
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel: [ 154033]     0 154033   612567     1852   524288        0          -997 argusagent
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel: [ 155097]     0 155097     7419     1203    65536        0          -997 ilogtail
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel: [ 155098]     0 155098    86251    10704   315392        0          -997 ilogtail
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel: [ 161175]  2405 161175  2129386  1137647 13078528        0          -997 java
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel: [ 196134]     0 196134    35092     1638   200704        0          -997 syslog-ng
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel: [ 196516]     0 196516     5483     1005    77824        0          -997 systemd-journal
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel: [ 211042]     0 211042   287770     4936   438272        0          -997 uniagent
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel: [ 113407]     0 113407    27442      688    65536        0          -997 su
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel: [ 113415]     0 113415    26770      625    49152        0          -997 cli.sh
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel: [ 121483]     0 121483   567231   415853  3461120        0          -997 python2.7
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel: [ 121604]     0 121604   439169   410811  3371008        0          -997 rpm
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=54966a6c9d0c88bbc8d9eaa6426451653613cece11693c665330ebc5b4d02719,mems_allowed=0-1,oom_memcg=/kubepods/podcabf8137-c743-4923-96bd-1e59c6ac4a83,task_memcg=/kubepods/podcabf8137-c743-4923-96bd-1e59c6ac4a83/54966a6c9d0c88bbc8d9eaa6426451653613cece11693c665330ebc5b4d02719,task=java,pid=161175,uid=2405
Dec  4 00:01:23 phyhost-ecs-ali033057255136.na610 kernel: oom_reaper: reaped process 161175 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel: staragentd invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=-997
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel: CPU: 79 PID: 170287 Comm: staragentd Kdump: loaded Tainted: G           OE K   5.10.112-005.ali5000.al8.aarch64 #1
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel: Hardware name: Alibaba Alibaba Cloud ECS/Alibaba Cloud ECS, BIOS 1.2.M1.AL.P.157.00 07/29/2023
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel: Call trace:
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel:  dump_backtrace+0x0/0x1e0
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel:  show_stack+0x1c/0x24
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel:  dump_stack+0xcc/0x120
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel:  dump_header+0x3c/0x44
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel:  dump_memcg_header+0x20/0x58
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel:  oom_kill_process+0x26c/0x274
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel:  out_of_memory+0x100/0x3d0
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel:  mem_cgroup_out_of_memory+0x128/0x140
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel:  try_charge+0x544/0x5c0
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel:  mem_cgroup_charge+0x80/0x27c
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel:  __add_to_page_cache_locked+0x290/0x4c0
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel:  add_to_page_cache_lru+0x58/0xf4
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel:  pagecache_get_page+0x240/0x3f0
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel:  filemap_fault+0x544/0x724
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel:  ext4_filemap_fault+0x38/0x980
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel:  __do_fault+0x40/0x1f4
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel:  do_read_fault+0x64/0x36c
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel:  do_fault+0x8c/0x180
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel:  handle_pte_fault+0x84/0x234
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel:  __handle_mm_fault+0x1d8/0x390
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel:  handle_mm_fault+0xa0/0x200
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel:  do_page_fault+0x16c/0x3c0
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel:  do_translation_fault+0xac/0xc8
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel:  do_mem_abort+0x44/0xa0
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel:  el0_ia+0x68/0xdc
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel:  el0_sync_handler+0x90/0xb0
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel:  el0_sync+0x148/0x180
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel: memory: usage 8388600kB, limit 8388608kB, failcnt 138540
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel: memory+swap: usage 8388600kB, limit 9007199254740988kB, failcnt 0
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel: Memory cgroup stats for /kubepods/podcabf8137-c743-4923-96bd-1e59c6ac4a83:
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel: anon 8588054528
file 10137600
kernel_stack 0
percpu 0
sock 0
shmem 0
file_mapped 3108864
file_dirty 6893568
file_writeback 0
anon_thp 0
file_thp 0
shmem_thp 0
inactive_anon 8591228928
active_anon 0
inactive_file 180224
active_file 0
unevictable 0
slab_reclaimable 0
slab_unreclaimable 0
slab 0
workingset_refault_anon 0
workingset_refault_file 2561744
workingset_activate_anon 0
workingset_activate_file 261919
workingset_restore_anon 0
workingset_restore_file 20474
workingset_nodereclaim 0
pgfault 10610763133
pgmajfault 64595
pgrefill 4619575
pgscan 77639847
pgsteal 52617757
pgactivate 796428620
pgdeactivate 3811580
pglazyfree 0
pglazyfreed 0
thp_fault_alloc 6925
thp_collapse_alloc 1735
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel: Tasks state (memory values in pages):
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel: [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel: [ 134996]     0 134996      201        1    32768        0          -998 pause
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel: [ 135440]     0 135440      920      313    36864        0          -997 ops_container_i
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel: [ 151151]     0 151151   173921     1797   163840        0          -997 logagent
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel: [ 151753]     0 151753    85889    25934   417792        0          -997 logagent-collec
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel: [ 151759]     0 151759    21487     1379   118784        0          -997 logagent-collec
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel: [ 170234]     0 170234   773586     2678   528384        0          -997 staragentd
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel: [   7252]  1000  7252  1032130    70899  1024000        0          -997 java
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel: [ 103530]     0 103530      571       88    36864        0          -997 sleep
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel: [ 113209]     0 113209    27441      703    65536        0          -997 su
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel: [ 113211]     0 113211    26770      643    57344        0          -997 cli.sh
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel: [ 140894]     0 140894     1286      259    45056        0          -997 python2.7
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel: [ 145463]     0 145463     5220     3015    73728        0          -997 systemd
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel: [ 146939]     0 146939     4450      884    65536        0         -1000 sshd
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel: [ 147305]     0 147305    26993      565    53248        0          -997 crond
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel: [ 147509]     0 147509   842691     3627   479232        0          -997 staragentd
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel: [ 150780]     0 150780    41654     1493   360448        0          -997 rasp-daemon
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel: [ 154033]     0 154033   649433     1751   544768        0          -997 argusagent
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel: [ 155097]     0 155097     7419     1203    65536        0          -997 ilogtail
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel: [ 155098]     0 155098    86251    11666   315392        0          -997 ilogtail
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel: [ 196134]     0 196134    35092     1643   200704        0          -997 syslog-ng
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel: [ 196516]     0 196516     5483     1031    77824        0          -997 systemd-journal
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel: [ 211042]     0 211042   287770     4937   438272        0          -997 uniagent
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel: [ 113407]     0 113407    27442      688    65536        0          -997 su
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel: [ 113415]     0 113415    26770      625    49152        0          -997 cli.sh
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel: [ 121483]     0 121483  1142190   990715  8065024        0          -997 python2.7
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel: [ 121604]     0 121604  1007812   979443  7925760        0          -997 rpm
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=3f728405d7e4e3578281464d5de9c9f697c7623bf6084488e43ac18a060798be,mems_allowed=0-1,oom_memcg=/kubepods/podcabf8137-c743-4923-96bd-1e59c6ac4a83,task_memcg=/kubepods/podcabf8137-c743-4923-96bd-1e59c6ac4a83/54966a6c9d0c88bbc8d9eaa6426451653613cece11693c665330ebc5b4d02719,task=python2.7,pid=121483,uid=0
Dec  4 00:02:19 phyhost-ecs-ali033057255136.na610 kernel: oom_reaper: reaped process 121483 (python2.7), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel: LogProcess-0 invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=-997
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel: CPU: 118 PID: 211259 Comm: LogProcess-0 Kdump: loaded Tainted: G           OE K   5.10.112-005.ali5000.al8.aarch64 #1
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel: Hardware name: Alibaba Alibaba Cloud ECS/Alibaba Cloud ECS, BIOS 1.2.M1.AL.P.157.00 07/29/2023
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel: Call trace:
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel:  dump_backtrace+0x0/0x1e0
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel:  show_stack+0x1c/0x24
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel:  dump_stack+0xcc/0x120
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel:  dump_header+0x3c/0x44
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel:  dump_memcg_header+0x20/0x58
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel:  oom_kill_process+0x26c/0x274
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel:  out_of_memory+0x100/0x3d0
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel:  mem_cgroup_out_of_memory+0x128/0x140
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel:  try_charge+0x544/0x5c0
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel:  mem_cgroup_charge+0x80/0x27c
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel:  __add_to_page_cache_locked+0x290/0x4c0
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel:  add_to_page_cache_lru+0x58/0xf4
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel:  pagecache_get_page+0x240/0x3f0
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel:  filemap_fault+0x544/0x724
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel:  ext4_filemap_fault+0x38/0x980
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel:  __do_fault+0x40/0x1f4
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel:  do_read_fault+0x64/0x36c
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel:  do_fault+0x8c/0x180
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel:  handle_pte_fault+0x84/0x234
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel:  __handle_mm_fault+0x1d8/0x390
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel:  handle_mm_fault+0xa0/0x200
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel:  do_page_fault+0x16c/0x3c0
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel:  do_translation_fault+0xac/0xc8
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel:  do_mem_abort+0x44/0xa0
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel:  el0_ia+0x68/0xdc
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel:  el0_sync_handler+0x90/0xb0
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel:  el0_sync+0x148/0x180
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel: memory: usage 8388536kB, limit 8388608kB, failcnt 265013
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel: memory+swap: usage 8388536kB, limit 9007199254740988kB, failcnt 0
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel: Memory cgroup stats for /kubepods/podcabf8137-c743-4923-96bd-1e59c6ac4a83:
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel: anon 8591028224
file 9867264
kernel_stack 0
percpu 0
sock 0
shmem 0
file_mapped 2162688
file_dirty 6217728
file_writeback 0
anon_thp 0
file_thp 0
shmem_thp 0
inactive_anon 8594743296
active_anon 0
inactive_file 0
active_file 0
unevictable 0
slab_reclaimable 0
slab_unreclaimable 0
slab 0
workingset_refault_anon 0
workingset_refault_file 2948207
workingset_activate_anon 0
workingset_activate_file 263569
workingset_restore_anon 0
workingset_restore_file 21728
workingset_nodereclaim 0
pgfault 10611964828
pgmajfault 79775
pgrefill 4850797
pgscan 107324432
pgsteal 53009265
pgactivate 796650413
pgdeactivate 4036136
pglazyfree 0
pglazyfreed 0
thp_fault_alloc 6925
thp_collapse_alloc 1735
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel: Tasks state (memory values in pages):
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel: [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel: [ 134996]     0 134996      201        1    32768        0          -998 pause
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel: [ 135440]     0 135440      920      313    36864        0          -997 ops_container_i
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel: [ 151151]     0 151151   173921     1797   163840        0          -997 logagent
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel: [ 151753]     0 151753    85889    25953   417792        0          -997 logagent-collec
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel: [ 151759]     0 151759    21487     1373   118784        0          -997 logagent-collec
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel: [ 170234]     0 170234   773586     2685   528384        0          -997 staragentd
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel: [   7252]  1000  7252  1032130    70900  1024000        0          -997 java
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel: [ 103530]     0 103530      571       88    36864        0          -997 sleep
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel: [ 145463]     0 145463     5220     3015    73728        0          -997 systemd
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel: [ 146939]     0 146939     4450      782    65536        0         -1000 sshd
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel: [ 147305]     0 147305    26993      565    53248        0          -997 crond
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel: [ 147509]     0 147509   842691     3598   479232        0          -997 staragentd
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel: [ 150780]     0 150780    41654     1517   360448        0          -997 rasp-daemon
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel: [ 154033]     0 154033   649433     1893   544768        0          -997 argusagent
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel: [ 155097]     0 155097     7419     1203    65536        0          -997 ilogtail
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel: [ 155098]     0 155098    86251    11666   315392        0          -997 ilogtail
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel: [ 196134]     0 196134    35092     1557   200704        0          -997 syslog-ng
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel: [ 196516]     0 196516     5483     1030    77824        0          -997 systemd-journal
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel: [ 211042]     0 211042   287770     4941   438272        0          -997 uniagent
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel: [ 121604]     0 121604  1997684  1969310 15863808        0          -997 rpm
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=54966a6c9d0c88bbc8d9eaa6426451653613cece11693c665330ebc5b4d02719,mems_allowed=0-1,oom_memcg=/kubepods/podcabf8137-c743-4923-96bd-1e59c6ac4a83,task_memcg=/kubepods/podcabf8137-c743-4923-96bd-1e59c6ac4a83/54966a6c9d0c88bbc8d9eaa6426451653613cece11693c665330ebc5b4d02719,task=rpm,pid=121604,uid=0
Dec  4 00:03:49 phyhost-ecs-ali033057255136.na610 kernel: oom_reaper: reaped process 121604 (rpm), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

进程 coredump / vmcore / kdump

kdump 是用户态的一个组件,是一个 systemd 的服务。

coredump 描述的是行为,而 vmcore 表示的是结果,都是一个意思。

我们使用 kill -9 命令杀死一个进程会发生 core dump 吗?实验证明是不能的,那么什么情况会产生呢?如果我们信号均是采用默认操作,那么,以下列出几种信号,它们在发生时会产生 core dump:

默认情况下,linux 系统是不会生成 core dump 文件,可以使用命令 ulimit -c 来查看,如果输出为 0 则表示不会生成 core dump 文件。如果输出 unlimited 表示生成文件大小不受限制。

编译的时候,需要加上 -g 选项,程序崩溃的时候才会生成 core dump 文件。

coredump 文件生成格式 /proc/sys/kernel/core_pattern

linux 下core dump文件的生成以及错误定位_linux core dump-CSDN博客

这个是谁保存的?一般保存到哪里?

Memcg OOM Strategy / oom_score

如果一个 memcg 发生了 OOM,那么这个 memcg 里 kill 掉哪一个进程是如何决定的呢?

You can see the oom_score of each of the processes in the /proc filesystem under the pid directory:

cat /proc/10292/oom_score

The higher the value of oom_score of any process, the higher is its likelihood of getting killed.

  • oom_scoreoom_score_adj:每个进程都有一个 oom_score,表示它被杀的可能性。分数越高,该进程越有可能在 OOM 事件中被杀。系统管理员可以通过调节 oom_score_adj 来影响一个进程的 OOM 分数。这个值为一个可调的权重,设置范围是 -1000 到 1000,一个较低的值将降低被选择为 OOM 的可能。

How is the oom_score calculated?

The calculation turns into a simple question of what percentage of the available memory is being used by the process. If the system as a whole is short of memory, then "available memory" is the sum of all RAM and swap space available to the system.

If instead, the OOM situation is caused by exhausting the memory allowed to a given cpuset/control group, then "available memory" is the total amount allocated to that control group. A similar calculation is made if limits imposed by a memory policy have been exceeded. In each case, the memory use of the process is deemed to be the sum of its resident set (the number of RAM pages it is using) and its swap usage.

如果有 500G 的可用内存空间,但是某一个进程用了 100G,相当于占用了 20%;如果一个进程所在的 memcg 只有 10G,但是只用了 3G,相当于占用了 30%,这种情况下会优先 kill 谁?

This calculation produces a percent-times-ten number as a result; a process which is using every byte of the memory available to it will have a score of 1000, while a process using no memory at all will get a score of zero. There are very few heuristic tweaks to this score, but the code does still subtract a small amount (30) from the score of root-owned processes on the notion that they are slightly more valuable than user-owned processes.

One other tweak which is applied is to add the value stored in each process's oom_score_adj variable, which can be adjusted via /proc. This knob allows the adjustment of each process's attractiveness to the OOM killer in user space; setting it to -1000 will disable OOM kills entirely, while setting to +1000 is the equivalent of painting a large target on the associated process.

User-space out-of-memory handling

当发生 OOM 时,userspace 也可以进行干预。

User-space out-of-memory handling [LWN.net]