HBase (and YARN, and containers, if colocating Hadoop MR). Sum the -Xmx
executing on the server. The total should be less than the total amount of
RAM available on the server. Additionally you will want to reserve ~1GB for
the OS. Finally, set vm.swappiness=0 in /etc/sysctl.conf to prevent
> The snippet in /var/log/messages is as follows, i am sure that process
> killed(22827) is RegsionServer.
> ......
> May 14 12:00:38 localhost kernel: Mem-Info:
> May 14 12:00:38 localhost kernel: Node 0 DMA per-cpu:
> May 14 12:00:38 localhost kernel: CPU 0: hi: 0, btch: 1 usd: 0
> ......
> May 14 12:00:38 localhost kernel: CPU 39: hi: 0, btch: 1 usd: 0
> May 14 12:00:38 localhost kernel: Node 0 DMA32 per-cpu:
> May 14 12:00:38 localhost kernel: CPU 0: hi: 186, btch: 31 usd: 30
> ......
> May 14 12:00:38 localhost kernel: CPU 39: hi: 186, btch: 31 usd: 8
> May 14 12:00:38 localhost kernel: Node 0 Normal per-cpu:
> May 14 12:00:38 localhost kernel: CPU 0: hi: 186, btch: 31 usd: 5
> ......
> May 14 12:00:38 localhost kernel: CPU 39: hi: 186, btch: 31 usd: 20
> May 14 12:00:38 localhost kernel: Node 1 Normal per-cpu:
> May 14 12:00:38 localhost kernel: CPU 0: hi: 186, btch: 31 usd: 7
> ......
> May 14 12:00:38 localhost kernel: CPU 39: hi: 186, btch: 31 usd: 10
> May 14 12:00:38 localhost kernel: active_anon:7993118 inactive_anon:48001
> isolated_anon:0
> May 14 12:00:38 localhost kernel: active_file:855 inactive_file:960
> isolated_file:0
> May 14 12:00:38 localhost kernel: unevictable:0 dirty:0 writeback:0
> unstable:0
> May 14 12:00:38 localhost kernel: free:39239 slab_reclaimable:14043
> slab_unreclaimable:27993
> May 14 12:00:38 localhost kernel: mapped:48750 shmem:75053
> pagetables:20540 bounce:0
> May 14 12:00:38 localhost kernel: Node 0 DMA free:15732kB min:40kB
> low:48kB high:60kB active_anon:0kB inactive_anon:0kB active_file:0kB
> inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> present:15336kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
> slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB
> unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
> all_unreclaimable? yes
> May 14 12:00:38 localhost kernel: lowmem_reserve[]: 0 3211 16088 16088
> May 14 12:00:38 localhost kernel: Node 0 DMA32 free:60388kB min:8968kB
> low:11208kB high:13452kB active_anon:2811676kB inactive_anon:72kB
> active_file:0kB inactive_file:788kB unevictable:0kB isolated(anon):0kB
> isolated(file):0kB present:3288224kB mlocked:0kB dirty:0kB writeback:44kB
> mapped:156kB shmem:8232kB slab_reclaimable:10652kB
> slab_unreclaimable:5144kB kernel_stack:56kB pagetables:4252kB unstable:0kB
> bounce:0kB writeback_tmp:0kB pages_scanned:1312 all_unreclaimable? yes
> May 14 12:00:38 localhost kernel: lowmem_reserve[]: 0 0 12877 12877
> May 14 12:00:38 localhost kernel: Node 0 Normal free:35772kB min:35964kB
> low:44952kB high:53944kB active_anon:13062472kB inactive_anon:4864kB
> active_file:1268kB inactive_file:1504kB unevictable:0kB isolated(anon):0kB
> isolated(file):0kB present:13186560kB mlocked:0kB dirty:0kB writeback:92kB
> mapped:6172kB shmem:51928kB slab_reclaimable:22732kB
> slab_unreclaimable:73204kB kernel_stack:16240kB pagetables:38040kB
> unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:10268
> all_unreclaimable? yes
> May 14 12:00:38 localhost kernel: lowmem_reserve[]: 0 0 0 0
> May 14 12:00:38 localhost kernel: Node 1 Normal free:45064kB min:45132kB
> low:56412kB high:67696kB active_anon:16098324kB inactive_anon:187068kB
> active_file:2192kB inactive_file:1548kB unevictable:0kB isolated(anon):0kB
> isolated(file):0kB present:16547840kB mlocked:0kB dirty:116kB writeback:0kB
> mapped:188672kB shmem:240052kB slab_reclaimable:22788kB
> slab_unreclaimable:33624kB kernel_stack:7352kB pagetables:39868kB
> unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:12064
> all_unreclaimable? yes
> May 14 12:00:38 localhost kernel: lowmem_reserve[]: 0 0 0 0
> May 14 12:00:38 localhost kernel: Node 0 DMA: 1*4kB 0*8kB 1*16kB 1*32kB
> 1*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15732kB
> May 14 12:00:38 localhost kernel: Node 0 DMA32: 659*4kB 576*8kB 485*16kB
> 338*32kB 208*64kB 106*128kB 27*256kB 2*512kB 0*1024kB 0*2048kB 0*4096kB =
> 60636kB
> May 14 12:00:38 localhost kernel: Node 0 Normal: 1166*4kB 579*8kB 337*16kB
> 203*32kB 106*64kB 61*128kB 3*256kB 2*512kB 0*1024kB 0*2048kB 0*4096kB =
> 37568kB
> May 14 12:00:38 localhost kernel: Node 1 Normal: 668*4kB 405*8kB 422*16kB
> 259*32kB 176*64kB 67*128kB 7*256kB 2*512kB 0*1024kB 0*2048kB 0*4096kB =
> 43608kB
> May 14 12:00:38 localhost kernel: 78257 total pagecache pages
> May 14 12:00:38 localhost kernel: 0 pages in swap cache
> May 14 12:00:38 localhost kernel: Swap cache stats: add 0, delete 0, find
> 0/0
> May 14 12:00:38 localhost kernel: Free swap = 0kB
> May 14 12:00:38 localhost kernel: Total swap = 0kB
> May 14 12:00:38 localhost kernel: 8388607 pages RAM
> May 14 12:00:38 localhost kernel: 181753 pages reserved
> May 14 12:00:38 localhost kernel: 77957 pages shared
> May 14 12:00:38 localhost kernel: 8104642 pages non-shared
> May 14 12:00:38 localhost kernel: [ pid ] uid tgid total_vm rss
> cpu oom_adj oom_score_adj name
> ......
> May 14 12:00:38 localhost kernel: [22827] 483 22827 4392305 4074129
> 23 0 0 java
> May 14 12:00:38 localhost kernel: [38727] 483 38727 428355 74385
> 22 0 0 java
> ......
> May 14 12:00:38 localhost kernel: Out of memory: Kill process 22827 (java)
> score 497 or sacrifice child
> May 14 12:00:38 localhost kernel: Killed process 22827, UID 483, (java)
> total-vm:17569220kB, anon-rss:16296276kB, file-rss:240kB
> May 14 12:00:38 localhost kernel: sleep invoked oom-killer:
> gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
> May 14 12:00:38 localhost kernel: sleep cpuset=/ mems_allowed=0-1
> May 14 12:00:38 localhost kernel: Pid: 31136, comm: sleep Not tainted
> 2.6.32-358.el6.x86_64 #1
> May 14 12:00:38 localhost kernel: Call Trace:
> May 14 12:00:38 localhost kernel: [<ffffffff810cb5d1>] ?
> cpuset_print_task_mems_allowed+0x91/0xb0
> May 14 12:00:38 localhost kernel: [<ffffffff8111cd10>] ?
> dump_header+0x90/0x1b0
> May 14 12:00:38 localhost kernel: [<ffffffff810e91ee>] ?
> __delayacct_freepages_end+0x2e/0x30
> May 14 12:00:38 localhost kernel: [<ffffffff8121d0bc>] ?
> security_real_capable_noaudit+0x3c/0x70
> May 14 12:00:38 localhost kernel: [<ffffffff8111d192>] ?
> oom_kill_process+0x82/0x2a0
> May 14 12:00:38 localhost kernel: [<ffffffff8111d0d1>] ?
> select_bad_process+0xe1/0x120
> May 14 12:00:38 localhost kernel: [<ffffffff8111d5d0>] ?
> out_of_memory+0x220/0x3c0
> May 14 12:00:38 localhost kernel: [<ffffffff8112c27c>] ?
> __alloc_pages_nodemask+0x8ac/0x8d0
> May 14 12:00:38 localhost kernel: [<ffffffff8116087a>] ?
> alloc_pages_current+0xaa/0x110
> May 14 12:00:38 localhost kernel: [<ffffffff8111a0f7>] ?
> __page_cache_alloc+0x87/0x90
> May 14 12:00:38 localhost kernel: [<ffffffff81119ade>] ?
> find_get_page+0x1e/0xa0
> May 14 12:00:38 localhost kernel: [<ffffffff8111b0b7>] ?
> filemap_fault+0x1a7/0x500
> May 14 12:00:38 localhost kernel: [<ffffffff811430b4>] ?
> __do_fault+0x54/0x530
> May 14 12:00:38 localhost kernel: [<ffffffff81059784>] ?
> find_busiest_group+0x244/0x9f0
> May 14 12:00:38 localhost kernel: [<ffffffff81143687>] ?
> handle_pte_fault+0xf7/0xb50
> May 14 12:00:38 localhost kernel: [<ffffffff8105e203>] ?
> perf_event_task_sched_out+0x33/0x80
> May 14 12:00:38 localhost kernel: [<ffffffff8114431a>] ?
> handle_mm_fault+0x23a/0x310
> May 14 12:00:38 localhost kernel: [<ffffffff810474c9>] ?
> __do_page_fault+0x139/0x480
> May 14 12:00:38 localhost kernel: [<ffffffff8109be2f>] ?
> hrtimer_try_to_cancel+0x3f/0xd0
> May 14 12:00:38 localhost kernel: [<ffffffff8109bee2>] ?
> hrtimer_cancel+0x22/0x30
> May 14 12:00:38 localhost kernel: [<ffffffff8150f1b3>] ?
> do_nanosleep+0x93/0xc0
> May 14 12:00:38 localhost kernel: [<ffffffff8109bfb4>] ?
> hrtimer_nanosleep+0xc4/0x180
> May 14 12:00:38 localhost kernel: [<ffffffff8109ae00>] ?
> hrtimer_wakeup+0x0/0x30
> May 14 12:00:38 localhost kernel: [<ffffffff8151311e>] ?
> do_page_fault+0x3e/0xa0
> May 14 12:00:38 localhost kernel: [<ffffffff815104d5>] ?
> page_fault+0x25/0x30
> ......
>
>
>
>
>
>
>
>
>
>
> At 2015-05-16 02:39:02, "iain wright" <
[hidden email]> wrote:
> >What log is this seen in? Can you paste the log line? Do you mean
> >/var/log/messages?
> >On May 12, 2015 7:44 PM, "David chen" <
[hidden email]> wrote:
> >
> >> A RegionServer was killed because OutOfMemory(OOM), although the
> process
> >> killed can be seen in the Linux message log, but i still have two
> following
> >> problems:
> >> 1. How to inspect the root reason to cause OOM?
> >> 2 When RegionServer encounters OOM, why can't it free some memories
> >> occupied? if so, whether or not killer will not need.
> >> Any ideas can be appreciated!
>
Problems worthy of attack prove their worth by hitting back. - Piet Hein