Home CPU Flame Graph of Slab Allocators
Post
Cancel

CPU Flame Graph of Slab Allocators

What is CPU Flame Graph

FlameGraph is a awesome visualization tool made by Brendan Gregg. it visualizes function graphs and helps understand which call stacks consume most of cpu time. below is example of a flame graph:

CPU Flame Graph of Slab allocators

If you are first time using flamegraph, Read Brendan’s great tutorial. I used perf.

In this article, I’ll show you a flamegraph of hackbench, which is a scheduler benchmark. it is used to measure performance of slab allocators because it intensively uses slab.

Before running perf, I compiled kernel with -fno-indirect-inlining and removed all inline keyword in mm/sl[auo]b.c and mm/slab_common.c. uninlining functions makes overhead of function call but helps analysis of flame graph.

SLUB

this flame graph shows that hackbench consumes most of time (96.41%) calling read/write system calls.
it consumes 11.72% of time for allocation (__kmalloc_node_track_caller (6.06%) + kmem_cache_alloc_node (5.66%)),
and 13.3% of time for deallocation (kmem_cache_free (6.74%) + kfree (6.56%)).

SLUB consumes 25.02% of total time on hackbench.
And its consumes roughly 18~19% of allocation time in its slowpath (___slab_alloc).

SLAB

Similarly SLAB consumes 10.5% for allocation, 9.82% for deallocation.
SLAB consumes 20.32% of total time on hackbench.

SLOB

SLOB uses 68.89% of total time on hackbench. this is so high. the core reason is SLOB uses global lock named slob_lock in allocation/deallocation step. because lock is so coarse, it consumes roughly 98~99% of allocation/deallocation time just waiting for lock.

But this is OK because SLOB is not written for workloads where allocations are so frequent. SLOB is for machines where memory is so low and its global locking is unavoidable because if SLOB uses per-cache locking that will result in increase of memory.

This post is licensed under CC BY 4.0 by the author.

Getting Started

-

Comments powered by Disqus.