I was designing a PaaS service that needed a way to track hosted process memory actions in runtime as black box i.e., without knowing from advanced which process will be used. The service did not require to know the process source code or even binary from advanced. The memory tracking covered which memory addresses are read and writen by the hosted processes.
I tried Intel’s PIN tool which is a dynamic binary instrumentation tool – it provides hooks to instrument your code into a process in runtime. Using the hooks for memory reading and writing I added my code that tracked the address used in all MOV instructions.
The service worked but then I wander: Are there a lot of MOV instruction in a typical process? More precise:
- What is the percentile of MOV instructions from total instructions? In idle? during load?
- How many MOV instructions are executed every second? In idle? during load?
- If the instrumented process is a service – what is the response time (latency) degradation due to the instrumentation?
To figure out the answers to this questions – I tested a popular database MySQL which its binaries were instrumented with Intel’s PIN and measured the rate of memory read/write actions during idle and during MySQL request’s processing. The instrumentation code included only code that count the number of memory actions and every instruction i.e., no other overhead was added to MySQL binary in runtime. I also measured the MySQL requests latency of this very basic instrumentation.
Results summary
- Application memory instruction takes a significant part from the total of instructions that are executed ~25%.
- Application memory instruction rate can be very high ~100M instructions/second.
- Application latency degrade significantly with instrumentation ~16X times slower.
Detailed results
Table 1 below show results of comparing read/write during idle/minor load
Metric | Idle (no load) | Load of select & insert operations |
write/sec | 20K | 180M |
read/sec | 50K | 100M |
read/total | 14% | 30% |
write/total | 25% | 25% |
Table 1
Table 2 below show results of comparing latency with and without instrumentation. Instrumentation included code that increase counters on memory read and writes. Load was composed of 10x SELECT and INSERT requests to MySQL. Instrumentation increase two counters for each memory instruction. mysqld process CPU was 97% during load with instrumentation.
Metric | MySQL without instrumentation | MySQL with instrumentation |
latency (millisecond) | 160 | 2600 |
Table 2