Performance test for evaluating the overhead of tracing process memory activity

I was designing a PaaS service that needed a way to track hosted process memory actions in runtime as black box i.e., without knowing from advanced which process will be used. The service did not require to know the process source code or even binary from advanced. The memory tracking covered which memory addresses are read and writen by the hosted processes.

I tried Intel’s PIN tool which is a dynamic binary instrumentation tool – it provides hooks to instrument your code into a process in runtime. Using the hooks for memory reading and writing I added my code that tracked the address used in all MOV instructions.

The service worked but then I wander: Are there a lot of MOV instruction in a typical process? More precise:

  1. What is the percentile of MOV instructions from total instructions? In idle? during load?
  2. How many MOV instructions are executed every second? In idle? during load?
  3. If the instrumented process is a service – what is the response time (latency) degradation due to the instrumentation?

To figure out the answers to this questions – I tested a popular database MySQL which its binaries were instrumented with Intel’s PIN and measured the rate of memory read/write actions during idle and during MySQL request’s processing. The instrumentation code included  only code that count the number of memory actions and every instruction i.e., no other overhead was added to MySQL binary in runtime. I also measured the MySQL requests latency of this very basic instrumentation.

Results summary

  1. Application memory instruction takes a significant part from the total of instructions that are executed ~25%.
  2. Application memory instruction rate can be very high ~100M instructions/second.
  3. Application latency degrade significantly with instrumentation ~16X times slower.

Detailed results

Table 1 below show results of comparing read/write during idle/minor load

Metric Idle (no load) Load of select & insert operations
write/sec 20K 180M
read/sec 50K 100M
read/total 14% 30%
write/total 25% 25%

Table 1

Table 2 below show results of comparing latency with and without instrumentation. Instrumentation included code that increase counters on memory read and writes. Load was composed of 10x SELECT and INSERT requests to MySQL. Instrumentation increase two counters for each memory instruction. mysqld process CPU was 97% during load with instrumentation.

Metric MySQL without instrumentation MySQL with instrumentation
latency (millisecond) 160 2600

Table 2

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s