Skip to content

2019

Java, The Cost of a Single Element Loop

In quite a few cases I have seen myself designing code with listeners and callbacks. It is quite common for a class that emits events, to expose an API to attach listener(s) to it. Those listeners are usually stored in a data structure (see List, Set, Array) and when an event is about to be dispatched the listeners are iterated in a loop and the appropriate callback is called.

Something along the lines of:

public class EventDispatcher {

        private final List<Listener> listeners = new ArrayList<>();

        public void dispatchEvent() {
            final MyEvent event = new MyEvent();
            for (Listener listener : this.listeners) {
                listener.onEvent(event);
            }
        }
        public void attachListener(final Listener listener) {
            this.listeners.add(listener);
        }

        public void removeListener(final Listener listener) {
            this.listeners.remove(listener);
        }
    }

    public static class Listener implements EventListener {

        void onEvent(final MyEvent myEvent) {
            // do staff
        }
    }

    public static class MyEvent {

    }

In many cases I have observed that despite the fact that the class is desinged to accept many listeners, the true is actually that just one listener is attached in the majority of the cases.

Hence I wanted to measure the performance penalty paid in case the class had just one listener vs if the class was initially designed to accept just one listener.

In essence I wanted to check the performance impact on the below two cases.

private Listener listener;
        private final List<Listener> singleElementArray = new ArrayList<Listener>(){
            {add(new Listener());}
        };

        public void dispatch() {
            this.listener.onEvent(new MyEvent());
        }

        public void dispatchInLoop() {
            for (int i = 0; i < 1; i++) {
                this.singleElementArray.get(i).onEvent(new MyEvent());
            }
        }

Assumptions Made Prior To Testing

Before creating a benchmark for the above, I made some assumptions:

  • I assumed the single element (single listener in a data container) loop would be unrolled
  • I (wrongly) assumed that the performane cost will not be significant. As effectively with the loop unrolled I would think the native code produced would more or less look close enough

JMH Benchmark

In order to test my assumptions I created the below benchmark:

SingleElementLoopBenchmark.java

Initial Observations

To my surprise I found out that an invocaiton on a single element list was about ~2,5 slower, based on the below throughput numbers:

Benchmark                                                          Mode  Cnt   Score   Error   Units
          SingleElementLoopBenchmark.directInvocation                       thrpt   10   0.317 ± 0.022  ops/ns
          SingleElementLoopBenchmark.singleElementListLoopInvocation        thrpt   10   0.114 ± 0.010  ops/ns

I couldn't really understand why and the above seemed a bit too far from my expecations/assumptions.

The first thing that I verified with JVM argument -XX:+PrintCompilation was that both methods were compiled with C2 compiler, which was the case.

I also tried to print the assembly code with -XX:+PrintAssembly but I couldn't really read/interpret the assembly code.

Resorting to Social Media

I ended up posting a tweet about my findings and asking some pointer on where/how to look for explanations on what I was observing. The answer I got was to try to find the hot methods by using something like perfasm, which would tie the assembly output to the hottest methods of my benchmark.

Which I did with -prof dtraceasm (The benchmark was running on a Mac that's why I used dtrace). The output was the below:

Direct Invocation

9.56%  ↗  0x000000010b73c950: mov    0x40(%rsp),%r10
  1.00%  │  0x000000010b73c955: mov    0xc(%r10),%r10d                ;*getfield dispatcher {reexecute=0 rethrow=0 return_oop=0}
         │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::directInvocation@1 (line 23)
         │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@17 (line 121)
  0.17%  │  0x000000010b73c959: mov    0xc(%r12,%r10,8),%r11d         ; implicit exception: dispatches to 0x000000010b73ca12
 11.18%  │  0x000000010b73c95e: test   %r11d,%r11d
  0.00%  │  0x000000010b73c961: je     0x000000010b73c9c9             ;*invokevirtual performAction {reexecute=0 rethrow=0 return_oop=0}
         │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invoke@5 (line 40)
         │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::directInvocation@5 (line 23)
         │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@17 (line 121)
 10.69%  │  0x000000010b73c963: mov    %r9,(%rsp)
  0.65%  │  0x000000010b73c967: mov    0x38(%rsp),%rsi
  0.00%  │  0x000000010b73c96c: mov    $0x1,%edx
  0.14%  │  0x000000010b73c971: xchg   %ax,%ax
 10.08%  │  0x000000010b73c973: callq  0x000000010b6c2900             ; ImmutableOopMap{[48]=Oop [56]=Oop [64]=Oop [0]=Oop }
         │                                                            ;*invokevirtual consume {reexecute=0 rethrow=0 return_oop=0}
         │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Listener::performAction@2 (line 53)
         │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invoke@5 (line 40)
         │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::directInvocation@5 (line 23)
         │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@17 (line 121)
         │                                                            ;   {optimized virtual_call}
  1.44%  │  0x000000010b73c978: mov    (%rsp),%r9
  0.19%  │  0x000000010b73c97c: movzbl 0x94(%r9),%r8d                 ;*ifeq {reexecute=0 rethrow=0 return_oop=0}
         │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@30 (line 123)
  9.77%  │  0x000000010b73c984: mov    0x108(%r15),%r10
  0.99%  │  0x000000010b73c98b: add    $0x1,%rbp                      ; ImmutableOopMap{r9=Oop [48]=Oop [56]=Oop [64]=Oop }
         │                                                            ;*ifeq {reexecute=1 rethrow=0 return_oop=0}
         │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@30 (line 123)
  0.02%  │  0x000000010b73c98f: test   %eax,(%r10)                    ;   {poll}
  0.28%  │  0x000000010b73c992: test   %r8d,%r8d
  0.00%  ╰  0x000000010b73c995: je     0x000000010b73c950             ;*aload_1 {reexecute=0 rethrow=0 return_oop=0}

Single Element Loop Invocation

         ╭    0x000000011153fa9d: jmp    0x000000011153fad6
  0.19%  │ ↗  0x000000011153fa9f: mov    0x58(%rsp),%r13
  3.55%  │ │  0x000000011153faa4: mov    (%rsp),%rcx
  0.09%  │ │  0x000000011153faa8: mov    0x60(%rsp),%rdx
  0.22%  │ │  0x000000011153faad: mov    0x50(%rsp),%r11
  0.17%  │ │  0x000000011153fab2: mov    0x8(%rsp),%rbx                 ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
         │ │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@12 (line 44)
         │ │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
         │ │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  3.55%  │↗│  0x000000011153fab7: movzbl 0x94(%r11),%r8d                ;*goto {reexecute=0 rethrow=0 return_oop=0}
         │││                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@35 (line 44)
         │││                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
         │││                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  0.16%  │││  0x000000011153fabf: mov    0x108(%r15),%r10
  0.28%  │││  0x000000011153fac6: add    $0x1,%rbx                      ; ImmutableOopMap{r11=Oop rcx=Oop rdx=Oop r13=Oop }
         │││                                                            ;*ifeq {reexecute=1 rethrow=0 return_oop=0}
         │││                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@30 (line 123)
  0.19%  │││  0x000000011153faca: test   %eax,(%r10)                    ;   {poll}
  4.00%  │││  0x000000011153facd: test   %r8d,%r8d
         │││  0x000000011153fad0: jne    0x000000011153fbe9             ;*aload_1 {reexecute=0 rethrow=0 return_oop=0}
         │││                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@33 (line 124)
  0.07%  ↘││  0x000000011153fad6: mov    0xc(%rcx),%r8d                 ;*getfield dispatcher {reexecute=0 rethrow=0 return_oop=0}
          ││                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@1 (line 28)
          ││                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  0.22%   ││  0x000000011153fada: mov    0x10(%r12,%r8,8),%r10d         ;*getfield singleListenerList {reexecute=0 rethrow=0 return_oop=0}
          ││                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@4 (line 44)
          ││                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
          ││                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
          ││                                                            ; implicit exception: dispatches to 0x000000011153ff2a
  0.21%   ││  0x000000011153fadf: mov    0x8(%r12,%r10,8),%edi          ; implicit exception: dispatches to 0x000000011153ff3e
  4.39%   ││  0x000000011153fae4: cmp    $0x237565,%edi                 ;   {metadata('com/nikoskatsanos/benchmarks/loops/SingleElementLoopBenchmark$Dispatcher$1')}
          ││  0x000000011153faea: jne    0x000000011153fc92
  0.33%   ││  0x000000011153faf0: lea    (%r12,%r10,8),%r9              ;*invokeinterface size {reexecute=0 rethrow=0 return_oop=0}
          ││                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@7 (line 44)
          ││                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
          ││                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  0.09%   ││  0x000000011153faf4: mov    0x10(%r9),%r9d
  0.14%   ││  0x000000011153faf8: test   %r9d,%r9d
          ╰│  0x000000011153fafb: jle    0x000000011153fab7             ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@12 (line 44)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
           │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  3.98%    │  0x000000011153fafd: lea    (%r12,%r8,8),%rdi              ;*getfield dispatcher {reexecute=0 rethrow=0 return_oop=0}
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@1 (line 28)
           │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  0.06%    │  0x000000011153fb01: xor    %r9d,%r9d                      ;*aload_0 {reexecute=0 rethrow=0 return_oop=0}
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@15 (line 45)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
           │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  0.09%    │  0x000000011153fb04: mov    0x8(%r12,%r10,8),%esi          ; implicit exception: dispatches to 0x000000011153ff4e
  0.06%    │  0x000000011153fb09: cmp    $0x237565,%esi                 ;   {metadata('com/nikoskatsanos/benchmarks/loops/SingleElementLoopBenchmark$Dispatcher$1')}
  0.00%    │  0x000000011153fb0f: jne    0x000000011153fcc2
  3.93%    │  0x000000011153fb15: lea    (%r12,%r10,8),%rax             ;*invokeinterface get {reexecute=0 rethrow=0 return_oop=0}
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@20 (line 45)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
           │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  0.06%    │  0x000000011153fb19: mov    0x10(%rax),%r10d               ;*getfield size {reexecute=0 rethrow=0 return_oop=0}
           │                                                            ; - java.util.ArrayList::get@2 (line 458)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@20 (line 45)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
           │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  0.11%    │  0x000000011153fb1d: test   %r10d,%r10d
           │  0x000000011153fb20: jl     0x000000011153fcf6             ;*invokestatic checkIndex {reexecute=0 rethrow=0 return_oop=0}
           │                                                            ; - java.util.Objects::checkIndex@3 (line 372)
           │                                                            ; - java.util.ArrayList::get@5 (line 458)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@20 (line 45)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
           │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  0.28%    │  0x000000011153fb26: cmp    %r10d,%r9d
  0.00%    │  0x000000011153fb29: jae    0x000000011153fc1c
  3.97%    │  0x000000011153fb2f: mov    0x14(%rax),%r10d               ;*getfield elementData {reexecute=0 rethrow=0 return_oop=0}
           │                                                            ; - java.util.ArrayList::elementData@1 (line 442)
           │                                                            ; - java.util.ArrayList::get@11 (line 459)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@20 (line 45)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
           │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  0.05%    │  0x000000011153fb33: mov    %r9d,%ebp                      ;*invokestatic checkIndex {reexecute=0 rethrow=0 return_oop=0}
           │                                                            ; - java.util.Objects::checkIndex@3 (line 372)
           │                                                            ; - java.util.ArrayList::get@5 (line 458)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@20 (line 45)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
           │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  0.08%    │  0x000000011153fb36: mov    0xc(%r12,%r10,8),%esi          ; implicit exception: dispatches to 0x000000011153ff62
  1.27%    │  0x000000011153fb3b: cmp    %esi,%ebp
           │  0x000000011153fb3d: jae    0x000000011153fc5a
  3.94%    │  0x000000011153fb43: shl    $0x3,%r10
  0.05%    │  0x000000011153fb47: mov    0x10(%r10,%rbp,4),%r9d         ;*aaload {reexecute=0 rethrow=0 return_oop=0}
           │                                                            ; - java.util.ArrayList::elementData@5 (line 442)
           │                                                            ; - java.util.ArrayList::get@11 (line 459)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@20 (line 45)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
           │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  1.71%    │  0x000000011153fb4c: mov    0x8(%r12,%r9,8),%r10d          ; implicit exception: dispatches to 0x000000011153ff72
 17.85%    │  0x000000011153fb51: cmp    $0x237522,%r10d                ;   {metadata('com/nikoskatsanos/benchmarks/loops/SingleElementLoopBenchmark$Listener')}
  0.00%    │  0x000000011153fb58: jne    0x000000011153fef6             ;*checkcast {reexecute=0 rethrow=0 return_oop=0}
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@25 (line 45)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
           │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  3.79%    │  0x000000011153fb5e: mov    %rdi,0x18(%rsp)
  0.02%    │  0x000000011153fb63: mov    %r8d,0x10(%rsp)
  0.02%    │  0x000000011153fb68: mov    %rbx,0x8(%rsp)
  0.19%    │  0x000000011153fb6d: mov    %r11,0x50(%rsp)
  3.95%    │  0x000000011153fb72: mov    %rdx,0x60(%rsp)
  0.02%    │  0x000000011153fb77: mov    %rcx,(%rsp)
  0.03%    │  0x000000011153fb7b: mov    %r13,0x58(%rsp)
  0.36%    │  0x000000011153fb80: mov    %rdx,%rsi
  3.78%    │  0x000000011153fb83: mov    $0x1,%edx
  0.01%    │  0x000000011153fb88: vzeroupper
  4.05%    │  0x000000011153fb8b: callq  0x00000001114c2900             ; ImmutableOopMap{[80]=Oop [88]=Oop [96]=Oop [0]=Oop [16]=NarrowOop [24]=Oop }
           │                                                            ;*invokevirtual consume {reexecute=0 rethrow=0 return_oop=0}
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Listener::performAction@2 (line 53)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@29 (line 45)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
           │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
           │                                                            ;   {optimized virtual_call}
  0.98%    │  0x000000011153fb90: mov    0x10(%rsp),%r8d
  3.61%    │  0x000000011153fb95: mov    0x10(%r12,%r8,8),%r10d         ;*getfield singleListenerList {reexecute=0 rethrow=0 return_oop=0}
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@4 (line 44)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
           │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  0.24%    │  0x000000011153fb9a: mov    0x8(%r12,%r10,8),%r9d          ; implicit exception: dispatches to 0x000000011153ff9e
  0.74%    │  0x000000011153fb9f: inc    %ebp                           ;*iinc {reexecute=0 rethrow=0 return_oop=0}
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@32 (line 44)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
           │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  0.04%    │  0x000000011153fba1: cmp    $0x237565,%r9d                 ;   {metadata('com/nikoskatsanos/benchmarks/loops/SingleElementLoopBenchmark$Dispatcher$1')}
  0.00%    │  0x000000011153fba8: jne    0x000000011153fd36
  3.60%    │  0x000000011153fbae: lea    (%r12,%r10,8),%r11             ;*invokeinterface size {reexecute=0 rethrow=0 return_oop=0}
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@7 (line 44)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
           │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  0.11%    │  0x000000011153fbb2: mov    0x10(%r11),%r9d
  0.35%    │  0x000000011153fbb6: cmp    %r9d,%ebp
           ╰  0x000000011153fbb9: jge    0x000000011153fa9f             ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
                                                                        ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@12 (line 44)
                                                                        ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
                                                                        ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)

As I said I am not really able to read/interpret assembly code, but in between the lines I could see that:

  • The loop was indeed unrolled
  • A penalty was paid to cast the item to the expected type (17.85% of the CPU instructions)
  • A penalty was paid to fetch the item from the list, underlying array

In order to get some advice from someone knowledgable on this I posted the below question on StackOverflow. The answer is pretty comprehensive, as the person who answered is one of the most prominent names in JVM community

StackOveflow: Java Method Direct Invocation vs Single Element Loop

Conclusion/Observations

In summary:

  • The loop was indeed unrolled, as expected and as seen from the assembly code
  • The main penalty paid is for fetching the element from the list and casting it to the expected type
  • Some cost is also because of checks performed on the data container itself (i.e. size)
  • In general the extra cost been paid is memory access cost, rather than CPU instructions cost

As seen in the SO answer, Andrei makes the point that invoking the object's method from inside the loop is not ~2,5 times slower, but rather 3 ns slower, if we look it from a perspective of latency (ns/op) rather than throughput (ops/ns). This is a valid point, but I am not sure If i aggree 100%, as in some applications, depending on the nature, that extra cost will actually translate in ~2,5.

Finally, I have added in the JMH Benchmark test, tests for different data container types:

  • Array
  • List
  • Set

Observing the numbers of those, and as expected, an array is faster than the rest. The array is typed, hence the casting cost is not paid. The array underlying an array list is of type Object, hence the need for casting to the list's type.

Managing JDKs in MacOS

With the increasing number of JDK builds and the more frequent release candence, I found it hard to keep track what I had installed in my MacOS and switch between them on the fly.

Even in 2019 my preferred version of Java is 1.8, probably because this is the version I am using at my work. But depending on the occasion I find myself experimenting with newer features from later versions, or even from experimental builds:

  • JShell from Java 9 onwards
  • EpsilonGC
  • The use of var since Java 10
  • Value types in Project Valhalla builds
  • etc…

In addition, nowadays on my personal computer I mainly use Java builds from the AdoptOpenJDK project. But there are other builds which I have installed on my MacOS to try out:

Hence I spent some putting together some bash functions that give me a hand managing and switching between those versions.

For the complete bash script see here, but the highlights are:

  • List JKDs
  • List Different JDK builds
  • Set JDK to a specific version/vendor

Note that java_home=/usr/libexec/java_home

Identifying a high-CPU Java Thread

High CPU utilization Java application

Every now and then we find ourselves in situations when a single Java process is consuming a high percentage of CPU.

After investigating and ruling out high CPU because of continuous GC cycles or other pathogenic reasons, we find ourselves in a situation that we need to identify the business logic that causes those CPU spikes. An easy way of doing so is try to identify the thread(s) that is consuming most of the CPU and try to pinpoint the caveat.

There are a few utilities (i.e. top, htop) that let us see a process as a tree along with the threads that live inside that process' space. After identifying the thread's ID, it is pretty easy to translate the ID to its HEX value and identify the actual thread in a Java application (i.e. by taking a thread dump).

Example

As an example the following Java program, uses two application thread's (main thread and a thread created by the user), one thread is spinning forever generating random values. The main thread occasionally, reads those random values.

https://github.com/nikkatsa/nk-playground/blob/master/nk-dummies/src/main/java/com/nikoskatsanos/spinningthread/SpinningThread.java

It is expected that this would be a high CPU utilization application (see above image).

Find the Rogue Thread

After identifying the Java program's PID (i.e. with jps or something like ps, top, htop), we can run an application like htop as below

htop -p${PID}

A user can view that isolated process along with its threads. Usually htop would show user space threads by default, but if not is easy to do by going to the setup page and selecting the appropriate option on Setup -> Display Options.

Then a user should see an image like the below.

That shows the application's PID along with its threads, reporting the metrics (CPU, Memory etc) for each thread. From there someone can easily identify that thread 12820 is consuming a great percentage of CPU, hence it should be our caveat.

Translating Thread's ID to HEX

The next step would be to translate that thread's decimal ID to its HEX value, which is: 0x3214

Getting a thread dump

Knowing the thread's HEX value, the user can take a thread dump and easily locate the thread and its stack trace.

Full thread dump Java HotSpot(TM) Client VM (25.65-b01 mixed mode):

"Attach Listener" #8 daemon prio=9 os_prio=0 tid=0x64900800 nid=0x3340 waiting on condition [0x00000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
        - None

"Spinner" #7 daemon prio=5 os_prio=0 tid=0x6442a800 nid=0x3214 runnable [0x6467d000]
   java.lang.Thread.State: RUNNABLE
        at java.util.concurrent.ThreadLocalRandom.nextDouble(ThreadLocalRandom.java:442)
        at com.nikoskatsanos.spinningthread.SpinningThread.spin(SpinningThread.java:16)
        at com.nikoskatsanos.spinningthread.SpinningThread$$Lambda$1/28014437.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
        - <0x659c2198> (a java.util.concurrent.ThreadPoolExecutor$Worker)

"Service Thread" #6 daemon prio=9 os_prio=0 tid=0x76183c00 nid=0x3212 runnable [0x00000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
        - None

"C1 CompilerThread0" #5 daemon prio=9 os_prio=0 tid=0x76180c00 nid=0x3211 waiting on condition [0x00000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
        - None

"Signal Dispatcher" #4 daemon prio=9 os_prio=0 tid=0x7617f000 nid=0x3210 runnable [0x00000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
        - None

"Finalizer" #3 daemon prio=8 os_prio=0 tid=0x76162000 nid=0x320f in Object.wait() [0x64f9c000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x65806400> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
        - locked <0x65806400> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164)
        at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)

   Locked ownable synchronizers:
        - None
"C1 CompilerThread0" #5 daemon prio=9 os_prio=0 tid=0x76180c00 nid=0x3211 waiting on condition [0x00000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
        - None

"Signal Dispatcher" #4 daemon prio=9 os_prio=0 tid=0x7617f000 nid=0x3210 runnable [0x00000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
        - None

"Finalizer" #3 daemon prio=8 os_prio=0 tid=0x76162000 nid=0x320f in Object.wait() [0x64f9c000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x65806400> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
        - locked <0x65806400> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164)
        at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)

   Locked ownable synchronizers:
        - None

"Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x76160800 nid=0x320e in Object.wait() [0x64fec000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x65805ef8> (a java.lang.ref.Reference$Lock)
        at java.lang.Object.wait(Object.java:502)
        at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:157)
        - locked <0x65805ef8> (a java.lang.ref.Reference$Lock)

   Locked ownable synchronizers:
        - None

"main" #1 prio=5 os_prio=0 tid=0x76107400 nid=0x320c waiting on condition [0x762b1000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at java.lang.Thread.sleep(Thread.java:340)
        at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:386)
        at com.nikoskatsanos.spinningthread.SpinningThread.main(SpinningThread.java:40)

   Locked ownable synchronizers:
        - None

"VM Thread" os_prio=0 tid=0x7615d400 nid=0x320d runnable

"VM Periodic Task Thread" os_prio=0 tid=0x76185c00 nid=0x3213 waiting on condition

JNI global references: 310

The nid value (nid=0x3214) should match the HEX value of the thread's decimal ID

As seen, in the above case is obvious that thread with name 'Spinner' is the high CPU utilization thread we are looking for. After this point the user can investigate the application's logic and determine the root cause.

Introduction

I use Vim quite often in my day to day activities. I do not use it to write code on it, but I tend to find myself logged into a linux host, where I need to edit or update files. Vim is my editor of choice for doing that.

I am not really an expert on Vim, and I tend to use a particular set of commands, hence I decided to summarize the most common commands and shortcuts I use and have this blog post as a cheatsheet whenever I want to refresh my memory.

VIM Command Categories

I tried to group the commands I use into categories like Navigation, Edit, Search and Options.

Navigating in VIM can be done using the arrows. This navigates the cursor wherever the user wants. However, as VIM is really optimized to do everything with keyboard, in the most efficient way, navigation can also be achieved with keys 'h', 'j', 'k', 'l'. Those keys map nicely into the user's fingers, without the need to move your hand to reach the arrows.

Another very common function I find myself doing is going to lines. Someone can switch on line number by :set numbers (see options section). Moving directly to a line is as simple as ${lineNumber}G, which can be read as 'go to line ${lineNumber}'.

Additionally moving to absolute positions like the file's start or end, or to the begginning or end of a line is also fairly common.

Moving one word at a time, or a continuous block (like a camel case word) can be achieved with 'w' and 'W' respectively

Edit

I have grouped actions like inserting characters, deleting character, copying and pasting under the Edit category

Operations like copying and pasting are fairly common when editing files on a unix host

Search commands are very useful either in VIM or in less. Fortunately enough both tools have the same mechanism for searching

VIM can highlight any found search terms by setting the option :set hlsearch (see options section)

Options

VIM is highly customizable. User can set a profile with the desired properties and VIM behaviour at .vimrc file. The most common options I personally used are the below.