Skip to content

java

Building A Logo Turtle App With Antlr And JavaFX

This post is about building a very simple application, implementing a subset of the Logo Programming Language along with a UI that visualizes the Logo programms entered by users.
The technologies used were Antlr for creating a parser for a subset of the Logo rules and also JavaFX for creating a UI allowing users to enter Logo programs and providing a drawing visualization of those programs.

The Logo Language

Logo is an educational language, mainly targeted to younger ages. It is a language that I personally had some interactions with back in the junior high school days. It effectively provides a grammar of movement rules (i.e. forward, back, right 90o) along with some control flow (i.e. repeat) allowing the user to produce a set of commands. Those commands, coupled with a visualization software can draw vector graphics, or coupled with robotic devices can move the robot around.

It used to be a nice language for educating kids on programming, nowadays however there are better more advanced ones like Scratch. I picked Logo however, for its simple grammar as my main goal was to refresh some Antlr knowledge, rather than build a real application.

Antlr

Antlr is a great tool for parsing structured text (i.e. imagine regular expressions on steroids). It is a tool that can be used to parse grammatical rules and build applications on top of its features. A common approach is to build custom DSLs using Antlr. I will not describe Antlr in length as the official website has lots of good documentation and also I am no expert on Antlr. Additionally, the book Definitive-ANTLR-4-Reference written by its creator is a good resource.

In this application, effectively I have defined my own set of Logo rules using Antlr's grammar and I relied on Antlr's parsing capabilities to evaluate the Logo programs and give me callbacks of the Logo commands encountered.

JavaFX

Most people would be familiar with JavaFX. Effectively is the next attempt after Java Swing on creating the machinery for building (modern?) UIs using Java. My UI skills are pretty bad, hence I wanted something to force me build a UI. I picked JavaFX, instead of something more standard like HTML5+JS Framework, as I had done some Java Swing in the past and wanted to try JavaFX out of curiosity mainly.

Even though JavaFX is very feature rich and the programming model resembles part of Java Swing and part of C# WPF (which i was a bit familiar back in 2011) I was not impressed by it. It felt cumbersome in ways that I thought the whole programming model was getting into my way, maybe because I am not familiar with it, also maybe because is just not a great programming model justifying the lack of widespread adoption.

Defining The Antlr Grammar

As mentioned above, Antlr needs a grammar definition, which consist of parser and lexer rules. The lexer rules are used to extract tokens out of the text, and the parser rules for extracting meaningful statements.

I decided to only go with a small subset of the Logo features, so the below would be supported:

  • Moving forward
  • Moving backwards
  • Turning left/right
  • Allowing for pen up/down functionality, meaning if pen is up no drawing should appear even if the 'turtle' moves around

Those rules, translated into an Antlr grammar look like Logo.g4

Someone can notice that the above grammar just defines the keyworks (i.e. forward, back, right etc) as lexer rules (a.k.a tokens) and programmar expressions (i.e. forward 50) as parser rules. In the application layer, Antlr generates stubs of listeners for the parser rules, which can be implemented and the user gets callbacks on those rules. Then users can write their logic on top of that.

It is easy to see how helpful Antlr is, doing all the heavylifting for the user. Someone just need to extend the already generated listener, which propagates the events to the user's code.

Wiring Parser Callbacks

As we are mainly interested in the grammar rules that define Logo actions, we can only implement those callbacks. The class that deals with the callback can be made UI agnostic and act as a driver to the underlying implementation. For example we could have various implementations of how to visualize the Logo program:

  • A JavaFX UI
  • A Swing UI
  • A plain standard out program

The below implementation deals with that

public class LogoDriver extends LogoBaseListener {

    private final TurtlePainter painter;

    public LogoDriver(TurtlePainter painter) {
        this.painter = painter;
    }

    @Override
    public void exitForward(final ForwardContext ctx) {
        this.painter.forward(Integer.parseInt(ctx.getChild(1).getText()));
    }

    @Override
    public void exitBack(final BackContext ctx) {
        this.painter.back(Integer.parseInt(ctx.getChild(1).getText()));
    }

    @Override
    public void exitRight(final RightContext ctx) {
        this.painter.right(Integer.parseInt(ctx.getChild(1).getText()));
    }

    @Override
    public void exitLeft(final LeftContext ctx) {
        this.painter.left(Integer.parseInt(ctx.getChild(1).getText()));
    }

    @Override
    public void exitSet(final SetContext ctx) {
        final String[] point = ctx.POINT().getText().split(",");
        final int x = Integer.parseInt(point[0]);
        final int y = Integer.parseInt(point[1]);
        this.painter.set(x, y);
    }

    @Override
    public void exitPenUp(final PenUpContext ctx) {
        this.painter.penUp();
    }

    @Override
    public void exitPenDown(final PenDownContext ctx) {
        this.painter.penDown();
    }

    @Override
    public void exitClearscreen(ClearscreenContext ctx) {
        this.painter.cls();
    }

    @Override
    public void exitResetAngle(ResetAngleContext ctx) {
        this.painter.resetAngle();
    }

    @Override
    public void exitProg(ProgContext ctx) {
        this.painter.finish();
    }
}

The TurtlePainter can be anything, even a program that records the program's commands and asserts them, like a JUnit spy.

The JavaFX UI

In our case, the TurtlePainter is a class that translates the commands into JavaFX constructs and delegates to the UI thread to draw those constrcuts. For example the implementation for the forward command looks like:

    @Override
    public void forward(int points) {
        JavaFXThreadHelper.runOrDefer(() -> {
            final double radian = this.toRadian(this.direction);
            final double x = this.turtle.getCenterX() + points * Math.cos(radian);
            final double y = this.turtle.getCenterY() - points * Math.sin(radian);

            this.validateBounds(x, y);

            this.moveTurtle(x, y);
        });
    }

    private void moveTurtle(final double x, final double y) {
        JavaFXThreadHelper.runOrDefer(() -> {

            final Path path = new Path();
            path.getElements().add(new MoveTo(this.turtle.getCenterX(), this.turtle.getCenterY()));
            path.getElements().add(new LineTo(x, y));

            final PathTransition pathTransition = new PathTransition();
            pathTransition.setDuration(Duration.millis(this.animationDurationMs));
            pathTransition.setPath(path);
            pathTransition.setNode(this.turtle);

            if (this.isPenDown) {
                final Line line = new Line(this.turtle.getCenterX(), this.turtle.getCenterY(), x, y);
                pathTransition.setOnFinished(onFinished -> this.canvas.getChildren().add(line));
            }

            animation.getChildren().add(pathTransition);

            this.paintTurtle(x, y);
        });
    }

Effectively drawing a line to the UI.

A simple Logo program that draws "HELLO WORLD" in the screen can be found here. The result for this one would look like:

The Source Code

Source code is checked into github.

Quite a few enhancements can be made, both on the UI side, but also at the language level side:

  • Implement Logo flow control (i.e. loops)
  • Make the turtle, an actual turtle image, by also showing its facing direction
  • etc…

Feel free to fork or send a PR for any addition :)

For Loops, Allocations and Escape Analysis

For Java applications in certain domains it is truly important that the creation of objects/garbage stays at a minimum. Those applications usually cannot afford GC pauses, hence they use specific techniques
and methodologies to avoid any garbage creation. One of those techniques has to do with iterating over a collection or an array of items. The preferred way is to use the classic for loop. The enhanced-for loop
is avoided as 'it creates garbage', by using the collection's Iterator under the cover.

In order to prove this point i was playing around with loops as i wanted to better understand the differences and measure the amount of garbage that is been created by using the enhanced-for loop, which
arguably is a better, more intuitive syntax.

Prior on experimenting on this, I had (falsely?) made some assumptions:

  • Using a normal for loop over an array or a collection it does not create any new allocations
  • Using an enhanced-for loop over an array(?) or a collection it does allocate
  • Using an enhanced-for loop over an array or a collection of primitives, by accidentally autoboxing the primitive value, it ends up in a pretty high rate of new objects creation

In order to better understand the differencies and especially the fact that an array does not have an iterator, hence how the enhanced-for loop works, I followed the below steps.

Step 1: Enhanced-for Loop Under The Cover

An enhanced-for loop is just syntactic sugar, but what it actually results into, when used for an array and when used on a collection of items?

The answer to this can be found in the Java Language Specification.

The main two points from the above link are:

If the type of Expression is a subtype of Iterable, then the translation is as follows.
If the type of Expression is a subtype of Iterable for some type argument X, then let I be the type java.util.Iterator; otherwise, let I be the raw type java.util.Iterator.
The enhanced for statement is equivalent to a basic for statement of the form:

for (I #i = Expression.iterator(); #i.hasNext(); ) {
    {VariableModifier} TargetType Identifier =
        (TargetType) #i.next();
    Statement
}

and

Otherwise, the Expression necessarily has an array type, T[].
Let L1 … Lm be the (possibly empty) sequence of labels immediately preceding the enhanced for statement.
The enhanced for statement is equivalent to a basic for statement of the form:

T[] #a = Expression;
L1: L2: ... Lm:
for (int #i = 0; #i < #a.length; #i++) {
    {VariableModifier} TargetType Identifier = #a[#i];
    Statement
}

From the above someone can observe that indeed the Iterator is used on the enhanced-for loop on collections. However, for an array, the enhanced-for loop is just syntactic sugar which is equivalent to a normal for loop.

After understanding how the JVM is actually implemening the enhanced-for loop on different use cases our assumptions have changed:

  • Using a normal for loop over an array or a collection it does NOT create any new allocations
  • Using an enhanced-for loop over an array it does NOT create any new allocations
  • Using an enhanced-for loop over a collection it does allocate
  • Using an enhanced-for loop over an array or a collection of primitives, by accidentally autoboxing the primitive value, it ends up in a pretty high rate of new objects creation

Step 2: Defining The Test

In order to test the different scenarios I have created a very simple test which can be seen here

The test itself is very simple, the main points to notice are:

  • The test creates a static array and a static ArrayList and prepopulates them with 100,000 integers. In the case of the array, those are primitives, but in the case of the collection as we use plain ArrayList those are actuall Integer objects
  • The test executes the different for loop example scenarios 1,000,000 times
  • The memory used is read before the iterations start and is compared throughout the execution (every 100 invocations) of the program in order to determine if the memory profile has changed
  • The test scenarios include:
    • A for loop over an array
    • An enhanced-for loop over an array
    • An enhanced-for loop over an array, by also autoboxing the elements
    • A for loop over a collection
    • An enhanced-for loop over a collection
    • An iterator based for loop over a collection, replicating the behaviour of enhanced-for loop's syntactic sugar

Step 3: Running The Test

We ran the test with the below setup:

  • OS: MacOS Catalina (10.15.3), Core i5 @2.6Hz, 8GB DDR3
  • JDK: openjdk version "13.0.2" 2020-01-14
  • JVM_OPTS: -Xms512M -Xmx512M -XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC

We use EpsilonGC in order to eliminate any garbage collection and let the memory to just increase.

When running the test, some scenarios were easy to verify according to our expectations:

  • A for loop over an array or a collection, does not create any allocations
  • An enhanced for loop over an array does not create any allocations
  • An enhanced for loop over an array with autoboxing, it is indeed creating new objects

However, the rest of scenarios and the assumption that an enhanced-for loop over a collection will allocate a new Iterator on every loop could not be proved by running the above test, with the above JVM properties. No matter what
the memory profile was steady. No new allocations were taking place on the heap.

First step of the investigation was to make sure that the byte code indicates that a new object gets created. Below is the bytecode, which can be used to verify that a call to get the iterator is taking place in line 5:

  private static long forEachLoopListIterator();
    Code:
       0: lconst_0
       1: lstore_0
       2: getstatic     #5                  // Field LIST_VALUES:Ljava/util/List;
       5: invokeinterface #9,  1            // InterfaceMethod java/util/List.iterator:()Ljava/util/Iterator;
      10: astore_2
      11: aload_2
      12: invokeinterface #10,  1           // InterfaceMethod java/util/Iterator.hasNext:()Z
      17: ifeq          39
      20: lload_0
      21: aload_2
      22: invokeinterface #11,  1           // InterfaceMethod java/util/Iterator.next:()Ljava/lang/Object;
      27: checkcast     #8                  // class java/lang/Integer
      30: invokevirtual #4                  // Method java/lang/Integer.intValue:()I
      33: i2l
      34: ladd
      35: lstore_0
      36: goto          11
      39: lload_0
      40: lreturn

As we are using an ArrayList the next step is to see what the call to #iterator() is doing. It is indeed creating a new iterator object as can be seen in ArrayList source code

    public Iterator<E> iterator() {
        return new Itr();
    }

Looking at the above, the results that we are getting with a steady memory profile do not make much sense. Something else is definitely going on. It might be that the test is wrong (i.e. some code is removed by the JIT as returned value of that block is never used).
This should not be happenning as the returned value of all the methods that exercise the loops is used to take a decision further down on the program, hence the loops must be executed.

My final thinking was the 'unlikely' scenario that the objects were been placed on the stack. It is known that Hotspot performs this kind of optimizations, by using the output of Escape Analysis.
To be honest I have never seen it happening (or at least I never had the actual time to verify it was indeed happening) until now.

Step 4: Running Without Escape Analysis

The easiest and fastest way to verify the above assumption, that Escape Analysis was feeding into JIT and was causing the objects to get allocated on the stack, is to turn off Escape Analysis. This can be done by adding -XX:-DoEscapeAnalysis in our JVM options.

Indeed, by running the same test again this time we can see that the memory profile for an enhanced-for loop over a collection is steadily increasing. The Iterator objects, created from the ArrayList#iterator() are been allocated on the heap on each loop.

Conclusion

At least for myself the above finding was kind of interesting. In many occasions, mainly because of lack of time, we just make assumptions and empirically follow practises that are "known to be working". Especially for people that are working in a delivery oriented environment, without the luxury to perform some research I would think this is normal. It is interesting though to actually do some research from time to time and try to prove or better understand a point.

Finally, it is worth saying that the above behaviour was observed in an experiment, rather than in actual code. I would imagine the majority of cases in a production system to not exhibit this behaviour (i.e. allocating on the stack), but the
fact that JIT is such a sophisticated piece of software is very encouraging, as it can proactive optimize out code without us realizing the extra gains.

Java, The Cost of a Single Element Loop

In quite a few cases I have seen myself designing code with listeners and callbacks. It is quite common for a class that emits events, to expose an API to attach listener(s) to it. Those listeners are usually stored in a data structure (see List, Set, Array) and when an event is about to be dispatched the listeners are iterated in a loop and the appropriate callback is called.

Something along the lines of:

public class EventDispatcher {

        private final List<Listener> listeners = new ArrayList<>();

        public void dispatchEvent() {
            final MyEvent event = new MyEvent();
            for (Listener listener : this.listeners) {
                listener.onEvent(event);
            }
        }
        public void attachListener(final Listener listener) {
            this.listeners.add(listener);
        }

        public void removeListener(final Listener listener) {
            this.listeners.remove(listener);
        }
    }

    public static class Listener implements EventListener {

        void onEvent(final MyEvent myEvent) {
            // do staff
        }
    }

    public static class MyEvent {

    }

In many cases I have observed that despite the fact that the class is desinged to accept many listeners, the true is actually that just one listener is attached in the majority of the cases.

Hence I wanted to measure the performance penalty paid in case the class had just one listener vs if the class was initially designed to accept just one listener.

In essence I wanted to check the performance impact on the below two cases.

private Listener listener;
        private final List<Listener> singleElementArray = new ArrayList<Listener>(){
            {add(new Listener());}
        };

        public void dispatch() {
            this.listener.onEvent(new MyEvent());
        }

        public void dispatchInLoop() {
            for (int i = 0; i < 1; i++) {
                this.singleElementArray.get(i).onEvent(new MyEvent());
            }
        }

Assumptions Made Prior To Testing

Before creating a benchmark for the above, I made some assumptions:

  • I assumed the single element (single listener in a data container) loop would be unrolled
  • I (wrongly) assumed that the performane cost will not be significant. As effectively with the loop unrolled I would think the native code produced would more or less look close enough

JMH Benchmark

In order to test my assumptions I created the below benchmark:

SingleElementLoopBenchmark.java

Initial Observations

To my surprise I found out that an invocaiton on a single element list was about ~2,5 slower, based on the below throughput numbers:

Benchmark                                                          Mode  Cnt   Score   Error   Units
          SingleElementLoopBenchmark.directInvocation                       thrpt   10   0.317 ± 0.022  ops/ns
          SingleElementLoopBenchmark.singleElementListLoopInvocation        thrpt   10   0.114 ± 0.010  ops/ns

I couldn't really understand why and the above seemed a bit too far from my expecations/assumptions.

The first thing that I verified with JVM argument -XX:+PrintCompilation was that both methods were compiled with C2 compiler, which was the case.

I also tried to print the assembly code with -XX:+PrintAssembly but I couldn't really read/interpret the assembly code.

Resorting to Social Media

I ended up posting a tweet about my findings and asking some pointer on where/how to look for explanations on what I was observing. The answer I got was to try to find the hot methods by using something like perfasm, which would tie the assembly output to the hottest methods of my benchmark.

Which I did with -prof dtraceasm (The benchmark was running on a Mac that's why I used dtrace). The output was the below:

Direct Invocation

9.56%  ↗  0x000000010b73c950: mov    0x40(%rsp),%r10
  1.00%  │  0x000000010b73c955: mov    0xc(%r10),%r10d                ;*getfield dispatcher {reexecute=0 rethrow=0 return_oop=0}
         │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::directInvocation@1 (line 23)
         │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@17 (line 121)
  0.17%  │  0x000000010b73c959: mov    0xc(%r12,%r10,8),%r11d         ; implicit exception: dispatches to 0x000000010b73ca12
 11.18%  │  0x000000010b73c95e: test   %r11d,%r11d
  0.00%  │  0x000000010b73c961: je     0x000000010b73c9c9             ;*invokevirtual performAction {reexecute=0 rethrow=0 return_oop=0}
         │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invoke@5 (line 40)
         │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::directInvocation@5 (line 23)
         │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@17 (line 121)
 10.69%  │  0x000000010b73c963: mov    %r9,(%rsp)
  0.65%  │  0x000000010b73c967: mov    0x38(%rsp),%rsi
  0.00%  │  0x000000010b73c96c: mov    $0x1,%edx
  0.14%  │  0x000000010b73c971: xchg   %ax,%ax
 10.08%  │  0x000000010b73c973: callq  0x000000010b6c2900             ; ImmutableOopMap{[48]=Oop [56]=Oop [64]=Oop [0]=Oop }
         │                                                            ;*invokevirtual consume {reexecute=0 rethrow=0 return_oop=0}
         │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Listener::performAction@2 (line 53)
         │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invoke@5 (line 40)
         │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::directInvocation@5 (line 23)
         │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@17 (line 121)
         │                                                            ;   {optimized virtual_call}
  1.44%  │  0x000000010b73c978: mov    (%rsp),%r9
  0.19%  │  0x000000010b73c97c: movzbl 0x94(%r9),%r8d                 ;*ifeq {reexecute=0 rethrow=0 return_oop=0}
         │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@30 (line 123)
  9.77%  │  0x000000010b73c984: mov    0x108(%r15),%r10
  0.99%  │  0x000000010b73c98b: add    $0x1,%rbp                      ; ImmutableOopMap{r9=Oop [48]=Oop [56]=Oop [64]=Oop }
         │                                                            ;*ifeq {reexecute=1 rethrow=0 return_oop=0}
         │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@30 (line 123)
  0.02%  │  0x000000010b73c98f: test   %eax,(%r10)                    ;   {poll}
  0.28%  │  0x000000010b73c992: test   %r8d,%r8d
  0.00%  ╰  0x000000010b73c995: je     0x000000010b73c950             ;*aload_1 {reexecute=0 rethrow=0 return_oop=0}

Single Element Loop Invocation

         ╭    0x000000011153fa9d: jmp    0x000000011153fad6
  0.19%  │ ↗  0x000000011153fa9f: mov    0x58(%rsp),%r13
  3.55%  │ │  0x000000011153faa4: mov    (%rsp),%rcx
  0.09%  │ │  0x000000011153faa8: mov    0x60(%rsp),%rdx
  0.22%  │ │  0x000000011153faad: mov    0x50(%rsp),%r11
  0.17%  │ │  0x000000011153fab2: mov    0x8(%rsp),%rbx                 ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
         │ │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@12 (line 44)
         │ │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
         │ │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  3.55%  │↗│  0x000000011153fab7: movzbl 0x94(%r11),%r8d                ;*goto {reexecute=0 rethrow=0 return_oop=0}
         │││                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@35 (line 44)
         │││                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
         │││                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  0.16%  │││  0x000000011153fabf: mov    0x108(%r15),%r10
  0.28%  │││  0x000000011153fac6: add    $0x1,%rbx                      ; ImmutableOopMap{r11=Oop rcx=Oop rdx=Oop r13=Oop }
         │││                                                            ;*ifeq {reexecute=1 rethrow=0 return_oop=0}
         │││                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@30 (line 123)
  0.19%  │││  0x000000011153faca: test   %eax,(%r10)                    ;   {poll}
  4.00%  │││  0x000000011153facd: test   %r8d,%r8d
         │││  0x000000011153fad0: jne    0x000000011153fbe9             ;*aload_1 {reexecute=0 rethrow=0 return_oop=0}
         │││                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@33 (line 124)
  0.07%  ↘││  0x000000011153fad6: mov    0xc(%rcx),%r8d                 ;*getfield dispatcher {reexecute=0 rethrow=0 return_oop=0}
          ││                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@1 (line 28)
          ││                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  0.22%   ││  0x000000011153fada: mov    0x10(%r12,%r8,8),%r10d         ;*getfield singleListenerList {reexecute=0 rethrow=0 return_oop=0}
          ││                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@4 (line 44)
          ││                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
          ││                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
          ││                                                            ; implicit exception: dispatches to 0x000000011153ff2a
  0.21%   ││  0x000000011153fadf: mov    0x8(%r12,%r10,8),%edi          ; implicit exception: dispatches to 0x000000011153ff3e
  4.39%   ││  0x000000011153fae4: cmp    $0x237565,%edi                 ;   {metadata('com/nikoskatsanos/benchmarks/loops/SingleElementLoopBenchmark$Dispatcher$1')}
          ││  0x000000011153faea: jne    0x000000011153fc92
  0.33%   ││  0x000000011153faf0: lea    (%r12,%r10,8),%r9              ;*invokeinterface size {reexecute=0 rethrow=0 return_oop=0}
          ││                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@7 (line 44)
          ││                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
          ││                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  0.09%   ││  0x000000011153faf4: mov    0x10(%r9),%r9d
  0.14%   ││  0x000000011153faf8: test   %r9d,%r9d
          ╰│  0x000000011153fafb: jle    0x000000011153fab7             ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@12 (line 44)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
           │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  3.98%    │  0x000000011153fafd: lea    (%r12,%r8,8),%rdi              ;*getfield dispatcher {reexecute=0 rethrow=0 return_oop=0}
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@1 (line 28)
           │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  0.06%    │  0x000000011153fb01: xor    %r9d,%r9d                      ;*aload_0 {reexecute=0 rethrow=0 return_oop=0}
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@15 (line 45)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
           │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  0.09%    │  0x000000011153fb04: mov    0x8(%r12,%r10,8),%esi          ; implicit exception: dispatches to 0x000000011153ff4e
  0.06%    │  0x000000011153fb09: cmp    $0x237565,%esi                 ;   {metadata('com/nikoskatsanos/benchmarks/loops/SingleElementLoopBenchmark$Dispatcher$1')}
  0.00%    │  0x000000011153fb0f: jne    0x000000011153fcc2
  3.93%    │  0x000000011153fb15: lea    (%r12,%r10,8),%rax             ;*invokeinterface get {reexecute=0 rethrow=0 return_oop=0}
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@20 (line 45)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
           │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  0.06%    │  0x000000011153fb19: mov    0x10(%rax),%r10d               ;*getfield size {reexecute=0 rethrow=0 return_oop=0}
           │                                                            ; - java.util.ArrayList::get@2 (line 458)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@20 (line 45)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
           │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  0.11%    │  0x000000011153fb1d: test   %r10d,%r10d
           │  0x000000011153fb20: jl     0x000000011153fcf6             ;*invokestatic checkIndex {reexecute=0 rethrow=0 return_oop=0}
           │                                                            ; - java.util.Objects::checkIndex@3 (line 372)
           │                                                            ; - java.util.ArrayList::get@5 (line 458)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@20 (line 45)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
           │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  0.28%    │  0x000000011153fb26: cmp    %r10d,%r9d
  0.00%    │  0x000000011153fb29: jae    0x000000011153fc1c
  3.97%    │  0x000000011153fb2f: mov    0x14(%rax),%r10d               ;*getfield elementData {reexecute=0 rethrow=0 return_oop=0}
           │                                                            ; - java.util.ArrayList::elementData@1 (line 442)
           │                                                            ; - java.util.ArrayList::get@11 (line 459)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@20 (line 45)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
           │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  0.05%    │  0x000000011153fb33: mov    %r9d,%ebp                      ;*invokestatic checkIndex {reexecute=0 rethrow=0 return_oop=0}
           │                                                            ; - java.util.Objects::checkIndex@3 (line 372)
           │                                                            ; - java.util.ArrayList::get@5 (line 458)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@20 (line 45)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
           │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  0.08%    │  0x000000011153fb36: mov    0xc(%r12,%r10,8),%esi          ; implicit exception: dispatches to 0x000000011153ff62
  1.27%    │  0x000000011153fb3b: cmp    %esi,%ebp
           │  0x000000011153fb3d: jae    0x000000011153fc5a
  3.94%    │  0x000000011153fb43: shl    $0x3,%r10
  0.05%    │  0x000000011153fb47: mov    0x10(%r10,%rbp,4),%r9d         ;*aaload {reexecute=0 rethrow=0 return_oop=0}
           │                                                            ; - java.util.ArrayList::elementData@5 (line 442)
           │                                                            ; - java.util.ArrayList::get@11 (line 459)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@20 (line 45)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
           │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  1.71%    │  0x000000011153fb4c: mov    0x8(%r12,%r9,8),%r10d          ; implicit exception: dispatches to 0x000000011153ff72
 17.85%    │  0x000000011153fb51: cmp    $0x237522,%r10d                ;   {metadata('com/nikoskatsanos/benchmarks/loops/SingleElementLoopBenchmark$Listener')}
  0.00%    │  0x000000011153fb58: jne    0x000000011153fef6             ;*checkcast {reexecute=0 rethrow=0 return_oop=0}
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@25 (line 45)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
           │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  3.79%    │  0x000000011153fb5e: mov    %rdi,0x18(%rsp)
  0.02%    │  0x000000011153fb63: mov    %r8d,0x10(%rsp)
  0.02%    │  0x000000011153fb68: mov    %rbx,0x8(%rsp)
  0.19%    │  0x000000011153fb6d: mov    %r11,0x50(%rsp)
  3.95%    │  0x000000011153fb72: mov    %rdx,0x60(%rsp)
  0.02%    │  0x000000011153fb77: mov    %rcx,(%rsp)
  0.03%    │  0x000000011153fb7b: mov    %r13,0x58(%rsp)
  0.36%    │  0x000000011153fb80: mov    %rdx,%rsi
  3.78%    │  0x000000011153fb83: mov    $0x1,%edx
  0.01%    │  0x000000011153fb88: vzeroupper
  4.05%    │  0x000000011153fb8b: callq  0x00000001114c2900             ; ImmutableOopMap{[80]=Oop [88]=Oop [96]=Oop [0]=Oop [16]=NarrowOop [24]=Oop }
           │                                                            ;*invokevirtual consume {reexecute=0 rethrow=0 return_oop=0}
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Listener::performAction@2 (line 53)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@29 (line 45)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
           │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
           │                                                            ;   {optimized virtual_call}
  0.98%    │  0x000000011153fb90: mov    0x10(%rsp),%r8d
  3.61%    │  0x000000011153fb95: mov    0x10(%r12,%r8,8),%r10d         ;*getfield singleListenerList {reexecute=0 rethrow=0 return_oop=0}
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@4 (line 44)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
           │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  0.24%    │  0x000000011153fb9a: mov    0x8(%r12,%r10,8),%r9d          ; implicit exception: dispatches to 0x000000011153ff9e
  0.74%    │  0x000000011153fb9f: inc    %ebp                           ;*iinc {reexecute=0 rethrow=0 return_oop=0}
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@32 (line 44)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
           │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  0.04%    │  0x000000011153fba1: cmp    $0x237565,%r9d                 ;   {metadata('com/nikoskatsanos/benchmarks/loops/SingleElementLoopBenchmark$Dispatcher$1')}
  0.00%    │  0x000000011153fba8: jne    0x000000011153fd36
  3.60%    │  0x000000011153fbae: lea    (%r12,%r10,8),%r11             ;*invokeinterface size {reexecute=0 rethrow=0 return_oop=0}
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@7 (line 44)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
           │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  0.11%    │  0x000000011153fbb2: mov    0x10(%r11),%r9d
  0.35%    │  0x000000011153fbb6: cmp    %r9d,%ebp
           ╰  0x000000011153fbb9: jge    0x000000011153fa9f             ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
                                                                        ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@12 (line 44)
                                                                        ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
                                                                        ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)

As I said I am not really able to read/interpret assembly code, but in between the lines I could see that:

  • The loop was indeed unrolled
  • A penalty was paid to cast the item to the expected type (17.85% of the CPU instructions)
  • A penalty was paid to fetch the item from the list, underlying array

In order to get some advice from someone knowledgable on this I posted the below question on StackOverflow. The answer is pretty comprehensive, as the person who answered is one of the most prominent names in JVM community

StackOveflow: Java Method Direct Invocation vs Single Element Loop

Conclusion/Observations

In summary:

  • The loop was indeed unrolled, as expected and as seen from the assembly code
  • The main penalty paid is for fetching the element from the list and casting it to the expected type
  • Some cost is also because of checks performed on the data container itself (i.e. size)
  • In general the extra cost been paid is memory access cost, rather than CPU instructions cost

As seen in the SO answer, Andrei makes the point that invoking the object's method from inside the loop is not ~2,5 times slower, but rather 3 ns slower, if we look it from a perspective of latency (ns/op) rather than throughput (ops/ns). This is a valid point, but I am not sure If i aggree 100%, as in some applications, depending on the nature, that extra cost will actually translate in ~2,5.

Finally, I have added in the JMH Benchmark test, tests for different data container types:

  • Array
  • List
  • Set

Observing the numbers of those, and as expected, an array is faster than the rest. The array is typed, hence the casting cost is not paid. The array underlying an array list is of type Object, hence the need for casting to the list's type.

Identifying a high-CPU Java Thread

High CPU utilization Java application

Every now and then we find ourselves in situations when a single Java process is consuming a high percentage of CPU.

After investigating and ruling out high CPU because of continuous GC cycles or other pathogenic reasons, we find ourselves in a situation that we need to identify the business logic that causes those CPU spikes. An easy way of doing so is try to identify the thread(s) that is consuming most of the CPU and try to pinpoint the caveat.

There are a few utilities (i.e. top, htop) that let us see a process as a tree along with the threads that live inside that process' space. After identifying the thread's ID, it is pretty easy to translate the ID to its HEX value and identify the actual thread in a Java application (i.e. by taking a thread dump).

Example

As an example the following Java program, uses two application thread's (main thread and a thread created by the user), one thread is spinning forever generating random values. The main thread occasionally, reads those random values.

https://github.com/nikkatsa/nk-playground/blob/master/nk-dummies/src/main/java/com/nikoskatsanos/spinningthread/SpinningThread.java

It is expected that this would be a high CPU utilization application (see above image).

Find the Rogue Thread

After identifying the Java program's PID (i.e. with jps or something like ps, top, htop), we can run an application like htop as below

htop -p${PID}

A user can view that isolated process along with its threads. Usually htop would show user space threads by default, but if not is easy to do by going to the setup page and selecting the appropriate option on Setup -> Display Options.

Then a user should see an image like the below.

That shows the application's PID along with its threads, reporting the metrics (CPU, Memory etc) for each thread. From there someone can easily identify that thread 12820 is consuming a great percentage of CPU, hence it should be our caveat.

Translating Thread's ID to HEX

The next step would be to translate that thread's decimal ID to its HEX value, which is: 0x3214

Getting a thread dump

Knowing the thread's HEX value, the user can take a thread dump and easily locate the thread and its stack trace.

Full thread dump Java HotSpot(TM) Client VM (25.65-b01 mixed mode):

"Attach Listener" #8 daemon prio=9 os_prio=0 tid=0x64900800 nid=0x3340 waiting on condition [0x00000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
        - None

"Spinner" #7 daemon prio=5 os_prio=0 tid=0x6442a800 nid=0x3214 runnable [0x6467d000]
   java.lang.Thread.State: RUNNABLE
        at java.util.concurrent.ThreadLocalRandom.nextDouble(ThreadLocalRandom.java:442)
        at com.nikoskatsanos.spinningthread.SpinningThread.spin(SpinningThread.java:16)
        at com.nikoskatsanos.spinningthread.SpinningThread$$Lambda$1/28014437.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
        - <0x659c2198> (a java.util.concurrent.ThreadPoolExecutor$Worker)

"Service Thread" #6 daemon prio=9 os_prio=0 tid=0x76183c00 nid=0x3212 runnable [0x00000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
        - None

"C1 CompilerThread0" #5 daemon prio=9 os_prio=0 tid=0x76180c00 nid=0x3211 waiting on condition [0x00000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
        - None

"Signal Dispatcher" #4 daemon prio=9 os_prio=0 tid=0x7617f000 nid=0x3210 runnable [0x00000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
        - None

"Finalizer" #3 daemon prio=8 os_prio=0 tid=0x76162000 nid=0x320f in Object.wait() [0x64f9c000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x65806400> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
        - locked <0x65806400> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164)
        at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)

   Locked ownable synchronizers:
        - None
"C1 CompilerThread0" #5 daemon prio=9 os_prio=0 tid=0x76180c00 nid=0x3211 waiting on condition [0x00000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
        - None

"Signal Dispatcher" #4 daemon prio=9 os_prio=0 tid=0x7617f000 nid=0x3210 runnable [0x00000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
        - None

"Finalizer" #3 daemon prio=8 os_prio=0 tid=0x76162000 nid=0x320f in Object.wait() [0x64f9c000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x65806400> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
        - locked <0x65806400> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164)
        at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)

   Locked ownable synchronizers:
        - None

"Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x76160800 nid=0x320e in Object.wait() [0x64fec000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x65805ef8> (a java.lang.ref.Reference$Lock)
        at java.lang.Object.wait(Object.java:502)
        at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:157)
        - locked <0x65805ef8> (a java.lang.ref.Reference$Lock)

   Locked ownable synchronizers:
        - None

"main" #1 prio=5 os_prio=0 tid=0x76107400 nid=0x320c waiting on condition [0x762b1000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at java.lang.Thread.sleep(Thread.java:340)
        at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:386)
        at com.nikoskatsanos.spinningthread.SpinningThread.main(SpinningThread.java:40)

   Locked ownable synchronizers:
        - None

"VM Thread" os_prio=0 tid=0x7615d400 nid=0x320d runnable

"VM Periodic Task Thread" os_prio=0 tid=0x76185c00 nid=0x3213 waiting on condition

JNI global references: 310

The nid value (nid=0x3214) should match the HEX value of the thread's decimal ID

As seen, in the above case is obvious that thread with name 'Spinner' is the high CPU utilization thread we are looking for. After this point the user can investigate the application's logic and determine the root cause.

Python UnPickling in Java

Sometimes, applications, are necessary to interact with the serialized form of a different language. This usually happens in the persistence layer. Ideally, the form chosen for persistence should be cross platform (i.e protobufs), but unfortunately the reality is that sometimes the developer has no control over it or he/she needs to deal with third party systems, which they use a different serialized form than his/her language of choice supports.

This article, is about such a scenario and presents in simple steps how someone can deserialize (unPickle) in Java, a Python pickled object [see: Python Pickle].

Outlining the main steps someone has to follow below:

  • Have a dependency to Jython, which will be called from inside Java
  • Create a Java interface, which will act as a proxy between the Python/Jython and Java.
  • Create a stub class of the Python object, which the serialized form represents, and make sure you extend the above Java interface defined in step two. Also, this python class will have to implement all the methods for the specified interface
  • From your Java code use the cPickle Jython API to un-pickle the serialized object into your java interface

The below example demonstrates the steps defined above.

The Python object

Let's assume the python object that gets pickled is the below

Knowing how the python object looks like makes it a lot easier to define the Jython stub. If the object is not known, the developer will have to reverse engineer the serialized pickle and create his/her stub.

If an instance of the above object gets pickled it will look similar to:

Jython Dependency

If using maven add a dependency to Jython.

Java Interface acting as a proxy

Next step is to define a Java interface which acts as a proxy between Jython and Java. You can read more about this here

The interface we defined is the below:

Jython stub

We need to create a stub of the original object, which will extend the java interface we have defined. That stub is the Jython object that will store the serialized object after unpickling and before returning to Java.

In our case this can look like:

Note here that we import the actual Java interface we have defined and the python objects implements that interface.

Be aware

Some minor caveats to be aware of, at this step, is that this python object needs to be in your Jython's classpath, or module directory even better. This can be achieved by various ways, but the easiest would be:

  • Add the module directory in the environment variable JYTHONENV
  • Programmatically from Java add the directory that the module resides to python.path (i.e System.setProperty( "python.path", "/pythonModuleDir" ) )

UnPickle the object

The final step is to unpickle and use the object in Java. A sample code to do that is:

Java 9 Process API

In a previous blog post I wrote about one of my favourite features of Java 9, the JShell. At this post, I will write about another feature I am excited about. The new Java 9 Process API. I will also present some code showing how powerful and intuitive it is.

The new API adds greater flexibility to spawning, identifying and managing processes. As an example, before Java 9 someone would need to do the following in order to retrieve the PID of a running process:

The above is not intuitive and seems like a hack. It feels to someone that the Java process should at least easily expose its own PID.

Moreover, I quite a few times needed to spawn new child processes from inside a Java process and manage them. The process of doing so is very cumbersome. A reference to the child process has to be kept throughout the program's execution if the developer wishes to destroy that process later. Not to mention that getting the PIDs of the children processes is also a pain.

Fortunately, Java 9 comes to fix those issues and provide a clean API for interaction with processes. More specifically two new interfaces has been added to the JDK:

1. java.lang.ProcessHandle 2. java.lang.ProcessHandle.Info

The two new interfaces add quite a few methods. The first one methods for retrieving a PID, all the processes running in the system and also methods for relationships between processes. The second one mainly provides meta information about the process.

As someone would expect most of the methods have native, platform specific implementations. The OpenJDK's implementation of ProcessHandle can be found here. Also the Unix specific implementation can be seen here.

I have created a very simple program which makes use of most of the features of this new Process API. The program does the below:

  • Can retrieve the running process' PID
  • Can start a long running process
  • Can start a short running process, which terminates about ~5seconds after starting
  • Can list all child processes that were spawned by the parent one
  • Can kill all child processes that were spawned by the parent one
  • Attaches a callback when a child process exits. This is done using the onExit() method of the ProcessHandle

The sample class is provided below. For the entire example please see here:

JShell

As of now, Java 9 official release date is 27.07.2017. According to openJDK mailing list the push back was due to the most anticipated feature of Java 9, which is the modularisation of the JDK or commonly known as Project Jigsaw.

I am not as much excited for this feature as I am for the brand new JShell. Many people criticise the language's verbosity and sometimes the amount of code that is required to do some stuff. I do not disagree that this, in many cases, is true. But, what I was missing mainly from Java was the ability to quickly evaluate an expression/algorithm/piece of code.

For example, many times I find myself needing to try something quick which involves reading a file or reading something from the web and performing some manipulation on it. Or even sometimes testing out a lambda expression to see its behaviour. Up to this point, actions like that were a bit cumbersome, as it involved the creation of a class, a main method and the execution of that program.

JShell is introduced to solve problems like that and more. Also known as Project Kulla JShell is an extremely useful tool. It is a REPL (Read Evaluate Print Loop) tool. Similar ones exist in various other languages like Python, Perl, even Scala.

For someone to use JShell she/he needs to download JDK 9. Then all she/he has to do is to navigate to /bin directory and execute the jshell command.

Firstly, the JShell itself prompts the user to type /help intro

The jshell comes with auto-completion features, so the user can press Tab and see a list of commands depending on the first letter she/he typed:

A list of help command can appear on the output by typing /help.

By default JShell has to import the classes that the user is going to use. It comes with a pre-defined set of common classes already imported:

A user can import any JDK class, or even her/his own classes by adding to the classpath:

As someone can notice it is not mandatory to add semicolons in the end of statements. However, it is mandatory to add them if the user adds a class or a method.

Each expression the user writes on the console is evaluated and printed on the standard output. If an expression has a return type that return type is automatically assigned in a variable that the shell creates on the fly. Of course, later on the user can make use of that variable as normal:

User can define methods outside of classes. Additionally, classes can be defined and referenced as normal:

A very nice feature is the fact that the user does not need any try{}catch{} blocks for methods which define checked exceptions:

Finally, the user can see the defined methods, types and variables and reset her/his session:

Concluding, I believe JShell will be a nice to have tool. By exploring it I am pretty sure people will come up with some interesting uses of it.

JDK Evolution

I know for fact that many people (especially in the financial technology industry) are very skeptical when a new version of Java is released. People, actually persist to update their Java version (even the JRE version) for many years. There are a lot of places that are still using Java 6! Even though, this persistence is valid for some cases, especially in the early stages of a new release, i personally find it wrong. Indeed, to upgrade the version of Java a software is using is not a simple and easy thing in most of the cases. Lots of testing needs to be done, to ensure at least the application's performance has not degraded. Additionally, more testing is needed when the application is doing something very tailored, like calling native code in-process.

In my opinion, upgrading to the newest version is advisable for many reasons. The one i would like to mention today is the JDK evolution. Meaning, that in most of the cases a software developer will have some free gains, without him, in principal, doing anything. The code inside the JDK has some minor changes between releases. This is done for bug fixing reasons, improving performance reasons or even better for following hardware trends. There are lots of times that CPUs introduce a new instruction which solves a problem down at the silicon level, meaning faster processing. The Java engineers and in particular people who are involved in the OpenJDK project have lots of mechanical sympathy.

A well shout example is the commonly used java.util.concurrent.atomic.AtomicInteger class. There is a huge difference in the implementation for a couple of methods in this class, between Java7 and Java8. The difference is presented below:

Java 7

117  public final int getAndSet(int newValue) {118      for (;;) {119          int current = get();120          if (compareAndSet(current, newValue))121              return current;122      }123  }

Java 8

119  public final int getAndSet(int newValue) {120      return unsafe.getAndSetInt(this, valueOffset, newValue);121  }

There is a very important difference. Java 8 uses some code inside the Unsafe class, where Java 7 is performing a busy loop. Java 8 actually makes uses of a new CPU instruction for that. That means that the unsafe.getAndSetInt is an intrinsic function. Java's intrinsic functions can be found here.

This is a very simple but very important reason why someone should consider regularly upgrading his/her Java version. Simple things like that, which are spread across the newer implementations can actually have a positive impact on every application.