Skip to content

programming-languages

Building A Logo Turtle App With Antlr And JavaFX

This post is about building a very simple application, implementing a subset of the Logo Programming Language along with a UI that visualizes the Logo programms entered by users.
The technologies used were Antlr for creating a parser for a subset of the Logo rules and also JavaFX for creating a UI allowing users to enter Logo programs and providing a drawing visualization of those programs.

The Logo Language

Logo is an educational language, mainly targeted to younger ages. It is a language that I personally had some interactions with back in the junior high school days. It effectively provides a grammar of movement rules (i.e. forward, back, right 90o) along with some control flow (i.e. repeat) allowing the user to produce a set of commands. Those commands, coupled with a visualization software can draw vector graphics, or coupled with robotic devices can move the robot around.

It used to be a nice language for educating kids on programming, nowadays however there are better more advanced ones like Scratch. I picked Logo however, for its simple grammar as my main goal was to refresh some Antlr knowledge, rather than build a real application.

Antlr

Antlr is a great tool for parsing structured text (i.e. imagine regular expressions on steroids). It is a tool that can be used to parse grammatical rules and build applications on top of its features. A common approach is to build custom DSLs using Antlr. I will not describe Antlr in length as the official website has lots of good documentation and also I am no expert on Antlr. Additionally, the book Definitive-ANTLR-4-Reference written by its creator is a good resource.

In this application, effectively I have defined my own set of Logo rules using Antlr's grammar and I relied on Antlr's parsing capabilities to evaluate the Logo programs and give me callbacks of the Logo commands encountered.

JavaFX

Most people would be familiar with JavaFX. Effectively is the next attempt after Java Swing on creating the machinery for building (modern?) UIs using Java. My UI skills are pretty bad, hence I wanted something to force me build a UI. I picked JavaFX, instead of something more standard like HTML5+JS Framework, as I had done some Java Swing in the past and wanted to try JavaFX out of curiosity mainly.

Even though JavaFX is very feature rich and the programming model resembles part of Java Swing and part of C# WPF (which i was a bit familiar back in 2011) I was not impressed by it. It felt cumbersome in ways that I thought the whole programming model was getting into my way, maybe because I am not familiar with it, also maybe because is just not a great programming model justifying the lack of widespread adoption.

Defining The Antlr Grammar

As mentioned above, Antlr needs a grammar definition, which consist of parser and lexer rules. The lexer rules are used to extract tokens out of the text, and the parser rules for extracting meaningful statements.

I decided to only go with a small subset of the Logo features, so the below would be supported:

  • Moving forward
  • Moving backwards
  • Turning left/right
  • Allowing for pen up/down functionality, meaning if pen is up no drawing should appear even if the 'turtle' moves around

Those rules, translated into an Antlr grammar look like Logo.g4

Someone can notice that the above grammar just defines the keyworks (i.e. forward, back, right etc) as lexer rules (a.k.a tokens) and programmar expressions (i.e. forward 50) as parser rules. In the application layer, Antlr generates stubs of listeners for the parser rules, which can be implemented and the user gets callbacks on those rules. Then users can write their logic on top of that.

It is easy to see how helpful Antlr is, doing all the heavylifting for the user. Someone just need to extend the already generated listener, which propagates the events to the user's code.

Wiring Parser Callbacks

As we are mainly interested in the grammar rules that define Logo actions, we can only implement those callbacks. The class that deals with the callback can be made UI agnostic and act as a driver to the underlying implementation. For example we could have various implementations of how to visualize the Logo program:

  • A JavaFX UI
  • A Swing UI
  • A plain standard out program

The below implementation deals with that

public class LogoDriver extends LogoBaseListener {

    private final TurtlePainter painter;

    public LogoDriver(TurtlePainter painter) {
        this.painter = painter;
    }

    @Override
    public void exitForward(final ForwardContext ctx) {
        this.painter.forward(Integer.parseInt(ctx.getChild(1).getText()));
    }

    @Override
    public void exitBack(final BackContext ctx) {
        this.painter.back(Integer.parseInt(ctx.getChild(1).getText()));
    }

    @Override
    public void exitRight(final RightContext ctx) {
        this.painter.right(Integer.parseInt(ctx.getChild(1).getText()));
    }

    @Override
    public void exitLeft(final LeftContext ctx) {
        this.painter.left(Integer.parseInt(ctx.getChild(1).getText()));
    }

    @Override
    public void exitSet(final SetContext ctx) {
        final String[] point = ctx.POINT().getText().split(",");
        final int x = Integer.parseInt(point[0]);
        final int y = Integer.parseInt(point[1]);
        this.painter.set(x, y);
    }

    @Override
    public void exitPenUp(final PenUpContext ctx) {
        this.painter.penUp();
    }

    @Override
    public void exitPenDown(final PenDownContext ctx) {
        this.painter.penDown();
    }

    @Override
    public void exitClearscreen(ClearscreenContext ctx) {
        this.painter.cls();
    }

    @Override
    public void exitResetAngle(ResetAngleContext ctx) {
        this.painter.resetAngle();
    }

    @Override
    public void exitProg(ProgContext ctx) {
        this.painter.finish();
    }
}

The TurtlePainter can be anything, even a program that records the program's commands and asserts them, like a JUnit spy.

The JavaFX UI

In our case, the TurtlePainter is a class that translates the commands into JavaFX constructs and delegates to the UI thread to draw those constrcuts. For example the implementation for the forward command looks like:

    @Override
    public void forward(int points) {
        JavaFXThreadHelper.runOrDefer(() -> {
            final double radian = this.toRadian(this.direction);
            final double x = this.turtle.getCenterX() + points * Math.cos(radian);
            final double y = this.turtle.getCenterY() - points * Math.sin(radian);

            this.validateBounds(x, y);

            this.moveTurtle(x, y);
        });
    }

    private void moveTurtle(final double x, final double y) {
        JavaFXThreadHelper.runOrDefer(() -> {

            final Path path = new Path();
            path.getElements().add(new MoveTo(this.turtle.getCenterX(), this.turtle.getCenterY()));
            path.getElements().add(new LineTo(x, y));

            final PathTransition pathTransition = new PathTransition();
            pathTransition.setDuration(Duration.millis(this.animationDurationMs));
            pathTransition.setPath(path);
            pathTransition.setNode(this.turtle);

            if (this.isPenDown) {
                final Line line = new Line(this.turtle.getCenterX(), this.turtle.getCenterY(), x, y);
                pathTransition.setOnFinished(onFinished -> this.canvas.getChildren().add(line));
            }

            animation.getChildren().add(pathTransition);

            this.paintTurtle(x, y);
        });
    }

Effectively drawing a line to the UI.

A simple Logo program that draws "HELLO WORLD" in the screen can be found here. The result for this one would look like:

The Source Code

Source code is checked into github.

Quite a few enhancements can be made, both on the UI side, but also at the language level side:

  • Implement Logo flow control (i.e. loops)
  • Make the turtle, an actual turtle image, by also showing its facing direction
  • etc…

Feel free to fork or send a PR for any addition :)

Brushing Up My C. Building A Unix Domain Socket Client/Server (PART II)

I described in this previous blog post how to build a simplistic Unix Domain Socket client/server application.
The disadvantage with that approach is that the server can only handle one connection at a time (i.e. is not concurrent).

This blog post explains how this can be improved by using mechanisms like select(), epoll(), kqueue() etc.
Effectively all these mechanisms allow for monitoring multiple file descriptors and be called back when one or multiple of those file descriptors have data so that an action can be invoked (i.e. read, write etc).
The main differences among those are characteristics like:

  • Synchronous vs Asynchronous paradigms
  • Underlying data structures in the internals of those system calls, which play an important role on performance
  • Platforms/OS specific, as not every OS supports all the above. Some are platform agnonistic (i.e. select()), some are platform specific (i.e. epoll() is only implemented on Linux)

A superb blog post to understand the differences is this one by Julia Evans

select()

I tried to just enhance the server part of the previous blog post and I went with the select() option.
The select() system call is simpler from epoll() or kqueue() and it effectively allows for registering a number of file descriptors which are monitored for I/O events. On calling select() the thread blocks and it only unblocks when one or more file descriptors have I/O data.
The file descriptors have to manually be registered on an fd_set, which in turn is passed in the select() call. The below macros can be used to manipulate the fd_set:

  • void FD_ZERO(fd_set *set): Initialize an fd_set
  • void FD_SET(int fd, fd_set *set): Add a file descriptor to an fd_set
  • void FD_CLR(int fd, fd_set *set): Remove a file descripro from the fd_set
  • int FD_ISSET(int fd, fd_set *set): Check if a specific file descriptor, part of the fd_set is ready with I/O data

The main caveat with select() is that on every call the fd_set is cleared from the file descriptors that do not have any I/O data on that cycle, hence the developer has to manually re-register all the file descriptors again, which is also descripted in the select() documentation

Note well: Upon return, each of the file descriptor sets is modified in place to indicate which file descriptors are currently "ready". Thus, if using select() within a loop, the sets must be reinitialized before each call.

Having said that, the server.c file now looks like:

#include "stdlib.h"
#include "stdio.h"
#include "string.h"
#include "stddef.h"

#include <unistd.h>
#include <sys/socket.h>
#include <sys/types.h> 
#include <sys/un.h>
#include <netinet/in.h>
#include "sys/syscall.h"
#include <sys/select.h>
#include <errno.h>

#include "af_unix_sockets_common.h"

int pumpData(int fd);
int cleanupConnections(int *connections, int idx);

/*
* Open a `AF_UNIX` socket on the `path` specified. `bind()` to that address, `listen()` for incoming connections and `accept()`. Finally, wait for input from the socket and print 
* that to the `stdout`. When one connection is closed, wait for the next one.
*/
void server(char *path) {
    printf("Starting AF_UNIX server on Path=%s\n", path);
    AFUnixAddress *domainSocketAddress = open_af_unix_socket(path);

    int hasBind = bind(domainSocketAddress->fd, (struct sockaddr *)domainSocketAddress->address, sizeof(struct sockaddr));
    if(hasBind == -1){
        fprintf(stderr, "Failed to bind AF_UNIX socket on Path=%s. ErrorNo=%d\n", path, errno);
        cleanup(domainSocketAddress->fd, path);
        exit(errno);
    }

    int isListening = listen(domainSocketAddress->fd,  10);
    if(isListening == -1) {
        fprintf(stderr, "Failed to listen to AF_UNIX socket on Path=%s. ErrorNo=%d\n", path, errno);
        cleanup(domainSocketAddress->fd, path);
        exit(errno);
    }

    fd_set readfds;
    int maxFD = domainSocketAddress->fd;
    FD_ZERO(&readfds);
    FD_SET(domainSocketAddress->fd, &readfds);

    int openConnections[FD_SETSIZE];
    int closedConnections[FD_SETSIZE] = {0};// indices to openConnections that have clo
    int nextIdx = 0;

    fprintf(stdout, "Start accepting connections on Path=%s\n", path);
    while(TRUE) {
        int retVal = select(maxFD + 1, &readfds, NULL, NULL, NULL);
        if(FD_ISSET(domainSocketAddress->fd, &readfds)) {

            int connFd = accept(domainSocketAddress->fd, NULL, NULL);
            if(connFd == -1) {
                fprintf(stderr, "Error while accepting connection. Error=%s, ErrorNo=%d\n", strerror(errno), errno);
                cleanup(domainSocketAddress->fd, path);
                exit(errno);
            }
            fprintf(stdout, "New AF_UNIX connection added\n");

            openConnections[nextIdx++] = connFd;
            maxFD = maxFD >= connFd ? maxFD : connFd;
            FD_SET(connFd, &readfds);
        } else {
            for(int i = 0; i < nextIdx;i ++) {
                if(FD_ISSET(openConnections[i], &readfds)) {

                    if(!pumpData(openConnections[i])){
                        FD_CLR(openConnections[i], &readfds);
                        openConnections[i] = -1;// denotes that connection has closed
                    }
                }
            }

            nextIdx = cleanupConnections(openConnections, nextIdx);
        }

        // re-add all active FDs to fd_set
        FD_SET(domainSocketAddress->fd, &readfds);
        for(int i = 0; i < nextIdx;i ++) {
            FD_SET(openConnections[i], &readfds);
        }
    }

    cleanup(domainSocketAddress->fd, path);
}

int pumpData(int connFd) {
    char buf[BUFSIZ];
    int bytes = read(connFd, buf, BUFSIZ);
    if(bytes <= 0) {
        fprintf(stdout, "Connection closed\n");
        return FALSE;
    }
    write(1, buf, bytes);
    return TRUE;
}

int cleanupConnections(int *connections, int idx) {
    int temp[idx];
    int next = 0;
    for(int i = 0; i < idx;i++) {
        if(connections[i] != -1) {
            temp[next++] = connections[i];
        }
    }

    memcpy(connections, temp, sizeof(temp));
    return next;
}

The Changes

The few changes that worth mentioning are the below:

  • We first registered the AF_UNIX sockets file descriptor on the fd_set that is passed into the select() call.
  • On every call to select(), the first check to be done is that if the socket file descriptor has I/O data, which means a new connection. If so the server accept() that connection
if(FD_ISSET(domainSocketAddress->fd, &readfds)) {
    int connFd = accept(domainSocketAddress->fd, NULL, NULL);
    ...
  • After accepting a connection, that connection's file descriptor has to be added to the fd_set so that it can be monitored for I/O events
FD_SET(connFd, &readfds);
  • For every open connection, the program checks if the corresponding file descriptor has I/O data, and if so the server reads those data. Worth noting that when a connection closes, this also means an I/O signal, hence the program needs to check and remove the closed file descriptor from the monitoring fd_set
if(FD_ISSET(openConnections[i], &readfds)) {
    if(!pumpData(openConnections[i])){
        FD_CLR(openConnections[i], &readfds);
        openConnections[i] = -1;// denotes that connection has closed
    }
}
  • Finally, as mentioned above, after select() returns it will only contain file descriptors that have data on the fd_set. Any previously added file descriptors that did not have I/O data on that cycle are removed, hence needs to be re-added. Luckily, according to the select() documentation there is no harm trying to re-set a file descriptor that is already in the fd_set hence we just loop over the known file descriptors and re-add them all on the fd_set

FD_SET() This macro adds the file descriptor fd to set. Adding a file descriptor that is already present in the set is a no-op, and does not produce an error.

// re-add all active FDs to fd_set
FD_SET(domainSocketAddress->fd, &readfds);
for(int i = 0; i < nextIdx;i ++) {
    FD_SET(openConnections[i], &readfds);
}

Conclusion

The changes needed to allow for multiplexing of different connections were minimal and did not radically affect the programs logic. Someone can take this example and enhance it further. Some suggestions would be:

  • Try epoll() instead of select()
  • Instead of just reading what the client has sent and printing it out to the console, broadcast the message to all clients connected at that time

Brushing Up My C. Building A Unix Domain Socket Client/Server (Part I)

I haven't really done much C in my professional career. There were a couple of times that I had to look into some C code, but not really create any kind of system in C. Most of my interaction with the C programming language dates back to university times.
I always liked C though. For its simplicity as a language, for its small surface as a language (i.e. not bloated with tons of features) and finally for the perfect, artistic, abstraction it provides on top of the hardware, which is so close to the metal but at the same time hiding the complexities and providing just about the right amount of abstractions.

For all the above and thanks to the lockdown and the plenty of me-time that I now have, I decided to revise C and try to put some of its features in practice by gradually building a very simple application.

Revising Using a Book

I didn't want to follow any tutorials or knowledge in Google, hence I picked up a book in order to revise. Even though there are a couple of great books for C out there, for me it was a no brainer to actually revise one of the best books in my library The C Programming Language by Brian W. Kernighan and Dennis M. Ritchie.
This book is written back in 1988, but personally I find it to be one of the best technical books I ever read. I read the entire book in about 2 weeks, and in parallel I was trying to do some of the plenty exercise that each chapter has. By doing so, I refreshed some of my knowledge in C and got a fresh overview of its features as a language.

Deciding On A Simple Application

After finishing the book I decided it would be nice to put some of the concepts in action and build something small, trying to use some of the features I just revised. I wanted something that wouldn't take me much time (as it would mainly be a playground, rather than a side project), but at the same time also give me some knowledge.
I was always interested and fairly familiar with network servers. I have an understanding of the underlyling kernel space functions that take place in socket programming, but I have always been using abstractions on top of this, mainly through Java frameworks. So I decided to build something around that concept, and as I wanted to keep it fairly simple and because I hadn't really use this feature in the past, I concluded building a client/server applications using Unix Domain Sockets makes sense and most likely would fullfill most of my requirements.

Simplifying it a lot, a Unix Domain Socket is like any other socket (i.e. internet socket), but it can only be used for inter-process communication on the host it is opened, as it is actually backed by the filesystem.

Rules Of Engagement

In order to actually build this all by myself (I could just google an example and get the solution in 5 minutes), I decided to impose some very basic rules:

  • I could use the book I read as a reference
  • I could use man7.org as a reference for system calls
  • I could not use any other internet resource that provided a ready baked solution
  • The purpose of this application was not to be cross-platform. I was mainly interested making it work in my Mac and effectively in BSD like systems

The Code

I decided to split the code into different source (.c) files:

  • af_unix_sockets.c: The file containing the main() method and basic logic for parsing the command line arguments in order to start a client or a server
  • af_unix_sockets_common.h: A header file containing common definitions and the prototypes for the different methods, that client or server implements and also the defininion of a simple type AFUnixAddress storing a file descriptor and the actual socket address
  • af_unix_sockets_common.c: A source file containing some common methods
  • af_unix_sockets_server.c: The server implementation, to be called by the main method in af_unix_sockets.c
  • af_unix_sockets_client.c: The client implementation, to be called by the main method in af_unix_sockets.c

The Header File

As described above af_unix_sockets_common.h is a header file defining the prototypes of various functions (which I view as the public interface) to be implemented by parts of the system and to be called by other parts.
Additionally, the header defines a type, which I mainly created for Part II of this post, encapsulating the file descriptor of an opened unix domain socket and also its address.

#define TRUE 1
#define FALSE 0
#define CLIENT "client"
#define SERVER "server"

typedef struct af_unix_address {
    int fd;
    struct sockaddr_un *address;
} AFUnixAddress;

AFUnixAddress * open_af_unix_socket(char *);
void cleanup(int fd, char *path);

void server(char *path);

void client(char *path);

The Common File

I wanted to have a common file, just for the shake of it, to be able to export some common functionality that is shared between the server and the client.

#include "stdlib.h"
#include "stdio.h"
#include "string.h"
#include "stddef.h"

#include <sys/socket.h>
#include <sys/un.h>
#include <errno.h>

#include "af_unix_sockets_common.h"

/*
 * @return AFUnixAddress type, containing the address and the file descriptor for the opened unix domain socket  
*/
AFUnixAddress *open_af_unix_socket(char *path) {
    int fd = socket(AF_UNIX, SOCK_STREAM, 0);
    if(fd == -1) {
        fprintf(stderr, "Failed to open AF_UNIX socket. ErrorNo=%d\n", errno);
        cleanup(fd, path);
        exit(errno);
    }

    AFUnixAddress *af_unix_socket = (AFUnixAddress *)malloc(sizeof(AFUnixAddress));
    af_unix_socket->address = (struct sockaddr_un *)malloc(sizeof(struct sockaddr_un));
    af_unix_socket->fd = fd;
    af_unix_socket->address->sun_family = AF_UNIX;
    strcpy(af_unix_socket->address->sun_path, path);
    return af_unix_socket;
}

/*
* Clean up an opened file descriptor opened for 
*/
void cleanup(int fd, char *path) {
    close(fd);

    remove(path);

    if(fd != -1) {
        fprintf(stderr, "Failed to successfully close socket. ErrorNo=%d\n", errno);
    }
}

The common file is very simple, containing just two methods, one that opens a Unix Domain Socket for the specified path that was passed in and returning an AFUnixAddress which is a type defined in the af_unix_sockets_common.h file, containing the actual socket's address and the file descriptor corresponding to that socket.
The file descriptor is needed for later use, to invoke system calls on it. Finally, worth mentioning that the Unix Domain Socket opened is a SOCK_STREAM one, which based on the documentation follows TCP semantics, as opposed to SOCK_DGRAM which follows UDP semantics.

The Client

The client's behavior is defined in its own file and is pretty simplistic. A path is passed in specifying the filepath for the Unix Domain Socket. Then a socket is opened for that path and an invocation to #connect() method, passing in the file descriptor associated with the socket, forces the connection to be established.
Finally, the client reads from stdin and writes that to the socket invoking the write() method.

#include "stdlib.h"
#include "stdio.h"
#include "string.h"
#include "stddef.h"
#include "unistd.h"

#include <sys/socket.h>
#include <sys/types.h> 
#include <sys/un.h>
#include "sys/syscall.h"
#include <errno.h>

#include "af_unix_sockets_common.h"

/*
* AF_UNIX socket client, obtains an ```AFUnixAddress``` by opening the socket to the specified ```path``` and then invoking ```#connect()``` on the socket's file descriptor.
* Finally, the client reads input from ```stdin``` and writes that to the socket.
*/
void client(char *path) {
    fprintf(stdout, "Starting AF_UNIX client on Path=%s\n", path);
    AFUnixAddress *domainSocketAddress = open_af_unix_socket(path);
    printf("AF_UNIX client socket on Path=%s opened with fd=%d\n", domainSocketAddress->address->sun_path, domainSocketAddress->fd);

    int isConnected = connect(domainSocketAddress->fd, (struct sockaddr *)domainSocketAddress->address, sizeof(struct sockaddr));
    if(isConnected == -1) {
        fprintf(stderr, "Failed to connect to Path=%s. ErrorNo=%s\n", domainSocketAddress->address->sun_path, strerror(errno));
        cleanup(domainSocketAddress->fd, domainSocketAddress->address->sun_path);
        exit(errno);
    }

    char line[1024];
    while(fgets(line, 1024, stdin) != NULL) {
        int size = 0;
        for(;line[size] != '\0'; size++){
        }
        if(size == 0) {
            break;
        }
        int bytes = write(domainSocketAddress->fd, line, size);
    }

    cleanup(domainSocketAddress->fd, domainSocketAddress->address->sun_path);
    exit(0);
}

The Server

The server follows the same pattern with that of the client. The only difference is the system calls involved in order to bind to the opened socket and start listening. More specifically, after the socket is created
a call to bind() connects the file descriptor with the address of the socket. Then a call to listen() allows the socket to wait
for incoming connections, and finally a call to accept() accepts the first enqued connection request and retrieves a file descriptor for that connection. That file descriptor can
be passed in the read() system call to read incoming bytes. Note, that we had to call accept() because we marked the domain socket as a SOCK_STREAM one,
hence effectively a connection oriented socket.

#include "stdlib.h"
#include "stdio.h"
#include "string.h"
#include "stddef.h"
#include "unistd.h"

#include <sys/socket.h>
#include <sys/types.h> 
#include <sys/un.h>
#include "sys/syscall.h"
#include <errno.h>

#include "af_unix_sockets_common.h"

/*
* Open a ```AF_UNIX``` socket on the ```path``` specified. ```#bind()``` to that address, ```#listen()``` for incoming connections and ```#accept()```. Finally, wait for input from the socket and print 
* that to the ```stdout```. When one connection is closed, wait for the next one.
*/
void server(char *path) {
    printf("Starting AF_UNIX server on Path=%s\n", path);
    AFUnixAddress *domainSocketAddress = open_af_unix_socket(path);

    int hasBind = bind(domainSocketAddress->fd, (struct sockaddr *)domainSocketAddress->address, sizeof(struct sockaddr));
    if(hasBind == -1){
        fprintf(stderr, "Failed to bind AF_UNIX socket on Path=%s. ErrorNo=%d\n", path, errno);
        cleanup(domainSocketAddress->fd, path);
        exit(errno);
    }

    int isListening = listen(domainSocketAddress->fd,  10);
    if(isListening == -1) {
        fprintf(stderr, "Failed to listen to AF_UNIX socket on Path=%s. ErrorNo=%d\n", path, errno);
        cleanup(domainSocketAddress->fd, path);
        exit(errno);
    }

    fprintf(stdout, "Start accepting connections on Path=%s\n", path);
    while(TRUE) {
        int connFd = accept(domainSocketAddress->fd, NULL, NULL);
        if(connFd == -1) {
            fprintf(stderr, "Error while accepting connection. Error=%s, ErrorNo=%d\n", strerror(errno), errno);
            cleanup(domainSocketAddress->fd, path);
            exit(errno);
        }

        char buf[BUFSIZ];
        while(TRUE){
            int bytes = read(connFd, buf, BUFSIZ);
            if(bytes <= 0) {
                fprintf(stdout, "Connection closed\n");
                break;
            }
            write(1, buf, bytes);
        }
    }

    cleanup(domainSocketAddress->fd, path);
}

The Main Method

The file that contains the program's main method, reads the command line arguments, does some rudimentary parsing and based on what was passed creates an AF_UNIX socket server or client.

#include "stdlib.h"
#include "stdio.h"
#include "string.h"
#include "stddef.h"
#include "unistd.h"

#include <sys/socket.h>
#include <sys/types.h> 
#include <sys/un.h>
#include "sys/syscall.h"
#include <errno.h>

#include "af_unix_sockets_common.h"

const static char *USAGE = "Usage af_unix_socket --type [server|client] --path [path]\n";

int main(int argc, char **argv){
    if(argc != 5) {
        fprintf(stderr, "%s", USAGE);
        exit(-1);
    }

    char *type = NULL;
    char *path = NULL;
    char *nextParam;

    int idx=1;
    int expectFlag = TRUE;
    while(idx < 5) {
        if(strstr(&(*argv[idx]), "--") != NULL) {
            if(strcmp("--type", &(*argv[idx])) == 0) {
                nextParam = (char *)malloc(32);
                type = nextParam;
            } else if(strcmp("--path", &(*argv[idx])) == 0) {
                nextParam = (char *)malloc(32);
                path = nextParam;
            } else {
                fprintf(stderr, "%s", USAGE);
                exit(-1);
            }
            expectFlag = FALSE;
        } else {
            if(expectFlag) {
                fprintf(stderr, "Expected flag for positional argument %d. %s", idx, USAGE);
                exit(-1);
            }
            size_t paramSize = sizeof(&(*argv[idx]));
            memcpy(nextParam, &(*argv[idx]), paramSize);
        }
        ++idx;
    }

    if(type == NULL || path == NULL) {
        fprintf(stderr, "%s", USAGE);
        exit(-1);
    }

    fprintf(stdout, "Initializing AF_UNIX for Type=%s, Path=%s\n", type, path);
    if(strcmp(type, SERVER) == 0) {
        if(access(path, F_OK) != -1) {
            fprintf(stdout, "File=%s already exists. Deleting file to be used by AF_UNIX server\n", path);
            if(remove(path) != 0) {
                fprintf(stderr, "Failed to remove existing File=%s. File cannot be used for AF_UNIX server\n", path);
                exit(-1);
            }
        }

        server(path);
    } else if(strcmp(type, CLIENT) == 0) {
        client(path);
    } else {
        fprintf(stderr, "Unknown Type=%s\n", type);
    }
    return 0;
}

Compiling And Running The Program

As now we have all the building blocks, we can actually compile, link and run the program. Based on the operating system someone is running they will need a compatible compiler. Some standard choices are GCC or Clang.
I have personally used Clang as it is the standard on a MacOS system.

Compiling and linking the different files together can plainly be done by:

clang af_unix_sockets_common.c af_unix_sockets.c af_unix_sockets_client.c af_unix_sockets_server.c -o af_unix_sockets

Running the server:

af_unix_sockets --type server --path /tmp/af_unix_socket_example

Running the client:

af_unix_sockets --type client --path /tmp/af_unix_socket_example

Why Part I

This post is named Part I. The main reason behind this is that this program only accepts one connection at a time. A more scalable way of writing this program is by using a selector mechanism (i.e. select(), epoll(), kqueue()) which will allow for connection multiplexing and concurrent handling of those connections.
This will be described in a Part II of this blog and hopefully it will not need many alterations on the code above.

Java, The Cost of a Single Element Loop

In quite a few cases I have seen myself designing code with listeners and callbacks. It is quite common for a class that emits events, to expose an API to attach listener(s) to it. Those listeners are usually stored in a data structure (see List, Set, Array) and when an event is about to be dispatched the listeners are iterated in a loop and the appropriate callback is called.

Something along the lines of:

public class EventDispatcher {

        private final List<Listener> listeners = new ArrayList<>();

        public void dispatchEvent() {
            final MyEvent event = new MyEvent();
            for (Listener listener : this.listeners) {
                listener.onEvent(event);
            }
        }
        public void attachListener(final Listener listener) {
            this.listeners.add(listener);
        }

        public void removeListener(final Listener listener) {
            this.listeners.remove(listener);
        }
    }

    public static class Listener implements EventListener {

        void onEvent(final MyEvent myEvent) {
            // do staff
        }
    }

    public static class MyEvent {

    }

In many cases I have observed that despite the fact that the class is desinged to accept many listeners, the true is actually that just one listener is attached in the majority of the cases.

Hence I wanted to measure the performance penalty paid in case the class had just one listener vs if the class was initially designed to accept just one listener.

In essence I wanted to check the performance impact on the below two cases.

private Listener listener;
        private final List<Listener> singleElementArray = new ArrayList<Listener>(){
            {add(new Listener());}
        };

        public void dispatch() {
            this.listener.onEvent(new MyEvent());
        }

        public void dispatchInLoop() {
            for (int i = 0; i < 1; i++) {
                this.singleElementArray.get(i).onEvent(new MyEvent());
            }
        }

Assumptions Made Prior To Testing

Before creating a benchmark for the above, I made some assumptions:

  • I assumed the single element (single listener in a data container) loop would be unrolled
  • I (wrongly) assumed that the performane cost will not be significant. As effectively with the loop unrolled I would think the native code produced would more or less look close enough

JMH Benchmark

In order to test my assumptions I created the below benchmark:

SingleElementLoopBenchmark.java

Initial Observations

To my surprise I found out that an invocaiton on a single element list was about ~2,5 slower, based on the below throughput numbers:

Benchmark                                                          Mode  Cnt   Score   Error   Units
          SingleElementLoopBenchmark.directInvocation                       thrpt   10   0.317 ± 0.022  ops/ns
          SingleElementLoopBenchmark.singleElementListLoopInvocation        thrpt   10   0.114 ± 0.010  ops/ns

I couldn't really understand why and the above seemed a bit too far from my expecations/assumptions.

The first thing that I verified with JVM argument -XX:+PrintCompilation was that both methods were compiled with C2 compiler, which was the case.

I also tried to print the assembly code with -XX:+PrintAssembly but I couldn't really read/interpret the assembly code.

Resorting to Social Media

I ended up posting a tweet about my findings and asking some pointer on where/how to look for explanations on what I was observing. The answer I got was to try to find the hot methods by using something like perfasm, which would tie the assembly output to the hottest methods of my benchmark.

Which I did with -prof dtraceasm (The benchmark was running on a Mac that's why I used dtrace). The output was the below:

Direct Invocation

9.56%  ↗  0x000000010b73c950: mov    0x40(%rsp),%r10
  1.00%  │  0x000000010b73c955: mov    0xc(%r10),%r10d                ;*getfield dispatcher {reexecute=0 rethrow=0 return_oop=0}
         │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::directInvocation@1 (line 23)
         │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@17 (line 121)
  0.17%  │  0x000000010b73c959: mov    0xc(%r12,%r10,8),%r11d         ; implicit exception: dispatches to 0x000000010b73ca12
 11.18%  │  0x000000010b73c95e: test   %r11d,%r11d
  0.00%  │  0x000000010b73c961: je     0x000000010b73c9c9             ;*invokevirtual performAction {reexecute=0 rethrow=0 return_oop=0}
         │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invoke@5 (line 40)
         │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::directInvocation@5 (line 23)
         │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@17 (line 121)
 10.69%  │  0x000000010b73c963: mov    %r9,(%rsp)
  0.65%  │  0x000000010b73c967: mov    0x38(%rsp),%rsi
  0.00%  │  0x000000010b73c96c: mov    $0x1,%edx
  0.14%  │  0x000000010b73c971: xchg   %ax,%ax
 10.08%  │  0x000000010b73c973: callq  0x000000010b6c2900             ; ImmutableOopMap{[48]=Oop [56]=Oop [64]=Oop [0]=Oop }
         │                                                            ;*invokevirtual consume {reexecute=0 rethrow=0 return_oop=0}
         │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Listener::performAction@2 (line 53)
         │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invoke@5 (line 40)
         │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::directInvocation@5 (line 23)
         │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@17 (line 121)
         │                                                            ;   {optimized virtual_call}
  1.44%  │  0x000000010b73c978: mov    (%rsp),%r9
  0.19%  │  0x000000010b73c97c: movzbl 0x94(%r9),%r8d                 ;*ifeq {reexecute=0 rethrow=0 return_oop=0}
         │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@30 (line 123)
  9.77%  │  0x000000010b73c984: mov    0x108(%r15),%r10
  0.99%  │  0x000000010b73c98b: add    $0x1,%rbp                      ; ImmutableOopMap{r9=Oop [48]=Oop [56]=Oop [64]=Oop }
         │                                                            ;*ifeq {reexecute=1 rethrow=0 return_oop=0}
         │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@30 (line 123)
  0.02%  │  0x000000010b73c98f: test   %eax,(%r10)                    ;   {poll}
  0.28%  │  0x000000010b73c992: test   %r8d,%r8d
  0.00%  ╰  0x000000010b73c995: je     0x000000010b73c950             ;*aload_1 {reexecute=0 rethrow=0 return_oop=0}

Single Element Loop Invocation

         ╭    0x000000011153fa9d: jmp    0x000000011153fad6
  0.19%  │ ↗  0x000000011153fa9f: mov    0x58(%rsp),%r13
  3.55%  │ │  0x000000011153faa4: mov    (%rsp),%rcx
  0.09%  │ │  0x000000011153faa8: mov    0x60(%rsp),%rdx
  0.22%  │ │  0x000000011153faad: mov    0x50(%rsp),%r11
  0.17%  │ │  0x000000011153fab2: mov    0x8(%rsp),%rbx                 ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
         │ │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@12 (line 44)
         │ │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
         │ │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  3.55%  │↗│  0x000000011153fab7: movzbl 0x94(%r11),%r8d                ;*goto {reexecute=0 rethrow=0 return_oop=0}
         │││                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@35 (line 44)
         │││                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
         │││                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  0.16%  │││  0x000000011153fabf: mov    0x108(%r15),%r10
  0.28%  │││  0x000000011153fac6: add    $0x1,%rbx                      ; ImmutableOopMap{r11=Oop rcx=Oop rdx=Oop r13=Oop }
         │││                                                            ;*ifeq {reexecute=1 rethrow=0 return_oop=0}
         │││                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@30 (line 123)
  0.19%  │││  0x000000011153faca: test   %eax,(%r10)                    ;   {poll}
  4.00%  │││  0x000000011153facd: test   %r8d,%r8d
         │││  0x000000011153fad0: jne    0x000000011153fbe9             ;*aload_1 {reexecute=0 rethrow=0 return_oop=0}
         │││                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@33 (line 124)
  0.07%  ↘││  0x000000011153fad6: mov    0xc(%rcx),%r8d                 ;*getfield dispatcher {reexecute=0 rethrow=0 return_oop=0}
          ││                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@1 (line 28)
          ││                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  0.22%   ││  0x000000011153fada: mov    0x10(%r12,%r8,8),%r10d         ;*getfield singleListenerList {reexecute=0 rethrow=0 return_oop=0}
          ││                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@4 (line 44)
          ││                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
          ││                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
          ││                                                            ; implicit exception: dispatches to 0x000000011153ff2a
  0.21%   ││  0x000000011153fadf: mov    0x8(%r12,%r10,8),%edi          ; implicit exception: dispatches to 0x000000011153ff3e
  4.39%   ││  0x000000011153fae4: cmp    $0x237565,%edi                 ;   {metadata('com/nikoskatsanos/benchmarks/loops/SingleElementLoopBenchmark$Dispatcher$1')}
          ││  0x000000011153faea: jne    0x000000011153fc92
  0.33%   ││  0x000000011153faf0: lea    (%r12,%r10,8),%r9              ;*invokeinterface size {reexecute=0 rethrow=0 return_oop=0}
          ││                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@7 (line 44)
          ││                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
          ││                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  0.09%   ││  0x000000011153faf4: mov    0x10(%r9),%r9d
  0.14%   ││  0x000000011153faf8: test   %r9d,%r9d
          ╰│  0x000000011153fafb: jle    0x000000011153fab7             ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@12 (line 44)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
           │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  3.98%    │  0x000000011153fafd: lea    (%r12,%r8,8),%rdi              ;*getfield dispatcher {reexecute=0 rethrow=0 return_oop=0}
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@1 (line 28)
           │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  0.06%    │  0x000000011153fb01: xor    %r9d,%r9d                      ;*aload_0 {reexecute=0 rethrow=0 return_oop=0}
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@15 (line 45)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
           │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  0.09%    │  0x000000011153fb04: mov    0x8(%r12,%r10,8),%esi          ; implicit exception: dispatches to 0x000000011153ff4e
  0.06%    │  0x000000011153fb09: cmp    $0x237565,%esi                 ;   {metadata('com/nikoskatsanos/benchmarks/loops/SingleElementLoopBenchmark$Dispatcher$1')}
  0.00%    │  0x000000011153fb0f: jne    0x000000011153fcc2
  3.93%    │  0x000000011153fb15: lea    (%r12,%r10,8),%rax             ;*invokeinterface get {reexecute=0 rethrow=0 return_oop=0}
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@20 (line 45)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
           │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  0.06%    │  0x000000011153fb19: mov    0x10(%rax),%r10d               ;*getfield size {reexecute=0 rethrow=0 return_oop=0}
           │                                                            ; - java.util.ArrayList::get@2 (line 458)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@20 (line 45)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
           │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  0.11%    │  0x000000011153fb1d: test   %r10d,%r10d
           │  0x000000011153fb20: jl     0x000000011153fcf6             ;*invokestatic checkIndex {reexecute=0 rethrow=0 return_oop=0}
           │                                                            ; - java.util.Objects::checkIndex@3 (line 372)
           │                                                            ; - java.util.ArrayList::get@5 (line 458)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@20 (line 45)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
           │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  0.28%    │  0x000000011153fb26: cmp    %r10d,%r9d
  0.00%    │  0x000000011153fb29: jae    0x000000011153fc1c
  3.97%    │  0x000000011153fb2f: mov    0x14(%rax),%r10d               ;*getfield elementData {reexecute=0 rethrow=0 return_oop=0}
           │                                                            ; - java.util.ArrayList::elementData@1 (line 442)
           │                                                            ; - java.util.ArrayList::get@11 (line 459)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@20 (line 45)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
           │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  0.05%    │  0x000000011153fb33: mov    %r9d,%ebp                      ;*invokestatic checkIndex {reexecute=0 rethrow=0 return_oop=0}
           │                                                            ; - java.util.Objects::checkIndex@3 (line 372)
           │                                                            ; - java.util.ArrayList::get@5 (line 458)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@20 (line 45)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
           │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  0.08%    │  0x000000011153fb36: mov    0xc(%r12,%r10,8),%esi          ; implicit exception: dispatches to 0x000000011153ff62
  1.27%    │  0x000000011153fb3b: cmp    %esi,%ebp
           │  0x000000011153fb3d: jae    0x000000011153fc5a
  3.94%    │  0x000000011153fb43: shl    $0x3,%r10
  0.05%    │  0x000000011153fb47: mov    0x10(%r10,%rbp,4),%r9d         ;*aaload {reexecute=0 rethrow=0 return_oop=0}
           │                                                            ; - java.util.ArrayList::elementData@5 (line 442)
           │                                                            ; - java.util.ArrayList::get@11 (line 459)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@20 (line 45)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
           │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  1.71%    │  0x000000011153fb4c: mov    0x8(%r12,%r9,8),%r10d          ; implicit exception: dispatches to 0x000000011153ff72
 17.85%    │  0x000000011153fb51: cmp    $0x237522,%r10d                ;   {metadata('com/nikoskatsanos/benchmarks/loops/SingleElementLoopBenchmark$Listener')}
  0.00%    │  0x000000011153fb58: jne    0x000000011153fef6             ;*checkcast {reexecute=0 rethrow=0 return_oop=0}
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@25 (line 45)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
           │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  3.79%    │  0x000000011153fb5e: mov    %rdi,0x18(%rsp)
  0.02%    │  0x000000011153fb63: mov    %r8d,0x10(%rsp)
  0.02%    │  0x000000011153fb68: mov    %rbx,0x8(%rsp)
  0.19%    │  0x000000011153fb6d: mov    %r11,0x50(%rsp)
  3.95%    │  0x000000011153fb72: mov    %rdx,0x60(%rsp)
  0.02%    │  0x000000011153fb77: mov    %rcx,(%rsp)
  0.03%    │  0x000000011153fb7b: mov    %r13,0x58(%rsp)
  0.36%    │  0x000000011153fb80: mov    %rdx,%rsi
  3.78%    │  0x000000011153fb83: mov    $0x1,%edx
  0.01%    │  0x000000011153fb88: vzeroupper
  4.05%    │  0x000000011153fb8b: callq  0x00000001114c2900             ; ImmutableOopMap{[80]=Oop [88]=Oop [96]=Oop [0]=Oop [16]=NarrowOop [24]=Oop }
           │                                                            ;*invokevirtual consume {reexecute=0 rethrow=0 return_oop=0}
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Listener::performAction@2 (line 53)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@29 (line 45)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
           │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
           │                                                            ;   {optimized virtual_call}
  0.98%    │  0x000000011153fb90: mov    0x10(%rsp),%r8d
  3.61%    │  0x000000011153fb95: mov    0x10(%r12,%r8,8),%r10d         ;*getfield singleListenerList {reexecute=0 rethrow=0 return_oop=0}
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@4 (line 44)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
           │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  0.24%    │  0x000000011153fb9a: mov    0x8(%r12,%r10,8),%r9d          ; implicit exception: dispatches to 0x000000011153ff9e
  0.74%    │  0x000000011153fb9f: inc    %ebp                           ;*iinc {reexecute=0 rethrow=0 return_oop=0}
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@32 (line 44)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
           │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  0.04%    │  0x000000011153fba1: cmp    $0x237565,%r9d                 ;   {metadata('com/nikoskatsanos/benchmarks/loops/SingleElementLoopBenchmark$Dispatcher$1')}
  0.00%    │  0x000000011153fba8: jne    0x000000011153fd36
  3.60%    │  0x000000011153fbae: lea    (%r12,%r10,8),%r11             ;*invokeinterface size {reexecute=0 rethrow=0 return_oop=0}
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@7 (line 44)
           │                                                            ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
           │                                                            ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
  0.11%    │  0x000000011153fbb2: mov    0x10(%r11),%r9d
  0.35%    │  0x000000011153fbb6: cmp    %r9d,%ebp
           ╰  0x000000011153fbb9: jge    0x000000011153fa9f             ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
                                                                        ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@12 (line 44)
                                                                        ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
                                                                        ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)

As I said I am not really able to read/interpret assembly code, but in between the lines I could see that:

  • The loop was indeed unrolled
  • A penalty was paid to cast the item to the expected type (17.85% of the CPU instructions)
  • A penalty was paid to fetch the item from the list, underlying array

In order to get some advice from someone knowledgable on this I posted the below question on StackOverflow. The answer is pretty comprehensive, as the person who answered is one of the most prominent names in JVM community

StackOveflow: Java Method Direct Invocation vs Single Element Loop

Conclusion/Observations

In summary:

  • The loop was indeed unrolled, as expected and as seen from the assembly code
  • The main penalty paid is for fetching the element from the list and casting it to the expected type
  • Some cost is also because of checks performed on the data container itself (i.e. size)
  • In general the extra cost been paid is memory access cost, rather than CPU instructions cost

As seen in the SO answer, Andrei makes the point that invoking the object's method from inside the loop is not ~2,5 times slower, but rather 3 ns slower, if we look it from a perspective of latency (ns/op) rather than throughput (ops/ns). This is a valid point, but I am not sure If i aggree 100%, as in some applications, depending on the nature, that extra cost will actually translate in ~2,5.

Finally, I have added in the JMH Benchmark test, tests for different data container types:

  • Array
  • List
  • Set

Observing the numbers of those, and as expected, an array is faster than the rest. The array is typed, hence the casting cost is not paid. The array underlying an array list is of type Object, hence the need for casting to the list's type.

JDK Evolution

I know for fact that many people (especially in the financial technology industry) are very skeptical when a new version of Java is released. People, actually persist to update their Java version (even the JRE version) for many years. There are a lot of places that are still using Java 6! Even though, this persistence is valid for some cases, especially in the early stages of a new release, i personally find it wrong. Indeed, to upgrade the version of Java a software is using is not a simple and easy thing in most of the cases. Lots of testing needs to be done, to ensure at least the application's performance has not degraded. Additionally, more testing is needed when the application is doing something very tailored, like calling native code in-process.

In my opinion, upgrading to the newest version is advisable for many reasons. The one i would like to mention today is the JDK evolution. Meaning, that in most of the cases a software developer will have some free gains, without him, in principal, doing anything. The code inside the JDK has some minor changes between releases. This is done for bug fixing reasons, improving performance reasons or even better for following hardware trends. There are lots of times that CPUs introduce a new instruction which solves a problem down at the silicon level, meaning faster processing. The Java engineers and in particular people who are involved in the OpenJDK project have lots of mechanical sympathy.

A well shout example is the commonly used java.util.concurrent.atomic.AtomicInteger class. There is a huge difference in the implementation for a couple of methods in this class, between Java7 and Java8. The difference is presented below:

Java 7

117  public final int getAndSet(int newValue) {118      for (;;) {119          int current = get();120          if (compareAndSet(current, newValue))121              return current;122      }123  }

Java 8

119  public final int getAndSet(int newValue) {120      return unsafe.getAndSetInt(this, valueOffset, newValue);121  }

There is a very important difference. Java 8 uses some code inside the Unsafe class, where Java 7 is performing a busy loop. Java 8 actually makes uses of a new CPU instruction for that. That means that the unsafe.getAndSetInt is an intrinsic function. Java's intrinsic functions can be found here.

This is a very simple but very important reason why someone should consider regularly upgrading his/her Java version. Simple things like that, which are spread across the newer implementations can actually have a positive impact on every application.

Being a polyglot developer

Throughout my professional career i have mainly used object oriented, statically typed programming languages. The likes of Java and C# were ideal for big scale projects. Such languages have a massive active community behind them, developing frameworks, tools, writing articles about it and responding to questions in online forums. Hence it is really hard to advocate against using them. It is rare that someone would stumble upon a problem that someone else hasn't solved already. It was not until recently that i had to switch to a different paradigm of language. A dynamically typed language which can be used as an object oriented one, but also having functional abilities.

This blog post is not about advocating which style is better, as i believe there is no silver bullet out there. Every problem is different and can be solved in multiple ways.

However, this blog post is about the benefits of being a polyglot developer, or using different styles of programming languages. At the beginning, i found myself struggling to get used to the new one. I tended to try make it work using the approach i already knew, which was not the right mentality. After a couple of weeks and as i got more familiar with the language its features and its 'mentality' i found myself trying to approach the problems from a different perspective. I was combining the various different programming styles and nice ideas were coming out of those. Even when i was going back and solving problems in the object oriented way, i found out that i could apply new techniques which were making the solution much more elegant and fun to program.

I believe that being open to different languages is a great thing for a software engineer. It opens new perspectives, adding new fundamental knowledge and giving agility on how someone approaches a problem. Having said that, I was looking at a new programming language to learn or at least get a bit familiar with. I heard really positive things about Haskell. A functional, statically type language. Even though i know i might not use such a language for a project at my workplace, i believe that i will benefit a lot by at least spending a couple of months playing with it.

Concluding, i would urge everyone to try a different programming language than the one he/she is used to. The benefits are enormous and is really fun too.