Java versions covered: 8, 11 and eventually 17.
- Introduction
- Performance Testing
- A Java Performance Toolbox
- JVM and Compilers
- Garbage collection
- Heap Memory Best Practices
- Native Memory Best Practices
- Threading and Synchronization Performance
- Java Servers
- Database Performance Best Practices
- Java SE API Tips
- Summary of Tuning Flags
- References/Bibliography
The Java bytecode is the instruction set that Java Virtual Machine can interpret, as assembler in bare metal machines can be executed, the JVM executes this bytecode, to do this JVM use two different mechanisms:
- The Interpreter
- The Just-In-Time Compiler
When we compile our Java program (e.g., using the javac command), we'll end up with our source code compiled into the binary representation of our code (JVM bytecode).
To be able to run a Java program, the JVM interprets the bytecode. Since interpreters are usually a lot slower than native code executing on a real processor, the JVM can run another compiler which will now compile our bytecode into the machine code that can be run by the processor, this so-called just-in-time compiler.
The de-facto standard for executing Java Bytecode has been Interpreting, but in recent years the JIT Compilation has been gaining popularity because improves the performance of the applications in most cases.
An Interpreter normally reads the code line by line and translates the code to machine code as it reads the lines.
The Just-In-Time compiler is similar to the Interpreter in most cases, but has a significant difference: the JIT Compiler would compile the most used code to native machine code, for the rest is similar because it reads line by line the code. To know what code to compile it saves a count variable for every method that is actualized with every call, so the compiler is triggered in some point to compile the method to native machine code to avoid re-interpreting that code again.
The process of JIT compiling is performed in a separated thread in the JVM (the JVM it's by itself a multi-thread application) so the normal execution it's not interrupted by the JIT compiler.
The code for this example is in Performance_Example_1 :
In order to see which methods are being compiled by the JIT compiler we use the flag:
When you compile this code you can see something like this:
156 60 % 4 java.lang.String::hashCode (49 bytes) made not entrant
this means that your code has been compiled to the most optimal compilation level (level 4)
The JDK implementation contains two conventional JIT-compilers: the client compiler, also called C1 and the server compiler, called opto or C2.
C1 is designed to run faster and produce less optimized code, while C2, on the other hand, takes a little more time to run but produces a better-optimized code. The client compiler is a better fit for desktop applications since we don't want to have long pauses for the JIT-compilation. The server compiler is better for long-running server applications that can spend more time on the compilation.
The JVM would decide which compile level would be applied to your code from 1 to 4 level (4 is the most optimized level).
We can see when the C2 compiler is used in the code as the method is called if we use the flags:
The compiled code from the JIT compiler is going to be allocated in the code cache, as a cache the reason for using it is to improve the performance of the I/O operations (is faster than main memory), but has a little problem, the amount of cache memory is very limited, so we can only optimize a little fraction of our code.
If you have too much code for optimization you will see a message like this:
VM warning: CodeCache is full. Compiler has been disabled.
meaning that even though there are code to be optimized, the compiler won't optimize such code because cannot use the code cache (is already full).
We can see the code cache size and other basic information if we use the flag:
We specify the initial, maximum and growing rate for the code cache with the flags:
the size can be provided in Kilobytes (k) or Megabytes(m or M)
We can use JConsole to monitor the code cache in a remote way, just connect the process you want to monitor and select memory menu.
Our starting point is to understand the terms of the stack and the heap when our applications run they need access to some of our computers' memory, for example, to store the objects that we create and hold a memory.
The stack is a very efficient data structure, which is managed effectively by the Java virtual machine.
One important aspect of the stack is that Java knows exactly when data on the stack can be destroyed (garbage collection).
In Java all local variables are created on the stack, and they are automatically popped from the stack when you reach the close of the block that created that variable. All this happens within the Java Virtual Machine.
The second area of Java's memory is called the heap. Although the stack is a very efficient data structure, it can't be used to store complex data types such as an object.
All object references in Java are passed by value. This means that a copy of the value will be passed to a method. But the trick is that passing a copy of the value also changes the real value of the object.
For objects passed to methods, the Reference to the object is passed by Value
final keyword
Real meaning of the final keyword is not that the variable can never be changed, but that the variable can only be assigned once. Once the variable has been assigned, it can never be altered.
The final keyword doesn't stop the object value from changing, only prevents the stack pointer from changing to the actual object.
Here are some key difference between 32-bit and 64-bit Java Virtual Machine
64-bit JVM, you can specify more memory for heap size than 32-bit JVM, like in 32-bit JVM, the theoretical limit for maximum memory in 32-bit is 4G, but 64-bit is much higher.
64-bit JVM is particularly useful for Java applications with large heaps, like applications that use more than 100G for max memory.
The same Java application will take memory while running in 64-bit JVM then 32-bit because of the increased size of OOP (Ordinary Object pointer), from 32 to 64 bits. Though you can get away with this by using -XXCompressedOOP JVM option, which tells JVM to use 32-bit pointers.
Both 32-bit and 64-bit JVM have a separate installer.
One more thing that changed in the 64 bit JVM architecture is object header size; it is now 12 bytes in comparison to 8 bytes headers in 32 bit JVM. Another thing that changed is the size of internal references that means it can go a maximum of up to 8 bytes wherein 32 bit JVM up-to 4 bytes.
So, from these two points, you can conclude that an application running on 64 bit JVM will consume more space in comparison when the same application runs on the 32-bit version.
In order to force the JIT compiler to use only a type of compiler we can use the flags:
-client for client compiler(C1)
-server for server compiler(C2)
-d64 for the 64 bit compiler version
As we mentioned in the previous section, our Java program, compiled by javac, starts its execution in an interpreted mode. The JVM tracks each frequently called method and compiles them. In order to do that, it uses C1 for the compilation. But, the HotSpot still keeps an eye on the future calls of those methods. If the number of calls increases, the JVM will recompile these methods once more, but this time using C2.
This is the default strategy used by the HotSpot, called tiered compilation.
We can force the JVM to run in interpreter-mode only with the flag:
-XX:-TieredCompilation (we turn off the flag)
We can specify the number of threads to perform a compilation with the flag:
-XX:-CICompilerSize=[size] (default 3, minimum 2)
We also can specify the threshold number for the C2 compiler to be triggered:
In general, the heap is absolutely huge compared to the stacks and the metaspace, which are quite small. So the metaspace is used primarily to store metadata. That's going to be information about classes, methods, which methods, for example, have been compiled into bytecode and which should be compiled native code in general.
But there are other uses for the better space which are more interesting.
And the first of these is it's where static variables are stored. We can think of the metaspace as having the same role as a stack for any object or any variable that we declare as a static variable so static primitives are stored entirely in the metaspace and static objects are stored on the heap, but with the object pointer or reference held in the metaspace.
So if, for example, we declared a static int, that's a primitive core global variable that would exist solely in the meta space if we created a static object in this case, it would be a hashmap. That map would be created on the heap, but with the variable reference in this case called settings in the metaspace.
Unlike a stack where variables can be popped off when they go out of scope, variables in the matter space are permanently there, as we would expect for static variables. They never reach a state where they can no longer be referenced. So any objects on the heap which are referenced from the metaspace will never be garbage collected.
All classes and all threads within a Java program have access to the metaspace, and that's why static variables can be accessed by any piece of code we write in our application, because the thread running that code can access the letter space, so it can access any variables that are declared to be living in the metaspace.
PermGen (Permanent Generation) is a special heap space separated from the main memory heap.
The JVM keeps track of loaded class metadata in the PermGen. Additionally, the JVM stores all the static content in this memory section. This includes all the static methods, primitive variables, and references to the static objects.
Furthermore, it contains data about bytecode, names, and JIT information. Before Java 7, the String Pool was also part of this memory.
The default maximum memory size for 32-bit JVM is 64 MB and 82 MB for the 64-bit version.
However, we can change the default size with the JVM options:
-XX:PermSize=[size] is the initial or minimum size of the PermGen space
-XX:MaxPermSize=[size] is the maximum size
Most importantly, Oracle completely removed this memory space in the JDK 8 release. Therefore, if we use these tuning flags in Java 8 and newer versions, we'll get the following warnings:
>> java -XX:PermSize=100m -XX:MaxPermSize=200m -version OpenJDK 64-Bit Server VM warning: Ignoring option PermSize; support was removed in 8.0 OpenJDK 64-Bit Server VM warning: Ignoring option MaxPermSize; support was removed in 8.0 ...
With its limited memory size, PermGen is involved in generating the famous OutOfMemoryError. Simply put, the class loaders weren't garbage collected properly and, as a result, generated a memory leak.
Therefore, we receive a memory space error; this happens mostly in the development environment while creating new class loaders.
Simply put, Metaspace is a new memory space (starting from the Java 8 version); it has replaced the older PermGen memory space. The most significant difference is how it handles memory allocation.
Specifically, this native memory region grows automatically by default.
We also have new flags to tune the memory:
- MetaspaceSize and MaxMetaspaceSize: we can set the Metaspace upper bounds.
- MinMetaspaceFreeRatio: is the minimum percentage of class metadata capacity free after garbage collection
- MaxMetaspaceFreeRatio: is the maximum percentage of class metadata capacity free after a garbage collection to avoid a reduction in the amount of space
Additionally, the garbage collection process also gains some benefits from this change. The garbage collector now automatically triggers the cleaning of the dead classes once the class metadata usage reaches its maximum metaspace size.
Therefore, with this improvement, JVM reduces the chance to get the OutOfMemory error.
Despite all of these improvements, we still need to monitor and tune the metaspace to avoid memory leaks.
Java internally implements the flyweight pattern and generates a pool of Strings that is shared. In this way, every time we need to create a new Java chain, it checks if it already exists in the pool, in which case it returns a reference to it.
If our string pool is so full, it would be so condensed, it's going to be pretty inefficient. Being aware of the density of your string pool, particularly for very big applications, is definitely something to be aware of and to monitor, because if it is getting to be quite dense, that could be something which is going to be slowing down your application.
The following flag will give us information about strings in our application:
this flag tells us how many buckets there are in the pool and how dense our pool is.
Remember, this doesn't get resized unlike a regular hashmap. So whatever number we start with is going to be the number for the length of our application.
The flag for specifying the string pool size is, this flag specifies the number of buckets:
For this to work in an optimal way, the number that you provide in here should be a prime number.
The flag for specifying the maximum heap size is:
-Xmx[size] as the shortcut form
If we want to specify the initial heap size we can use:
-Xms[size] as the shortcut form
If we want to know the default values of these parameters we can use the flags combined:
One of the big differences between Java and some other programming languages that you might have used, such as C are that when you finished using an object in Java, you don't have to tell Java that it's no longer needed.
The virtual machine works this out automatically in languages like C or C++. Plus the exact opposite of this is the case for any object on the heap in these languages. The programmer must include code that tells the language that you finished with this object in C, you do this by calling a function called free in visual basic.
The way Java knows wich object it's no longer needed it's when an object cannot be reached through a reference from the stack.
There are some methods in the Java API that seem to have a bearing on the garbage collector. In particular, there's a method of the system class called GC. If you see the Java docs for the method you'll see it says that the method suggests that the Java virtual machine runs the garbage collection process, so it's going to tell the virtual machine to run a garbage collection process but there's no guarantee that the virtual machine will actually do that.
When an object is garbage collected Java will run the finalize() method, the method is available for use in all Java version but is deprecated since Java 9.
There are a number of different garbage collection algorithms that Java can use, and these are known as garbage collectors. The Java virtual machine decides which is the best based on your hardware, and we'll see how to find out which type of garbage collector your computer is using.
JVM by default can resize the heap for making less garbage collections. There is a runtime flag that will turn off the dynamic resizing of the heap.
Lots of objects are going to live for a short while and then will find themselves without references on the stack anymore or whether they're eligible for garbage collection. This is this special scenario I wanted to set up because it doesn't match the default Java assumption. So our objects are surviving for a short while rather than most objects not surviving any time at all.
Now, this is not necessarily the most realistic scenario, but it means that when we start shooting the garbage collection for how the objects in our application behave, we should see some real impacts. So one thing we'll certainly want to do is minimize the number of full garbage collections.
One way of trying to achieve that is by using three different runtime tuning flags.
Firstly, we can resize the different parts of the heap. We could say we want to allocate more of our overall memory to the young generation and less of our memory to the old generation. Doing that will mean that the garbage collections on the young generation will happen less frequently, meaning that movable objects will be garbage collected earlier so fewer of them will make it through to the old generation.
And we can also alter the thresholds for how many generations of garbage collection an object needs to survive before it gets moved to the old generation.
We can use the following flag to specify how many times bigger should the old generation be compared to the young generation:
The second flag means how much of the young generation should be taken up by the survivors spaces zero and one, the rest of this will be the Eden space.
Third flag specifies is how many generations to an object survive before it becomes part of the old generation. 15/16 is the maximum value for this flag.
There are three types of garbage collector we can use in our applications:
Mostly concurrent
(J)VisualVM is actually an Oracle project, although Oracle have released it under the new version to license with classpath exception. So it is absolutely free for you to download and use.
You might notice if you read the download page that actually what you're downloading is the Oracle JDK 8 version of Java official VM. That's because from version 9 onwards they've moved this into something called the GraalVisualVM.
With these tools we can monitor our application memory and identify memory issues.
In order to generate a heap dump from the JVM we use:
If we want to analyze the heap dump file we need to use a memory analyzer, IntelliJ and Eclipse has tools to do this.
[1] «Java Application Performance and Memory Management», Udemy. [Online]. Disponible en: [Accedido: 26-sep-2021]
[2] S. Oaks, Java performance: in-depth advice for tuning and programing Java 8, 11, and beyond, Second edition. Beijing [China]; North Sebastopol, CA: O’Reilly, 2020.
[3] B. J. Evans, J. Gough, y C. Newland, Optimizing Java: practical techniques for improving JVM application performance, First edition. Sebastopol, California: O’Reilly Media, 2018.