Introduction to Java Bytecode
This is an introduction about Java Bytecode using examples.
1. Introduction
Java is an extremely popular generic, object-oriented programming language. It is based on the “Write once, run anywhere (WORA)” principles.
Java is architecture-neutral and portable. Java source code once compiled can be run on any operating system and any hardware. Example: the Java source code written on a 64-bit Windows machine, once compiled can be run on a 32-bit Ubuntu machine without making any changes or without needing recompilation. This portability is possible because of the Java Bytecode.
2. Meaning of Java Bytecode
Java bytecode, simply put, is the representation of Java source code that the Java virtual machine (JVM) can interpret and run.Java Bytecode is generated after a java program is compiled.
2.1 Who creates java bytecode?
During compilation of a Java program, the compiler(javac) converts the source code that is your “.java” file into an intermediate low-level code that is in the binary format. This intermediate low-level, binary code format is the Java bytecode and gets saved as a .class file.
3. Advantages of Java Bytecode
Java developers do not need to understand the java bytecode to write code. However, according to the IBM developer Works journal,
Understanding bytecode and what bytecode is likely to be generated by a Java compiler helps the Java programmer in the same way that knowledge of assembly helps the C or C++ programmer.
Peter Haggar, IBM
Knowing how the java compiler converts your code from source to bytecode will help you understand how your program will perform in terms of speed of execution. This is crucial when doing performance tuning.
One other advantage of knowing bytecode would be that you would be able to decompile your .class files to their source code form. While there are many “Decompilers” (programs that convert .class to .java files) available, none of them are perfect and cannot handle all the instructions. Having knowledge of the java bytecode will help you in recreating your source code again.
4. Understanding the Bytecode
To understand what Bytecode is, we must first understand how a Java Virtual Machine works. In brief, it works as follows:
- The Java Virtual Machine is both a stack-based and register-based abstract machine.
- The java stack consists of frames. The stack makes a new frame for every method call.
- Each frame consists of a last-in-first-out(LIFO) operand stack and a local variables array.
- An instruction to the JVM consists of “opcodes” which are one-byte instructions of what operation is to be performed followed by the parameter values required.
- According to the Java Docs, ignoring exceptions this is what the JVM does.
JVM Algorithm
do { atomically calculate pc and fetch opcode at pc; if (operands) fetch operands; execute the action for the opcode; } while (there is more to do);
The instruction set i.e opcodes can be broadly classified as:
- Load and store
- Arithmetic and logic
- Type conversion
- Object creation and manipulation
- Operand stack management
- Control transfer
- Method invocation and return
Most instructions encode the type information for the operations they do as a mnemonic. For example, “iadd” would add two integers(i) whereas “dadd” would add 2 double together. The detailed description of each of the opcodes is available in the Java docs here.
Given below are all the opcodes along with their broad classification.
The opcodes for the switch case is “tableswitch” and “lookupswitch”.
5. Bytecode generators
There are many java bytecode generators in the market like Jikes, Espresso, ASM, GNU Compiler for Java. The most popular one is ASM. However, the java sdk also has an inbuilt Dis-assembler known as “javap”.
5.1 Bytecode Example
To generate Java bytecode we use javap with the -c or -v (verbose) option. Next, we will see what the generated bytecode looks like and how it flows by considering a very simple calculator code.
SimpleCalculator.java
import java.util.Scanner; public class SimpleCalculator { public static void main(String[] args) { Scanner myObj = new Scanner(System.in); int result = 0; boolean incorrect = false; System.out.println("Enter the operation(+, -, *, /).:"); String oper = myObj.nextLine(); System.out.println("Enter number1:"); int num1 = myObj.nextInt(); System.out.println("Enter number2:"); int num2 = myObj.nextInt(); switch (oper) { case "+": result = num1 + num2; break; case "-": result = num1 - num2; break; case "*": result = num1 * num2; break; case "/": if (num2 != 0) { result = num1 / num2; } else incorrect = true; System.out.println("Division not possible"); break; } if (!incorrect) { System.out.println("Result is:" + result); } myObj.close(); } }
Generated bytecode using the javap -c option
Syntax: javap -c SimpleCalculator.class
SimpleCalculator.class
Compiled from "SimpleCalculator.java" public class SimpleCalculator { public SimpleCalculator(); Code: 0: aload_0 1: invokespecial #1 // Method java/lang/Object."":()V 4: return public static void main(java.lang.String[]); Code: 0: new #2 // class java/util/Scanner 3: dup 4: getstatic #3 // Field java/lang/System.in:Ljava/io/InputStream; 7: invokespecial #4 // Method java/util/Scanner."":(Ljava/io/InputStream;)V 10: astore_1 11: iconst_0 12: istore_2 13: iconst_0 14: istore_3 15: getstatic #5 // Field java/lang/System.out:Ljava/io/PrintStream; 18: ldc #6 // String Enter the operation(+, -, *, /).: 20: invokevirtual #7 // Method java/io/PrintStream.println:(Ljava/lang/String;)V 23: aload_1 24: invokevirtual #8 // Method java/util/Scanner.nextLine:()Ljava/lang/String; 27: astore 4 29: getstatic #5 // Field java/lang/System.out:Ljava/io/PrintStream; 32: ldc #9 // String Enter number1: 34: invokevirtual #7 // Method java/io/PrintStream.println:(Ljava/lang/String;)V 37: aload_1 38: invokevirtual #10 // Method java/util/Scanner.nextInt:()I 41: istore 5 43: getstatic #5 // Field java/lang/System.out:Ljava/io/PrintStream; 46: ldc #11 // String Enter number2: 48: invokevirtual #7 // Method java/io/PrintStream.println:(Ljava/lang/String;)V 51: aload_1 52: invokevirtual #10 // Method java/util/Scanner.nextInt:()I 55: istore 6 57: aload 4 59: astore 7 61: iconst_m1 62: istore 8 64: aload 7 66: invokevirtual #12 // Method java/lang/String.hashCode:()I 69: tableswitch { // 42 to 47 42: 140 43: 108 44: 169 45: 124 46: 169 47: 156 default: 169 } 108: aload 7 110: ldc #13 // String + 112: invokevirtual #14 // Method java/lang/String.equals:(Ljava/lang/Object;)Z 115: ifeq 169 118: iconst_0 119: istore 8 121: goto 169 124: aload 7 126: ldc #15 // String - 128: invokevirtual #14 // Method java/lang/String.equals:(Ljava/lang/Object;)Z 131: ifeq 169 134: iconst_1 135: istore 8 137: goto 169 140: aload 7 142: ldc #16 // String * 144: invokevirtual #14 // Method java/lang/String.equals:(Ljava/lang/Object;)Z 147: ifeq 169 150: iconst_2 151: istore 8 153: goto 169 156: aload 7 158: ldc #17 // String / 160: invokevirtual #14 // Method java/lang/String.equals:(Ljava/lang/Object;)Z 163: ifeq 169 166: iconst_3 167: istore 8 169: iload 8 171: tableswitch { // 0 to 3 0: 200 1: 209 2: 218 3: 227 default: 251 } 200: iload 5 202: iload 6 204: iadd 205: istore_2 206: goto 251 209: iload 5 211: iload 6 213: isub 214: istore_2 215: goto 251 218: iload 5 220: iload 6 222: imul 223: istore_2 224: goto 251 227: iload 6 229: ifeq 241 232: iload 5 234: iload 6 236: idiv 237: istore_2 238: goto 243 241: iconst_1 242: istore_3 243: getstatic #5 // Field java/lang/System.out:Ljava/io/PrintStream; 246: ldc #18 // String Division not possible 248: invokevirtual #7 // Method java/io/PrintStream.println:(Ljava/lang/String;)V 251: iload_3 252: ifne 267 255: getstatic #5 // Field java/lang/System.out:Ljava/io/PrintStream; 258: iload_2 259: invokedynamic #19, 0 // InvokeDynamic #0:makeConcatWithConstants:(I)Ljava/lang/String; 264: invokevirtual #7 // Method java/io/PrintStream.println:(Ljava/lang/String;)V 267: aload_1 268: invokevirtual #20 // Method java/util/Scanner.close:()V 271: return }
The bytecode flows as follows:
- The bytecode starts with the public class and methods names.
- Lines 0 to 14: initializes and stores all the constants, variables, and arrays.
- Lines 15 to 66: initialize the user input variables, scanner objects.
- Lines 69 to 108: the switch case is set up with references to when the instructions are loaded. This is called a jump table.
- Lines 108 to 169: this loads all the variables, methods, etc past the switch case code.
- Lines 171 to 271: These lines are the switch case instructions where the add, subtract, mult, and div are loaded onto the stack. The goto belongs to the break statement which exits the control from the switch command and goes to the next line of code.
6. Disadvantages of Bytecode
- Performance: The compiler generates the java bytecode. The interpreter then interprets and runs this code. This is an overhead and makes the overall program run slower than a native programming language program.
- Even for a very small program, the entire JVM needs to be loaded into the memory.
7. Download the source code
We saw an example of a Simple Calculator for which we generated the java bytecode using the javap -c command.
You can download the full source code of this example here: Introduction to Java Bytecode