A Gcc-based Java Implementation

Per Bothner [email protected]
December 1996

Short abstract

While the portability of Java bytecodes is a major factor in its success, we believe it cannot become a mainstream programming language without mainstream implementation techniques, specifically an optimizing ahead-of-time compiler. This allows much better optimization, and much faster application start-up times than with JIT translators. Cygnus is writing a Java front-end for the GNU compiler (gcc) to translate Java bytecodes to machine code. This uses proved and widely used technology. The meta-data (such as the Class objects and lists of fields) will be laid out by the compiler in static data memory, saving more startup time. We will enhance and use the GNU linker (ld) to link compiled class files into standard shared or static libraries. For the run-time environment, we are enhancing the existing Kaffe free Java VM to make it full-strength and to support linking with pre-compiled class libraries. Kaffe is a JIT system, which means that methods that have been dynamically loaded and compiled use the same calling conventions as pre-compiled methods. We will enhance the GNU debugger (gdb) to understand Java, which will provide a familiar and multi-language debugging environment (you can use the same interface to debug Java and native methods).

Extended abstract

Java has taken off because it is a decent programming language, is buzzword-compliant (object-orient and web-enabled), and because it is implemented by compiling to portable bytecodes. However, interpreting bytecodes makes Java program many times slower than comparable C or C++ programs. One approach to improving this situation is "Just-In-Time" (JIT) compilers. These dynamically translate bytecodes to machine code just before a method is executed. This can provide substantial speed-up, but it is still slower than C or C++. There are two main problems with the JIT approach compared to conventional compilers: (1) The compilation is done every time the application is executed, which increases start-up times substantially, and (2) the JIT compiler has to run fast, and therefore cannot do any substantial optimization.

While JIT compilers have an important place in a Java system, for frequently used applications it is better to use a more traditional "ahead-of-time" or batch compiler. While Java has been primarily touted as an internet/web language, many people are interested in using Java as an alternative to traditional languages such as C++, if the performance can be made adequate. For embedded applications it makes much more sense to pre-compile the Java program, especially if the program is to be in ROM.

So Cygnus is building a Java programming environment that is based on conventional a compiler, linker, and debugger, using Java-enhanced versions of the existing GNU programming tools.

The core tool is of course the compiler. This is "cc1java," a gcc new front-end. This has similar structure as existing front-ends, and shares most of the code with them. The most unusual aspect of cc1java is that its "parser" reads *either* Java source files or Java bytecode files. (The first release will only support directly support bytecodes; parsing Java source will be done by invoking Sun's javac. A future version will provide an integrated Java parser, mainly for the sake of compilation speed.) In any case, it is important that cc1java can read bytecodes, for at three reasons: (1) it is the natural way to get declarations of external classes (in this respect a Java bytecode file is like a C++ pre-compiled header file); (2) it is needed so we can support code produced from other tools that produce Java bytecodes (such as the Kawa Scheme-to-Java-bytecode compiler); and (3) some libraries are (unfortunately) distributed as Java bytecodes without source.

To "parse" a Java bytecode file involves first parsing the meta-data in the file. Each bytecode file defines one Java class, and defines the superclass, fields, and methods of the class. We use this information to build corresponding declarations and type nodes using mostly-standard gcc "tree" nodes. This information will also be used to generate the run-time meta-information (such as the Class data structure): The compiler generates initialized static data that have the same layout as the run-time data structures used by the Java VM. Thus startup is fast, and does not require allocating any data.

The executable content of a bytecode file contains a vector of bytecode instructions for each (non-native) method. Code generation means converting the stack-oriented bytecodes into gcc expression nodes. The first problem is that we must know for each instruction the types of each operand (stack and local variable slots) in the Java virtual machine state. This is done with a process very similar a Java bytecode verifier. Transforming postfix stack operations to expression nodes involves a compile-time stack of expression nodes. When necessary, we also map stack locals and local varaibles into gcc pseudo-registers.

Generating machine code from the expression nodes uses existing code (instruction generator, optimizer, and assembler).

Linking a set of compiled Java binaries into a library or executable will use the standard linker (GNU ld). However, some enhancements are necessary or at least desirable. The linker must provide a way to build a table mapping class names to Class objects. This can be done using the same mechanism used for running C++ static initializer. Linker help is also desirable to combine multiple copies of the same literal.

Running a compiled Java program will need a suitable Java run-time environment. This contains support for threads, garbage collection, and all the primitive Java methods. Complete Java support also means being able to dynamically load new bytecodes classes. Hence the appropriate Java environment is a basically a Java Virtual Machine. We are using the Kaffe free Java VM (written by Tim Wilkinson), but enhancing and modifying it to be more suitable for pre-compiled code. (For example, we are simplifying the data structures.) Kaffe include a JIT compiler, which solves the problem of calling between pre-compiled and dynamically loaded methods (since both use the same calling convention).

We plan to enhance gdb (the GNU debugger) so it can understand Java-compiled code. This may involve accessing Java meta-data from the Java executable. We may also enhance gdb to understand dynamically-loaded bytecodes, but the need for that is reduced if we instead provide a hook so gdb knows about JIT-compiled code.