by Jordan Olin
The .class file is the fundamental unit of measure for a Java application with respect to the Java Virtual Machine (JVM). It represents a contract of sorts between a compiler and an implementation of the JVM. I mention "a compiler" versus "a Java compiler" because, as you will see, any language compiler could potentially generate .class files and Java bytecodes.
Physically, the .class file is an ordered set of bytes representing extremely dynamic structures and arrays that describe the compiled version (or runtime image) of an executable unit, called a class. Most of the components that make up the class file have a fixed structure followed by a set of variable length structures. Some pieces are mandatory, and others are optional. The important thing to keep in mind is that a process that generates a .class file must do so in the exact format and style in this chapter. Otherwise, the JVM's class loader and verifier will not accept the submitted .class file.
Keeping the .class file in the byte-stream-oriented format is very critical for quickly loading and parsing the information it contains. Implementers of class loaders might take advantage of the stream I/O classes in Java--for example, to easily read in a .class file piece by piece parsing as it is read, or read it into a byte array and parse it manually.
The concept to keep in mind is that you must read in each section, in order, until its information is exhausted. And, you can't really read in a section of the file without reading its "descriptive" information first. For example, a portion of the file is called the Constant Pool. The first item that you would read in is the number of elements that will follow. Then, for each element you read in, a descriptor tells you the format of the next element. Finally, you read in the actual element based on its specific format.
The file itself can be broken up into logical sections:
NOTE: The information described in this chapter was gleaned from two sources: The online reference materials at the JavaSoft Web site (www.javasoft.com) and Sun Web site (www.sun.com). Sessions at JavaOne, Sun's Worldwide Java Developer Conference held in San Francisco in May, 1996.
In order to fully understand the contents of the .class file, you need to first define some common structures that are used by the various sections it includes. You examine the Constant Pool, the format of a signature or type definition, and attributes.
The idea of a Constant Pool might be a new concept to you, but they have been used since the early days of compilers and runtime systems. A Constant Pool is used to contain each distinct literal value encountered while the source code for a class is being compiled. A literal value in this case might be an actual numeric value, a string literal, a class name, type description, or method signature.
Each time one of these literal values is encountered, the Constant Pool is searched for a matching value in order to avoid putting duplicate values into the pool. If the value was found, then its existing location in the Constant Pool is inserted into the class definition or compiled bytecode stream. If the value was not found in the Constant Pool, then it is added. At load time, the Constant Pool is placed into an array-like structure in memory for quick access. Then, as the rest of the class is loaded, and at runtime whenever a literal is needed, its value is located in the Constant Pool by its index and retrieved.
The use of a Constant Pool keeps the size of the compiled class smaller and hence loads faster. At runtime, the Sun implementation of the JVM has a mechanism to make resolving a Constant Pool reference only occur the first time a distinct value is needed. After that, the resolved value may be directly referenced in a special array off of the Constant Pool. The actual mechanism is supported by a special set of internal bytecodes called the quick instructions. Because they are strictly implementation-dependent, they are not part of the formal definition of the Java bytecodes.
The Constant Pool as it is recorded in the .class file is in a very compacted format. It begins with a 16-bit unsigned integer value that is the count of elements that follow plus one. (The extra count is for the zero'th element that is only used at runtime and not included in the elements contained in the class file.) What follows the count value is a variable length array with each element being a variable length structure and no padding between elements.
Twelve different types of values and associated structures may be stored in the Constant Pool. Each structure begins with a single byte-sized integer value called a tag (see Table 47.1). The tag is used to determine the format of the bytes that follow which make up the remainder of this element's structure.
Tag | Meaning | Note |
1 | Utf8 string | |
2 | Unicode string | Not used at this point. |
3 | Integer value | |
4 | Float value | |
5 | Long value | |
6 | Double value | |
7 | Class reference | Only refers to class name. |
8 | String | |
9 | Field reference | Only used in bytecode stream. |
10 | Method reference | Only used in bytecode stream. |
11 | Interface Method | Only used in bytecode stream. |
12 | Name and Type reference |
Now that you know the tag values, let's look at each Constant Pool element type.
All tags are one byte long, and all lengths and indexes are 16-bit unsigned integer
values, unless otherwise noted. In order to have a consistent way of describing the data types of fields, variables,
arguments, and the signatures of methods, the .class file uses a very abbreviated
notation. Essentially, each native type known by the JVM is represented by a single-character
shortcut for its full name, with classes and arrays denoted by a special character
for modification. Each type and signature shortcut is kept in a Utf8 formatted
string in the Constant Pool. For the type of a field or variable, it is just a single
type description; for a method signature, it is a series of type descriptions put
together with the arguments first (in order, surrounded by parentheses), followed
by the shortcut for the method's result type. Table 47.10 shows the abbreviated type name followed by its real data type.
Attributes are the mechanism that the designers of the .class file structure
created to allow additional descriptive information about the class to be included
in the file without changing its semantics. Attributes are dynamically structured
modifiers that contain both mandatory and optional properties affecting the class,
its fields, and its methods. For example, information on local variables, arguments,
and the compiled bytecode for a method are contained in a mandatory attribute called
the Code attribute. Also, with respect to using attributes to extend the information in a .class file,
Microsoft's JVM implementation provides support for interoperability with COM objects
by adding new attributes to the .class file. A class loader and JVM implementation
only need to recognize the mandatory attributes and may ignore the rest. That way,
a class compiled for one VM may still be read (and possibly executed) by another
VM.
CAUTION:
Tag 1: Utf8 String
Field
Number of Bytes
Value
Tag
1
1
Size
2
Length in bytes of the Utf8 string.
Data
(Size)
The actual Utf8 string.
Tag 2: Unicode String
Field
Number of Bytes
Value
Tag
1
2
Size
2
Number of characters in the Unicode string.
Data
(Size * 2)
The actual Unicode string.
Tags 3 and 4: Integer and Float Values
Field
Number of Bytes
Value
Tag
1
3 for Integer; 4 for Float.
Data
4
Actual integer or float value in big-endian (MSB first) order.
Tags 5 and 6: Long and Double Values
Field
Number of Bytes
Value
Tag
1
5 for Long; 6 for Double.
Data
8
Actual long or double value big-endian (MSB first) order.
Tag 7: Class Reference
Field
Number of Bytes
Value
Tag
1
7
Index
2
Location of a Utf8 string in the Constant Pool containing the fully qualified
class name.
Tag 8: String Reference
Field
Number of Bytes
Value
Tag
1
8
Index
2
Location of a Utf8 string in the Constant Pool containing the actual string
value.
Tags 9, 10, and 11: Field, Method, and Interface Method Reference
Field
Number of Bytes
Value
Tag
1
9 for Field; 10 for Method; 11 for Interface Method reference.
Class Index
2
Location of a Class reference in the Constant Pool containing the following
Field or Method reference.
Name/Type
2
Location of a Name and Type Index reference in the Constant
Pool describing a field or method.
Tag 12: Name and Type Reference
Field
Number of Bytes
Value
Tag
1
12
Name Index
2
Location of a Utf8 string in the Constant Pool containing the name of a
field, var, arg, or method.
Description
2
Location of a Utf8 string in Index the Constant Pool containing
the Name's type or signature.
Type Information
In order to see how these abbreviations are used, take a look at Listing 47.1. You
define a simple Java class, and for each variable and method, you put its shorthand
version in a comment.
Abbreviation
Java Type
Notes
B
byte
C
char
D
double
F
float
I
int
J
long
S
short
Z
boolean
V
void
Only used for methods.
L<classname>;
class
The capital letter L followed by a fully qualified class name terminated by a semicolon.
Note that forward slashes, not periods, are used to delimit the actual package name
tokens for the class name.
[
Array dimension
An open-bracket is used to denote each dimension of an array.
Listing 47.1
class foo {
// TYPE FIELD NAME SHORT-HAND VERSION
int simpleInt; // I
boolean simpleBool; // Z
float[] floatArray; // [F
char[][] twoDimCharArray; // [[C
String[][][] threeDimStringArray; // [[[Ljava/lang/String;
// Note the use of slashes here
void DoSomething( long arg1, double[][] arg2 ) { }
// (J[[D)V
// Two arguments, a long and a two dimension double array, returning // nothing.
java.net.Socket OpenSocket( String hostname, int port ) { }
// (Ljava/lang/String;I)Ljava/net/Socket;
// Two arguments, a String object and an integer, returning a Socket // object.
void NoArgsNoResult( ) { }
// ()V
// No arguments, returning nothing
}
Attributes
Obviously, if you created a .class file that depended on a VM that supported COM
objects, for example, it would not run with the Sun JVM 1.0.2.
Table 47.11 gives a brief description of the attributes that are recognized by Sun's JVM Version 1.0.2.
Attribute Name | Mandatory | Level | Purpose |
SourceFile | No | Class | Names the file Java source for this .class file. |
ConstantValue | Yes | Field | Holds value of an initializer for a native typed field. |
Exceptions | Yes | Method | Defines the exceptions that are thrown by this method. |
Code | Yes | Method | Defines the physical structure and bytecodes for a method. |
LineNumberTable | No | Code | Contains Program Counter to Line Number table for use in debugging. |
LocalVariableTable | No | Code | Contains local variable descriptive information for use in debugging. |
When .class file elements use attributes, they are kept in a table and are preceded by an unsigned 16-bit integer count field holding the number of attributes that immediately follow. The attributes physically are named variable-length structures that are similar in some respects to the entries in the Constant Pool described earlier in this section. Each attribute begins with a fixed-length portion and is followed by a variable number of fields. Attributes may also be nested in order to allow for extensions to the information that they contain.
All attribute definitions have the same first two fields, as shown in Table 47.12.
Field | Number of Bytes | Value |
Name Index | 2 | Location of a Utf8 string in the Constant Pool containing the literal name of this attribute, as defined in Table 47.11. |
Length | 4 | An unsigned integer containing the number of bytes of data that follow, excluding the six bytes that make up the fixed portion (Name Index and Length). |
Data | (Length) | The actual variable length structure associated with this specific attribute definition. |
NOTE: I describe each attribute's meaning and structure in context with its actual position in the .class file. In those discussions, it is assumed that each attribute begins with the Name Index and Length fields described in Table 47.12.
Now that I have defined the dynamic elements that are used in the .class file, you can finally discover its real structure. Table 47.13 shows the first level of description for the fields in the .class file.
Field | Number of Bytes | Value |
Magic Number | 4 | This value acts as a signature and used to help ensure the validity of the actual class file. As of this writing, it must be the 32-bit value 0xCAFEBABE. |
Minor Version | 2 | Minor version number used by the compiler that generated this .class. This integer value is currently 3 in the JDK 1.0.2 javac compiler. |
Major Version | 2 | Major version number used by the compiler that generated this .class. This integer value is currently 45 in the JDK 1.0.2 javac compiler. |
Constant Pool Size | 2 | Number of entries in the following Constant Pool plus one. That is, this value represents the actual number of entries in the runtime version of the Constant Pool, which includes the zero'th entry. That entry is not included in Table 47.14. |
Constant Pool | Varies | The actual Constant Pool entries as described in the earlier section "The Constant Pool." |
Class Flags | 2 | A series of bit flags (defined in the following section) that specify the access permissions for this class or interface definition. |
Class Name | 2 | Index to a Class reference in the Constant Pool representing the fully qualified name of this class. |
Superclass Name | 2 | Index to a Class reference in the Constant Pool representing the fully qualified name for the ancestor class to this one. If this value is zero, then Class Name must refer to java.lang.Object (the only class without a direct ancestor). |
No. of Interfaces | 2 | The count of interfaces implemented by this class. |
Interface List | (the number * 2) | An array of Constant Pool indexes pointing to Class reference entries that name the interfaces that this class implements. This array must be in the same order as the implements clause encountered when this class was compiled. |
No. of Fields | 2 | The count of fields (static and instance) that are defined in this class. |
Field Table | Varies | An array of field information structures as defined in the following section. |
No. of Methods | 2 | The count of methods (static and instance) that are defined in this class. |
Method Table | Varies | An array of method information structures as defined in the following section. |
No. of Attributes | 2 | The count of attributes that are defined for this class. |
Attribute Table | Varies | The table of attributes included in this .class file. The only attribute recognized at this level by the Sun JVM 1.0.2 is the SourceFile attribute defined previously. |
The Class Flags field is a 16-bit unsigned integer that is used to represent
a set of Boolean values that define the structure and access permissions for this
.class file (see Table 47.14). They are predominantly used by the Verification Pass
of the JVM to denote whether this is a class or interface, and modifiers with respect
to class visibility and extension.
Bit Position (LSb = 1) | Logical Name | Applies to Class | Interface | Definition of Set |
1 | PUBLIC | Yes | Yes | The class is accessible other classes outside of this package. |
5 | FINAL | Yes | No | This class may not be subclassed. |
6 | SUPER | Yes | Yes | Calls to methods in the superclass are specially cased. |
10 | INTERFACE | No | Yes | This class represents an interface definition. |
11 | ABSTRACT | Yes | Yes | This class or interface is abstract and has methods that must be coded in a subclass or interface implementation. |
The Field Information structure is a second-level set of information used to describe the name, type, and access permissions associated with a field of this class (see Table 47.15). The fields may be instance or static (class variables) and may represent native types, specific object references, or arrays of either one. The JVM uses this information toallocate the appropriate amount of space for the class definition in memory and each instance's data space in memory.
Field | Number of Bytes | Value |
Field Flags | 2 | A series of bit flags that define the access permissions for this field. |
Field Name | 2 | Index to a Utf8 string in the Constant Pool representing the name of this field. |
Type | 2 | Index to a Utf8 string in the Constant Pool representing the type definition in the format described in the "Type Information" section. |
No. of Attributes | 2 | The count of attributes that are defined for this field. |
Attribute Table | Varies | The table of attributes associated with this field. The only attribute recognized at this level by the Sun JVM 1.0.2 is the ConstantValue attribute defined previously. |
Bit Pos. (LSb = 1) | Logical Name | Applies to Class | Interface | Definition of Set |
1 | PUBLIC | Yes | Yes | The field is accessible from other classes outside this package. |
2 | PRIVATE | Yes | No | The field is only accessible from this class. No subclasses or classes outside this package may access it. |
3 | PROTECTED | Yes | No | The field is only accessible from this class and its subclasses. |
4 | STATIC | Yes | Yes | The field is considered a class level field, and only has one occurrence in memory that is shared by all instances of this class. |
5 | FINAL | Yes | Yes | This field is only present in this class definition and may not be overridden or have a value assigned into it after it is initialized. |
7 | VOLATILE | Yes | No | Denotes that this field's value is not guaranteed to be consistent between accesses. So the comp-iler will not generate optimized code with re-spect to this field. |
8 | TRANSIENT | Yes | No | This field's value is only valid while an instance of the class is in memory at runtime. Its value, if written to, or read from persistent storage, is ignored. |
This mandatory attribute is found in the field information structure of the .class
file and is used to hold the values that were used to initialize the native typed
(non-object) fields in a class when they were defined (see Table 47.17).
Field | Number of Bytes | Value |
Value | 2 | Location in the Constant Pool of either an Integer constant, a Long constant, a Float constant, or a Double constant. |
Constant Pool Type | Holds Values For |
Integerconstant | boolean, byte, char, integer, and short initializers |
Long constant | long initializers |
Float constant | float initializers |
Double constant | double initializers |
The Method Information structure is a second-level set of information
that is used to describe the name, signature, and access permissions for a method
in this class (see Table 47.18). Methods may be instance-oriented (only callable
from an instance of this class), or they may be static methods (callable
whether an instance of this class is present or not). The JVM uses the information
in these structures, along with the attributes for this method, to create the internal
method table for instances of this class or interface to use.
Field | Number of Bytes | Value |
Method Flags | 2 | A series of bit flags that define the access permissions for this method. |
Method Name | 2 | Index to a Utf8 string in the Constant Pool representing the name of this method. |
Signature | 2 | Index to a Utf8 string in the Constant Pool representing this method's signature definition in the format described in the "Type Information" section. |
No. of Attributes | 2 | The count of attributes that are defined for this method. |
Attribute Table | Varies | The table of attributes associated with this method. The only attributes recognized at this level by the Sun JVM 1.0.2 are the Exceptions and Code attributes defined previously. |
Bit Pos. (LSb = 1) | Logical Name | Applies to Class | Interface | Definition of Set |
1 | PUBLIC | Yes | Yes | The method is accessible from other classes outside this package. |
2 | PRIVATE | Yes | No | The method is only accessible from this class. No subclasses or classes outside this package may access it. |
3 | PROTECTED | Yes | No | The method is only accessible from this class and its subclasses. |
4 | STATIC | Yes | No | The method is considered a class level method and may be called whether an instance of this class exists or not. |
5 | FINAL | Yes | No | This method is only present in this class definition and may not be overridden. |
6 | SYNCHRONIZED | Yes | No | This method is callable in a multi-threaded scenario and will have its access controlled and locked with a monitor. |
9 | NATIVE | Yes | No | This method's implementation is not in Java byte codes but in some other external form. It must conform to the native call interface specification of the JVM. |
11 | ABSTRACT | Yes | Yes | This method's signature is only defined in this class and must be implemented in a subclass. It effectively turns this class into an abstract class. |
Field | Number of Bytes | Value |
Count | 2 | Number of elements in the following table of Utf8 Constant Pool entries. |
Table | (Count * 2) | An array of indexes to Utf8 Constant Pool entries. |
Field | Number of Bytes | Value |
Stack Depth | 2 | Maximum allowable depth of the JVM's expression stack. |
No. Locals | 2 | Number of local variables (including arguments) defined in this method. |
Code Length | 4 | Number of bytes used by the following stream of bytecodes. |
bytecodes | (Code Length) | Stream of Java bytecodes representing the compiled version of this method's statements. |
Exception Count | 2 | Number of exceptions that are caught inside this method as described by Table 47.22. |
Exceptions | (Count * 8) | An ordered table of fixed length structures (described in Table 47.22) that detail each try-catch clause coded in this method. |
Attribute Count | 2 | Number of attributes defined in the following attribute table. |
Attribute Table | Varies | Table of attributes provided for this method's Code attribute. Currently, only the LineNumberTable and LocalVariableTable subattributes are supported. |
Field | Number of Bytes | Value |
PC Start | 2 | First bytecode of the try block that this exception is to handle. |
PC End | 2 | Bytecode address where this exception handler is no longer active (the bytecode immediately after the try block). |
PC Exception Handler | 2 | Bytecode location of the beginning of the actual exception handler. |
Exception Type | 2 | Index into the Constant Pool of a Class reference constant representing the actual exception to be handled. |
Field | Number of Bytes | Value |
Count | 2 | Number of elements in the following line number information table. |
Table | (Count * 4) | A table containing line number information elements as described in Table 47.24. |
Field | Number of Bytes | Value |
PC Start | 2 | Program Counter location of the start of some bytecodes associated with a given line number. |
Line Number | 2 | The actual line number (relative to the start of the .java source file) where these generated bytecodes came from. |
Field | Number of Bytes | Value |
Count | 2 | Number of elements in the following local variable information table. |
Table | (Count * 10) | A table containing local variable information elements as described in Table 47.26. |
The actual local variable table elements have the following fixed length structure,
as shown in Table 47.26.
Field | Number of Bytes | Value |
PC Start | 2 | Program Counter location where this variable goes into scope. |
Scope Size | 2 | The number of bytecodes beginning with PC Start where this variable remains in scope. For example, Scope = [`PC Start' to (`PC Start' + `Scope Size' - 1)]. |
Name | 2 | Location of a Utf8 string in the Constant Pool containing the literal variable name. |
Type | 2 | Location of a Utf8 string in the Constant Pool containing the type information for this variable (as defined in the "Type Information" section). |
Variable Slot | 2 | The slot, or offset, in this method's stack frame where the variable's value is kept. |
This optional attribute is used in the high-level .class file structure to hold
the name of the source file that was used to compile this .class file (see Table
47.27). It is primarily useful for debugging systems to be able to search for the
source file and display source lines as required.
Field | Number of Bytes | Value |
File Name | 2 | Location of a Utf8 string in the Constant Pool containing the literal .java file name. |
Now that you have a fairly good understanding of the physical format of the class structure, there are lots of things that you can do with this information, such as:
Personally, I chose a derivative of the fourth alternative. In order to gain a full understanding of the nuances that a .class file reader needed to be able to deal with, I implemented a Java application to help me out. I created a package and utility for parsing a .class file and converting its information into a displayable string format. The driver utility is called ClassFileDump, and the package is called com.Que.SEUsingJava.ClassFile.
The utility itself is very simple and just reads some command-line arguments and passes them onto the main class in the package. The package is comprised of 32 classes that are contained in eight Java language source files. The starting class to the package is called ClassHeader and has a simple constructor taking no arguments, and two primary methods. The first primary method is called read and takes a single argument of a java.io. DataInputStream instance. This instance should be associated with an open .class file. read is completely responsible for loading and parsing the .class file. It does this by passing the input stream to the 31 other support classes in the package.
Each class in the package knows about a specific structure or attribute of the .class file and understands how to read it and convert it to a String. After the read method returns, the utility calls the toString method on the ClassHeader instance. The toString method takes advantage of the other class instances in the package to convert their respective member data items to String values. The toString method then returns this large string to the driver utility where it is sent to System.out.
NOTE: The ClassFileDump utility can be found on the CD-ROM in two formats. The first one is the source to the utility and package and is called CLASSDMP_SOURCE.ZIP. The second format is the executable Java bytecode version and is in a file called CLASSDMP_LIB.ZIP. This file is in the proper format to add to your CLASSPATH environment variable. For example, if you put CLASSDMP_LIB.ZIP in your JDK's \LIB directory, you could modify your classpath to be:
;c:\java\lib\classes.zip;c:\java\lib\classdmp_lib.zipAfter you have done that, you may execute the utility from anywhere that the java command is available.
java ClassFileDump <.class file name>
For example,
java ClassFileDump ClassFileDump.class
causes the contents of the ClassFileDump utilities .class file to be sent to System.out, the console. I chose to send output there because it may be easily redirected to a file.