CSAIL Publications and Digital Archive header
bullet Research Abstracts Home bullet CSAIL Digital Archive bullet Research Activities bullet CSAIL Home bullet

link to publications.csail.mit.edu link to www.csail.mit.edu horizontal line

 

Research Abstracts - 2007
horizontal line

horizontal line

vertical line
vertical line

Instrumentation of Standard Libraries in Java

Jeff H. Perkins, David Saff & Michael D. Ernst

Introduction

Program instrumentation is useful for many tasks, including memory profiling, execution profiling, debugging, logging, and other dynamic analyses. Instrumentation modifies a program by adding code at particular program points to capture dynamic information. For example, a program could be instrumented to count how many times each method is called. The instrumented code would include code at the beginning of each method that incremented the count for that method:

  class ZipFile {
    ...
    int size() {
      MethodCounter.call ("ZipFile.size");
      ...
    }    
    InputStream getInputStream (ZipEntry entry) {
      MethodCounter.call ("ZipFile.getInputStream");
      ...
    }
  } 

  class MethodCounter {

    /** Increment call count for name **/
    public static call (String name) {
      ...
    } 
  } 
Figure 1: Simple instrumentation example. Each method is instrumented by adding a call to MethodCounter.call.

Instrumentation can be accomplished for Java by modifying either source files or class files. Instrumenting class files is preferable to instrumenting source files in many circumstances. Source is not always available, source format changes more often and more drastically than class files, and a class-file-based approach is often easier to use, because it makes it unnecessary to store and compile instrumented source files.

Most instrumentation tasks require that the standard libraries (for example, Java's rt.jar) be instrumented as well as the user class files. For example, instrumentation added to determine abstract types, must track the interactions between variables within the standard libraries. Similarly, memory profiling code must keep track of allocations within the standard libraries. Instrumenting user class files is straightforward, but instrumenting the standard libraries is challenging.

Standard Library Instrumentation Challenges

Instrumenting the Java standard libraries (primarily rt.jar) provides a number of challenges.

  • The instrumenting code itself often uses the standard libraries. This use should not be instrumented or the instrumentation may change the results. For example, In the simple case of counting method calls shown in figure 1, it is important not to instrument calls to MethodCounter.call . If such calls were instrumented, at best the information would be wrong and at worse, an infinite recursion would be created.
  • JVM depends on internal details (field names, field types, number of fields, etc.) of some classes (e.g., Object and String). Changes to these classes may cause the JVM to crash.
  • More than 200 classes are loaded by the JVM before control is passed to user code. These classes cannot be instrumented dynamically, so it is necessary to statically modify the library (rt.jar) to add the instrumentation.
  • Instrumentation may wish to change class names or fields. Native calls can access the fields in any of their arguments or any field transitively reachable from those fields. Fields are identified by their name and type. In order for a native call to function correctly, any field accessed by the native call must have its original name, type, and contents. There are 1664 native methods in 290 classes in the Linux 1.5 JDK.
Approach

Previous research proposed the Twin Class Hierarchy (TCH) approach to instrumenting the Java standard libraries [1]. TCH creates a copy of each class in the standard library with a different name. There is no inheritance relationship between the instrumented class and the original class. The instrumented classes have the same inheritance relations to each other as do the original classes. Each reference to an original class within the instrumented classes is modified so that it refers to its corresponding instrumented class.

TCH leaves the original classes unchanged, which solves many of the instrumentation challenges. It does not, however, handle native calls. Native calls require the original classes and cannot be called from the instrumented classes. This can be worked around by delegating the calls to an instance of the original class and translating each argument from the instrumented type to the original type. Since there is no type relationship between the instrumented class and the original class there is no automatic way to do this. Each native call must thus be implemented by hand; and there is no guarantee that the mechanism will always work.

Our approach is to double (duplicate) each method within the class rather than to twin each class. The doubled method is given a unique name and each reference to a method within it is modified to call the doubled version rather than the original version. The doubled version of a native method simply forwards its call to the original method. Since the original classes and fields are available, native methods work as expected.

class ZipFile

  int total;  // total number of entries

  ...

  int size() {
    ensureOpen();
    return total;
  }    

  int size_double() {
    ensureOpen_double();
    return total;
  }    

  InputStream getInputStream(ZipEntry entry) 
    return getInputStream(entry.name);
  }

   InputStream getInputStream_double(ZipEntry entry) 
    return getInputStream_double (entry.name);
  } 

  static native long open(String name, int mode, 
                          long lastModified);

  static long open_double (String name, int mode, 
                            long lastModified) {
    open (name, mode, lastModified)
  }
}








Figure 2: Instrumented methods (shown in blue) are added for each original method in the class. Except for natives, instrumented methods only call other instrumented methods.

If the instrumentation requires per-object data, that data is stored separately in a weak identity hash map so that the layout and fields of the original class can be left unchanged.

Interfaces

It is useful to be able to replace a class with a differently-named class that implements the same interface. For instance, we use this technique to implement Test Factoring [2]. Test factoring replaces classes with new versions that track each interaction with the class. We also plan to use it to check dynamic mutability downcasts. It could also be used for other purposes such as logging or profiling.

Replacing one class by another is straightforward if the original design defined an interface that both classes implement, and the original design always referred to the interface, never to the original class. Such a situation is rarely the case.

Our interfacing technique automatically creates such interfaces for every class, effectively separating type inheritance from implementation inheritance. The new interface consists of all of the methods defined in the concrete class, plus accessors (get_f() and set_f()) for each field f in the class. The accessors are added to the concrete class as well. JDK Interfacing is built using our doubling instrumentation technique. Each doubled method is modified to use interfaces rather than concrete classes. Parameters are changed from concrete classes to the corresponding interface types. Virtual calls within the method become interface calls. And field accesses are modified to use the accessors defined in the interface.

  class ZipFile

    int total;  // total number of entries

    ...
    
    public int get_total() {
      return (total);
    }

    public void set_total (int total) {
      this.total = total;
    }

    int size_double() {
      ensureOpen_double();
      return total;
    }    

    InputStream_iface getInputStream_double (ZipEntry_iface entry) 
      return getInputStream_double (entry.get_name());
    }

    static long open_double (String name, int mode, 
                              long lastModified) {
      open (name, mode, lastModified)
    }
  }

  interface ZipFile_iface {

    public int get_total();
    public void set_total (int total);
    int size_double();
    InputStream_iface getInputStream_double (ZipEntry_iface entry);
    static long open_double (String name, int mode, 
                              long lastModified);
  }

Figure 3: The interfaced class and its interface with accessors (shown in blue) and interfacing methods. The original methods are elided for space.
Progress

Using our doubling and interfacing techniques, we have run instrumented versions of Java programs of up to 300,000 lines. The instrumented version runs about 80% slower than the original version.

Future

Our current implementation does not fully support some Java language constructs. Since we change the signatures and names of methods, reflection requires special support so that the modified names are not visible to the user program. User-defined class loaders must be augmented so that our instrumentation is applied to the loaded classes. Full support of arrays requires a wrapper for each array so that the array can be an interface as well. We plan to complete support for these items and make the resulting tool publicly available.

Research Support

This research is supported by DARPA contract FA8750-04-2-0254, NSF grants CCR-0133580 and CCR-0234651, the Deshpande Center, NTT, IBM, and the Oxygen project.

References

[1] M Factor, Assat Schuster, and K. Shagin. Instrumentation of Standard Libraries in Object-Oriented Languages: The Twin Class Hierarchy Approach. In Proceedings of the 19th annual ACM SIGPLAN Conference on Object-oriented programming, systems, languages, and applications, pages 288-300, Vancouver, BC, Canada, October 2004.

[2] David Saff, Shay Artzi, Jeff H. Perkins, and Michael D. Ernst. Automatic test factoring for Java. In Proceedings of the 21st Annual International Conference on Automated Software Engineering pages 114-123, Long Beach CA, USA, November 2005.

vertical line
vertical line
 
horizontal line

MIT logo Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu