CSAIL Research Abstracts - 2005 link to http://publications.csail.mit.edu/abstracts/abstracts05/index.html link to http://www.csail.mit.edu
bullet Introduction bullet Architecture, Systems
& Networks
bullet Language, Learning,
Vision & Graphics
bullet Physical, Biological
& Social Systems
bullet Theory bullet

horizontal line

Instrumentation of Standard Libraries in Java

Jeff H. Perkins, David Saff & Michael D. Ernst

Introduction

Program Instrumentation can be used to accomplish a number of useful tasks, including memory profiling, execution profiling, debugging, logging, and other dynamic analyses. Instrumentation modifies a program by adding code at particular program points to capture dynamic information. For example, a program could be instrumented to count how many times each method is called. The instrumented code would include a call at the beginning of each method that incremented the count for that method:

  class ZipFile {
    ...
    int size() {
      MethodCounter.call ("ZipFile.size");
      ...
    }    
    InputStream getInputStream (ZipEntry entry) {
      MethodCounter.call ("ZipFile.getInputStream");
      ...
    }
  } 

  class MethodCounter {

    Map call_cnt = new HashMap();

    /** Increment call count for name **/
    public static call (String name) {
      ...
    } 
  } 
Figure 1: Simple instrumentation example. Each method is instrumented by adding a call to MethodCounter.call.

Instrumentation can be accomplished by modifying either source files or class files. Instrumenting class files is preferable to instrumenting source files in many circumstances. Source is not always available, source format changes more often and more drastically than do the class files, and because instrumented source files need to be compiled, a class file based approach is often easier to use.

Most instrumentation tasks require that the standard libraries (rt.jar) be instrumented as well as the user class files. For example, if instrumentation is being added to determine comparability, it is necessary to track the interactions between variables within the standard libraries. Similarly, when memory profiling, it is necessary to keep track of allocations within the standard libraries. Instrumenting user class files is straightforward but instrumenting the standard libraries is challenging.

Standard Library Instrumentation Challenges

Instrumenting the Java standard libraries (primarily rt.jar) provides a number of challenges.

  • The instrumenting code itself often uses the standard libraries. This use should not be instrumented or the instrumentation may change the results. For example, In the simple case of counting method calls shown in figure 1, it is important not to instrument calls to MethodCounter.call_cnt. If such calls were instrumented, at best the information would be wrong and at worse, an infinite recursion would be created.

  • The JVM depends on internal details (field names, field types, number of fields, etc.) of some classes (e.g., Object and String). Changes to these classes may cause the JVM to crash.

  • More than 200 classes are loaded by the JVM before control is passed to user code. These classes cannot be instrumented dynamically, so it is necessary to statically modify the library (rt.jar) to add the instrumentation.

  • Native calls can access the fields in any of their arguments or any field transitively reachable from those fields. Fields are identified by their name and type. In order for a native call to function correctly, any field accessed by the native call must have its original name, type, and contents. There are 1664 native methods in 290 classes in the Linux 1.5 JDK.
Approach

Previous research proposed the Twin Class Hierarchy (TCH) approach to instrumenting the Java standard libraries [1]. TCH creates a copy of each class in the standard library with a different name. The instrumented classes have the same inheritance relations to each other as do the original classes. There is no inheritance relationship between the instrumented class and the original class. Each reference to an original class within the instrumented classes is modified so that it refers to its corresponding instrumented class.

TCH leaves the original classes unchanged, which solves many of the instrumentation challenges. It does not, however, handle native calls. Native calls require the original classes and cannot be called from the instrumented classes. This can be worked around by delegating the calls to an instance of the original class and translating each argument from the instrumented type to the original type. Since there is no type relationship between the instrumented class and the original class there is no automatic way to do this. Each native call must thus be implemented by hand and there is no guarantee that the mechanism will always work.

Our approach is to double each method within the class rather than to twin each class. The doubled method is given a unique name and each reference to a method within it is modified to call the doubled version rather than the original version. The doubled version of a native method simply forwards its call to the original method. Since the original classes and fields are available, native methods work as expected.

class ZipFile

  int total;  // total number of entries

  ...

  int size() {
    ensureOpen();
    return total;
  }    

  int size__double() {
    ensureOpen_double();
    return total;
  }    

  InputStream getInputStream(ZipEntry entry) 
    return getInputStream(entry.name);
  }

   InputStream getInputStream_double(ZipEntry entry) 
    return getInputStream_double (entry.name);
  } 

  static native long open(String name, int mode, 
                          long lastModified);

  static long open__double (String name, int mode, 
                            long lastModified) {
    open (name, mode, lastModified)
  }
}








Figure 2: Instrumented methods (shown in blue) are added for each original method in the class. Except for natives, instrumented methods only call other instrumented methods.

Interfaces

It is useful to be able to replace a class with a differently-named class that implements the same interface. For instance, we use this technique to implement Test Factoring. Test factoring replaces classes with new versions that track each interaction with the class. We also plan to use it to check dynamic mutability downcasts. It could also be used for other purposes such as logging or profiling.

Replacing one class by another is straightforward if the original design defined an interface that both classes implement, and the original design always referred to the interface, never to the original class. Such a situation is rarely the case.

Our interfacing technique automatically creates such interfaces for every class, effectively separating type inheritance from implementation inheritance. The new interface consists of all of the methods defined in the concrete class, plus accessors (get_f() and set_f()) for each field f in the class. The accessors are added to the concrete class as well. JDK Interfacing is built using our doubling instrumentation technique. Each doubled method is modified to use interfaces rather than concrete classes. Parameters are changed from concrete classes to the corresponding interface types. Virtual calls within the method are modified to interface calls. And field accesses are modified to use the accessors defined in the interface.

  class ZipFile

    int total;  // total number of entries

    ...
    
    public int get__total() {
      return (total);
    }

    public void set__total (int total) {
      this.total = total;
    }

    int size__double() {
      ensureOpen__double();
      return total;
    }    

    InputStream__iface getInputStream_double (ZipEntry__iface entry) 
      return getInputStream__double (entry.get__name());
    }

    static long open__double (String name, int mode, 
                              long lastModified) {
      open (name, mode, lastModified)
    }
  }

  interface ZipFile__iface {

    public int get__total();
    public void set__total (int total);
    int size__double();
    InputStream__iface getInputStream_double (ZipEntry__iface entry);
    static long open__double (String name, int mode, 
                              long lastModified);
  }

Figure 3: The interfaced class and its interface with accessors (shown in blue) and interfacing methods. The original methods are elided for space.

Progress

Using our doubling and interfacing techniques, we have run instrumented versions of Java programs of up to 300,000 lines. The instrumented version runs about 80% slower than the original version.

Future

Our current implementation does not fully support some Java language constructs. Since we change the signatures and names of methods, reflection requires special support so that the modified names are not visible to the user program. User-defined class loaders must be augmented so that our instrumentation is applied to the loaded classes. Full support of arrays requires a wrapper for each array so that the array can be an interface as well. We plan to complete support for these items and make the result publicly available.

Research Support

This research is supported by DARPA contract FA8750-04-2-0254, NSF grants CCR-0133580 and CCR-0234651, the Deshpande Center, NTT, IBM, and the Oxygen project.

References

[1] M Factor, Assat Schuster, and K. Shagin. Instrumentation of Standard Libraries in Object-Oriented Languages: The Twin Class Hierarchy Approach. In Proceedings of the 19th annual ACM SIGPLAN Conference on Object-oriented programming, systems, languages, and applications, pages 288-300, Vancouver, BC, Canada, October 2004.

horizontal line

MIT logo Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu
(Note: On July 1, 2003, the AI Lab and LCS merged to form CSAIL.)