CSAIL Research Abstracts - 2005 link to http://publications.csail.mit.edu/abstracts/abstracts05/index.html link to http://www.csail.mit.edu
bullet Introduction bullet Architecture, Systems
& Networks
bullet Language, Learning,
Vision & Graphics
bullet Physical, Biological
& Social Systems
bullet Theory bullet

horizontal line

Migrating Java Applications to Use Generic Libraries

Adam Kiezun, Julian Dolby, Alan Donovan, Michael D. Ernst, Robert M. Fuhrer,
Markus Keller, Frank Ti p & Matthew S. Tschanz

Introduction

Generic types in Java 1.5 of the programming language, enable the creation of reusable class libraries and compiler enforcement of the type safety of their usage. Some libraries have already been enhanced with generics (most visibly, the Java standard class libraries). It is the premise of this research that application developers will want to convert their programs to take advantage of the new increased safety and documentation provided by the language feature.

For an example, consider the following program (left) and its desired version after the migration to use generics.

Original Program Transformed Program (note the removed downcast)
Map m = new HashMap();
m.put("string", 2);
m.put("string", 3.14);
Set entries = m.entrySet();
Iterator i = entries.iterator();
Entry me = (Entry)i.next();
String key = (String)me.key();
Map<String, Number> m = new HashMap<String, Number>();
m.put("string", 2);
m.put("string", 3.14);
Set<Entry<<String, Number>>> entries = m.entrySet();
Iterator
<Entry<String, Number>> i = entries.iterator();
Map.Entry
<String, Number> me = i.next();
String key = me.key();

Manual transformation of any real-size application would be very tedious and error-prone. Our goal is to enable automated migration of Java applications to use generic versions of library classes. We aim at creating sound analyses and well-performing and easy-to-use tools that help programmers in this task.

Approaches

We present here two approaches that we employed to address the problem. The first approach [1], implemented in a prototype tool Jiggetai, is in two parts: the first computes the least types of each generic allocation site in the program, and the second propagates those parametric types throughout the rest of the program, in order to make it type-correct. The first part uses a whole-program, context-sensitive pointer analysis and a unification-like algorithm to reconcile the pointer analysis results with the declared types of method parameters, return types and field types. The second part is context-insensitive and selects new types for all declarations in the program that refer to generic classes. It does so by creating and solving a type-constraint system. It can be performed on the whole program or only on the part that is required to be modified.

The second approach [2], to which we refer here as the Eclipse approach, is slightly different. It too uses type constraints but does so in a simpler, context-insensitive way. The desired accuracy of results is achieved by the use of special variables used in the type constraints to represent formal types arguments of generic types in the program. It does not need to analyze the whole program. Specifically, generic libraries need not be analyzed - only the type signature information is required. The algorithm is in three parts: the first creates a system of type constraints for the program, the second closes the constraint system with additional constraints that reflect dependencies between inferred actual type parameters, and the third solves the system, finding suitable types for all declarations in the program. The Eclipse approach is designed to be able to additionally infer actual type parameters in declarations of subclasses of generic classes. For example, it may infer that a certain class that implements Iterator should in fact implement Iterator<String>. This allows it to remove additional casts.

Results

We implemented the first approach in a tool Jiggetai, that efficiently and soundly performs the translation to use generic libraries as a source-to-source translation. It does not need to have source code for the generic libraries but rather works on the byte-code level. This has the disadvantage that, in order to correctly rewrite the source code, Jiggetai needs lexical information that is not normally present in the byte-code. We solved that problem by enhancing the Java compiler to insert such source-level details as lexical extent of declarations as byte-code ''comments''. We evaluated Jiggetai on several real-world programs (sizes 6-47KLOC). It is effective; in under seven minutes it can analyze the largest of our benchmarks and remove almost all casts (78%-99%, depending on a benchmark) related to the use of generic classes.

The Eclipse approach was implemented with practical use in mind. It performs the transformation as a source-to-source refactoring, does not analyze byte-code of the library and does not require modifications to the compiler. It is implemented in Eclipse and is planned to be included in the release 3.1 of the development environment thus making it freely available to hundreds of thousands of programmers. A disadvantage of using a source-code analysis is the more complex syntax which we dealt with by re-using a type type-constraint infrastructure already present in Eclipse (used for other, not generic-related refactorings). Because it uses a simpler, non context-sensitive analysis, this approach scales better than Jiggetai. The tool we implemented can successfully analyze programs up to 90KLOC in under two minutes. The accuracy (the number of casts and unchecked compiler warnings removed) of the results are also improved. It is due to the fact that the Eclipse approach is able to infer generic instantiation of subclasses of generic classes, thus allowing the removal of additional casts.

Future:

We plan to continue working on accuracy and scalability of our Eclipse implementation. We believe that accuracy can be improved by enhancing the analysis to infer wildcards [3]. The tool scales well in its current state, but several important optimalizations have not yet been implemented. Those include cycle elimination and simplifications that reduce the number of constraint variables generated. Also, we are planning to experiment with reducing the, now substantial, memory footprint.

Research Support

This research is supported by IBM and by NSF grants CCR-0133580 and CCR-0234651.

References

[1] Alan Donovan, Adam Kiezun, Matthew S. Tschanz and Michael D. Ernst. Converting Java Programs to Use Generic Libraries. In The Proceedings of OOPSLA, pp. 15-34, Vancouver, BC, Canada, October 2004.

[2] Robert Fuhrer, Frank Tip, Adam Kiezun, Julian Dolby and Markus Keller. Efficiently Refactoring Java Applications to Use Generic Libraries. In The Proceedings of ECOOP, Edinburgh, United Kingdom, July 2005.

[3] Mads Torgersen, Christian Plesner Hansen, Erik Ernst, Peter von der Ahe, Gilad Bracha and Neal M. Gafter. Adding wildcards to the Java programming language. In The Proceedings of ACM Symposium on Applied Computing (SAC), pp. 1289-129, Nikosia, Cyprus, 2004.

horizontal line

MIT logo Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu
(Note: On July 1, 2003, the AI Lab and LCS merged to form CSAIL.)