CSAIL Publications and Digital Archive header
bullet Technical Reports bullet Work Products bullet Research Abstracts bullet Historical Collections bullet

link to publications.csail.mit.edu link to www.csail.mit.edu horizontal line

 

Research Abstracts - 2006
horizontal line

horizontal line

vertical line
vertical line

Refactoring for Parameterizing Java Classes

Adam Kiezun1 & Michael Ernst1 & Frank Tip2 & Robert Fuhrer2

1CSAIL MIT
2IBM T.J. Watson Research Center
Introduction

Generics are now a feature of the Java programming language. They enable the creation of type-safe reusable libraries. Much pre-generic Java code would benefit from being upgraded to use generics, and this task can be viewed as two related technical problems [1]:

  1. The parameterization problem consists of adding type parameters to an existing class definition. For example, one might convert the class definition
    class UnionFind{ ... }
    into
    class UnionFind <E>{ ... }
    with uses of Object in the body appropriately replaced by the formal type parameter E.
  2. Once a class has been parameterized, the instantiation problem is the task of determining how uses of the generic class should be instantiated in client code. For example, this might convert the declaration
    UnionFind parameters;
    into
    UnionFind<String> parameters;

The parameterization problem subsumes the instantiation problem, because to parameterize a class, references to other, already parametric classes, may need to be instantiated. For example, if the UnionFind class uses a Map in its implementation, then to parameterize UnionFind, one needs to instantiate the Map.

In our previous work [3,4], we showed that the instantiation problem can be solved using completely automatic and scalable techniques, and the Infer Generic Type Arguments refactoring in Eclipse 3.1 is based on our results. However, thus far, class libraries such as the Java Collections Framework, Apache Collections, etc have been parameterized manually, and developers involved with this task described it as very time-consuming, tedious, and error-prone.

Our contribution in this project is a solution to the parameterization problem with the following salient characteristics:

  • The behavior of any client of the parameterized classes is preserved.
  • The source-code transformation produces a similar result to what would be produced manually by a skilled programmer.
  • Wildcard types (lower- and upper-bounded) are fully supported. This is necessary to preserve type-correctness of the resulting program and increase the usage flexibility for clients of the parameterized class.
  • In contrast to previous approaches, which aim at full automation of the parameterizing process, our approach requires a small about of user guidance. In particular, we request that the user select one type reference (e.g., return type of a method), that he wants to change into a type parameter. This information is typically enough to fully parameterize the class and classes it depends on.
  • The approach is practical --- it allows an efficient implementation that is easy to use.
  • The results are similar to what a skilled programmer would produce.
Technique

Our parameterization algorithm is based on type constraints and has 3 steps:

  1. From the program's source code, generate a set of type constraints.
  2. Solve the constraints, thus computing types for all entities in the program.
  3. Rewrite the program source code accordingly.

Type constraints and the solving algorithm are discussed below.

Type Constraints

Type constraints express subtype relationships between elements in the program. For example, given an assignment x = y, the rules of the Java language specify that in order for the program to be type-correct, the type of the right-hand side of the assignment (i.e., y) must be equal or be a subtype of the type of the right-hand side of the assignment. In our notation, [y] ≤ [x] (square brackets mean "type of", the less-than symbol means "equal or subtype of"). The two "sides" of a type constraint are called constraint variables. A program is type-correct when all type constraints are satisfied.

In addition to preserving type correctness, type constraints are used to preserve the program's behavior. For example, type constraints between parameters of overriding and overridden methods ensure that the overriding relationship is preserved. Without such constraints, the behavior of dynamic dispatch may change, thus changing the program's behavior. Context Variables

In the presence of generics, a type of a program entity is not necessarily fixed, and may depend on the context in which the entity is referenced. For example, consider the following class declaration:

class ArrayList<E>{
  ... 
  E get(int idx){ ... }
}

The return type of ArrayList.get is not the same for all callers (as it would be in the absence of generics). For example, if the method is called via  a reference of type ArrayList<String>, the return type is String, if via ArrayList<Integer>, the return type is Integer, etc.

In the parameterization problem, the placement of formal type parameters is initially unknown. The program entities whose types depend on the context are also initially unknown. Our algorithm uses context variables, a special form of constraint variables in which it is possible to express that a variable may or may not be given a type parameter. For example, consider the following call to an initially unparameterized UnionFind:

uf.union(x1, x2);

The type constraints generated for this call use context variables.
[x1] [Param(1,union)](uf)
[x2]
[Param(2,union)](uf)

The type constraint [x1] [Param(1,union)](uf) expresses the following: "the type of x1 must be equal or a subtype of the type of the first parameter of method union, in the context of reference uf". The right-hand side of the constraint is a context-variable. If UnionFind is not parameterized, the context is irrelevant.

Solving Type Constraints

Because of dependencies, classes must be parameterized in groups. For example, if the UnionFind class has a member class to iterate over its elements, both classes must be parameterized in order to parameterize UnionFind. It is impractical, and often impossible (because of mutual dependencies) to parameterize classes in the order of dependencies. Some classes, such as abstract classes and interfaces, cannot be parameterized on their own.

Our solving algorithm enables parameterization of multiple interdependent classes at once. Our constraint solver uses a non-standard step that propagates type parameter information between classes. This allows parameterization of interdependent classes.

The constraint solving algorithm works as follows. Every variable in the solver has an assigned type estimate. Type estimate for a variable is a set of types that the variable may legally have. Type estimates contain non-parametric types, type parameters and wildcard types. As solving progresses, type estimates are narrowed, according to the type constraints. When a type estimate for a context variable is narrowed to contain only a type parameter, then the main, non-context part of the variable is assigned a (new) type parameter. This is necessary to preserve type correctness of the resulting program and allows parameterization of interdependent classes.

At the end of solving, type estimates for all variables are singleton sets. Those sets do not contain parametrized types. For example, there may be a variable [uf] whose estimate is {UnionFind}, a variable [Param(1,union)] with type estimate {T} (singleton of type parameter). To assign parametric types, the algorithm: i) finds formal type parameters for classes and, ii) finds actual type parameters for every formal type parameter in every context. For example, given the estimates above, UnionFind would be parameterized as UnionFind<T>. To find the actual type parameter for uf, the estimate of context variable [Param(1,union)](uf) is used. For example, if the last estimate is {String}, the the declaration of uf is parameterized as UnionFind<String> uf;

Results

We implemented our technique as a refactoring in Eclipse. We evaluated its effectiveness by parameterizing selected classes from 6 already-generic and 2 non-generic libraries. Parameterizing already-generic classes allows direct comparison of the results automatically computed by our analysis with those that were created by a human expert (i.e., the libraries' authors).

For our experiments, we parameterized 52 classes from the selected libraries. Then, we manually compared the results with the expert-made parameterization. In 68% of cases, our tool performed equally well (i.e., the parameterization was exactly the same or equivalent). In 19% of cases, our tool perfomed worse than the human (i.e., the inferred parameterization, though correct, was not what a skilled programmer would have done). In 13% of cases, our tool performed better than the human expert (i.e., the automatically-inferred parameterization allowed for removal of more casts or more closely followed the style of the standard collections).

Reseach Support

This work was supported in part by NSF grants CCR-0133580 and CCR-0234651, DARPA contracts NBCH30390004 and FA8750-04-2-0254, and gifts from IBM and Microsoft.

References

[1] A. Donovan, M. Ernst. Inference of generic types in Java, MIT Technical Report, 2003

[2] D. Duggan. Modular type-based reverse engineering of parameterized types in Java code, In The Proceedings of International Conference on Object-Oriented Programming Systems, Languages and Applications (OOPSLA) 1999

[3] A. Donovan, A. Kiezun, M. Tschantz, M. Ernst. Converting Java Programs to use Generic Libraries, In The Proceedings of International Conference on Object-Oriented Programming Systems, Languages and Applications (OOPSLA) 2004

[4] R. Fuhrer, F.Tip, A.Kiezun, J.Dolby, M.Keller. Efficiently refactoring Java applications to use generic libraries, In The Proceedings of the European Conference on Object-Oriented Programming (ECOOP) 2005

[5] D. von Dincklage, A. Diwan. Converting Java Classes to Use Generic, In The Proceedings of International Conference on Object-Oriented Programming Systems, Languages and Applications (OOPSLA) 2004

vertical line
vertical line
 
horizontal line

MIT logo Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu