|
Refactoring for
Parameterizing Java Classes
Adam Kiezun1 & Michael Ernst1
& Frank Tip2 & Robert Fuhrer2
1CSAIL MIT
2IBM T.J. Watson Research Center
Introduction
Generics are now a feature of the Java programming language. They enable
the creation of type-safe reusable libraries. Much pre-generic Java code
would benefit from being upgraded to use generics, and this task can be
viewed as two related technical problems [1]:
- The parameterization
problem consists of adding type parameters to an existing class
definition. For example, one might convert the class definition
class UnionFind{ ... }
into
class UnionFind <E>{ ... }
with uses of Object in the body appropriately replaced
by the formal type parameter E .
- Once a class has been parameterized, the instantiation problem is the task of determining
how uses of the generic class should be instantiated in client code.
For example, this might convert the declaration
UnionFind parameters;
into
UnionFind<String> parameters;
The parameterization problem subsumes the instantiation problem, because
to parameterize a class, references to other, already parametric classes,
may need to be instantiated. For example, if the UnionFind
class uses a Map in its implementation, then to parameterize
UnionFind , one needs to instantiate the Map .
In our previous work [3,4], we showed
that the instantiation problem can be solved using completely automatic
and scalable techniques, and the Infer Generic Type Arguments refactoring
in Eclipse 3.1 is based on our results. However, thus far, class libraries
such as the Java Collections Framework, Apache Collections, etc have been
parameterized manually, and developers involved with this task described
it as very time-consuming, tedious, and error-prone.
Our contribution in this project is a solution to the parameterization
problem with the following salient characteristics:
- The behavior of any client of the parameterized
classes is preserved.
- The source-code transformation produces a similar result
to what would be produced manually by a skilled programmer.
- Wildcard types (lower- and upper-bounded) are fully supported.
This is necessary to preserve type-correctness of the resulting program
and increase the usage flexibility for clients of the parameterized
class.
- In contrast to previous approaches, which aim at full
automation of the parameterizing process, our approach requires a small
about of user guidance. In particular, we request that the user select
one type reference (e.g., return type of a method), that he wants to
change into a type parameter. This information is typically enough to
fully parameterize the class and classes it depends on.
- The approach is practical --- it allows an efficient
implementation that is easy to use.
- The results are similar to what a skilled programmer
would produce.
Technique
Our parameterization algorithm is based on type constraints and has 3
steps:
- From the program's source code, generate a set of type
constraints.
- Solve the constraints, thus computing types for all entities
in the program.
- Rewrite the program source code accordingly.
Type constraints and the solving algorithm are discussed below.
Type Constraints
Type constraints express subtype relationships between elements in the
program. For example, given an assignment x = y, the rules of the Java
language specify that in order for the program to be type-correct, the
type of the right-hand side of the assignment (i.e., y) must be equal
or be a subtype of the type of the right-hand side of the assignment.
In our notation, [y] ≤ [x] (square brackets mean "type of", the
less-than symbol means "equal or subtype of"). The two "sides" of a type
constraint are called constraint variables. A program is type-correct
when all type constraints are satisfied.
In addition to preserving type correctness, type constraints are used
to preserve the program's behavior. For example, type constraints between
parameters of overriding and overridden methods ensure that the overriding
relationship is preserved. Without such constraints, the behavior of dynamic
dispatch may change, thus changing the program's behavior. Context Variables
In the presence of generics, a type of a program entity is not necessarily
fixed, and may depend on the context in which the entity is referenced.
For example, consider the following class declaration:
class ArrayList<E>{
...
E get(int idx){ ... }
}
The return type of ArrayList.get is not the same for all
callers (as it would be in the absence of generics). For example, if the
method is called via a reference of type ArrayList<String> ,
the return type is String , if via ArrayList<Integer> ,
the return type is Integer , etc.
In the parameterization problem, the placement of formal type parameters
is initially unknown. The program entities whose types depend on the context
are also initially unknown. Our algorithm uses context
variables, a special form of constraint variables in which it is
possible to express that a variable may or may not be given a type parameter.
For example, consider the following call to an initially unparameterized
UnionFind :
uf.union(x1, x2);
The type constraints generated for this call use context variables.
[x1] ≤ [Param(1,union)](uf)
[x2] ≤ [Param(2,union)](uf)
The type constraint [x1] ≤ [Param(1,union)](uf)
expresses the following: "the type of x1 must be
equal or a subtype of the type of the first parameter of method union ,
in the context of reference uf ". The right-hand side of the
constraint is a context-variable. If UnionFind is not parameterized,
the context is irrelevant.
Solving Type Constraints
Because of dependencies, classes must be parameterized in groups. For
example, if the UnionFind class has a member class to iterate over its
elements, both classes must be parameterized in order to parameterize
UnionFind. It is impractical, and often impossible (because of mutual
dependencies) to parameterize classes in the order of dependencies. Some
classes, such as abstract classes and interfaces, cannot be parameterized
on their own.
Our solving algorithm enables parameterization of multiple interdependent
classes at once. Our constraint solver uses a non-standard step that propagates
type parameter information between classes. This allows parameterization
of interdependent classes.
The constraint solving algorithm works as follows. Every variable in
the solver has an assigned type estimate. Type estimate for a variable
is a set of types that the variable may legally have. Type estimates contain
non-parametric types, type parameters and wildcard types. As solving progresses,
type estimates are narrowed, according to the type constraints. When a
type estimate for a context variable is narrowed to contain only a type
parameter, then the main, non-context part of the variable is assigned
a (new) type parameter. This is necessary to preserve type correctness
of the resulting program and allows parameterization of interdependent
classes.
At the end of solving, type estimates for all variables are singleton
sets. Those sets do not contain parametrized types. For example, there
may be a variable [uf] whose estimate is {UnionFind} ,
a variable [Param(1,union)] with type estimate {T}
(singleton of type parameter). To assign parametric types, the algorithm:
i) finds formal type parameters for classes and, ii) finds actual type
parameters for every formal type parameter in every context. For example,
given the estimates above, UnionFind would be parameterized
as UnionFind<T> . To find the actual type parameter
for uf, the estimate of context variable [Param(1,union)](uf) is
used. For example, if the last estimate is {String} , the
the declaration of uf is parameterized as UnionFind<String>
uf;
Results
We implemented our technique as a refactoring in Eclipse. We evaluated
its effectiveness by parameterizing selected classes from 6 already-generic
and 2 non-generic libraries. Parameterizing already-generic classes allows
direct comparison of the results automatically computed by our analysis
with those that were created by a human expert (i.e., the libraries' authors).
For our experiments, we parameterized 52 classes from the selected libraries.
Then, we manually compared the results with the expert-made parameterization.
In 68% of cases, our tool performed equally well (i.e., the parameterization
was exactly the same or equivalent). In 19% of cases, our tool perfomed
worse than the human (i.e., the inferred parameterization, though correct,
was not what a skilled programmer would have done). In 13% of cases, our
tool performed better than the human expert (i.e., the automatically-inferred
parameterization allowed for removal of more casts or more closely followed
the style of the standard collections).
Reseach Support
This work was supported in part by NSF grants CCR-0133580 and CCR-0234651,
DARPA contracts NBCH30390004 and FA8750-04-2-0254, and gifts from IBM
and Microsoft.
References
[1] A. Donovan, M. Ernst. Inference of generic types
in Java, MIT Technical Report, 2003
[2] D. Duggan. Modular type-based reverse engineering
of parameterized types in Java code, In The Proceedings of International
Conference on Object-Oriented Programming Systems, Languages and Applications
(OOPSLA) 1999
[3] A. Donovan, A. Kiezun, M. Tschantz, M. Ernst. Converting
Java Programs to use Generic Libraries, In The Proceedings of
International Conference on Object-Oriented Programming Systems, Languages
and Applications (OOPSLA) 2004
[4] R. Fuhrer, F.Tip, A.Kiezun, J.Dolby, M.Keller. Efficiently
refactoring Java applications to use generic libraries, In The
Proceedings of the European Conference on Object-Oriented Programming
(ECOOP) 2005
[5] D. von Dincklage, A. Diwan. Converting Java
Classes to Use Generic, In The Proceedings of International Conference
on Object-Oriented Programming Systems, Languages and Applications
(OOPSLA) 2004 |
|