MIT Artificial Intelligence Lab
October 24, 2002
Click to edit Master text styles
Second level
Third level
Fourth level
Fifth level
Mark A. Foltz   mfoltz@ai.mit.edu
Thesis Committee Meeting
‹#›
Hello, today I am going to talk about my progress on my thesis research on the relationship between software diagramming and software refactoring.
My agenda is to first briefly describe the problem I’m working on and my approach to it. Then I will talk about refactorings in Dr. Jones – what Dr. Jones knows about them and what space of refactorings Dr. Jones will incorporate.  Then I’ll illustrate a scenario of multiple refactorings in Dr. Jones and end with a status update. 
First I’ll describe the problem I’m attacking.
That problem is the complexity of redesigning software.
When thinking about design, programmers prefer diagrams that abstract from the details in source code.   However, current tools provide diagrams like the one on the left that quickly become as complicated as the program they’re trying to depict. The programmer can try to filter out what she doesn’t want to see, but usually it’s easier for her pick up a pen and paper and redesign by drawing only the parts she wants to see (like the diagram on the right).  Unfortunately, the computer can’t help the programmer when she redesigns with pen and paper – and I believe that it should.
I’d like to bring the computer back into this process by starting with two observations.
 First, programmers’ pen and paper diagrams are task-relevant – they draw the parts of the program they want to change and the dependencies that are involved, and leave out the rest. 
Second, there’s a growing body of commonly used design moves called refactorings – local, structural changes to the program that involve a few of its related parts. 
These observations led me to the thesis that if a diagramming tool understood the refactorings the programmer wanted to make, it could
(1) draw relevant, task-specific diagrams and
(2) use those diagrams to help the programmer interactively explore the program’s design.
Stepping back for a bit, a redesign tool could potentially support the user in a number of roles, because refactoring is a multi step process. First, the tool can give the user a visual representation where it’s easy to spot opportunities for refactoring. Second, the tool can diagnose design problems by looking for `bad smells’ or `antipatterns’, known patterns of design weakness like code duplication. Third, the tool can show the user the results of proposed refactorings, allowing them to chain together multiple refactorings to visualize new designs. And finally the tool can verify that the refactorings preserve behavior and implement them by changing the source code. Choosing the actual refactorings – between steps two and three in this process – is the most difficult step and remains up to the user.
 
Dr. Jones addresses the first and third roles listed here.
The metaphor is that of a fellow programmer who knows the program you’re refactoring (although not what it does), can draw accurate diagrams of it, and give the programmer guidance while refactoring.  It innovates by decoupling the steps of planning and implementing the refactorings  -- current tools transform the source immediately when the user makes a refactoring decision.
I see three main contributions resulting from this research. 
The motivating contribution is Dr. Jones, the tool I am developing that requires two main innovations.  The first innovation is a knowledge base of refactorings for Java programs, built from the perspective of a tool that assists the user in  visual design exploration. The second innovation is a mechanism for keeping the contents of software diagrams relevant across multiple refactorings by tracking the focus of refactoring attention.  The rest of this talk will focus on my progress towards realizing this first innovation.
First I will describe what Dr. Jones knows about each individual refactoring for Java.
Briefly, a refactoring is a structural change to a program that improves its design, without changing its visible behavior. Common examples are moving a method, generalizing classes to a base class, and encapsulating methods into a delegate.
Let’s look at the move method refactoring in detail, to illustrate what Dr. Jones knows about a typical refactoring. Suppose we decide that the miles per gallon of a vehicle is really a property of a vehicle, and not its engine, to reduce coupling. We refactor by moving the method to Vehicle, and leaving a skeleton method behind in Engine that delegates to the new location.
What would Dr. Jones need to know to help me plan this refactoring? 
 
In Dr. Jones, I represent a refactoring by four pieces of knowledge. 
First, what are the obvious reasons not to perform the refactoring (the guards). 
Second, how does the refactoring change Dr. Jones’ representation of the program design and thus what is shown in its diagrams.  Third, does the refactoring suggest other refactorings that are likely to improve the program design.  And finally, where are the places in the source that might have to be changed to implement the refactoring. I’ll now examine these four pieces of knowledge in detail for the move method refactoring.
First, Dr. Jones can check for guards – obvious reasons that one wouldn’t want to do the refactoring. In this case, Dr. Jones can check for name conflicts, and that you’re not trying to move a constructor. Also a different set of rules apply for move method if the source and target classes are related by inheritance. Note that these`guards’ don’t completely check that the refactoring preserves the program’s behavior (since that would involve much more difficult analyses). Rather these are more like sanity checks to help the programmer avoid refactoring mistakes.
(Dr. Jones can remind the programmer to check the more difficuly safety conditions when it’s time to implement the refactorings.  This runs the risk, however, of allowing the programmer to plan unsafe refactorings with Dr. Jones.)
The impact on the design representation is straightforward (as we saw a few slides ago in the move method example). Dr. Jones copies the method signature from the source to the target and notes that the old method delegates to the new location.
Dr. Jones can make several design suggestions even for this seemingly straightforward refactoring. If the method is polymorphic in the source hierarchy, it’s likely that the programmer will want to express that polymorphism on the target hierarchy.  So we leave to-dos for the programmer to move the overriding and overridden methods to appropriate places in the target hierarchy. Also, if the method is overloaded with functions of the same name but different signatures, then the programmer might want to move those as well. Finally, if the method uses fields or methods in the source class, the programmer will need to provide access to them (I.e., by encapsulating those fields).
Finally we can give some guidance to the programmer when he is ready to tackle the source, like moving the method body to the target, implementing the delegation in the source, and converting the uses of source members.  (Dr. Jones builds a cross reference of which methods use which other fields and methods for use in these last two steps.) Dr. Jones decouples the choices of the refactoring steps to take, and the actual manipulation on the source to implement the refactorings. In this way multiple alternatives can be more easily explored, and the hard work of implementing the refactorings undertaken once an alternative is chosen.
Dr. Jones uses these four pieces of knowledge together to play its role as a diagramming assistant. 
I’d like to compare Dr. Jones’ knowledge about refactoring to that of another research project, the Smalltalk Refactoring Browser developed at UIUC. The Refactoring Browser was primarily concerned with giving the user a safe and reliable tool – the user could trust it to know when a refactoring is behavior-preserving, and if so to transform the source correctly. Dr Jones on the other hand has knowledge that will give the user visual feedback on the new designs generated by refactoring, prevent bad refactorings, and suggest`follow-up’ refactorings. These kinds of knowledge haven’t been explicitly considered before in a refactoring tool, and I believe my specifications of it represents a contribution to refactoring research. I also believe these two bodies of knowledge are complementary, and a tool that integrates, for example, design diagnosis, design exploration and source transformation would be a more complete solution and a fruitful direction for future work.
Next I am going to give an overview of what refactorings are included in the knowledge base.
The KB is structured around a set of refactoring verbs that can be applied to the major program elements in Java.  This vocabulary was motivated by the desire to have a economical number of actions that the user can apply to elements of the diagram, instead of a flat list that would have to be learned and remembered.
The vocabulary also sets up a space of possible refactorings whose cases can be filled in for a specific language (in this case Java).
I have specified entries in the KB for each of these check marks in a semi-formal language. 
Most of the missing marks are cases that don’t make sense in Java, [next slide]
I.e. hiding a package. 
Creating and removing fields and methods aren’t included because they don’t seem to be in the spirit of behavior preservation, and adding new functionality is a separate concern. The intention is to let programmers naturally express typical sequences of refactorings they would use in practice.
We can compare the coverage of Dr. Jones to the catalog in Fowler’s 1999 book and a leading refactoring CASE tool. Although we’re comparing apples and oranges, in terms of expressiveness, Dr. Jones has a significant fraction of the Fowler’s refactorings collected from practice and more than a source-transformation-only CASE tool.
To bring the two main parts of the talk together I’ll present a scenario that shows Dr. Jones’ body of knowledge in action.  simulte
We start with a class that keeps a calendar of appointments.
The information for each appointment is kept in fields of arrays (one for the date, one for the start time, etc.) Appointments can be made in three types: regular, to-dos, and and all-day (indicated by a numeric type code). The programmer would like to refactor this to create an extensible abstraction for an Appointment.
The first step is to encapsulate the array fields into a new Appointment class.
Dr. Jones would ask the user to name the new class, and to choose a container for the aggregation (in this case an array). It would then change the program representation as necessary and diagram the new design, including replacing the multiple aggregation edges with a single new one.
Now that we have an Appointment class, it makes sense to replace the multiple parameters to add() with a single Appointment parameter.
The user does this with the encapsulate parameters refactoring.
Since Dr. Jones knows where the add() method is called in the original program, it can tell the programmer where to change the calling syntax later.
Now the user would like to make the appointment-type-specific behavior explicit in the class hierarchy. The user prepares for this by moving the type code fields to the Appointment class. [ include something about moving methods in Calendar here, or come up with an example. ]
The user can then encapsulate the type codes of Appointment into subclasses.
Things look pretty good, until she realizes that the user of the calendar might want to change the type of an appointment.
Objects can’t change class in Java, so this creates a problem.
Here she can use Dr. Jones’ ability to explore alternatives to back up and try a different refactoring.
Encapsulating the type codes in a separate class avoids this probem.
An Appointment can change its type dynamically by reassigning its AppointmentType instance.
We can compare this design to the original to see the improvement [flip slides].
 
[ slide copied here to illustrate improvement in design ]
Before concluding, I wanted to bring the talk back to the issues that got me started looking at refactoring, that of keeping the diagrams simple and relevant while redesigning software. This is the goal of focus tracking, and I see it as one of Dr. Jones’ major payoffs to the user. The KB I’ve described will be a major component of the focus tracking mechanism, since it knows what program elements are involved in the current refactoring and which ones are likely to be further refactored. The focus tracking mechanism will use this information to render the elements at appropriate levels of detail, shown here. For instance, it could show some historical context by showing elements refactored in the past at a low level of detail.
The currently refactored elements will get the highest level of detail.
Likely future refactorings will also get more detail, but since we can’t predict the user’s next actions exactly, the drop off is quick.
The specifics of this mechanism remain future work in my research.
Today I’ve presented an overview of my research progress on Dr. Jones, an interactive refactoring tool for Java programs. I’ve described the four kinds of design exploration knowledge I have specified for each of  Dr. Jones’ 50 refactorings. And I’ve also described a scenario that will drive the next phases of my research.
The tasks I’ve completed are the specification of 50 refactorings for Dr. Jones in a semi-formal language, and the infrastructure to analyze and flexibly diagram existing Java programs. Next I plan to implement the set of refactorings used in the scenario and understand what focus tracking would be for the scenario. This in turn will drive work on a more general focus tracking mechanism, and the implementation of the remaining refactorings I have specified.  In the final phase of my research I want to evaluate Dr. Jones by obtaining user reactions and feedback.
Before breaking for discussion I’ll give you a tour of the prototype’s current capabilities.