‹header›

‹date/time›

Click to edit Master text styles

Second level

Third level

Fourth level

Fifth level

‹footer›

‹#›

Hello, I am Tracy Hammond from the MIT Aritificial Intelligence Laboratory.I am here to talk to you today about an agent-based system for capturing and indexing software design meetings. This project is a joint project between the Design Rationale Group and the Intelligent Room. Other members working on this project include Krzysztof Gajos, Randall Davis, and Howard Shrobe.

Software design meetings occur often in the Intelligent Room and in many other spaces. During these meetings users discuss design. The white board plays a large part in these meetings, and designers draw sketches on the white board to aid in the brainstorming of ideas, visualizing of program organization, and understanding of requirements.

Both the interaction between the members of the room and their sketches on the white board contain a large amount of design history.

We can capture this design history by videotaping the members of the room and the sketches on the white boards.

However, this video can be long. What if we have a question about how a particular class came into being and want to watch the part of the design session where that class was developed?

Our solution is to use sketch recognition to index the design meeting videotape.

We have created Tahuti.

Tahuti allow users to sketch UML-type diagrams on a white board as they would naturally. UML diagrams are a standard way to represent object relationship in an object oriented system. They are an industry standard.

The only difference being that there images are projected onto the white board and they use a digital pen rather than marker. We then recognize these sketches using sketch recognition.

Others have made sketch recognition system to recognize UML diagrams. Our system is unique first because it recognizes diagrams by shape. This gives a high recognition rate without requiring users to draw objects in a particular order. Other systems place drawing requirements such as stroke order, or use features that are too poorly correlated with shape, like stroke length, causing a lot of false positives.

It’s also unique because the system is made for designing agents. And the UML class diagram library of shapes has been increased to include shapes facilitating agent design.

The sketches are recognized based on shape rather than other features requiring a particular stroke order or direction to provide the maximum amount of drawing freedom. Because we are working with digital ink, the sketches can be edited. Because the sketches are recognized, the drawn objects can be naturally moved as a whole, affecting other objects as appropriate. For instance here we drag an object and the relationship stays attached.

(maybe show 5 second video of drawing one shape)

Because the diagrams are interpreted and understood, we can further process the drawn diagram by generating base code, saving time for the user. Here we see the code generated by Rationale Rose which we have directly connected to the Tahuti sketch recognition agent.

Finally, as drawn objects are recognized we can signal events. These events are time-stamped and can be mapped to the meeting’s videotape.

The Tahuti sketch recognition agent recognizes several drawn shapes. The system recognizes UML class diagram symbols including a general class, drawn as a square, an interface class, drawn as a circle, an interface association, drawn as a line connecting a general class to an interface class, a dependency association, drawn as an arrow with an open arrow head, an inheritance association, drawn as an arrow with a triangle arrow head, and an aggregations association drawn as an arrow with a diamond arrow head.

The system was built for software design within the Intelligent Room of the MIT AI Lab. Intelligent Room applications are built from agent technology, run on the Metaglue agent infrastructure, and frequently have speech interfaces to them. Thus, we added two additional symbols to aid in describing agent-based software, an agent class, drawn by a double-edged rectangle, and a speech grammar, drawn by a triangle or pentagon.

Here we show an example of a sketched diagram designing a simple fan agent. I will switch to the interpreted view before explaining this diagram since it is easier to read.

Here is the cleaned up view of the of the diagram drawn on the previous slide.

The Fan Agent inherits from the X10 Device Agent which specifies an interface for any device that can be powered on and off. It has a fan speech agent that answer to commands, such as start the fan. These commands are defined within the fan speech grammar. The Fan also has a Fan GUI Agent so that the Fan may be also be controlled using a screen app. The FanGUIMaker is a general class defining how to creation the Fan GUI. The other Agents are part of the Metaglue system, and the relationships drawn need to exist in order for everything to work properly in the room.

When a user draws an agent class, there is actually more happening than meets the eye. The agent class is actually an abstraction for a more complicated relationship. In reality, an agent’s implementation is always accompanied by an interface and the inheritance structure of interfaces usually parallels that of agents. In our sketches we omit the interfaces.

The interactions among agents involve a complex pattern of proxy objects and helper classes, which we also omit in our sketches

The dependency arrow between two general classes means that one class uses, relies on, another class. The dependency arrow between two agent classes means much the same thing as it implies there is communication between the two classes and the communication originates from the tail of the arrow. The GUI Manager-Stub shown in the picture is a general class that is automatically generated for RMI communication so the agents can talk over the network.

Finally, reliance on a grammar always implies use of a special proxy agent and interaction with Metaglue’s speech facilities.

- The left displays the underlying object oriented design. But when designers build new speech objects they don’t want to think about the stuff on the left. They would prefer to only have to worry about the stuff listed on the right.

During the recognition process, events, such as addition, deletion, movement, and text annotation of drawn objects are processed and recorded.

We then associate the timestamps of the events with the video and use them to index the videotape, audio, or screen capture of the design meeting.

This allows us to ask such questions as, what was the meeting discussion when this class was added? The system could then provide us with the appropriate videotape segment.

Designers may also want to ask the more general question “How did we design this system?” We would like to present to the designer a visual description of how the scene evolved. We don’t want to show the designer all of the significant events. Rather, we want to select a small number of snapshots that when combined together can best display the evolution of the design. We want to select the most significant events to show to the user, and the most revealing snapshot related to those significant events. Significant events are all given a rank. For instance, creation of an Agent is given the highest rank of all sketched objects, with a rank of 10.

The logic behind the initial ranking is as follows. The final event is always ranked the highest. The designer selected significant events outrank computer selected significant events. Creation of viewable objects is considered a more significant event than the updating or movement of that object. Creations of objects that no longer exist in the final version are considered to be much less significant than those that remained throughout the entire process.

Within a particular category (e.g., looking only at the Creation of Agent Events), events are again ranked as more or less significant. Events that affect more objects have a higher ranking. Events specifying the creation of agents, classes, and grammars are further differentiated by the number of associations attached to them, adding to the rank the total number of associations divided by 100.

A designer may want to see the three most significant sketch events of a software design session.

In this picture the drawn objects are ranked. The number at the left of the dash shows the order the object was drawn. The middle number to the right of the dash and the left of the decimal point gives the primary ranking of the object based on the event type. The third number gives the secondary ranking based on the importance or rather how related it is to the other objects in the diagram. When the top three most significant events are chosen, we choose the final event by default and the two with the highest ranking, which are marked by stars.

When the most significant events are chosen, the screen shot associated with the event is not the screen shot of the time of the occurrence of that significant event, but rather the screen shot of the moment before the next significant event. The next significant event is defined to be the next event greater than or equal to the lowest primary ranking in the listing of the most significant event. This allows any smaller additions such as text or movement to be included in the snapshot.

In this case the lowest primary ranking was that of an agent class. Thus the screenshots presented will contain all of the relationships and general classes drawn before the next agent class is drawn.

Here we have the first two screen shots describing the design history. The third one was the final screen shot. Notice that the second diagram include relationships and classes drawn after the significant event and before the next agent since the lowest ranking chosen was that of an agent.

If we asked to see the three most significant screen shots showing the design history, we would see the two screen shots displayed here as well as the final screen shot.

The system no longer allows users to design agent software, but it is also built up from agents itself.

The system run on the Metaglue agent architecture installed in the MIT AI lab intelligent room and in other various places. The system is composed of several agent components that all interact with one another. The design meeting manager agent is a general agent that inherits from the meeting manager agent, and starts the tahuti agent, the tahuti speech agent, and meeting capture devices. The meeting manager agent is an agent for general meetings and can be used to control the agenda as well as things like the lights and the fan. The Tahuti agent controls the sketch recognition and process the sketch related events. The Tahuti Speech Agent allows users to use voice to command the application. Meeting capture devices, such as a video agent, audio agent, and/or screen capture agent may be present in the room. The Metaglue agent system will start the most appropriate capture device in the room to record the meeting.

In our research group we think of agents as a programming abstraction that extends the object oriented programming style. They are differentiated from other such object in that we think of them as independent entities that do one task well. Agents also have communications with other agents at the forefront of their design. An agent itself also has a notion of where it’s running and can use this information to its advantage.

The Metaglue Agent Architecture has been built with several advantages in mind. For more information I invite you to look at the metaglue website.

It is built to support synchronous and asynchronous communication among distributed agents. This provides for fast communications across the network.

The metaglue agent architecture provides mechanisms for resource discovery and management. It holds a catalogue of all the agents available in a particular space. In the software design meeting scenario described here, this is important because the application can query for the available meeting capture devices in the room, and start the most appropriate available device. For instance, if a room has only audio recording available, it will start the audio recording. However, if a room has audio and video recording available, it will start the video recording.

The metaglue agent architecture also provides robust recovery mechanisms for failed components. If an agent fails, metaglue will restart it. For instance, if the video agent has failed for any reason, the agent will restart it.

Metaglue has built-in persistent storage. Agents can save information to a database as they are running. If an agent dies and is restarted, it can be restarted, keeping its old state by using the information stored in the database.

The Metaglue Agent Architecture also provides support for multimodal interactions through speech, gesture, and graphical user interfaces. We saw some of this support in the some of the previous diagrams. For instance, to add speech to an agent, the user need only to provide a grammar describing a set of expected utterances and a handler for speech input events.

A small user study performed at the MIT AI lab showed that users preferred drawing and editing software diagrams in Tahuti rather than a traditional paint application or a professional UML tool.

Tahuti was used in four classrooms in Columbia University this summer to help teach 65 students object oriented programming. The system was well-received and appeared to aid in both the initial program design and progressive program development, although a formal study was not done. Even simply having a graphical picture of the program allowed the student to maintain a clear picture of the program structure throughout the coding process.

Tahuti has been integrated into the MIT AI Lab Intelligent Room for use in software design meetings, including the design of agent-based systems.

Others have been interested in recognizing sketches of UML diagrams. Hse developed a wizard of oz experiment and determined that users preferred a sketch-based tool to a mouse and palette tool. Damm and others created a tool called Ideogrammic UML which recognizes UML class diagrams uses a graffiti-like implementation. Users are required to draw objects in a particular direction and in a single stroke. This is not always intuitive since some objects don’t look like they are sketched. Queen’s University also developed a system for recognizing UML class diagrams where recognition is done on stroke length compared to the drawn perimeter. This could cause some false positives since the letter M could be recognized as a rectangle using this metric.

UML diagrams have been found lacking simple ways to describe agent-based technologies (Odell, Parunak, and Bauer, 2000). Bergenti and Poggi (2001) have created a CAD system to input UML diagrams for agent-based systems. The system requires designers to enter their diagrams using a rigid CAD interface rather than allowing designers to sketch as they would naturally.

Much research has been done on indexing audio-visual material (Brunelli, Mich, and Modena, 1996). Researchers have attempted to label the video with salient features within the video itself, focusing on the recognition and description of color, texture, shape, spatial location, regions of interest, facial characteristics, and specifically for motion materials, video segmentation, extraction of representative key frames, scene change detection, extraction of specific objects and audio keywords.

While not much research has been done using sketch recognition to label and index a particular moment in video, a considerable body of work has been done using sketch recognition to find a particular moment in a pre-indexed video (Kato, Kurita, Otsu, and Hirata, 1992; Cho and Yoo, 1998; Jacobs, FinkelStein, and Salesin, 1995).

Future system enhancements include allowing the user to sketch more detail about a program. For instance, we plan to add the ability to recognize multiplicity relationships by interpreting annotations on associations. For example, the “many to one” relationship. We would also like to recognize other software diagram types, such as flow charts or sequence diagrams. We also hope to integrate a code editor, allowing users to alternate between diagram view and code view, using existing software to reverse engineer the code into diagrams.

We present an agent-based system for capturing and indexing software design meetings. Design meeting history is captured using available audio, video, and screen capture services in the environment. Tahuti, a sketch recognition agent, recognizes UML-type sketches drawn during the software design meeting and produces significant events based on the sketches drawn. These events are then used to index the videos and audiotapes for fast retrieval of specific information. The system is composed of multiple agents and runs in Metaglue, a multi-agent software infrastructure that provides for seamless multi-modal interaction between the various agents of the system as well as the users.