Assignment 5
Refactoring the KnownSpace Hydrogen Kernel

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

CSCI B490/B469

October 14, 2002

Jason Baumgartner, Ailish Byrne, Mary Clegg, Chris Dent,

Bob George, Nate Johnson, Allen Lee, Matt Liggett, Brian Mcgough and Xi Rao

KnownSpace Hydrogen Kernel Overview

The KnownSpace Kernel (The Datamanager Kernel Guide) is the heart of an application framework called KnownSpace (KnownSpace, 1999). KnownSpace's raison d'etre is to be "an open, programmable, computational environment suitable for arbitrary data management applications so that anyone, anywhere, armed only with a bright idea and some time can build anything they choose—for free.” (The KnownSpace Hydrogen API Design, 1999) Why might you want such a thing? Well, consider the following story:

 

Imagine what appears to be an unstructured jumble of data, like a card catalog that someone has tipped over. The data is out there and very interesting to you, but not organized in any obvious way.

Now, imagine all the little cross-references and Dewey Decimal numbers on the cards. Imagine that at each cross-reference, a tiny string is taped directly to the card, and if you follow that string, you come to the card it references! Naturally, you can also follow the string in the other direction.

Okay, so in front of you is this card-and-string wreckage. What a mess, right!? Well, imagine that to your left is a big library table stacked with spider-looking robots about six inches across. Grab one. Switch it on. Tell it that you're interested in Miyamoto Musashi. The thing hops off your palm and dives into the cards. It dances along the pieces of string until it finds a card it thinks you might like. When it does, it propels itself by little anus jets until it hovers in front of your face. It holds the card in front of you: Book of the Five Rings. Exactly what you're looking for!

Now, what else do you notice? Well, when your robot friend brought that card up to your face, he dragged some of the rest of the tangled, stringy morass with him. Inevitably, some other cards are dangling nearby—close to view—because they are attached by strings to this card. Off to the right you notice something about Hagakure. Yeah, that seems interesting. Dangling off to the left is a card that says something about Lone Wolf and Cub, and hanging below are cards mentioning Machiavelli and Sun Tzu. Those might be worth pursuing, too.

KnownSpace is an empty library floor and a library table. It's also an infinite stack of card catalog cards, an infinite spool of string, and an infinite reel of tape. As a bonus, you get a factory that makes anus-jet spider robots for free.

KnownSpace is written in Java, and the Kernel supports a small set of basic operations. KnownSpace

 

Bits of data in KnownSpace are called Entities. The relationships among Entities are specified using directional links: one Entity (called the Base) points at another Entity (called the Attribute) in order to specify some relationship. Entities may have as many attributes as they like.

All the existent Entities are stored in a Pool. In order to search for Entities, the Pool has a search() method. This method takes a Constraint, an extensible way to specify a query against the Pool of Entities.

The events are implemented using an Event interface, and Constraints are just as capable of matching Events as they are of matching Entities.

This property is known as persistence, and it makes the system more usable because the user may stop and start his work at leisure without concern for the state of the application.

Kernel Review and Refactoring

Team A was assigned the task of refactoring the KnownSpace kernel. The goal, as found in Assignment 5 (Rawlins, 2002), was to “[r]efactor the KnownSpace Hydrogen alpha kernel” with focus “on making the code: Java 1.4 compliant, thouroughly tested, flexible and efficient, free of style crime [and] well documented.” In addition the team was to make a working persistence layer for the kernel. To reach these goals work was divided amongst pairs by the sections described below.

 

While each pair approached the refactoring process differently, the following wisdom provided a unifying theme: Code is a form of communication with future developers. As KnownSpace is a project that is continually being developed, its code must be easy to understand, easy to change and easy to trust. Code is easier to understand, change and trust when it is well-documented, written in good style, uses loosely coupled classes, is programmed to interfaces rather than implementations, ensures contracts and provides a suite of tests that ensure that functionality is guaranteed as changes are made.

 

Details of specific changes made are provided below. Several common changes are described here:

 

To be Java 1.4 compliant, the use of the assert() method in the org.datamanager.tools.Debug class had to be factored out. Java 1.4 introduces formal assertions to Java (Programming with Assertions, 2002), making assert a keyword in the language. Depending on the way Debug.assert() was being used, one of two changes was chosen:

 

Unit test cases, using JUnit (JUnit Test Infected: Programmers Love Writing Tests), were created for several classes. Tests ease future changes by increasing confidence that any changes made do not cause problems. Using tests accelerates development. All pairs benefited from using tests. Some experienced challenges accessing private data and methods. A possible solution to this problem that was not pursued but may be useful in the future is JUnitX (Extreme Java - JUnitX). JunitX is an addition to JUnit that supports access to private methods and data.

 

Prior to Refactoring, the classes in the KnownSpace kernel reflected no consistent documentation style. To make the code more clear, pairs adjusted or added JavaDoc as necessary for classes, methods and data. A frequent change was adjusting comments to use the third person, in accordance with Sun’s suggestion (How to Write Doc Comments for the Javadoc(tm) Tool, 2000).

 

To ensure future flexibility many classes were changed to program to interfaces rather than concrete implementations. For example List was used instead of Vector and Map instead of Hashtable. Group discussion on the team mailing list as well as the class newsgroup revealed that it would also be valuable to use classes created as part of the Java Collections framework. Thus the ArrayList implementation of List is used instead of Vector and the HashMap implementation of Map rather than Hashtable. In a similar fashion, Iterators replaced Enumerations where possible (and noticed). These changes were not made throughout the code, but a start was made.

 

The common refactoring technique of “Extract Method” (Fowler) was used in many places in the code to turn a fragment of code into its own method. This was done to increase readability and, as with most of the changes, increase the ease with which future changes can be made. Other common refactorings include “Replace Temp With Query” and “Decompose Conditional”.

 

To ease code review, a ‘private-docs’ target was added to the ant build.xml file to generate JavaDoc for all classes, methods and members, including private methods and members.

Sections

Justification

To minimize collisions between pairs and aid in the division of labor, the twenty kernel classes were broken up into four related sections by identifying a role and the classes that participate in each role. In some cases the roles were rather arbitrary with groupings simply a matter of convenience to the reviewers. Roles were identified with the assistance of the Kernel Guide on the KnownSpace web site (The Datamanager Kernel Guide). The roles are: Pool, DataManager, Event and Entity. The following table lists the classes associated with each role, as they existed prior to refactoring.

 

Role

Pool

DataManager

Event

Entity

Description

Classes providing a searchable pool of Entities. Searches are performed using constraints.

Classes that start the interface to the system and are responsible for loading and managing Simpletons.

Interfaces and Classes responsible for generating and handling Events.

Classes providing the fundamental bricks for data management.

Entities are the parts of the system that are persisted.

Classes

Pool

AbstractEntityConstraint

AbstractEventConstraint

EntityManager

VerySimpleEntityManager

DataManager

DataManagerWorker

KernelWindow

Simpleton

SimpletonLoaderProcess

EventGenerator

EventHandler

EventGeneratorAssistant

EventHandlerThread

Entity

EntityProxy

EntityValue

Identifier

IdentifierManager

VerySimpleEntity

 

The division of the classes into four sections, with the addition of the persistence task, conveniently mapped to the 5 pairs available in Team A. Each pair not working on persistence was expected to take a section and review and refactor it according to the goals stated above. At the end of review, a completed section was passed to another pair.

 

An overview of the changes made to the classes associated with each role follows. For details on changes please see the CVS log(TeamA, 2002).

Changes

Pool

The Pool, AbstractEntityConstraint, AbstractEventConstraint, and EntityManager classes are all active participants in the role of presenting an abstract pool of Entities. The Pool acts as the concrete representation of a collection or set of Entities. AbstractEntityConstraint and AbstractEventConstraint provide base classes for concrete Constraints used in searching or filtering the Pool. EntityManager manages the instantiation and persistence of the Pool. As the EntityManager maintains the relationships between Identifiers, Entities, and EntityProxies, a Pool is essentially useless without an EntityManager to query for mappings. Hence, persistence of a Pool is intimately tied to the persistence of its associated EntityManager.

 

The major changes in this role involved:

 

1. Modifying the static serialization and de-serialization being performed in VerySimpleEntityManager to utilize the new persistence layer. Originally, creation of both the defaultPool and the VerySimpleEntityManager were attempted via de-serialization first and then done by conventional instantiation if deserialization fails.

 

We defined two new operations within the abstract EntityManager class: a static restore(File) method and an abstract persist(File) method. The VerySimpleEntityManager's Singleton instance is statically created via the createInstance() method. The createInstance() method attempts to restore() itself from its hard-coded constant PERSISTENCE_FILE_LOCATION. If this fails, the method instantiates a new VerySimpleEntityManager which in turn instantiates a new Pool via createDefaultPool(). The instantiation of a new Pool via createDefaultPool() will be discussed below.

 

2. Changing the package-private access constructor of Pool to a package-private static method createDefaultPool(). This change anticipates eventual support for multiple pools. Security and other checks to ensure contract enforcement during Pool instantiation are easier to perform with a static method call versus a constructor.

 

The createDefaultPool() method performs persistence restoration checks similar in spirit to the EntityManager. One difference, however, is that the Pool does not need to load itself statically, as the EntityManager ensures that this will happen. Persistence of pools by themselves is unnecessary now, but may become useful if EntityManagers become capable of regenerating their Identifier/Entity/EntityProxy mappings after being given a specific Pool.

 

Additional changes to Pool include:

 

Changes to AbstractEntityConstraint and AbstractEventConstraint were minor. These classes are the base classes for concrete Constraint creation and implement two methods, accepts(Entity) and describes(Entity). The accepts() method performs any necessary initialization for invoking describes() on the given Entity such that the KnownSpace security model is not compromised. For example, threads not owned by the kernel should not be allowed to enter the kernel space and should be killed. At this time thread killing is not implemented, but if a non-kernel thread is encountered in accepts() its presence is logged. The describes() method performs the actual constraining. It determines whether the Entity it is passed is described by the given Constraint (in other words, whether or not the Entity passed in to the method ‘passes the test’). The only code refactoring performed on these classes was to change the final accepts() method's multiple points of return into a single point of return in addition to the usual checks for documentation inconsistencies.

DataManager

 

The DataManager role is responsible for starting the KnownSpace data-manager, loading the Simpletons available to the system and providing an initial interface for the user. In addition to the general changes described in the overview above, several significant modifications were made to KernelWindow, Simpleton, SimpletonLoaderProcess and DataManager.

 

KernelWindow was renamed to KernelGui. Instead of extending JFrame, KernelGui now has a JFrame that it uses. Large chunks of repeated event handling code were extracted to a private method: startSimpletonsAtIndeces(). To facilitate persistence, a persistPoolAndExit() method was added. This method is called when System.exit() would otherwise be called. Once we learn how, it would be ideal if System.exit() could be hooked instead. persistPoolAndExit() will persist the pool (including its entities) and then call System.exit().

 

In order to support the presence of different kernel interfaces, and also multiple or even zero interfaces running at one time, direct knowledge between the KernelGui class and the SimpletonLoaderProcess was removed. A SimpletonList class, which is Observable, was added. SimpletonLoaderProcess produces a SimpletonList and adds new Simpletons to the list as they are loaded. Any classes may observe the SimpletonList by implementing the Observer interface. KernelGui was made an Observer of SimpletonList.

 

Extra care was taken in SimpletonList to attempt to ensure that additions and removals to the list as well as notifications of Observers are handled without danger in a thread rich environment.

 

To allow multiple copies of the same Simpleton running at the same time, we modified the Simpleton class to implement Cloneable. This introduces the expectation that any default constructor for a Simpleton-implementing class should return a prototypical Simpleton that can be freely clone()d. There is further work to do in this area: Simpleton metadata should be expanded to include information about the number of copies of a given Simpleton that may be running at once, as well as whether a Simpleton should be restarted once it finishes.

 

So that all semantics associated with Simpleton.Priority are kept together, the two translatePriority() methods in DataManagerWorker were moved into Simpleton.Priority and renamed for clarity, pushing the intelligence down to where it ought to be.

 

DataManager was updated to add a private method, startPoolPersistingThread(), that starts a thread that persists the system at a constant interval. The method is called from the main() method of DataManager. With the new method of persistence in place the deprecated System.runFinalizersOnExit(true) is no longer needed so was removed.

Event

The Event role allows for the generation and handling of events in the system. EventGenerator and EventHandler provide interfaces for classes that wish to generate and handle events, respectively. The EventGeneratorAssistant provides a class to which an EventGenerator may delegate some of the common tasks an EventGenerator may wish to perform. EventHandlerThread is a Simpleton created by EventGeneratorAssistant that encapsulates the thread that will call handle() on an EventHandler. When handle() is called it is told of the DataManagerEvent that an EventGenerator has fired.

 

Changes in the classes focused on the EventGeneratorAssistant and the EventHandlerThread classes. The EventGenerator and EventHandler interfaces were left unchanged. Three main refactoring techniques were used to improve the EventGeneratorAssistant and EventHandlerThread classes. “Extract Method”, “Replace Temp with Query” and “Decompose Conditional” were used to make methods, such as fireEvents() in the EventGeneratorAssistant class, shorter and easier to follow. In fireEvents() the bulk of the code was removed into the method fireEventsToHandlerAccordingToConstraints() and eventually refactored further into the method fireEventHandlers(). The predicate method shouldGenerateEvents() is responsible for checking to see if the current thread is an event generating thread.

 

Smaller, yet still important, refactorings included renaming instance variables to accurately describe their purposes. For example, listeners was renamed to eventListeners. Private accessors for the instance variables were also added so that future renaming of variables could be isolated to one place. For example, there is now a getEventListeners() method that should be called rather than accessing the eventListeners variable directly.

 

Finally, the datatype for the eventListeners was changed to be a Map and instantiated as a HashMap. This was done to take advantage to the Collections framework provided by Java 2.  The initial size of 5, which originally had to do with memory concerns, was also eliminated. At this time we are not interested in performance, rather a cleaner implementation.

Entity

The Entity role gathers those classes that are Entities themselves or are responsible for managing Entities. Entities are the fundamental unit of stored data in KnownSpace.

 

Special attention was paid to making the documentation of the code for these classes accurate, clear, and concise. The main motivation was to rid the documentation of its many errors and inconsistencies.

 

Many of the changes in the Entity role were to rid the classes of the Debug.assert() method. In some cases Debug.assert() was used to check for null objects passed as parameters. In those cases an IllegalArgumentException was thrown and a log message written with the new Debug.writeLogMessage() method. In other cases Debug.assert() was used when a method was not supported by the class but was required by the parent class. This is the case with VerySimpleEntity. In those cases an UnsupportedOperationException was thrown.

 

Throughout the classes the following refactorings were applied: unclear variable and parameter names were changed to be more informative, data structures were declared with a Java Collections Framework interface, accessor and instance variable names were made to be consistent, and conditional logic was decomposed into private methods. For example, the identifier and extra instance variables of SimpleIdentifier were renamed to stringSection and longSection respectively. In VerySimpleEntity the declaration of the attributes and bases instance variables were changed from Vector to List.

 

Changes were made to EntityProxy to ensure smooth method interactions. The loadNewAssistant() method was changed to only load a new EventGeneratorAssistant if one was not already loaded. This change allows methods that call loadNewAssistant() to forego checking for an assistant prior to making the call. The removeBase() method was changed to accept only an EntityProxy when it had been accepting an Entity. This change was the result of discovering an unnecessary cast to EntityProxy of an object that was already an EntityProxy when passed to addBase(). Finally, the getIdentifier() method of IdentifierManager was removed from the class as it was not being used.

Testing

To increase developer confidence and ease changes, unit tests were developed where possible and where time allowed. The presence of tests gives current and future developers greater confidence that their changes in the implementation of the classes are functionally accurate. Some of the more important test classes are described below.

 

The Entity class is the data storehouse for KnownSpace and thus has the largest suite of tests. They test creation of Entities, linking between Entities, retrieving linked Entities, creation of Entity Identifiers and the subscription and unsubscription of EventHandlers.

 

A DataManagerWorkerTester class was created to allow the migration of the thread priority translation function from DataManagerWorker to Simpleton.Priority. The test was created prior to any changes being made to the code. Once the test was working correctly, the DataManagerWorker and SimpletonPriority classes were easily modified. When the tests passed after these modifications, the developers were confident the change was correct.

 

The EventGeneratorAssistantTester class tests the addition and removal of EventHandlers from an EventGeneratorAssistant and ensures that fired events are properly handled. The related EventHandlerThreadTester class ensures that the EventHandlerThread (a Simpleton) calls handle() on the correct EventHandler.

 

The PoolTester class ensures that pools can be created, are Singletons when created, and may have Entities added.

 

Functional testing of the kernel has been accomplished with TeamASimpleton and EntityMakerSimpleton. These classes allow a developer to use KnownSpace in much the same ways a user might. A ‘run’ target in the ant build.xml file will start KnownSpace and provide access to these two Simpletons.

Persistence

The persistence layer operates through the DataAccessObject interface to allow hierarchies of objects to be saved to a nonvolatile medium. A DataAccessObjectFactory is provided for the creation of arbitrary types of DataAccessObjects. TeamA has implemented two types: XMLDataAccessObject and SerializableDataAccessObject.

 

The XML persistence layer has been implemented through the new org.datamanager.persistence package. The XMLEncoder and XMLDecoder packages included in the Java Sun API do support persistence to XML, but they require the objects to conform to the JavaBeans standard. For our purpose a more general solution was necessary. Therefore, the XML persistence layer required a new package of classes. The package supports encoding of all member variables in a given object, and builds an object hierarchy based on the top-level object encoded (the object passed in to be encoded). This structure conforms to a subset of the DTD (see Appendix 1 for the DTD used) for the XMLEncoder/XMLDecoder. For more information on the structure of the XML persistence package see Figure 1.

 

Issues requiring further review include: the use of arrays, as support for multi-dimensional arrays is not yet implemented; testing for arrays of arrays; dealing with null objects; encoding of special characters in String objects and chars; and further general testing of the package.

 

Figure 1: Design Diagram for Persistence

The SerializableDataAccessObject persistence layer takes advantage of Java’s existing Serializable interface to save objects. While it is far more lightweight than the XML layer, it does not provide the portability options of XML.

 

At this time, persistence is used to save Entities. When the EntityManager is asked to persist, all Entities and Identifiers are saved because they are stored in Maps on the EntityManager.

 

A large number of JUnit test cases were created to ensure that the persistence layer can correctly save and restore diverse objects.

Left To Do

The work done to refactor the KnownSpace kernel is by no means complete. This review has provided an opportunity for the members of Team A to learn more about the kernel: how it works and what it can do. The code changes made clarify the function of the classes and make possible changes in the future but do little now to radically alter the flexibility and function of the kernel.

 

Some of the ideas for future changes include:

 

  1. Implementing shared Pools.
  2. Disposing of the KernelGui interface and allowing multiple or zero interfaces.
  3. Implementing other DataAccessObjects for the sake of multiple styles of persistence.
  4. Extending Simpleton.Information to include information about the number of concurrent Simpletons that may exist as well as whether an existing Simpleton should restart after it has reached the end of its process() method.
  5. After determining if it is worthwhile, providing a mechanism by which a Simpleton may choose to be persistable.
  6. Disposing of the knownspace_config.xml file used for loading Simpletons and returning to a directory from which Simpleton classes are loaded.
  7. Exploring a more robust method of Identifier generation that creates identifiers that are truly unique and persistent. The current implementation could have collisions if shared Pools are implemented.
  8. Optimizing constraints so the searches of the Pool do not require direct comparisons with every Entity.

Lessons Learned

This assignment was an exercise in teamwork and communication. Several important lessons were learned which will be valuable in future team-based software projects:

 

Conclusion

The refactoring process is never complete. Team A has made the KnownSpace kernel Java 1.4 compliant and made improvements in persistence, style, documentation and flexibility. While it would give us great satisfaction to say we were done, we know that we are not, but at least we are one small step closer to getting spider robots with anus jets.

Code

The refactored kernel classes and associated helper and test classes may be found in a zip or tar file at http://www.burningchrome.com/~cdent/649/ass5/turnin/.

Colophon

The members of Team A created this document collaboratively. Each pair responsible for changes in a section or persistence provided a draft explanation of the changes they made and the reasons for making those changes. These changes were then compiled into a document reviewed by the entire group.

 

The members of Team A are: Jason Baumgartner, Ailish Byrne, Mary Clegg, Chris Dent,

Bob George, Nate Johnson, Allen Lee, Matt Liggett, Brian Mcgough and Xi Rao.

References

The Datamanager Kernel Guide. Retrieved October 1, 2002, from the World Wide Web: http://developer.knownspace.org/groups/kernel/KernelGuide/index.jsp

Extreme Java - JUnitX. Retrieved October 12, 2002, from the World Wide Web: http://www.extreme-java.de/junitx/

Fowler, M.Refactorings in Alphabetical Order. Retrieved October 12, 2002, from the World Wide Web: http://www.refactoring.com/catalog/index.html

How to Write Doc Comments for the Javadoc(tm) Tool (2000). Retrieved October 12, 2002, from the World Wide Web: http://java.sun.com/j2se/javadoc/writingdoccomments/

JUnit Test Infected: Programmers Love Writing Tests. Retrieved October 1, 2002, from the World Wide Web: http://junit.sourceforge.net/doc/testinfected/testing.htm

KnownSpace (1999). Retrieved October 1, 2002, from the World Wide Web: http://www.knownspace.org/

The KnownSpace Hydrogen API Design (1999). Retrieved October 1, 2002, from the World Wide Web: http://developer.knownspace.org/designdocs/api_design.jsp

Programming with Assertions (2002). Sun Microsystems. Retrieved October 5, 2002, from the World Wide Web: http://java.sun.com/j2se/1.4.1/docs/guide/lang/assert.html

Rawlins, G. (2002, September 30). Design Patterns in Java Assignment 5. Retrieved October 1, 2002, from the World Wide Web: http://www.burningchrome.com/~cdent/dpmoin/moin.cgi/AssignmentFive

TeamA. (2002). cvsLog. Retrieved October 12, 2002, from the World Wide Web: http://www.burningchrome.com/~cdent/dpmoin/moin.cgi/cvsLog

 

Appendix 1

 

Persistence XML DTD

 

<!ELEMENT java (object)>
<!ATTLIST java version  CDATA  #IMPLIED>
 
<!ELEMENT boolean (#PCDATA)>
<!ELEMENT byte    (#PCDATA)>
<!ELEMENT char    (#PCDATA)>
<!ELEMENT short   (#PCDATA)>
<!ELEMENT int     (#PCDATA)>
<!ELEMENT long    (#PCDATA)>
<!ELEMENT float   (#PCDATA)>
<!ELEMENT double  (#PCDATA)>
<!ELEMENT string  (#PCDATA)>
<!ELEMENT null    (#PCDATA)>
 
<!ELEMENT object (void)+>
 
<!ATTLIST object 
  id       ID     #IMPLIED
  idref    IDREF  #IMPLIED
  class    CDATA  #IMPLIED
        >
 
<!ELEMENT array (
  object   | 
  string   |  
  null     | 
  array    | 
  boolean  | 
  byte     | 
  char     | 
  short    | 
  int      | 
  long     | 
  float    |  
  double
        )*>
 
<!ELEMENT void (
  object   | 
  string   |  
  class    | 
  null     | 
  array    | 
  boolean  | 
  byte     | 
  char     | 
  short    | 
  int      | 
  long     | 
  float    |  
  double
        )*>
        
<!ATTLIST void property CDATA  #IMPLIED>