Sound advice - blog

Tales from the homeworld

My current feeds

Fri, 2005-Feb-25

RDF encoding for UML?

Paul Gearon asks

Is there a standard for serialised UML into RDF?

The standard encoding for UML 2.0 is XMI, but rdf schema already does a nice job at modelling some equivalent concepts to those of UML. The w3c has this to say:

Web-based and non-Web based applications can be described by a number of schema specification mechanisms, including RDF-Schema. The RDF-Schema model itself is equivalent to a subset of the class model in UML.

Here is an attempt to overcome the limitations of XMI by mapping UML to rdf more generally than is supported by rdf schema. Since rdfs is a subset of UML, this is "similar to defining an alternative RDF Schema specification".


Fri, 2005-Feb-25

CM Synergy

Martin Pool has begun a venture into constructing a new version control system called bazaar-NG. At first glance I can't distinguish it from CVS, or the wide variety of CM tools that have come about recently. It has roughly the same set of commands, and similar concepts of how working files are updated and maintained.

This is not a criticism, in fact at first glance it looks like we could be seeing a nice refinement of the general concepts. Martin himself notes:

I don't know if there will end up being any truly novel ideas, but perhaps the combination and presentation will appeal.

To the end of hopefully contributing something useful to the mix, I thought I would describe the CM system I use at work. When we initially started using the product it was called Continuus Change Management (CCM), and has since been bought by Telelogic and rebadged as CM Synergy. Since our earliest use of the product it has been shipped with a capability called Distributed Change Management (DCM), which has since been rebadged Distributed CM Synergy.

Before I start, I should note that I have seen no CM synergy source code and have only user-level knowledge. On the other hand, my user level knowledge is pretty in-depth given that I was build manager for over a year before working in actual software development for my company's product (it's considered penance in my business ;). At that time Telelogic's predecessor-in-interest, Continuus, had not yet entered Australia and we were being supported by another firm. This firm was not very familiar with the product, and for many years the CCM expertise in my company exceeded that of the support firm in many areas. Some of my detailed knowledge may be out of date. I've been back in the software domain for a number of years.

CCM is built on an Informix database which contains objects, object attributes, and relationships. Above this level is the archive, which uses gzip to store object versions of binary types and a modified gnu rcs to store object versions of text types. Above this level is the cache, which contains extracted versions of all working-state objects and static (archived) objects in use within work areas. Working state objects only exist within the cache. The final level is the work area. Each user will have at least one, and that is where software is built. Under unix, the controlled files within the work area are usually symlinks to cache versions. Under windows, the controlled files must be copies. Object versions that are no longer in use can be removed from the cache by using an explicit cache clean command. A work area can be completely deleted at any time and recreated from cache and database information with the sync command. Atribitrary objects (including tasks and projects, which we'll get to shortly) can be transferred between CCM databases using the DCM object version transfer mechanism.

CCM is a task-based CM environment. That means that it distinguishes between the concept of a work area, and what is currently being worked on. The work area content is decided by the reconfigure activity which uses reconfigure properties on a project as its source data. A baseline project and a set of tasks to apply (including working state and "checked-in" (static) tasks). This set is usually determined by a set of task folders, which can be configured to match the content of arbitrary object queries.

Once the baseline project and set of tasks is determined by updating any folder content, the tasks themselves and the baseline project are examined. Each one is equivalent to a list of specific object versions. Starting at the root directory of the project, the most-recently-created version of that directory object within the task and baseline sets is selected. The directory itself specifies not object versions, but file-ids. The slots that these ids identify are filled out in the same way, by finding the most-recently-created version of the object within the task and baseline sets.

So, this allows you to be working on multiple tasks within the same work area. It allows you to pick up tasks that have been completed by other developers but not yet integrated into any baseline and include them in your work area for further changes. The final and perhaps most imporantant thing it allows you to do is perform a conflicts check.

The conflicts check is a more rigourous version of the reconfigure process. Instead of just selecting the most-recently-created object version for a particular slot, it actively searches the object history graph. This graph is maintained as "successor" relationships in the informix database. If the the graph analysis shows that any of the objects selected by the baseline or task set are not predecessors of the selected objects then a conflict is declared. The user typically resolves this conflict by performing a merge between the two selected but branch versions using a three-way diff tool. Conflicts are also declared if part of a task is included "accidentally" in a reconfigure. This can occur if you have a task A and task B where B builds on A. When B is included, but A is not included some of A's objects will be pulled into the reconfigure by virtue of being predecessors of "B" object versions. This is detected and the resolution is typically to either pull A in as well, or to remove B from the reconfigure properties.

The conflicts check is probably the most important feature of CCM from a user perspective. Not only can you see that someone else has clobbered the file you're working on, but you can see how it was clobbered and how you should fix it. On the other side, though, is the build manager perspective. Task-based CM makes the build manager role somewhat more flexible, if not actually easier.

The standard CCM model assumes you will have user work areas, an integration work area, and a software quality assurance work area. User work areas feed into integration on a continuous or daily basis, and every so often a cut of the integration work area is taken as a release candidate to be formally assessed in the slower-moving software quality assurance work area. Each fast moving work areas can use one of the slower-moving baselines as its baseline project (work area, baseline, and project are roughly interchangeable terms in CCM). Personally, I only used an SQA build within the last few months or weeks of a release. The means of delivering software to be tested by QA is usually a build, and you often don't need an explicit baseline to track what you gave them in earlier project phases.

One way we're using the CCM task and projects system at my place of employment is to delay integration of unreviewed changes. Review is probably the most useful method for validating design and code changes as they occur, whether it be document review or code review. Anything that hasn't been reviewed isn't worth its salt, yet. It certainly shouldn't be built on top of by other team members. So what we do is add an approved_by attribute to each task. While approved_by is None, it can be explicitly picked up by developers if they really need to build upon it before the review cycle is done... but it doesn't get into the integration build (it's excluded from the folder query). When review is done, the authority who accepts the change puts their name in the approved_by field, and either that person or the original developer does a final conflicts check and merge before the nightly build occurs. That means that work is not included until it is accepted, and not accepted until it passes the conflicts check (as well as other check such as developer testing rigour). In the mean-time other developers can work on it if they are prepared to have their own work depend on the acceptance of the earlier work. In fact, users can see and compare the content of all objects, even working state objects that have not yet been checked in. That's part of the beauty of the cache concept, and the idea of checking out objects (and having a new version number assigned to the new version) before working on them.

I should note a few final things before closing out this blog entry. Firstly, I do have to use a customised gnu make to ensure that changes to a work area symlink (ie, selection of a different file version) always cause a rebuild. It's only a one-line change, though. Also CCM is both a command-line utility and a graphical one. The graphical version makes merging an understanging of object histories much easier. There is also a set of java GUIs which I've never gotten around to trying. Telelogic's Change Synergy (a change request tracking system similar in scope to bugzilla) is designed to work with CCM, and should reach a reasonable level of capability in the next few months or years but is currently a bit snafued. Also, I haven't gone into any detail about the CCM object typing system or other aspects that there are probably better solutions to these days anyway. I also haven't covered project hierarchies, or controlled products which have a few interesting twists of their own.