Sound advice

I've spent some of this afternoon playing with Dia. I have played with it before and found it wanting, but that was coming from a particular use case.

At work I've used Visio extensively, starting with the version created before Microsoft purchased the program and began integrating it with their office suite. As I've mentioned previously I use a public domain stencil set for authoring UML2 that I find useful in producing high-quality print documentation. When I used Dia coming from this perspective I found it very difficult to put together diagrams that were visually appealing in any reasonable amount of time.

Today I started from the perspective of using Dia as a software authoring tool, much like Visio's standard UML stencils are supposed to support but with my own flavour to it. Dia is able to do basic UML editing and because it saves to an XML file (compressed with gzip) it is possible to actually use the information you've created. Yay!

I created a couple of xsl stylesheets to transform a tiny restricted subset of Dia UML diagrams into a tiny restricted subset of RDF Schema. I intend to add to the supported set as I find a use for it, but for now I only support statements that indicate the existence of certain classes and of certain properties. I don't currently describe range, domain, or multipicity information in the RDFS, but this is only meant to be a rough scribble. Here's what I did:

First, uncompress the dia diagram:
$ gzip -dc foo.dia > foo.dia1
Urrgh. That XML format looks terrible:
```
    <dia:object type="UML - Class" version="0" id="O9">
      <dia:attribute name="obj_pos">
        <dia:point val="20.6,3.55"/>
      </dia:attribute>
      <dia:attribute name="obj_bb">
        <dia:rectangle val="20.55,3.5;27,5.8"/>
      </dia:attribute>
```
It's almost as bad as the one used by gnome's glade! I'm particularly averse to seeing "dia:attribute" entities when you could have used actual XML attributes and saved everyone a lot of typing. The other classic mistake they make is to assume that a consumer of the XML needs to be told what type to use for each attribute. The fact is that the type of a piece of data is the least of a consumer's worries. They have to decide where to put it on the screen, or which field to insert it into in their database. Seriously, if they know enough to use a particular attribute they'll know its type. Just drop it and save the bandwidth. Finally (and for no apparent reason) strings are bounded by hash (#) characters. I don't understand that at all :) Here's part of the xsl stylesheet I used to clean it up:
```
  <xsl:for-each select="@*"><xsl:copy/></xsl:for-each>
  <xsl:for-each select="dia:attribute[not(dia:composite)]">
    <xsl:choose>
      <xsl:when test="dia:string">
        <xsl:attribute name="{@name}">
          <xsl:value-of select="substring(*,2,string-length(*)-2)"/>
        </xsl:attribute>
      </xsl:when>
      <xsl:otherwise>
        <xsl:attribute name="{@name}">
          <xsl:value-of select="*/@val"/>
        </xsl:attribute>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:for-each>
  <xsl:apply-templates select="node()">
    <xsl:with-param name="parent" select="$parent"/>
  </xsl:apply-templates>
```
Ahh, greatly beautified:
$ xsltproc normaliseDia.xsl foo.dia1 > foo.dia2
<dia:object type="UML - Class" version="0" id="O9" obj_pos="20.6,3.55" obj_bb="20.55,3.5;27,5.8" elem_corner="20.6,3.55"...
This brings the uncompressed byte count for my partular input file from in excess of 37k down to a little over 9k, although it only reduces the size of the compressed file by 30%. Most importantly, it is now much simpler to write the final stylesheet, because now I can get at all of those juicy attributes just by saying @obj_pos, and @obj_bb. If I had really been a cool kid I would probably have folded the "original" attributes of the object (type, version, id, etc) into the dia namespace while allowing other attributes to live in the null namespace.

So now that is complete, the final stylesheet is nice and simple (I've only cut the actual stylesheet declaration, including namespace declaration):

<xsl:template match="/">
<rdf:RDF>
        <xsl:for-each select="//dia:object[@type='UML - Class']">
                <xsl:variable name="classname" select="@name"/>
                <rdfs:Class rdf:ID="{$classname}"/>
                <xsl:for-each select="dia:object.attributes">
                <rdfs:Property rdf:ID="{concat($classname,'.',@name)}"/>
                </xsl:for-each>
        </xsl:for-each>
        <xsl:for-each select="//dia:object[@type='UML - Association']">
                <rdfs:Property rdf:ID="{@name}"/>
        </xsl:for-each>
</rdf:RDF>
</xsl:template>

Of course, it only does a simple job so far:
$ xsltproc diaUMLtoRDFS.xsl foo.dia2 > foo.rdfs

<rdf:RDF xmlns...>
  <rdfs:Class rdf:ID="Account"/>
  <rdfs:Property rdf:ID="Account.name"/>
  <rdfs:Class rdf:ID="NumericContext"/>
  <rdfs:Property rdf:ID="NumericContext.amountDenominator"/>
  <rdfs:Property rdf:ID="NumericContext.commodity"/>
...

My only problem now is that I don't really seem to be able to do anything much useful with the RDF schema, other than describe the structure of the data to humans which the original diagram does more intuitively. I do have a script which constructs an sqlite schema from rdfs, but I really don't have anything to validate the rdfs against. I don't have any program that will validate RDF data against the schema that I'm aware of. Perhaps there's something in the Java sphere I should look into.

The main point, though, is that Dia has proven a useful tool for a small class of problems. Schema information that can be most simply described in a graphical format and is compatible with Dia's way of doing things can viably be part of a software process.

I think this is important. I have already been heading down this path lately with XML files. Rather than trying to write code to describe a constrained problem space, I've been focusing on nailing down the characteristics of the space and putting them into a form that is human and machine readible (XML) but is also information-dense. The sparsity of actual information in some forms of code (particularly those dealing with processing of certain types of data) can lead to confusion as to what the actual pass/fail behaviour is. It can be hard to verify the coded form against a specification, and hard to reverse-enginer a specification from existing code. The XML approach allows a clear specification, from which I would typically generate rather than write the processing code. After that, hand-written code can pass that information on or process it in any appropriate way. That hand-written code is improved in density because the irrelevant rote parts have been removed out into the XML file.

So what this experiment with Dia means to me is that I have a second human- and machine- readible form to work with. This time it is in the form of a diagram, and part of a tool that appears to support some level of extension. I think this could improve the software process even more for these classes of problem.

Benjamin

Sound advice - blog

Lifesigns

Subscribe

RDF

Feedback and Social Software

Support Software

Site Statistics

License

My recent bookmarks

Symphony Operating System

Dia