Sound advice - blog

Tales from the homeworld

My current feeds

Sat, 2005-Jul-16

File Tagging

Watching my mother trying to use Windows XP to locate her holiday snaps makes it clear to me that tagging is the right way to interact with personal documents. The traditional "one file, one location" filesystem is old and busted. The scenareo begins with my mother learning how to take pictures from her camera and put them into foldlers. Unfortunately, my father is still the one managing short movie files. The two users have different mental models for the data. They have different filing systems. Mum wants to find files by date, or by major event. Data thinks that movie files are different to static images and that they should end up in different places. The net result is that Mum needs to learn how to use the search feature in order to find her file, and is lucky to find what she is looking for.

Using tags we would have a largely unstructured collection of files. The operating system would be able to apply tags associated with type automatically, so "mpeg" and "video" might already appear. The operating system might even add tags for time and date. The user might add additional tags such as "21st Birthday" or "Yeppoon Trip". Tags are associated with the files themselves and can be added from anywhere you see the file. You'll then be able to find the file via the additional tag. This approach seems to work better than searching or querying does for non-expert computer users. A query has to be constructed from ideas that aren't in front of the user. Tags are already laid out before them.

Here is one attempt at achieving a tagging model in a UNIX filesystem. I'm not absolutely sure that soft links are the answer. Personally I wonder if we need better hard links. If you delete a hard link to a file from a particular tag set it would disappear from there but stay connected to the filesystem by other links to it. It shouldn't be possible to make these links invalid like it is with symbolic links. Unfortunately hard links can't be used across different filesystems. It would be nice if the operating system itself could manage a half-way point. I understand this would be tricky with the simplest implementation resulting in a copy of the file on each partition. Deciding which was the correct one when they dropped out of sync would be harmful. Perhaps tagging should simply always happen within a single filesystem. Hard links do still have the problem that a rename on one instance of the file doesn't trigger a rename to other tagged instances.

Benjamin

Sat, 2005-Jul-16

The Visual Display Unit is not the User Interface

One thing I've noticed as I've gotten into user interface design concepts is that the best user interface is usually not something you see on your computer screen. When you look at an iPod it is clear how to use it, and what it will do. A well designed mobile phone such as the Nokia 6230 my wife carries makes it easy to both make phone calls and to navigate its menus for more sophisticated operations. A gaming console like the PS2, the Xbox, or the Game boy is easy to use. Easer than a PC, by miles.

Joel Spolsky has a wonderful work online descibing how user interfaces should be designed. He designates a whole chapter to "Affordances and Metaphors", which very simply amounts to giving the user hints on when and how to click. In the same document he highlights the problem that makes desktop software so hard to work with generally:

Users can't control the mouse very well.

I've noticed that whenever I'm finding a device easy to use, it is because it has a separate physical control for everything I want to do. Up and down are different buttons, or are different ends of a tactile device. I don't have to imagine that the thing in the screen is a button. Instead, the thing in the screen obviously relates to a real button. I just press it with my stubby fingers and it works.

So maybe we should be thinking just as much about what hardware we might want to give to users as we think about how to make our software work with the hardware they have already. Wouldn't it be nice if instead of a workspace switcher you had four buttons on your keyboard that would switch for you? Wouldn't it be nice if instead of icons on a panel, minimised applications appeared somewhere obvious on your keyboard so you could press a button and make the applications appear? There wouln't be any need for an expose feature. The user would always be able to see what was still running but not visible.

Leon Brooks points to a fascinating keyboard device that includes LCD on each button. This can be controlled by the program with current focus to provide real buttons for what would normally only by "visual buttons". I think this could make the app with focus much more usable both by freeing up screen real estate for things the user actually wants to see and making buttons tangable instead of just hoping they look like something clickable. Personally I would have doubts about the durability of a keyboard like this, but if it could be made without a huge expense and programs could be designed to work with it effectively I think it could take off.

We can see this tactile approach working already with scrollwheel mice and keyboards. It is possible to get a tactile scroll bar onto either device without harming its utility and while making things much simpler to interact with. Ideally, good use of a keyboard with a wider range of functions would remove entirely the need for popup menus and of buttons on the screen. In a way this harks back to keyboard templates like those for wordperfect. I wonder if now the excitement over GUIs and mice has died down that this approach will turn out to be practical after all.

Update 18 October 2005:
United Keys has a keyboard model that is somewhat less radical in design and looks to be a little closer to market. Thanks to commenter Tobi on the Dutch site Usabilityweb for pointing it out. I like the colour LCD promised in the Optimus keyboard, but suspect that the united keys approach of segregating regular typing keys from programmable function keys will wear better. Ultimately it has to both look good and be a reasonable value proposition to attract users.

Benjamin

Sat, 2005-Jul-16

Solaris C++ Internationalisation

One of my collegues has been tasked with introducing infrastructure to internationalise some of our commercial software. We are currently running Solaris 9 and using the Forte 8 C++ compiler. We decided that the best way to perform the internationalisation would be to use a combination of boost::format and something gettext-like that we put together ourselves. That's where the trouble started.

The first problem was the compiler itself. It can't compile boost::format due to some usage of template metaprogramming techniques that were slightly beyond its reach. It quickly became clear that upgrading to version 10 of the compiler would be necessary, and even then patches are required to build the rest of boost.

That was problem number one, but the hard problem turned out to be in our use of the -libary=stlport4 option. Stlport appears not to support locales under Solaris We've been tracking Forte versions since the pre-standardisation 4.2 compiler, and that's just while I've been working there. We originally used stlport because there was no alternative, but when we did upgrade to a compiler with a (roguewave) STL we found problems changing over to it. When got things building and our applications were fully loaded up with data we found they used twice as much memory as the stlport version. At the time we didn't have an opportunity to upgrade our hardware so that kind of change in memory profile would have really hurt us. With no impetus for change we decided to stick to the tried and true.

By the time Forte hit version 8 it had the -library=stlport4 option to use an inbuilt copy of the software and we stopped using our own controlled version. We found at the time that a number of STL-related problems being reported through sunsolve were being written off with "just use stlport" so weren't keen to try the default STL again. These days it looks like this inbuilt STL hasn't been modified for some years. It does support non-C locales, but moving our software over is a new world of pain.

Another alternative was to use gcc. Shockingly, the 3.4.2 version available from sunfreeware produced incorrect code for us when compiled with -O2 for sparc. This also occured in the latest 3.4.4 version shipped by blastwave. I haven't looked into the problem personally to ensure it isn't something we're doing, but the people who did look into it know what they're doing. Funnily, although the sfw 3.4.2 version did support the full range of locales, blastwave's 3.4.4 did not. We would have been back to square one again.

So, the summary is this: If you want do internationalise C++ code under Solaris today you have very few good choices. You can run gcc, which seems to have some dodgy optimisation code for sparc... but make sure you get it from the right place or it won't work. You can use Forte 10, but you can't use the superior stlport for your standard library. C++ is essentially a dead language these days, so don't count on the situation improving. My guidance would be to drop sparc as quickly as you can, and use gcc on an intel platform where it should be producing consisently good code.

Benjamin

Sat, 2005-Jul-16

REST Content Types

So we have our REST triangle of nouns, verbs, and content types. REST is tipping us towards placing site to site and object to object variation in our nouns. Verbs and content types should be "standard", which means that they shouldn't vary needlessly but that we can support some reasonable levels of variation.

Verbs

If it were only the client and server involved in any exchange, REST verbs could be whittled down to a single "DoIt" operation. Differences between GET, PUT, POST, DELETE, COPY, LOCK or any of the verbs which HTTP in its various forms supports today could be managed in the noun-space instead of the verb space. After all, it's just as easy to create a http://example.com/object/resource/GET resource as it is to create http://example.com/object/resource with a GET verb on it. The server implementation is not going to be overly complicated by either implementation. Likewise, it should be just as easy to supply two hyperlinks to the client as it is to provide a single hyperlink with two verbs. Current HTTP "A" tags are unable to specify which verb to use in a transaction with the href resource. That has lead to tool providers misusing the GET verb to perform user actions. Instead of creating a whole html form, they supply a simple hyperlink. This of course breaks the web, but why is not as straightforward as you may think.

Verbs vs Delegates

Delegates in C# and functions in python give away how useful a single "doIt" verb approach is. In a typical O-O observer pattern you need the observer to inherit from or otherwise match the specification available for a baseclass. When the subject of the pattern changes it looks through its list of observer objects and calls the same function on each one. It quickly becomes clear when we use this pattern that the one function may have to deal with several different scenareos. One observer may be watching several subjects, and it may be important to disambiguate between them. It may be important to name the function in a more observer-centric rather than subject-centric way. Rather than just "changed", the observer might want to call the method "openPopupWindow". Java tries to support this flexibility by making it easy to create inner classes which themselves inherit from Observer and call back your "real" object with the most appropriate function. C# and python don't bother with any of the baseclass nonsense (and the number of keystrokes required to implement them) and supply delegates and callable objects instead. Although Java's way allows for multiple verbs to be associated with each inner object, delegates are more "fun" to work with. Delegates are effectively hyperlinks provided by the observer to the subject that should be followed on change, issuing a "doIt" call on the observer object. Because we're now hyperlinking rather than trying to conceptualise a type hierarchy things turn out to be both simpler and more flexible.

The purpose of verbs

So if not for the server's benefit, and not for the client's benefit, why do we have all of these verbs? The answer for the web of today is caching, but the reasoning can be applied to any intermediatary. When a user does a GET, the cache saves its result away. Other verbs either mark that cache entry dirty or may update the entry in some ways. The cache is a third party to the conversation and should not be required to understand it in too much detail, so we expose the facets of the conversation that are important to the cache as verbs. This principle could apply any time we have a third party involved who's role is to manage the communication efficiently rather than to become involved in it directly.

Server Naivety and Client Omniscience

In a client/server relationship the server can be as naive as it likes. So long as it maintains the basic service contstraints it is designed for, it doesn't care whether operations succeed or fail. It isn't responsible for making the system work. Clients are the ones who do that. Clients follow hyperlinks to their servers, and they do so for a reason. Whenever a client makes a request it already knows the effect its operation should have and what it plans to do with the returned content. To the extent necessary to do its job, the client already knows what kind of document will be returned to it.

A web browser doesn't know which content type it will receive. It may be HTML, or some form of XML, or a JPEG image. It could be anything within reason, and within reason is a precisely definable term in this context. The web browser expects a document that can be presented to its user in a human-readable form, and one that corresponds to one of the standard content types it supports for this purpose. If we take this view of how data is handled and transfer it into a financial setting where only machines are involved, it might read like this: "An account reconciler doesn't know which content type it will receive. It may be ebXML, or some form from of OFX, or an XBRL report. It could be anything with reason, and within reason is a precisely definable term in this content. The reconciler expects a document that can be used to compare its own records to that of a supplier or customer and highlight any discrepencies. The document's content type must correspond to one of the standard content types it supports for this purpose."

REST allows for variations in content type, so long as the client understands how to extract the data out of the returned document and transform it into its own internal representation. Each form must carry sufficient information to construct this representation, or it is not useful for the task and the client must report an error. Different clients may have different internal representations, and the content types must reflect those differences. HTTP supports negotiation of content types to allow for clients with differing supported sets, but when new content types are required to handle different internal data models it is typically time to introduce a new noun as well.

Hyperlinking

So how does the client become this all knowing entity it must be in every transaction it participates? Firstly, it must be configured with a set of starting points or allow them to be entered at runtime. In some applications this may be completely sufficient, and the configuration of the client could refer to all URIs it will ever have to deal with. If that is not the case, it must use its configured and entered URIs to learn more about the world.

The HTML case is simple because its use cases are simple. It has two basic forms of hyperlink: the "A" and the "IMG" tags. When it comes across an "A" it knows that whenever that hyperlink is activated it should look for a whole document to present to its user in a human-readable form. It should replace any current document on the screen. When it comes across "IMG" it knows to go looking for something humean-readable (probably an actual image) and embed it into the content of the document it is currently rendering. It doesn't have to be any more intelligent than that, because that is all the web browser needs to know to get its job done.

More sophisicated processes require more sophisticated hyperlinks. If they're not configured into the program, it must learn about them. You could look at this from one of two perspectives. Either you are extending the configuration of your client by telling it where to look to find further information, or the configuration itself is just another link document. Hyperlinks may be picked up indirectly as well, as the result of POST operations which return "303 See Other". As the omniscient client they must already know what to do when they see this response, just as a web browser knows to chase down that Location: URI and present its content to the user.

There is a danger in all things of introducing needless complexity. We can create new content types until we're blue in the face, but when it comes down to it we need to understand the clients requirements and internal data models. We must convey as much information as our clients require, and have some faith that they know enough to handle their end of the request processing. It's important not to over-explain things, or include a lot of redundant information. The same goes for types of hyperlinks. It may be possible to reduce the complexity of documents that describe relationships between resources by assuming that clients already know what kind of relationship they're looking for and what they can infer from a relationships existence. I think we'll continue to find as we have found in recent times that untyped lists are most of what you want from a linking document, and that using RDF's ability to create new and arbitrary predicates is often overkill. My guide for deciding how much information to include is to think about those men in the middle who are neither client nor server. Think about which ones you'll support and how much they need to know. Don't dumb it down the for the sake of anyone else. Server doesn't care, and Client already knows.

Benjamin