Friday, December 16, 2005

Alternate representations

A new problem that is in need for a solution are "alternate representations". Alternate representations are expressions that do not fit the mold of how you want to have expressions in a lexicological resource. One of the rules has always been that capitalisation is only used for words that are always capitalised, eg English (the language) is always capitalised in English. There are resources, resources that we would like to include, that have these as synonimes. An other example is "plague, bubonic" to me that should be "bubonic plague".

Many of these things find their origin in being the legacy of a paper based origin. In a digital resource with some magic linking "plague" and "bubonic plague", one would suffice. The problem is in how to make the Ultimate Wiktionary relevant. When we do include "plague, bubonic" in some way, we allow for the one to one linking from the Unified Medical Language System to Ultimate Wiktionary and vice versa. It would even allow for the inclusion of UMLS data in Ultimate Wiktionary.

My current thinking is about two options. I know that in lexicology they have some anotation to describe in what relation in a sentence a word exists. The other option is to have an AlternateRepresentation table that links an Expression to the preferred Expression.

I do want this anotation anyway, what I do not know is if this anotation is aware of capitalisation.

