Monday, November 21, 2005

Berlin III

Still in Berlin, at the end of three days of Ultimate Wiktionary we have done a lot of work. There is still a lot of work because what people want, need and deserve is something to show for all the work done. As visibility is important, we are going to have two things happen as soon as possible; first the Wikidata milestone 1 has to be finalized and committed to the CVS release branch, and then we will have an extra step to publish a read only version of the GEMET data. This combined with all the languages that are in the ISO 639-3 provisional version (in English) will give a clear idea of what we want: great lexicological content in all languages.

Technically, some things in the data design will be changed, among them a change of the Meaning table; it will become DefinedMeaning. This is to reflect that the DefinedMeaning defines which MeaningText is the one that truly defines a meaning. The point is that for a meaning you have to decide what language and what word define what it is. The other MeaningTexts in the other languages should be a translation of that specific text.

One great thing is that we came up with an improvement regarding inflections. The problem is that it does not make sense that all inflections show up in the list of the synonyms and translations, only because they share the same DefinedMeaning. By adding the key to the InflectionWord it belongs to, we can only show the headword for the parts of speech. Yes, it is a database change.

Erik was not happy with my Table table. He called it a hack and, it is a hack. So he does not want it, he does not want it, he does not want it... So, it is to go. Erik is correct where he says that NOT having this hack means that it will be much cleaner code and, it will help with the scalability issue.. A major point. So I will create a few more Relation tables that will be more specific.

Thanks.
GerardM

4 comments:

Anonymous said...

I'm not sure that forcing *every* Meaning to be officially "Defined" in one language is the best solution. I can see the benefit of this for authoritative terminologies such as GEMET (where the DefinedMeaning could be write-protected); however, for other situations, I'd rather see all languages debate the exact definition on an equal ground. If the definitions are clearly differentiated, they can be split and linked by an appropriate relationship ("NearSynonim" or "Hyper-Hyponim" as the case may be).

Regards.

Anonymous said...

My knowledge about databases is only rudimentary, so I didn't understand everything. But thanks for keeping us up to date!

DefinedMeaning:

Maybe I didn't get it right, but the way I understand it is that every language defines their own words and the other languages translate it. So for example the German community defines the German word "Liebe" and then the English community translates that definition to English. The English community defines the English word "love" and the German community translates that definition to German.

Regards,

Anonymous said...

The problem is that the English word "love" has many different meanings, some of which (but not all) are shared between many cultures. I don't see why such concepts as "a strong positive emotion of regard and affection", should be balkanized, nor why some languages should have to defer to others for their definition.

Regards.

GerardM said...

The point is a different one. All translations and synonyms should mean the same thing. This is accomplished by defining one meaning as the one that rules them all.

When translations are not true translations, they will get flagged as not being endemic. In a resource where languages and communities are to work together. There will be no such thing as an English or a German community definition. There will be a best definition.

When a word has many different meanings, there will be more than one DefinedMeaning these will have their own translations.

It is not languages that defer to others. It is that meanings are shared and as they are shared it has to be clear that the same thing is intended. By doing it this way it is apparent that they share the same meaning.

Thanks,
GerardM