Tuesday, March 07, 2006

If I had money that I could freely spend ...

If I could hire a programmer that would give me one extra functionality, something that is not scheduled for this moment, what would I have him do? What if I have a budget for at least a few weeks of work ..

Interwiki links
I hate these things. They are always out of sync. Many people spend a lot of hard work on getting them right while the problem is getting bigger and it is not solved at all. It feels like a waste. When a project grows, it has articles that need to be linked. As more and more wikipedias grow, the number of articles that need updating grow rapidly. Small projects are not easily integrated. It costs a lot of resources.

With a centralised database, we could link an article to another article and by inference it would be known to all the articles that are linked to it. As all the articles are about the same subject, we could check if article names are translations. When they are, it is a basis for linking to the lexicological content of WiktionaryZ ..

Inflection boxes
When a verb, a noun and adjective changes under given rules, it makes sense to have inflection boxes. They are generated using templates on many wiktionaries, but it makes more sense to have some software that allows us to build these boxes. Software that associates inflections of one language to the inflections of other languages for the purpose of translation.

Better support for tools like OmegaT
OmegaT is a CAT-tool, it helps translators with their work. I would do two things to OmegaT, I would have it read directly from a MediaWiki wiki and when the translation is finished, write it to another wiki. I would also have its translation glossary funtction make use of WiktionaryZ..

Yes there would be a quid pro quo, when a translator adds a word to the glossary it would be fed back to WiktionaryZ.

Thanks,
GerardM

PS What would your suggestion be ?

5 comments:

SabineWanner said...

One thing I would like to have the possibility to align two texts and then click on the term in the source language and on the corresponding term in the target language and then be able to send them to "glossary" or "tbx" or "WiktionaryZ". Another thing (a very similar one) is have two Wikipedia articles and see them in one language on the left siede of the screen and in another on the right side of the screen. Example: Venus (German) and venere (Italian) and from there be able to search for corresponding terms adding them as above to WiktionaryZ.

Sabine

MovGP0 said...

Interwiki-Links:
Well, interwiki-links are really awful. I'm not very afraid that there are out of date - that's what bots for. The bigger Problem is, that there are mostly linking to a therm, witch is not a direct translation. This makes it very inkonsistent, because of linking:

Lang1:Term1 ↔ Lang2:Term1 ↔ Lang1:Term2

The solution I'm recommending is a centralised store for translations

Inflection boxes:
A good resource for the german language for this issure might be canoo.net.
Developing such a tool is really hard to do, even for experts on the given language. So as a more realistic step it might be to let the users develop semantic schemes describing the grammar of there language

see also: Example on wiki.ontoworld.org

GerardM said...

When it comes to inflections etc, they are something that we will want for each language. To start off it does not need to be difficult; it needs to be no more than a box where the right inflection is put in its place.

This is something we already have. The next thing would be to link an inflection to translations. This will be more interesting...

MovGP0 said...

I see. A good example might be the german Abend witch has multiple meanings:
1) Evening
2) A Family-Name
3) A Location
where each has a different Flexion. And there is also the small form abends.

In French this word is easier, cause there is only for the first meaning a proper translation: soir.

Note too, that french is easier, cause there is flexion only in singular and plural for word and article. But there aren't four modal-cases for each.

I thing, that trying to map the translations of flexions doesn't makes much sense, cause you can't translate each and everything 1-on-1.

Instead it would make sense to extract the meaning of the world by getting the base-form, trying to extract each additional information (like count singular/plural; time past/present/future; etc.) and then try to describe the meaning in the target-language.

Translation has to be done in a way, that the Translation-Target has not more maning than the Translation-Source.
Source ≥ Meaning ≥ Target

The proper representation of the Meaning is key. This can be done by using a very complex language supporting every grammatical feature of each language of the world and make the translation by "word-reordering" within this language, witch seems to be nearly impossible to me.

The other way is to take a very basic language like RDF/OWL, adding some schemes for describing common constructs like time, and try to handle the complex part of translation within the translation from-and-to meaning separate for each language.

MovGP0 said...

sorry for typing mistakes, but correcting is hard without the possibility of editing. And I'm to lazy to write the whole thing again. :-)

most important Corrections:
* "meaning of the world" -> "meaning of the word"
* "I thing" -> "I think"