Monday, November 14, 2005

The Century Dictionary

The Century Dictionary was in its time a wonderful resource, even though it has aged it is still a wonderful resource. It gives a best of great impression of how dictionaries were at the end of the 19th and the beginning of the 20th century (1889-1910).

The fact that much effort was undertaken to make it available in this digital day and time is wonderful. It is advertised as the biggest on-line English dictionary on the Internet, with more than 500.000 definitions it may be just that.

The Wikimedia Foundation was asked for advice on how this splendid resource could be modernized and updated. Being asked to give an opinion privileged me. As I think highly of resources like the Century Dictionary, I would at best convert the digitized content when this improves the usability of the data. As I valuable the Century Dictionary for what it is, I would definitely keep maintain the data as is.

This does not mean that all this lexicological information cannot be used to build a modern dictionary. This can be done in many ways. An important consideration is that the data of the Century Dictionary is firmly in the public domain. This means that any existing project that works on building a dictionary can and may use this data.

I would not mind including the data of the Century Dictionary in the Ultimate Wiktionary. It would prove a challenge to fit it in what has always been envisioned to be a modern dictionary. Then again, the Ultimate Wiktionary is also to be inclusive. So when the opportunity comes to include whole dictionaries, I am sure we will find a way and make sure that it makes sense for our users as well..

The conversion of the Century Dictionary will be a lot of work. However, there are many professions where the skills for such a project are taught in universities. It is therefore that I could see students working on such a project for a term project.

Thanks,
GerardM

3 comments:

Anonymous said...

It occurs to me that such a dictionary project would be perfect for Project Gutenberg Distributed Proofreaders (www.pgdp.net).

The Distributed Proofreaders Project is a collaborative project in
support of Project Gutenberg that uses the distributed nature of the
Internet to allow many people to separately proofread individual
scanned pages of a book and speed-up the digitizing of public domain books. There are many projects underway including the entire 1911EB and several dictionaries ("A Dictionaire of the French and English Tongves (1611)", "An Etymological Dictionary of the Scottish Language", "Dictionary English - Spanish - Tagalog", "Dictionnaire Argot-Français").

Unfortunately, it appears that global-language.com is claiming copyright on their page images, which probably means Distributed Proofreaders or Project Gutenberg cannot use these particular scans.

GerardM said...

You cannot copyright what is already in the Public Domain. Given the nature of what has been digitized and how it is published, I doubt that any court would see it otherwise.

What is needed in the Century Dictionary is some datamining wizzard doing his magic on the content. This would allow the data to be structured and convert it to the Ultimate Wiktionary.

When it is found that some major proof reading is needed, this would then become the reason why a community effort is in order.

Thanks,
GerardM

Anonymous said...

I sincerely hope that we may include data from the Century Dictionary into the Ultimate Wiktionary. It's a great resource but the way the content is presented right now, it's simply unusable. The print is too small and hard on the eyes and the search functions are very restricted.

But the etymology provided for each word is really excellent. :-)