Friday, February 27, 2015

Harvest #Wikipedia categories

For many categories it is obvious what they should include. "Indiana State Senators" for instance will all be human. So when we know that an article is a human, we can safely deduce that they indeed hold or held a position as "member of the Indiana State Senate".  We can do similar things for alumni or members of a sports team.

When we harvest such categories regularly, Wikidata will become more inclusive than any Wikipedia. This is because we can harvest from similar categories from any Wikipedia.

We can, we should. harvest Wikipedia categories regularly. It will enrich Wikidata and we will become more aware of the full scope of the information held in all the Wikimedia Wikis.

Thursday, February 26, 2015

#Wikimedia #Labs - Risc analysis

Labs is a wonderful and successful project; more virtual machines are added all the time. More data is produced all the time and more people rely on it all the time.

Sounds good? It is!

From a management point of view it becomes increasingly problematic because for many of the most valuable Wikimedians it became a production resource and, as Labs is growing really quickly, it easily escapes the boundaries set earlier. Staffing, hardware it could all be better and it should all be better.

Having the best possible Labs will grow Labs even more. The best will outwit and outperform expectations. Classical budget think is a disservice to what we may achieve: share more information as widely as possible. One approach is to maintain a risc analysis of the services provided by Labs. It will help management to manage, to think and to use funds when the need and the justification is bigger than the budget

Today new virtual machines have been started that are starting to produce ZIM files based on the latest dumps. This will improve off-line reading of our projects a lot. The ZIM files will in future be and remain fresh..

This is just one day in the life of Labs...

Tuesday, February 24, 2015

#Wikidata - #Alumni by university or college in #India

As one of the most populous countries in the world, it is no surprise that India has many universities. The alumni of Indian universities or colleges can be found through a Wikipedia category.

Whenever tools are down I have been adding these alumni to Wikidata. It seems obvious that not all universities and colleges are represented. It is certain that many alumni cannot be found in these categories. This is because there may be no article about them or they have not been included in the category.

It is relatively easy to do this for India given that English is the main language for subjects about India. For China, Russia and Japan it is not so easy. Someone else has to get involved as I do not know the languages.

All of Labs is down again. So this time my customary hyperlinks are sadly absent..

Saturday, February 21, 2015

#Wikidata - the Stern–Gerlach Medal

The Stern–Gerlach Medal is one of many awards Wikidata knows about. Information is often available in a list within the articles. In some languages there are links to all those who received the award.

Having all the awards and all the people who received them in Wikidata is a massive undertaking. It can be argued that everyone who received an award has some notability..

Some people think that awards are not that important to categorise. Their way of thinking means that awards specifically relevant within a culture, a language become underrepresented. This is however an effect that diminishes in time.

It would be good when the lists were available to Wikipedias to use. When such lists become a service from Wikidata, it is easy to provide minimal information for the people that do not have an article yet. For best results it helps when all the associated labels are available.

Thursday, February 19, 2015

Where people died; a perspective on diversity

A wonderful new view is available thanks to Vizidata. It shows where people were born and were people died. The data is from a Wikidata dump so it is sadly static. Given that it is from Wikidata, you can safely assume that the data also exists in a Wikipedia ...

Italy is well pronounced in this view. It is because a lot of effort went into extracting data from the Italian Wikipedia. It follows that all the people the Italians care for are included as well. The fun thing of a view like this is that it is a historic view of what Wikidata covers and does not cover..

Apparently hardly anyone died in Africa in all the centuries.

Monday, February 16, 2015

25.000 books, old books

When 25.000 books, books from the early days, English texts from 1473-1700 become available it is quite something. Many of these text are the earliest sources on many subjects in English.

All of them deserve to be registered in Wikidata, The most relevant question would be: how do we serve our public best. Yes, it starts with indicating that these books exist but it is easy enough to point people in the right direction. The direction where these books can be found to be read.

It seems obvious. When books are (finally) available under a free license, it is important for people to find them.

Rafik Tlili, member of the Constituent Assembly of #Tunisia

Mr Tlili is a Tunisian politician who died. What is refreshing is that there is at least one decent list of members of the current parliament and, as is fitting, it is in French. Without assistance of Google translate the articles are too difficult for me.

There is also a category; and it has a problem. It links the current members in French to every member of the Tunisian parliament. from a Wikidata point of view that is fatally flawed. It is however part and parcel of a category of subjects that is underdeveloped. Our Wikiverse does not really care about Farfarawayistan. Its problems is seen as the diversity that is in genders and while important, it easily ignores what is far far away. As you can see in the picture, there are a fair bunch of women in the Tunisian parliament.

Even people who research are interested in diversity. They want to know how diversity differs in different languages. Those different languages mean different cultures, Cultures that by and large are not really well known in our Wikipedias as they are far, far away. Consequently Wikidata does not serve them the data they need.

I am happy with the Tunisian list. It means that Tunisia is not longer as far far away.

Saturday, February 14, 2015

CC-BY-SA; Creative Commons needs our support

The CC-BY-SA licenses are crucial to the Wikimedia movement. All but one of the Wikimedia Foundation projects are licensed with CC-BY-SA; Wikidata being the exception.

I find it astounding to learn that Creative Commons is in financial dire straits. As Wikimedians we are part of a world that is shaped by copyright law and the fight for free and fair license. When a crucial player like Creative Commons cannot take its role, it shows our weakness, It indicates that we are fighting a losing battle because our priorities are wrong.

Creative Commons deserves our support. We rely on Creative Commons.

It is one of those organisations that the WMF could do something special for. For instance a fund raiser on their behalf. <grin> WMF is good at that </grin> and in this way commit ourself more to free and fair licenses.

Tuesday, February 10, 2015

#Wikidata - Tokyo University of the Arts alumni

Mr Kenji Ekuan died. He was a Japanese designer who studied at the Tokyo University of the Arts. As can be expected, he was not the only one who studied there. There are categories in several Wikipedias informing us about them. The category on the Japanese Wikipedia has includes 642 alumni and for many of them there is no label in other languages.

It is no surprise either that these people refer to many items that have a label in Japanese and not in the languages people are familiar with. The automated description for Mr Munemoto Yanagi in Dutch for instance is "kunsthistoricus (*1917); Mainichi-Kulturpreis; kind van 柳宗悦 en 柳兼子 ♂". As more labels become available in Dutch, this information becomes easier to comprehend.

With every label that is added, all the associated descriptions are improved. Every item will be easier understood in Reasonator as well. Adding labels in Reasonator will provide you with instant gratification. Every statement of that item will show the new label.

Monday, February 09, 2015

#Wikidata - automated #descriptions are GIGO

The great thing about automated descriptions of Wikidata items is that they reflect what is there. The consequence is that when it is not good, it will show.

I discussed automated descriptions with my friend Amir and he pointed me to that "East Jerusalem" is not a "world heritage site". He is absolutely right; world heritage site is a qualification of whatever it is that has been recognised in this way.

Amir also pointed out that whenever labels are missing, automated description will give you something in another language. This is not a bug, it is a feature. All it takes is for someone to add the missing label and wherever it is used, the label will show in the "current" language in future. One alternative is NOT showing statements when they have no labels. Another alternative is to make it configurable if you want as much as possible or information only in the current language.

The point about automated descriptions stands; adding one statement may impact descriptions in all languages. Adding one label will impact all items that include the associated statement for that language.

Obviously an automated description is superior to no description. Arguably automated descriptions are superior to manual description because they reflect the item in any and all language.

Sunday, February 08, 2015

#Wikidata - George Muchai member of the Kenyan parliament

Mr Muchai was assassinated. For parliamentarians to be killed in this way is shocking. It is not important if you agree with a politician, a journalist, anyone. You do not kill them; you argue their points, maybe.

The death of Mr Muchai was reported in the international press. Wikipedia does not have an article. When someone finds it relevant enough they may write an article about him. They may call it "the assassination of George Muchai" At least there is already a Wikidata item for him.

Saturday, February 07, 2015

#Wikidata descriptions are not best of breed

In a Wikimedia mobile application they are going to use Wikidata descriptions. I really wonder why they did this because it is easy to recognise why it is substandard.

Take Mr Robert E. Hanson for instance. According to the description he is an "American politician", the description exists only in English and it does not recognise that he died.

Take Mrs Татьяна Михайловна Лебедева for instance; she died as well and there is not even a description.

At the same time there are descriptions available for both of them in any language based on the availability of statements and labels. For Mr Hansen it is "Amerikaans politicus (1947–2015) ♂" in Dutch and for Mrs Лебедева it is "Mens (1944–2015) ♀". These descriptions could easily have been in Arabic, Chinese, Zulu, Javanese or any of the other languages the WMF supports.

It has been obvious for a long time that descriptions are flawed and will not be useful in most languages supported by the WMF. It has been as well known that automated descriptions are more useful in more languages. The developers may have created a best of breed solution but it is still very much a dog.

Sunday, February 01, 2015

#Wikidata - when qualifications fail

Mr Diego Betancur Álvarez is the son of a former Colombian president. He is the ambassador for Colombia to Australia and New Zealand. Being ambassador is a position that a diplomat may hold. It is similar to president to the United States which is a position for a politician.

It is quite normal that people do not understand such qualifications. It is however the essence of what Wikidata is about.

It is easy enough to make any and all humans who are known as ambassador a professional diplomat. It is easy enough to give them the position of ambassador as well. It just needs to be done.

Another approach would be to have one "ambassador to Australia" or any other country. The alternative is to rely on qualifiers.

PS Mr Betancur is very much alive.