Tuesday, July 26, 2016

Have #Wikipedia share the sum of available #knowledge

If Wikipedia is to succeed in sharing the sum of all knowledge, it has to first share the sum of available knowledge. To do this Wikipedians have to become more inclusive. They have to realise that Wikipedia is not about them but about its readers.

Typically the question "What do readers want" is answered by what readers find. This answer has one flaw. It assumes that Wikipedia includes what people seek and it forgets what people seek and do not find. This is a lost opportunity on many levels. To start with, Wikipedia is not singular and a subject may exist in another language. As we do not know what is missed, we do not know what to write to satisfy an existing demand. Finally more and more available information does not even have a Wikipedia article but its information is available in other projects.

A partial solution to these issues was around for a long time. It extends search by adding results from Wikidata. It allows you to find data in any script from any project. If there was no article, it shows information using the Reasonator. It is relatively easy to revive this and it will make even more sense when it results are included as positive results.

Once Wikipedians consider Wikidata as a tool, they will find that both red links and wiki links may link to Wikidata items. Typically they are the same links for the same subject in any language. This is relevant to editors because it is one way to clarify what links exist to an article and, it is only one step away to annotate them as statements in Wikidata and thereby document such links. They will find a lot of erroneous links and it will improve overall quality.

The good news, the links between wiki links and Wikidata items already exist. What is lacking is a verification process that these wiki links are good. Adding links to statements for red links is technically not that hard. It will add some turmoil at the Wikidata end; many items will be added and will have to be merged eventually. One benefit of this approach is that it is not necessary for everyone to collaborate but it will benefit the people that matter most; all the readers of all the Wikipedias.

Saturday, July 23, 2016

#Wikipedia - #GMO controversy as a red herring

#Wikipedia has/had a big discussion on the safety of GMO food. When you read from what the Signpost has to say; it is only about the safety for people to eat this stuff.

The problem is that many promises have been made and this is only one issue, not even the most relevant issue. Read the article "20 years of failure" by Greenpeace or reads its rebuttal to what some Nobel Prize winners had to say.

The question if it is safe to eat is only one. The question if it will do us any good is more relevant. It does not bring us a more reliable food supply. It will not bring us more resiliency against climate change and it is very much in doubt that "golden rice" actually brings additional vitamins while a balanced diet does.

The important point of Greenpeace is that it backs its assertions with science. It is not in it for the money and its aim? A world that we can live in.

Monday, July 18, 2016

#Wikipedia - Dr Mary Meeker and SOI testing

Mrs Meeker and her husband Robert Meeker worked on a system used in education. She is known for applying Guilford's Structure of Intellect theory ("SI") to creating assessments and curriculum materials for use in teaching children and adults. The premise of SI is that intelligence comprises many underlying mental abilities or factors, organized along three dimensions—Operations (e.g., comprehension), Content (e.g., semantic), and Products (e.g., relations). When you are interested, read the article.

The article compared her work to the debunked Myers Briggs Type Indicator. This is something we should not do. The article on Mrs Laurie Helgoe provides all the arguments needed to restrict information on that indicator to that article. It is not best practice to use tools that are ambiguous in its results and therefore using it in comparison is not in the interest of our readers.

Sunday, July 17, 2016

#Wikipedia - notability of Mrs Laurie Helgoe

When popular knowledge gets debunked, it makes for notability. Mrs Helgoe debunked the Myers Briggs Type Indicator. It is used a lot even by those who should know better to classify human personality traits.

It is quite something when research shows how much popular methods are wrong. Instead of representing a 25-30% of the population, introverts make up 57% of the population. It means that Myer Briggs is off by 100%.

The critique of the article for Mrs Helgoe has it that it is an orphan; no articles link to it. Having read the article, it is more valid to find fault at the Myers Briggs article; it does state that the method is not valid but it more less glosses over that fact.

The problem with the Myers Briggs article is that it attempts to explain the method used, a method that is invalid.

#Reasonator - the perspecive on #Wikidata people do no get

#Wikidata is where Wikimedia data lives. It started with a big service to Wikipedia; It centralised its interwiki data and this was a huge step forward in its quality. There is still a lot of work done on improving it even further because many of the problems left need a different perspective.

The next official challenge is to provide data to infoboxes. This problem is utterly different from the challenge replacing interwikilinks. It is impossible to import all the data from infoboxes all at once and start improving. The quality of the data in infoboxes is worse but that is not the problem.

So people have imported oodles of data and the quality is as expected; poor but improving. One problem is that all the work is happening at Wikidata and it does not transfer to Wikipedia. There is not even an official way to have a good look at the data available at Wikidata. The unofficial tool is Reasonator, it is currently broken and it is why I am reflecting.

Reasonator provides an intelligible perspective on the data of an item. It makes many problems "obvious". It shows imported statements and it shows all the references to the item that is shown. It allows you to see all (with a maximum of 500) statements that share common properties.

With a functional Reasonator, many people work on data from Wikipedia with a Wikidata perspective. When Wikidata is to fulfil its promise of improving the quality of data of Wikipedia considerably, the first thing to do is change objectives and perspective. The perspective could be Wikipedia based and the objective is not replacing data in infoboxes but quality. The good thing is that it is actually possible to achieve this.

A few observations; all wikilinks are in effect links between Wikidata items. Many of the links indicate that an article "needs" to be in a category and consequently this can be automated.

Why do this? When people look at all the wikilinks with a Wikidata perspective, it will make a lot of faulty links obvious. A painter of the 16th century did not receive a 20th century award for instance. Quality will improve.  As more statements and possibly items are created, it will affect every article about the same and related topics.

It needs only one thing, a Reasonator like view of the data from a Wikipedia point of view.

Thursday, July 14, 2016

#Wikidata - Virginia Berninger; Samuel Torrey Orton award 2015

The Samuel Torrey Orton award is conferred by the International Dyslexia Association. It is named after Samuel Orton who was a pioneer in the field of dyslexia.

Mrs Berninger was added to Wikidata because she is the 2015 recipient of the award. It is my intent that Wikidata slowly but surely knows about the more recent award winners, one at a time. It so happened that two of my projects intersected; adding information about female psychologists and awards. Mrs Margaret J. Snowling received the award and this bit of data was added.

My notion of quality for Wikidata is that items need their statements and that more links are better. This allows for all kinds of statements. linking awards to the conferring organisation, the website of an award or an organisation, other awardees.

The funny thing is that adding Mrs Berninger may encourage Wikipedians to write an article about her or at least add her to the list of award winners :)

Monday, July 11, 2016

#Wikidata - Margaret D. Foster - a #female #scientist

I was asked to blog about Mrs Foster. The argument was: "This article missed all the points why Mrs Foster is notable". One great feature of the improved article is a picture that was lovingly restored by Adam Cuerden.

Well, to be honest, I remember a presentation by Rosie Stephenson-Goodknight where she argued that the first step to get some gender balance is to write an article warts and all. It does not have to be perfect, the least it does is be there and invite scorn and improvements.

This sentiment is part of the original Wikipedia ethos; it is good to have stubs and red links. It is good to have a start to improve upon. In this sentimental spirit I improved the data on Mrs Foster on Wikidata a bit. I used Autolist to add the content of a few categories and, I added some universities she had attended.

So yes, the article has improved and it is exactly why both Wikipedia and Rosie are a success.

Saturday, July 09, 2016

#Wikidata - Bródy Sándor-díj

When #Wikidata has really succeeded, it includes all the data of all the Wikipedias. The Sándor Bródy Prize is known on three Wikipedias and it is reasonable that the Hungarian Wikipedia has the most information.

The last known winner, Gábor Kálmán, won the prize in 2012. Currently it is a red link. There is no information about who won in later years and my Hungarian is not enough to find out more if the prize was conferred.

All this transpired from a recent idea that in order to improve the quality of Wikidata for awards, we should add all the winners of awards for 2015. Lydia suggested asking on Twitter for a query and both Magnus and Wikidatafacts provided a SPARQL query. For the Sándor Bródy Prize no winners were known, this was remedied with the "Linked Items" tool. As the objective was to only add the last winner, Mr Kálmán and the date for 2012 were added. There are some 13,881 awards known without a 2015 winner..

The objective for the Sándor Bródy Prize has not been achieved. However, the quality of the data has improved considerably. To make it as good as the information on the Hungarian Wikipedia, dates have to be added and two items have to be added to fill in for existing red links.

The point of all this is that it is possible to quantify a lack of data in Wikidata and by inference a lack of quality. As time goes by, people can use these queries as a tool to make improvements or people will just add data and as a consequence the quality will improve. Either way it is obvious that it takes time and effort to get the desired quality. However on a micro level, it is possible for Wikidata to be better than any of the other projects because its data for a specific award is better. For the the Sándor Bródy Prize all it takes is two items and a few dates.

Friday, July 08, 2016

#Wikidata - making a statement

#Statistics are powerful particularly when they tell the whole story like the ones produced by Magnus. They are a set of statistics and they indicate the progress of Wikidata. The most relevant statistics are included; they indicate the number of statements, the number of labels and the number of links over time.

There is more to the statistics. For some references or better still, the lack of references is why some people oppose the use of Wikidata. Specially for them there are five statistics that indicate progress made. The good news is that more and more items have referenced statements (62.64%). This growth can be understood because a lot of effort has been going into providing tools to add sources and many people do add them.

Improving the quality of Wikidata is complex. There are many factors that make a difference. Personally I care most about rich annotations with statements for each and every item. Others care more about references. It is important that improvements are made in every way. This is why Wikidata becomes increasingly relevant and why new ways open up to improve its quality even further.

As Wikidata matures, its quality becomes increasingly obvious. When more Wikimedia projects use its data, it will grow the number of people who are involved and Wikidata will evolve into a rich and trustworthy source of data.

Thursday, July 07, 2016

#Wikimedia chapters

As a movement, much of the local effort is channeled through chapters. A lot of important work is done because they provide continuance to the work that we do. It enables us to foster relations and organise the more complicated activities.

The map shows beautifully where national chapters exist. In the USA chapters do exist but the map does not acknowledge this.

Typically I am content with the information that is shown through the Reasonator, it however has its limits. This is where the lists maintained through ListeriaBot become relevant. This list of chapters shows additional columns like "country" and start "date". It enables sorting and this is relevant functionality. The same list may exist on a Wiki supporting a different language, for instance Dutch. The cool thing is that the same bot will at some stage update all these "similar" lists. It is just a matter of completing the data for better use.

Tuesday, July 05, 2016

#Wikimedia talks - The long tail

Many talks about Wikimedia products and issues are given at a Wikimania. However it is not uniquely Wikimania where Wikimedia is the big thing; there are the conferences held by chapters. Many chapters like the Dutch chapter, have a tradition of recurring conferences.

Such presentations are relevant. What was presented was the current state of affairs by the "thought leaders" at the time. Many presentations have not been recorded on video or audio but quite often the slides or a paper is available somewhere.

It is easy enough to add these conferences, these talks to Wikidata. It is relevant because it allows for lists that can be generated with a bot like ListeriaBot, it enables people to find the presentations and when they find it interesting they can even see what is available. What it also does is link presentations to the people involved. At some stage you may find how often Lydia presented and where. <grin> at this time, only once </grin>.

Monday, July 04, 2016

#Wikimania - What have we learned, how to experience it

We have not all been to Esino Lario. It is where Wikimania 2016 happened. That does not mean that it is not possible to see many of the presentations. You may find them on YouTube, maybe elsewhere. The same goes for presentations of previous Wikimania's.

In the history of Wikipedia and our movement, these presentations are notable. It even makes the people who presented notable. As I have been watching several presentations, as I argued that we are really bad at recognising our own notability, I have created a list; they link to lists of presentations for previous Wikimania's. The cool thing is that they are updated regularly by the ListeriaBot. So when I or someone else adds a Wikimania talk, the talk will magically appear.

What you find is rather basic but it works. You will be linked to the presentation on YouTube, You will be linked to the items for the "author", the language used for the presentation and the talk itself. At this stage Wikidata like Commons is mostly a repository. For best effect use Reasonator. Compare Wikidata with Reasonator for the presentation of James Heilman for instance..

I am really happy to have been helped by TweetsFactsAndQueries. He helped a lot in getting the SPARQL queries in a reasonable shape. He figured out how to show only the YouTube video ID instead of the full URL. It is probably possible to show an icon instead but that is for later. What is missing are links to the presentation (the power point) and the submission paper. I have no idea yet how that is to be modelled in Wikidata.

It is important to include such data for several reasons. First it brings access to the presentations for the people who are interested. Secondly it documents what we do puts a timestamp to our thinking in time. Thirdly it documents our history.

Sunday, July 03, 2016

#Wikimania - James Heilman on #quality and #language support

James Heilman presented at Wikimania 2016. I have not been to Wikimania and even the people who went to Wikimania may not have listened in. James had a lot to say about quality and the power of having information in "other languages". The talk is powerful and the arguments are compelling. At this time only 48 views of the talk on YouTube,

Saturday, July 02, 2016

#Wikimania and #Wikidata #dogfooding

Being critical is one thing, showing how a difference can be made one item at a time is how to make a difference. At the latest Wikimania many interesting and relevant presentations were made. So far Wikimania talks were missing at Wikidata and as I have been watching several talks, I have added these talks largely using the Ted talks as a model.

What you find are only some of the talks, it is easy to add more and by adding the YouTube Video ID it is possible to see the presentations directly from the Reasonator.

So when you want to add your presentation, it is easy.

#Wikimedia - #notadog and #notmemberofthepack

It is controversial when you tell the #Wikipedia crowd that they do not look after their own. I am not a Wikipedian, I do not want to be as my experiences are not that positive. It is not that I do not care deeply about free content and the opportunities the Wikimedia Foundation offers.

My point is simple; as a community we seem to be only interested in the production and maintenance of content and not really in the quality of the reader experience / the consumption of all the good stuff that exists. I will support it with a number of examples.

Wikimania is the annual Wikimedia conference and it boasts many high grade presentations important for the understanding of the past and the history of what we do. For this reason it makes sense to do a thorough job and include all the presentations in Wikidata so that we provide the same opportunity to explore this content as we do for TED conferences and presentations. When we do, we will honour the many Wikimedians who presented in the past. This may prove controversial because of the many conflicting notions of notability.

Wikisource is "Wikimedia project, an online digital library of free content textual sources on a wiki". For readers there is one vital problem. Much of it are works in progress, some need a finishing touch others still need a lot of tender loving care. With a different approach to finished goods and, it is easy enough to know the status of sources, a clean user interface can be delivered to potential readers expanding the reach of the work that is done.

Commons is an "online repository of free-use images, sound and other media files, part of the Wikimedia Foundation". As a repository it functions really well. Large numbers of media files are deposited and used within the Wikimedia projects but my experience is that when you seek an image, it is really hard. The category system is hard to navigate, there are no labels attached to images that help finding relevant content and it is English only. Finding something among 32,289,013 files is really hard. For now I have given up on Commons.

Being critical of Wikipedia is frowned upon. A typical response is "so fix it" but when solutions are offered that improve its quality, typically a suggestion falls on deaf ears. It is easy enough to improve the functionality of red links, but this idea is probably to mundane to consider even though it has been proven to be easy to implement.

People fault me for being blunt. To some extend it is part of the culture I grew up in; to some extend it is because I have lost faith in the "community". My experience is that there is too much group think and I am definitely not a member of the pack, I prefer to make up my own mind, I articulate my opinions and arguments and care not too much when people react negatively without considering the arguments.