Wikimedia is collecting a database of information called Wikidata, usable and editable by the general public.

The Wikimedia Foundation has been fairly static since its last project in 2006. However, recently the organization announced its latest project – Wikidata. The announcement came last month at the Semantic Tech & Business Conference in Berlin.


Wikidata’s purpose is to collect a database of knowledge that can be read and edited not only by humans but by machines as well. This isn’t the foundation’s first foray into trying to create a giant database of human knowledge however. The last attempt was called DBpedia. The difference between that project and Wikidata is that instead of just extracting data from Wikipedia and making it available online, Wikidata is going to make the data available and editable by anyone.

While this project will help users have access to more data, it also helps Wikipedia. Right now, the largest localized versions of Wikipedia are English, German, French, and Dutch. If the pure data is extracted and made available, it will give all the other localized versions of Wikipedia more information to work with.

Right now, chunks of data have to be created into Wikipedia pages themselves, like say for example, which national capitals have the largest populations in the world. With the Wikidata project, users will be able to search for that kind of information in the database and users won’t have to rely on other users to create a page of that data. Initially, the database is being populated with data from Wikipedia pages, but in the future, the project hopes to expand and pull data from many sources.

The push to get this project rolling came from the German branch of Wikimedia, Wikimedia Deutschland. CEO Pavel Richter is over the moon about the development of the project, which started in Germany. The chapter will continue to develop and test the project, but is anticipating turning over the project to the Wikimedia Foundation when the database is done. The best estimate of when the hand off will happen is a year from now.

Wikimedia Deutschland is hoping to have the first phase of the project (i.e. initially populating the database with data from every Wikipedia page in every language) completed by August 2012. The second phase of the project will include editors coming in to add and use data. That step is anticipated to be completed by December 2012. Finally, the third phase will be data entered into Wikidata first will then be used to create Wikipedia pages.

However, instead of trying to take the data from Wikidata and put it into a page in prose format, like most Wikipedia articles, developers instead are planning to simply insert the data in tables (called info boxes) that will be on the right-hand side of a Wikipedia article. That way, it will be a simple matter of formatting to make sure the information is available in all of Wikipedia’s localized languages and will make for a more compete encyclopedia for everyone.

Data from the project will be published under a free Creative Commons license. Governments, science departments, research facilities, and more will be able to use the data for their applications.

The tab for this project is being footed in part by the Allen Institute of Artificial Intelligence (originally founded by Microsoft’s Paul Allen in 2010), the Gordon and Betty Moore Foundation’s Science program, and Google. Undoubtedly Google could use the data from this project to help with its own project, the semantic search the company will be implementing in the coming months. Total, the Wikidata project received 1.3 million Euros in initial development funding.

Wikipedia stays up and running through user donations. Ever wonder how that money breaks down? Check out our blog post investigating just that: how Wikipedia uses users’ donations.