
Everything you should know about Wikipedia, a multilingual online largest encyclopedia.
Introduction
Wikipedia is a live collaboration differing from paper-based reference sources in important ways. Unlike printed encyclopedias, Wikipedia is continually created and updated, with articles on historical events appearing within minutes, rather than months or years. Because everybody can help improve it, Wikipedia has become more comprehensive than any other encyclopedia. Wikipedia originally developed from another encyclopedia project called Nupedia.
Wikipedia is written collaboratively by largely anonymous volunteers who write without charging anything. Anyone with Internet access can write and make changes to Wikipedia articles, except in limited cases where editing is restricted to prevent disruption or vandalism. Users can contribute anonymously, under a pseudonym, or, if they choose to, with their real identity. The fundamental principles by which Wikipedia operates are the five pillars. The Wikipedia community has developed many policies and guidelines to improve the encyclopedia; however, it is not a formal requirement to be familiar with them before contributing.
History
Wikipedia was launched on January 15, 2001, by Jimmy Wales and Larry Sanger. Sanger coined its name, as a portmanteau of wiki (the Hawai’ian word for “quick”) and “encyclopedia”. Initially, an English-language encyclopedia, versions in other languages were quickly developed. With 5,860,121 articles, the English Wikipedia is the largest of the more than 290 Wikipedia encyclopedias. Overall, Wikipedia comprises more than 40 million articles in 301 different languages and by February 2014 it had reached 18 billion page views and nearly 500 million unique visitors per month. Since its creation in 2001, Wikipedia has grown rapidly into one of the largest reference websites, attracting 500 plus million unique visitors monthly as of September 2015. There are about 72,000 active contributors working on more than 48,000,000 articles in 302 languages. As of today, there are 5,860,174 articles in English. Every day, hundreds of thousands of visitors from around the world collectively make tens of thousands of edits and create thousands of new articles to augment the knowledge held by the Wikipedia encyclopedia. People of all ages, cultures, and backgrounds can add or edit article prose, references, images and other media here. What is contributed is more important than the expertise or qualifications of the contributor. What will remain depends upon whether the content is free of copyright restrictions and contentious material about living people, and whether it fits within Wikipedia’s policies, including being verifiable against a published reliable source, thereby excluding editors’ opinions and beliefs and unreviewed research. Contributions cannot damage Wikipedia because the software allows easy reversal of mistakes and many experienced editors are watching to help ensure that edits are cumulative improvements.
How to Edit the content?
You can begin the editing of any Wikipedia page by simply clicking the Edit link at the top of any editable page. Wikipedia article could be edited by any reader, even those who did not have a Wikipedia account. Modifications to all articles would be published immediately. As a result, any article could contain inaccuracies such as errors, ideological biases, and nonsensical or irrelevant text. When you click “Edit,” MediaWiki allows the editing of a subsection of a page (as identified by its header). A registered user can also indicate whether or not an edit is minor. Correcting spelling, grammar or punctuation are examples of minor edits, whereas adding paragraphs of the new text is an example of a non-minor edit.
Sometimes while one user is editing, a second user saves an edit to the same part of the page. Then, when the first user attempts to save the page, an edit conflict occurs. The second user is then given an opportunity to merge his content into the page as it now exists following the first user’s page save. For more details, visit Wikipedia page.
How to Review the changes made by anonymous users?
Although changes are not systematically reviewed, the software that powers Wikipedia provides certain tools allowing anyone to review changes made by others. The “History” page of each article links to each revision. On most articles, anyone can undo others’ changes by clicking a link on the article’s history page. Anyone can view the latest changes to articles, and anyone may maintain a “watchlist” of articles that interest them so they can be notified of any changes. “New pages patrol” is a process whereby newly created articles are checked for an obvious problem.
Software Details
Simplified overview of the employed software as of October 2015. (A very complex LAMP “stack”)
- Wikipedia DNS servers run gdnsd. We use geographical DNS to distribute requests between our four data centers (3x US, 1x Europe) depending on the location of the client.
- Wikipedia uses Linux Virtual Server (LVS) on commodity servers to load balance incoming requests. LVS is also used as an internal load balancer to distribute MediaWiki requests. For back end monitoring and failover, we have our own system called PyBal.
- For regular MediaWiki web requests (articles/API) we use Varnish caching proxy servers in front of Apache HTTP Server.
- All our servers run either Debian or Ubuntu Server.
- Wikipedia uses Swift for distributed object storage
- Wikipedia uses UseModWiki is a wiki engine written in the Perl programming language
- Our main web application is MediaWiki, which is written in PHP (~70 %) and JavaScript (~30 %).
- Our structured data is stored in MariaDB since 2013. We group wikis into clusters, and each cluster is served by several MariaDB servers, replicated in a single-master configuration.
- We use Memcached for caching of database query and computation results.
- For a full-text search, we use Elasticsearch (Extension:CirrusSearch).
- https://noc.wikimedia.org/ – Wikimedia configuration files.
Hosting Details
As of May 2018, Wikipedia has the following colocation facilities (each name is derived from an acronym of the facility’s company and an acronym of a nearby airport):
Eqiad (Application services (primary) at Equinix in Ashburn, Virginia (Washington, DC area)) codfw (Application services (secondary) at CyrusOne in Carrollton, Texas (Dallas-Fort Worth area). Esams (Caching at EvoSwitch in Amsterdam, the Netherlands) Ulsfo (Caching at United Layer in San Francisco) Eqsin (Caching at Equinix in Singapore)
How to maintain such a large database?
MediaWiki can use either the MySQL/MariaDB, PostgreSQL or SQLite relational database management system. There is limited support for Oracle Database and Microsoft SQL Server. A MediaWiki database contains several dozen tables, including a page table that contains page titles, page ids, and other metadata, and a revision table to which is added a new row every time an edit is made, containing the page id, a brief textual summary of the change performed, the user name of the article editor (or its IP address the case of an unregistered user) and a timestamp.
In a 4½ year period, the MediaWiki database had 170 schema versions. Possibly the largest schema change was done in MediaWiki 1.5, when the storage of metadata was separated from that of content, to improve performance flexibility. When this upgrade was applied to Wikipedia, the site was locked for editing, and the schema was converted to the new version in about 22 hours. Some software enhancement proposals, such as a proposal to allow sections of articles to be watched via watchlist, have been rejected because the necessary schema changes would have required excessive Wikipedia downtime.