Wikimedia blog

News from the Wikimedia Foundation and about the Wikimedia movement

Technology

News and information from the Wikimedia Foundation’s Technology department (RSS feed).

New release of the MediaWiki Language Extension Bundle, and other updates

Highlights from the latest development sprint of the Language Engineering team include the release of a new version of the MediaWiki Language Extension Bundle, and continued progress on Translation User Experience (UX) and the Language Coverage Matrix.

Screenshot for the redesigned proofread view for the Translate extension showing translations in Georgian.

Screenshot of the redesigned proofread view for the Translate extension showing translations in Georgian.

Design and development improvements continued for Translate UX, also known as TUX. A preliminary implementation of the Proofreading feature (per the specifications in the design document) includes features to view the messages adjacently, adding clickable markers for proofreading and switching between proofreading and translation mode. Pau Giner presented these updates at an open session and also invited users to join the ongoing usability tests.

Amir Aharoni announced the release of MediaWiki Language Extension Bundle (MLEB) 2013.02. Besides localization updates in most of the components within MLEB, more features were added to Translate UX. The Universal Language Selector however had to be rolled back to the 2012.12 version to ensure compatibility with MediaWiki 1.20.

The Language coverage matrix document was updated to include more information about web fonts and input methods that are currently available for use in MediaWiki and Wikimedia projects. The document aims to provide an overview of the internationalization and localization support in languages across Wikimedia projects.

As part of the ongoing effort to use a CLDR-based, data-driven approach for internationalization features, plural rules for many languages were analyzed and custom rules were removed for a few languages.

The Language Engineering team will be hosting an IRC office hour session on Wednesday, March 13 2013 on in #wikimedia-office (FreeNode server) at 17:00 UTC. Topics will include discussion, questions, feedback about current projects, open bugs and projects planned for the next sprint.

Runa Bhattacharjee, Outreach and QA coordinator, Language Engineering

Wikimedia engineering February 2013 report

Engineering metrics in February:

  • 110 unique committers contributed patchsets of code to MediaWiki.
  • The total number of unresolved commits went from about 650 to about 830.
  • About 69 shell requests were processed.
  • Wikimedia Labs now hosts 150 projects and 1,002 users; to date 1561 instances have been created.

Major news in February include:

(more…)

Putting Commons contributions in your hand: mobile app uploads

Wikimedia Commons holds millions of images and thousands of audio and video clips, created by Wikimedians or contributed from the outside through free content licenses.

Many of the high-end photographs are created with professional-level equipment and post-processing software, but it’s often said that “the best camera is the one you have with you.” And in this day and age, that usually means the mobile phone.

Wikimedia’s Mobile Apps team has been putting together Android and iOS apps for Wikimedia Commons, allowing you to take photos on the camera you always have with you and upload them to Commons either immediately or at your leisure.

  • The Android beta is available now on the Google Play store.
  • iOS betas are distributed on an opt-in invite basis; sign up here on your iPhone or iPad and you’ll be notified when the next beta is ready. (Due to Apple restrictions, you won’t be able to install betas from before you signed up.)

At this stage of development we’re mostly looking for testers who are experienced Commons users — you’ll need to sign in to the app with an existing Wikimedia account — and hoping for feedback on the workflows and on any bugs we haven’t ironed out yet.

So far we’ve had 55 unique uploaders using the apps in the last two weeks (49 on Android, 9 on iOS, and of course a little overlap!), uploading images like these:

Taken with iPhone 4S

Taken with Samsung Galaxy S3

Our goal is to hit 1000 uploaders per month once we’re in full release.

As always, these tools are completely open source — you’re all welcome to follow our progress on the project page, file bugs, or even submit patches directly on GitHub (Android, iOS).

And for those using different mobile operating systems, we haven’t abandoned you. Photo upload support is in beta on the mobile web site as well… stay tuned for more information!


Footnotes

New Lua templates bring faster, more flexible pages to your wiki

Starting Wednesday, March 13th, you’ll be able to make wiki pages even more useful, no matter what language you speak: we’re adding Lua as a templating language. This will make it easier for you to create and change infoboxes, tables, and other useful MediaWiki templates. We’ve already started to deploy Scribunto (the MediaWiki extension that enables this); it’s on several of the sites, including English Wikipedia, right now.

You’ll find this useful for performing more complex tasks for which templates are too complex or slow common examples include numeric computations, string manipulation and parsing, and decision trees. Even if you don’t write templates, you’ll enjoy seeing pages load faster and with more interesting ways to present information.

Background

The text of English Wikipedia’s string length measurement template, simplified.

MediaWiki developers introduced templates and parser functions years ago to allow end-users of MediaWiki to replicate content easily and build tools using basic logic. Along the way, we found that we were turning wikitext into a limited programming language. Complex templates have caused performance issues and bottlenecks, and it’s difficult for users to write and understand templates. Therefore, the Lua scripting project aims to make it possible for MediaWiki end-users to use a proper scripting language that will be more powerful and efficient than ad-hoc, parser functions-based logic. The example of Lua’s use in World of Warcraft is promising; even novices with no programming experience have been able to make large changes to their graphical experiences by quickly learning some Lua.

Lua on your wiki

As of March 13th, you’ll be able to use Lua on your home wiki (if it’s not already enabled). Lua code can be embedded into wiki templates by employing the {{#invoke:}} parser function provided by the Scribunto MediaWiki extension. The Lua source code is stored in pages called modules (e.g., Module:Bananas). These individual modules are then invoked on template pages. The example: Template:Lua hello world uses the code {{#invoke:Bananas|hello}} to print the text “Hello, world!”. So, if you start seeing edits in the Module namespace, that’s what’s going on.

Getting started

The strlen template as converted to Lua.

Check out the basic “hello, world!” instructions, then look at Brad Jorsch’s short presentation for a basic example of how to convert a wikitext template into a Lua module. After that, try Tim Starling’s tutorial.

To help you preview and test a converted template, try Special:TemplateSandbox on your wiki. With it, you can preview a page using sandboxed versions of templates and modules, allowing for easy testing before you make the sandbox code live.

Where to start? If you use pywikipedia, try parsercountfunction.py by Bináris, which helps you find wikitext templates that currently parse slowly and thus would be worth converting to Lua. Try fulfilling open requests for conversion on English Wikipedia, possibly using Anomie’s Greasemonkey script to help you see the performance gains. On English Wikipedia, some of the templates have already been converted  feel free to reuse them on your wiki.

The Lua hub on mediawiki.org has more information; please add to it. And enjoy your faster, more flexible templates!

Sumana Harihareswara, Engineering Community Manager

Parsoid: How Wikipedia catches up with the web

Wikitext, as a Wikipedia editor has to type it in (above), and the resulting rendered HTML that a reader sees in her browser (below)

When the first wiki saw the light of the world in 1995, it simplified HTML syntax in a revolutionary way, and its inventor Ward Cunningham chose its name after the Hawaiian word for “fast.” When Wikipedia launched in 2001, its rapid success was thanks to the easy collaboration using a wiki. Back then, the simplicity of wiki markup made it possible to start writing Wikipedia with Netscape 4.7 when WYSIWYG editing was technically impossible. A relatively simple PHP script converted the Wikitext to HTML. Since then, Wikitext has always provided both the edit interface and the storage format of MediaWiki, the software underlying Wikipedia.

About 12 years later, Wikipedia contains 25 million encyclopedia articles written in Wikitext, but the world around it has changed a bit. Wikitext makes it very difficult to implement visual editing, which is now supported in browsers for HTML documents, and expected by web users from many other sites they are familiar with. It has also become a speed issue: With a lot of new features, the conversion from Wikitext to HTML can be very slow. For large Wikipedia pages, it can take up to 40 seconds to render a new version after the edit has been saved.

The Wikimedia Foundation’s Parsoid project is working on these issues by complementing existing Wikitext with an equivalent HTML5 version of the content. In the short term, this HTML representation lets us use HTML technology for visual editing. In the longer term, using HTML as the storage format can eliminate conversion overhead when rendering pages, and can also enable more efficient updates after an edit that only affect part of the page. This might all sound pretty straightforward. So why has this not been done before?

Lossless conversion between Wikitext and HTML is really difficult

For the Wikitext and HTML5 representations to be considered equivalent, it should be possible to convert between Wikitext and HTML5 representations without introducing any semantic differences. It turns out that the ad-hoc structure of Wikitext makes such a lossless conversion to HTML and back extremely difficult.

In Wikitext, italic text is enclosed in double apostrophes (”…”), and bold text in triple apostrophes (”’…”’), but here these notations clash. The interpretation of a sequence of three or more apostrophes depends on other apostrophe-sequences seen on that line.
Center: Wikitext source. Below: As interpreted and rendered by MediaWiki. Above: Alternative interpretation.

  • Context-sensitive parsing: The only complete specification of Wikitext’s syntax and semantics is the MediaWiki PHP-based runtime implementation itself, which is still heavily based on regular expression driven text transformation. The multi-pass structure of this transformation combined with complex heuristics for constructs like italic and bold formatting make it impossible to use standard parser techniques based on context-free grammars to parse Wikitext.
  • Text-based templating: MediaWiki’s PHP runtime supports an elaborate text-based preprocessor and template system. This works very similar to a macro processor in C or C++, and creates very similar issues. As an example, there is no guarantee that the expansion of a template will parse to a self-contained DOM structure. In fact, there are many templates that only produce a table start tag (<table>), a table row (<tr>...</tr>) or a table end tag (</table>). They can even only produce the first half of an HTML tag or Wikitext element (e.g. ...</tabl), which is practically impossible to represent in HTML. Despite all this, content generated by an expanded template (or multiple templates) needs to be clearly identified in the HTML DOM.
  • (more…)

Language engineers improve translation tool and meet with their peers

Quem não arrisca não petisca — a Portuguese proverb

During their latest development sprint, the Wikimedia Language Engineering team conducted extensive review and testing of the Translate extension, and participated and contributed to two major open source events in India: a core developers Language Summit and GNUnify.

add caption here

User experience improvements to the Translate tool will notably make it easier and more pleasant to translate content on Wikimedia sites that use it.

Translate Editor Updates

Progress continued on enhancements to the MediaWiki Translate extension. Further testing on the usability of the translation editor, search feature, and prototype of the advanced editing features were conducted by Pau Giner with five users from four different countries. The prototypes were tested in a great diversity of languages including Nepali, Chinese, Tetum, French, Breton, and Finnish. Based on this feedback, changes to the style and specifications for the prototype were made. Details about the individual tests can be found in the final report for this round of testing.

Community Participation

The Language Engineering team participated in the Open Source Language Summit and GNUnify, both held in Pune, India. The Open Source Language Summit, co-organized by the Wikimedia Foundation and Red Hat, consisted of work-sprints that focused on internationalization (i18n) and localization (l10n) features, font support, input method tools, language search, i18n testing methods and standards. More information about the event is available in the detailed event report.

The team also participated in GNUnify 2013, held at the Symbiosis Institute of Computer Studies and Research, in Pune. Besides presenting about the various projects that the team is currently working on, a translation sprint on translatewiki.net was also organized, as well as a workshop on jQuery.IME and a BoF session to discuss issues related to Wikimedia projects in Indian languages. Details of the accomplishments from the sessions at GNUnify 2013 can be found in the event report.

Other Achievements

Additionally, some changes to MediaWiki core were backported to support the newer version of the Universal Language Selector on MediaWiki versions 1.19 and 1.20. As there is no released maintenance version yet, MediaWiki Language Extension Bundle (MLEB) users are advised to remain on MLEB version 2012.12.

Focus for the next sprint

For the next development sprint, the team will work on more features for the Translate extension, like the proofreading mode and further improving the user experience. In addition to this, focus will be on putting together the language coverage matrix as a reference for the status of language support on MediaWiki, MediaWiki Extensions and Wikimedia projects.

Runa Bhattacharjee, Outreach and QA coordinator, Language Engineering

Language Engineering team participates in GNUnify 2013

Det vatten du hämtar ur bäcken lär dig känna källan – a Swedish Proverb.

GNUnify is an annual gathering consisting of workshops, talks & seminars, held to help increase the awareness of free and open-source software in India.

GNUnify is an annual gathering consisting of workshops, talks & seminars, held to help increase the awareness of free and open-source software in India.

The Wikimedia Language Engineering team participated in GNUnify 2013 held in Pune, India on February 15–17. The team presented their work, conducted a translation sprint, organized workshops and also participated in discussions with local Wikipedians about using MediaWiki and Wikimedia projects in their languages.

Presentations by the team

Runa Bhattacharjee presented about the changing dynamics in the adoption of localized content and the need for developing tools that facilitate new demands. She introduced the projects that the Language Engineering team is working on. Siebrand Mazeland and Niklas Laxström gave a walkthrough of the MediaWiki Translate extension and the translatewiki.net platform, and showcased the new design and features of the updated translation editor.

Santhosh Thottingal presented how the jQuery libraries of Project Milkshake can be used to prepare multilingual web applications for internationalization; he also presented a tutorial on their use. Amir Aharoni demonstrated the easy use of the input methods provided by the jQuery.IME library and how to contribute using phonetic keymaps. He encouraged use in web applications of the currently more than 140 input methods of the library. Yuvaraj Pandian demonstrated how he ported jQuery.IME for use in Android devices.

Alolita Sharma spoke about technologies and tools that help contributing to Wikipedia in various languages. She highlighted the need for features and tools to support non-English Wikipedias and the solutions that the Language Engineering team is developing that would help eliminate fundamental hindrances that contributors face while trying to create content for Indian languages. She also spoke about the other Wikimedia projects that are open for participation.

Workshops

Amir Aharoni conducted a workshop on the jQuery.IME library, in which he demonstrated the procedure to add a new input method and submit it for inclusion on GitHub. A two-hour translation sprint was conducted in which almost 40 participants translated various projects hosted on translatewiki.net. At the end of the session, more than 1000 completed translations were logged and prizes were distributed for the most significant contributions. Yuvaraj Pandian, Sucheta Ghoshal and Harsh Kothari conducted a workshop on building MediaWiki gadgets. Participants were introduced to the process of creating gadgets using JavaScript and CSS, and making them available for other users.

Language Engineering BOF session

The Language Engineering team also organized a session to discuss technical issues related to Wikimedia projects in Indian languages, which was attended by local Wikipedians. Issues related to following up on internationalization and localization bugs and building local technical user groups were discussed.

To conclude, participation in open source conferences such as GNUnify helps get more open source developers as well as language Wikipedians aware of the latest tools that the Language Engineering team is developing which they can use as well as receive direct feedback from the global communities the team serves.

More information can be found in the detailed report.

Runa Bhattacharjee, Outreach and QA coordinator, Language Engineering

Suggesting tasks for new Wikipedians

If you had just signed up to become a Wikipedia contributor, what kind of experience would you like to have? Would you know exactly where to get started, or would you prefer some suggestions?

For most of Wikipedia’s 12-year history, we have done very little to proactively introduce new participants to tasks that are interesting and easy. Right after account creation, for instance, we merely suggest that you check out your preferences. If you look around, you can find guides like Wikipedia:Tutorial. Most of this documentation is focused on the rules and mechanics of how to contribute, rather than suggesting real tasks to try immediately.

Naturally, the kind of people who have tended to thrive in this environment already know what they want to contribute, or are deeply motivated to go and find it. Unless you’ve spotted an error or a missing piece of information, there is little pointing you in the right direction. That lack of direction is a big part of why only about a quarter of all newly-registered accounts complete an edit.

This phenomenon is far from unique to the site, and in fact it would be surprising to hear of any site where 100% of signups become devoted content contributors. However, when considering the enormous workload we face, the sheer waste of human capital is staggering. In English Wikipedia alone, there are…

  • more than 200,000 “citation needed” tags
  • 3,000 articles that need basic copyediting
  • over 14,000 pages that need more wiki links

The list goes on, and these are just the items that have been explicitly added to the backlog. Wikipedia is in fact bursting at the seams with small problems that need fixing.

So how do we match the thousands of people who sign up every day, eager and willing to help, with tasks that are easy to do? That’s the question we’re attempting to solve with our work onboarding new Wikipedians, at the Wikimedia Foundation’s Editor Engagement Experiments team.

(more…)

Getting Wikipedia to the people who need it most

This post has also been published on the blog of the Knight Foundation.

Cellphone user in Mumbai, India

We’re in the middle of an information revolution that’s changing the way billions of people in developing countries obtain news and knowledge. With a $10 cell phone, a high school student in New Delhi or a cab driver in Dakar can access the Internet and — through Wikipedia and other websites — learn volumes about virtually any subject. If knowledge is power, then the developing world, with almost five billion cell-phone subscriptions, is poised to make amazing changes.

There’s just one catch: An overwhelming percentage of new mobile users in India, Senegal and other developing countries can’t afford data charges, so they’re effectively excluded from sites like Wikipedia. It’s a de facto blackout, a kind of information segregation that shunts potential Internet users to the side of a very important road.

That’s why the Wikimedia Foundation, the nonprofit that operates Wikipedia, has established Wikipedia Zero, a program where we partner with mobile operators to give their mobile users free-of-charge access to Wikipedia and its growing trove of 24 million articles.

In 2012, the Wikimedia Foundation signed Wikipedia Zero partnerships with three mobile operators, which is bringing free Wikipedia access to 230 million mobile users in 31 countries. In January of 2013, we signed a fourth partnership that extends Wikipedia Zero to at least 100 million more mobile users in five more countries.

And with the recent support of the Knight News Challenge grant, designed to accelerate media innovation by funding breakthrough ideas in news and information, a series of exciting new developments is on the horizon. We are: speeding up the development of Wikipedia Zero; hastening the development of the software that lets a simple feature phone (the dominant phone in developing countries) connect easily to Wikipedia’s mobile site; augmenting the development of the engineering that, on Wikipedia, makes hundreds of native languages readable from mobile devices; and pioneering a program to give mobile users USSD & SMS access to Wikipedia.

We’re very excited about delivering Wikipedia via text, which we expect to roll out within the next few months. With the program, users will send a text request to Wikipedia and, within seconds, they will get the article to their phone. To deliver this innovative technology, we’re partnering with the Praekelt Foundation, a nonprofit based in Johannesburg, South Africa. It’s another example of the tremendous collaborative spirit that has always driven Wikipedia and always will.

The number of mobile users who can get free access to Wikipedia is increasing rapidly, and so is its usage. In the countries where Wikipedia Zero has already been deployed, Wikipedia readership of local, non-English languages grew upwards of 400 percent in six months. On our partner’s network in Niger, Wikipedia’s mobile traffic increased by 77 percent in the first four months of Wikipedia Zero, compared to 7 percent growth on Niger’s mobile networks that don’t have Wikipedia Zero. In Kenya, the growth from Wikipedia Zero was even higher – 88 percent. The demand is there for much more growth, and word-of-mouth is spreading.

And the movement for access to knowledge is coming from all sides. Last December, a group of 11th-graders at Sinenjongo High School in Cape Town, South Africa, wrote a heartfelt letter to four mobile operators, imploring them to give their South African customers free-of-charge mobile access to Wikipedia. They had learned about Wikipedia Zero, even though the service is not yet available in South Africa. The Cape Town students have the technology in their hands, but they lack the money to pay for data charges. In their letter, which was published in Gadget, an online South Africa magazine that covers consumer technology, the 24 students wrote:

    “We recently heard that in some other African countries like Kenya and Uganda certain cell phone providers are offering their customers free access to Wikipedia. We think this is a wonderful idea and would really like to encourage you also to make the same offer here in South Africa. It would be totally amazing to be able to access information on our cell phones which would be affordable to us.

    Our school does not have a library at all so when we need to do research we have to walk a long way to the local library.  When we get there we have to wait in a queue to use the one or two computers which have the internet.  At school we do have 25 computers but we struggle to get to use them because they are mainly for the learners who do CAT (Computer Application Technology) as a subject. Going to an internet cafe is also not an easy option because you have to pay per half hour. 90% of us have cellphones but it is expensive for us to buy airtime so if we could get free access to Wikipedia it would make a huge difference to us…Our education system needs help and having access to Wikipedia would make a very positive difference. Just think of the boost that it will give us as students and to the whole education system of South Africa.”

Their letter is a reminder that the human spirit craves access to free information. Indeed, I firmly believe that access to free knowledge should be a universal human right. News and knowledge change lives for the better. They always have.

From the beginning of the Wikimedia movement, and more broadly across the free knowledge movement, the goal has been to break down the digital divide, and render barriers to knowledge obsolete. There’s no better time than now to make gigantic inroads in that quest. Eighty percent of all new mobile phone subscribers are in developing countries, according to the United Nations’ International Telecommunication Union. For now, of the 25 countries that have the highest rate of mobile traffic on Wikipedia, 22 are developing countries. The top eight countries are all in Africa.

We will do what it takes to get free knowledge into the hands of students like those in South Africa who are clamoring for it. We will continue partnering with mobile operators who donate their resources to the service of Wikipedia Zero. In the next two years, we will write more blog posts that detail the progress we make in the developing world.

The Knight News Challenge mobile grant is an important milestone in our movement to make free knowledge available to everyone, including every person in the developing world. We see 2013 as a year of significant transition as we make our vision a long-term reality. As I said, access to knowledge should be a human right. And the Wikimedia Foundation is thrilled to be part of the Information Revolution that is bringing free knowledge around the world. We want others to join us, and as the 11th-graders in South Africa have shown us, to also be leaders in this movement. With hard work and true partnership, this dream will become a reality for the students in South Africa, and indeed, everyone, everywhere.

Kul Takanao Wadhwa, head of mobile for Wikimedia Foundation


Report from the Spring 2013 Open Source Language Summit

Fortuna i forti aiuta, e i timidi rifiuta — an Italian proverb

The Wikimedia Foundation and Red Hat jointly organized the Second Open Source Language Summit on February 12th and 13th, 2013. The summit was held at the Red Hat engineering center in Pune, India. Similar to the previous summit, this face-to-face work session was focused on internationalization (i18n) and localization (l10n) features, font support, input method tools, language search, i18n testing methods and standards. The sessions were work sprints, each with special focus on a key area. Participants included core contributors from the Wikimedia Foundation, Red Hat (including Fedora SIG members), KDE, FUEL, Google and C-DAC. Below is a summary of what was accomplished during these two days.

During the summit, teams from different organizations came together to discuss language-related challenges, and worked together on features and tools to address them.

During the summit, teams from different organizations came together to discuss language-related challenges, and worked together on features and tools to address them.

Input Methods

Parag Nemade and Santhosh Thottingal worked on making additional input methods available for the jQuery.IME library. 60 input methods, covering languages like Assamese, Esperanto, Russian, Greek, Hebrew were added bringing the total to 144. Also IMEs from the m17n library missing from the jQuery.IME library were identified.

Translation tools, translatewiki.net & FUEL Sprint

Siebrand Mazeland and Niklas Laxström, together with Ankit Patel, Rajesh Ranjan and Red Hat language maintainers, worked to identify more tools that could be used as Translation aids in a translation system. The FUEL project aims to standardize translations for frequently used terms, translation style and assessment methodology. Until now it has focused mostly on languages of India. The FUEL project can now be translated in translatewiki.net. Pau Giner demonstrated new designs for the translation editor and terminology usage, remotely from Spain.

Language Coverage Matrix

To better evaluate the needs for enabling support for languages, a matrix detailing the requirements and availability of basic and extended features is being drawn up. With 285 languages currently supported in Wikimedia and more than 100 in Fedora, this document will be instrumental in bridging the gaps and porting features across projects and platforms. Key areas of evaluation include input methods, fonts, translation aids like glossaries and spell-checkers, testing and validation methods, etc. A preliminary draft was created during the summit by Alolita Sharma, Runa Bhattacharjee and Amir E. Aharoni.

Fonts, WebFonts

An initiative to document the technical aspects of fonts for scripts for languages spoken in India started during the language summit. For each of the scripts, a reference font will be chosen and each font will be explained in detail to intersect with the Open Type font specification as a standard. It will aim to act as a reference document for any typographer working on Indian language fonts. Initial draft and outline of this document was prepared during the second day of the language summit, mainly by Santhosh Thottingal and Pravin Satpute.

Testing Internationalization Tools

Finding suitable methods for testing internationalized components and contents was the major focus of this sprint, with the Fedora Localization Testing Group (FLTG) and Wikimedia’s Language Engineering team sharing details of their testing methods. The FLTG conducts Test Days prior to Fedora beta releases with a test matrix targeted at specific core components, and Wikimedia uses unit tests for frequent testing of their development features. The FLTG showed its plans to integrate the screenshot comparison method for testing localized interfaces. This method will be useful for Wikimedia too. Extending the method for web-based applications and Wikimedia’s language requirements (e.g. right-to-left) were identified as areas for collaboration.

More news from the Language Summit can be found in the tweets, the session notes and the full report.

Runa Bhattacharjee, Outreach and QA coordinator, Language Engineering