Wikimedia blog

News from the Wikimedia Foundation and about the Wikimedia movement

MediaWiki

Language Engineering Events – Language Summit, Fall 2013

The Wikimedia Language Engineering team, along with Red Hat, organised the Fall edition of the Open Source Language Summit in Pune, India on November 18 and 19, 2013.

Members from the Language Engineering, Mobile, VisualEditor, and Design teams of the Wikimedia Foundation joined participants from Red Hat, Google, Adobe, Microsoft Research, Indic language projects, Open Source Projects (Fedora, Debian) and Wikipedians from various Indian languages. Google Summer of Code interns for Wikimedia Language projects were also present. The 2-day event was organised as work-sessions, focussed on fonts, input tools, content translation and language support on desktop, web and mobile platforms.

Participants at the Open Source Language Summit, Pune India

The Fontbook project, started during the Language Summit earlier this year, was marked to be extended to 8 more Indian languages. The project aims to create a technical specification for Indic fonts based upon the Open Type v 1.6 specifications. Pravin Satpute and Sneha Kore of Red Hat presented their work for the next version of the Lohit font-family based upon the same specification, using Harfbuzz-ng. It is expected that this effort will complement the expected accomplishment of the Fontbook project.

The other font sessions included a walkthrough of the Autonym font created by Santhosh Thottingal, a Q&A session by Behdad Esfahbod about the state of Indic font rendering through Harfbuzz-ng, and a session to package webfonts for Debian and Fedora for native support. Learn more about the font sessions.

Improving the input tools for multilingual input on the VisualEditor was extensively discussed. David Chan walked through the event logger system built for capturing IME input events, which is being used as an automated IME testing framework available at http://tinyurl.com/imelog to build a library of similar events across IMEs, OSs and languages.

Santhosh Thottingal stepped through several tough use cases of handling multilingual input, to support the VisualEditor’s inherent need to provide non-native support for handling language content blocks within the contentEditable surface. Wikipedians from various Indic languages also provided their inputs. On-screen keyboards, mobile input methods like LiteratIM and predictive typing methods like ibus-typing-booster (available for Fedora) were also discussed. Read more about the input method sessions.

The Language Coverage Matrix Dashboard that displays language support status for all languages in Wikimedia projects was showcased. The Fedora Internationalization team, who currently provides resources for fewer languages than the Wikimedia projects, will identify the gap using the LCMD data and assess the resources that can be leveraged for enhancing the support on Desktops. Dr. Kalika Bali from Microsoft Research Labs presented on leveraging content translation platforms for Indian languages and highlighted that for Indic languages MT could be improved significantly by using web-scale content like Wikipedia.

Learn more about the sessions, accomplishments and next steps for these projects from the Event Report.

Runa Bhattacharjee, Outreach and QA coordinator, Language Engineering, Wikimedia Foundation

The Autonym Font for Language Names

When an article on Wikipedia is available in multiple languages, we see the list of those languages in a column on the side of the page. The language names in the list are written in the script that the language uses (also known as language autonym).

This also means that all the appropriate fonts are needed for the autonyms to be correctly displayed. For instance, an article like the one about the Nobel Prize is available in more than 125 languages and requires approximately 35 different fonts to display the names of all the languages in the sidebar.

Language Autonyms

Initially, this was handled by the native fonts available on the reader’s device. If a font was not present, the user would see square boxes (commonly referred to as tofu) instead of the name of a language. To work around this problem, not just for the language list, but for other sections in the content area as well, the Universal Language Selector (ULS) started to provide a set of webfonts that were loaded with the page.

While this ensured that more language names would be correctly displayed, the presence of so many fonts dramatically increased the weight of the pages, which therefore loaded much more slowly for users than before. To improve client-side performance, webfonts were set not to be used for the Interlanguage links in the sidebar anymore.

Removing webfonts from the Interlanguage links was the easy and immediate solution, but it also took us back to the sup-optimal multilingual experience that we were trying to solve in the first place. Articles may be perfectly displayed thanks to web fonts, but if a link is not displayed in the language list, many users will not be able to discover that there is a version of the article in their language.

Autonyms were not needed just for Interlanguage links. They were also required for the Language Search and Selection window of the Universal Language Selector, which allows users to find their language if they are on a wiki displaying content in a script unfamiliar to them.

Missing font or “tofu”

As a solution, the Language Engineers came up with a trimmed-down font that only contains the characters required to display the names of the languages supported in MediaWiki. It has been named the Autonym font and will be used when only the autonyms are to be displayed on the page. At just over 50KB in size, it currently provides support for nearly 95% of the 400+ supported languages. The pending issues list identifies the problems with rendering and missing glyphs for some languages. If your language misses glyphs and you know of an openly-licensed font that can fill that void, please let us know so we can add it.

The autonym font addresses a very specific use case. There have been requests to explore the possibility of extending the use of this font to similar language lists, like the ones found on Wikimedia Commons. Within MediaWiki, the font can be used easily through a CSS class named autonym.

The Autonym font has been released for free use with the SIL Open Font License, Version 1.1.

Runa Bhattacharjee, Outreach and QA coordinator, Language Engineering, Wikimedia Foundation

Get introduced to Internationalization engineering through the MediaWiki Language Extension Bundle

The MediaWiki Language Extension Bundle (MLEB) is a collection of MediaWiki extensions for various internationalization features. These extensions and the Bundle are maintained by the Wikimedia Language Engineering team. Each month, a new version of the Bundle is released.

The MLEB gives webmasters who run sites with MediaWiki a convenient solution to install, manage and upgrade language tools. The monthly release cycle allows for adequate testing and compatibility across the recent stable versions of MediaWiki.

A plate depicting text in Sanskrit (Devanagari script) and Pali languages, from the Illustrirte Geschichte der Schrift by Johann Christoph Carl Faulmann

The extensions that form MLEB can be used to create a multilingual wiki:

  • UniversalLanguageSelector — allows users to configure their language preferences easily;
  • Translate — allows a MediaWiki page to be translated;
  • CLDR — is a data repository for language-specific locale data like date, time, currency etc. (used by the other extensions);
  • Babel — provides information about language proficiency on user pages;
  • LocalisationUpdate — updates MediaWiki’s multilingual user interface;
  • CleanChanges — shows RecentChanges in a way that reflects translations more clearly.

The Bundle can be downloaded as a tarball or from the Wikimedia Gerrit repository. Release announcements are generally made on the last Wednesday of the month, and details of the changes can be found in the Release Notes.

Before every release, the extensions are tested against the last two stable versions of MediaWiki on several browsers. Some extensions, such as UniversalLanguageSelector and Translate, need extensive testing due to their wide range of features. The tests are prepared as Given-When-Then scenarios, i.e. an action is checked for an expected outcome assuming certain conditions are met. Some of these tests are in the process of being automated using Selenium WebDriver and the remaining tests are run manually.

The automated tests currently run only on Mozilla Firefox. For the manual test runs, the Given-When-Then scenarios are replicated across several web browsers. These are mostly the Grade-A level supported browsers. Regressions or bugs are reported through Bugzilla. If time permits, they are also fixed before the monthly release, or otherwise scheduled to be fixed in the next one.

The MLEB release process allows several opportunities for participation in the development of internationalization tools. The testing workflow introduces the participants to the features of the commonly-used extensions. Finding and tracking the bugs on Bugzilla familiarizes them with the bug lifecycle and also provides an opportunity to work closely with the developers while the bugs are being fixed. Creating a patch of code to fix the bug is the next exciting step of exploration that the new participants are always encouraged to continue.

If you’d like to participate in testing, we now have a document that will help you get started with the manual tests. Alternatively, you could also help in writing the automated tests (using Cucumber and Ruby). The newest version of MLEB has been released and is now ready for download.

Runa Bhattacharjee
Outreach and QA coordinator, Language Engineering, Wikimedia Foundation

Call for Wikimedia tech projects needing contributors

Wikimedia tech needs you!

The current round of Google Summer of Code and FLOSS Outreach Program for Women is about to end, and it’s time to start a new cycle of mentored projects in Wikimedia tech.

Check and contribute to the list of Possible Projects on mediawiki.org if you are:

  • editors on a Wikimedia project awaiting a specific software feature;
  • an organization with budget for tech activities looking for a short term goal;
  • a tech contributor with a cool idea for Wikimedia projects or MediaWiki in general.

Even if software development is a prominent activity, we also encourage proposals focusing on other technical areas: quality assurance, design, sysadmin, promotion, etc.

Post your proposal soon, edit it often. By submitting a proposal to the Possible Projects page you get attention and help from the tech community in the form of reality checks and contacts with possible mentors, interested projects and funding sources. 21 projects were selected in our last round.

We keep searching for more opportunities to channel these projects, both within the Wikimedia movement (Individual Engagement Grants, chapters…) and out there (organizations encouraging free software and diversity in tech).

We want to hear your feedback! Use the Possible Projects discussion page or comment below.

Quim Gil
Technical Contributor Coordinator (IT Communications Manager), Wikimedia Foundation

Translate the user interface of Wikipedia’s new VisualEditor

The VisualEditor beta release is being gradually rolled out to all Wikipedia editors in all languages. This is one the most exciting developments in the history of Wikipedia, because it will make editing the site accessible to the general public, rather than just to the people who have the patience to learn Wikipedia’s arcane markup language.

To make this accessibility really complete, however, the VisualEditor’s user interface needs to be completely translated to all the languages in which there is a Wikipedia. Its interface includes over a hundred new strings, and if they aren’t translated, they will appear in a foreign language on that Wikipedia (i.e. English text on Polish Wikipedia).

Take a look at the translation statistics for the VisualEditor. As you can see, the translation to a lot of important languages is far from complete or entirely absent: Arabic, Portuguese, Hindi, Swahili, Hungarian, Bulgarian, Tagalog, Urdu, Lithuanian, and many others. If you know a language in that list and the translation to it is not at 100 percent, please click the language name and complete the translation. (You’ll have to create an account at translatewiki.net, if you don’t have one already.)

The article Vilnius in the Lithuanian Wikipedia

The article “Vilnius” in the Lithuanian Wikipedia, being edited in the VisualEditor. Note that most of the buttons are written in Lithuanian, but the buttons on the toolbar are in English: “Edit source”, “Page settings”, “Cancel”, “Save page”, “Paragraph”. These buttons weren’t translated yet, so they are unusable for people who don’t know English.

Even if the translation to your language is currently complete, please check your language’s page every few days—the VisualEditor beta is in very active development, the messages to translate are updated literally every day, and you want your language to be at 100 percent all the time.

This is also an opportunity to thank the hundreds of translatewiki.net contributors, who work quietly, but persistently, and make MediaWiki and its extensions into one of the most thoroughly localized pieces of software ever.

If you haven’t joined the translatewiki.net community yet, you are very welcome!

Amir E. Aharoni
Software Engineer, Language Engineering team, Wikimedia Foundation

The future of third-party releases on MediaWiki

MediaWiki, the software that powers the Wikimedia movement sites, is a remarkable piece of engineering. Not only does it support the very specific use cases of the various projects (Wikipedia, Wikimedia Commons, Wikisource, etc), but it is also used by many other individuals and organizations spanning the entire range of institutional size, from the biggest multinational firm to the smallest boutique site. Those third-parties (users outside Wikimedia) are an important part of the MediaWiki ecosystem, as they provide added testing and development time to the project.

MediaWiki-notext.svg

Today, the Wikimedia Foundation is pleased to announce that we have contracted with two long-time members of the MediaWiki community–Mark Hershberger and Markus Glaser–to manage and drive forward third-party focused releases of MediaWiki.

Over a month ago, the Wikimedia Foundation sent out a request for proposals (RFP) to help us fill an important and underserved role in the development of MediaWiki. Two very solid proposals were produced, the community weighed in with questions and comments, and there was an open IRC office hour with the authors and interested community members. The Wikimedia Foundation is pleased with the outcome of this RFP and excited to begin this new chapter in the life of MediaWiki.

Mark and Markus bring a wealth of knowledge to this endeavor, as they are both MediaWiki contractors helping others set up, use, and do unique customizations of MediaWiki on a daily basis. They certainly know what third-party users of MediaWiki want.

“We are excited to work with the Foundation to enable the community of developers to respond in a more agile way to third-party MediaWiki users,” Mark said about the opportunity. ”Together, we will develop the next generation of MediaWiki software and build strategic and lasting relationships with Open Source organizations and third-party wikis.”

Mark and Markus will be working on various things, especially leading the efforts to make new tarball releases of the software, to improve the continuous integration infrastructure, to shepherd through changes to extension maintenance, and to collaborate with others as they add documentation for third-party users. All of their progress will be documented on the MediaWiki wiki for others to follow along and help.

Please join me in congratulating Mark and Marcus. We’re very excited to see what they will accomplish!

Greg Grossmeier
Release Manager, Wikimedia Foundation

Updates from the Language Engineering Google Summer of Code projects

Google Summer of Code (GSoC) 2013 is well underway on the coding phase. Four projects this year are related to various aspects of MediaWiki and Wikimedia internationalization initiatives. On completion of these 4 projects, we expect to present:

These projects are being mentored by members of the Wikimedia Language Engineering team along with members from the WMF Mobile and VisualEditor teams. In this post, we touch base with each of the projects about the challenges that they have faced so far and on their accomplishments.

MediaWiki VisualEditor internationalization and right-to-left languages support

A screenshot of a draft version of the VisualEditor language inspector.

Moriel Schottlender is working on adding better support for non-English languages to the VisualEditor. Her project consists of two main parts. The first is triaging and fixing bugs in handling right-to-left text reported by volunteer editors who test the currently deployed version of the editor in languages such as Arabic and Hebrew. She has already fixed several bugs such as moving the cursor in the correct direction using the arrow keys and adapting the design of VisualEditor’s dialog boxes to right-to-left layout. The other part of the project is developing a “language inspector”–a tool for setting the language of a piece of text in an article. This is needed very frequently in Wikipedias in all languages to set properties such as font, size or direction of a foreign name or quotation. Nowadays it is done using a multitude of templates and HTML tags, and Moriel’s project will make it easy and unified.

jQuery.IME extensions for Firefox and Chrome

Part of Project Milkshake, jQuery.ime is an input method library. Making a good start, Praveen Singh has already implemented working jQuery.ime extensions for both Chrome and Firefox. Rather than loading all input methods at once, input method scripts are loaded only when the user selects a particular input method. The jQuery.ime and jQuery.uls upstream projects were added as git submodules in the extensions. This will provide a way to synchronize the extensions with the upstream projects in the future by simply updating the respective submodule. Universal Language Selector (ULS) has been successfully integrated in the extensions, thus providing the users with an easy way to choose among different languages. The extension remembers a user’s most recently selected languages and their corresponding input methods, and offers an easy way to choose among those languages.

Language Coverage Matrix Dashboard

The Language Coverage Matrix dashboard was a project conceived during the last Open Source Language Summit held in Pune, India earlier this year. The project began as a shared spreadsheet and over the next few months, the data was filtered to uniquely identify the internationalization support status for each language and its variants. The dashboard will provide an interface to search this data and present visual data representations. The test instance set up by Harsh Kothari on wmflabs, showcases the search features. The spreadsheet data has been ported to a MySQL database. Plans for the coming few weeks include additional query implementation, code refactoring and start of the visualization representations.

Phone app for MediaWiki translation

The Translate extension is used for translating MediaWiki content. This project will bring the convenience of this widely used extension as an Android app. The iPhone app created by the student Or Sagi, is already available for download and testing. The similarly designed Android app will provide features for translation and proofreading. Due to conflicts with examination schedules, major work on this project will effectively start from late July.

Getting more updates

More details about the progress of each of these projects can be found on the project home pages. The students and mentors also meet up every week for demos. Please let them know if you’d like to be part of any of these sessions.

Runa Bhattacharjee
Outreach and QA coordinator, Language Engineering, Wikimedia Foundation

Universal Language Selector (ULS) deployed on more than 150 wikis

The Universal Language Selector (ULS), a MediaWiki extension to configure language settings, has now been deployed on more than 150 Wikimedia wikis. Deployment started with the first phase on June 11, and over 5 phases it will be made available on all Wikimedia wikis. ULS replaces the extensions Narayam and WebFonts which were used to configure input and font settings respectively.

Click on the image to play a short video about typing in Telugu using ULS; created by Subhashish Panigrahi

During the last two development sprints, the Wikimedia Language Engineering team worked with the wiki communities to resolve critical bugs and enhancement requests, and to test features, and we communicated widely while completing 2 more phases of deployment. In the 2nd and 3rd phases, deployment was completed for Wikipedia wikis ranked in size between 11-20 and for wikis with no language versions.

User setting features were tested in major web-browsers and operating systems. This includes recent versions of Mozilla Firefox, Google Chrome, Internet Explorer, Safari and Opera. The tests were manually done using a test plan based on user scenarios for different functionality.

The announcement on Meta-Wiki is also available in more than 40 languages. This announcement page will be used for all deployment phases. Prior to deployment, all the updated wikis were informed about the this change simultaneously on their respective community portals.

Besides the wikis, the current version of the ULS can be viewed and tested on the beta installation of the English language Wikipedia. Bug fixing and integration testing will continue. The test steps can be used to verify the functionality.

Next phase of deployment

On July 2, 2013, ULS will be deployed on the English Wikipedia. More information about the Universal Language Selector can be found on the feature page, and the FAQ. Feedback can be sent using Bugzilla, via the mailing list, and on IRC (#mediawiki-i18n on freenode). The deployment of this extension was also discussed and queries were addressed during our last office hour (complete log of discussion). The Language Engineering team will be following up with more announcements, translation requests and Village Pump/Community Portal interactions to optimize the Universal Language Selector on the wikis.

Runa Bhattacharjee, Outreach and QA coordinator, Language Engineering

Universal Language Selector coming to all wikis

The Universal Language Selector (ULS) provides a flexible way to configure and deliver language settings like interface language, fonts, and input methods (keyboard mappings). It combines the features of two earlier Mediawiki extensions Narayam and WebFonts. From June 11, 2013 on, ULS will be made available to all Wikimedia wikis in 5 phases.

In the first phase, ULS will replace the Narayam and WebFonts extensions on 84 wikis. User preferences from the replaced extensions will not be preserved. Affected communities will be notified by the Wikimedia Language Engineering team of the upcoming change.

In the 5 weeks that follow, ULS will be deployed on Wikipedias in size 11-20 (phase 2), all projects without language versions (phase 3), English language Wikipedia (phase 4) and all other wikis (phase 5).

The Universal Language Selector can be visible in two ways: In the sidebar for wikis with language versions, like Wikipedia, or in the personal toolbar at the top of wiki pages for wikis without language versions, like Wikimedia Commons and Meta-Wiki. Based on the geographic location of users, the initial set of language preferences is presented. Users can set the input methods and fonts to that they want to use. Logged-in users can also change the language for the MediaWiki menu items.

Universal Language Selector is already available on several Wikimedia wikis like Wikimedia Commons and Meta-Wiki. The appearance on wikis like Wikipedia is available in the beta installation of the English language Wikipedia on Wikimedia Labs. A cog icon is present in the “Languages” section of the sidebar menu. Clicking the icon opens the Language settings panel that can be used to set the display and input settings.

Please have a look at the Universal Language Selector feature description or the Frequently Asked Questions for more detailed information.

Runa Bhattacharjee, Outreach and QA coordinator, Language Engineering

Volunteers and staffers teach, learn, create at Amsterdam hackathon

149 participants from 31 countries came to Amsterdam in late May to teach each other and improve Wikimedia technology.

developers near the sticky-note wall

Developers work near sticky-notes representing topics and ideas at the Amsterdam hackathon in May 2013.

Technologists taught and attended sessions on how to write and run a bot, use the new Lua templating language, how to move from Toolserver to the new Wikimedia Labs, design, Wikidata, security, and the basics of Git and Gerrit. Check out the workshops page for slides, tutorials, and other reference material; videorecordings of sessions are due for uploading to Wikimedia Commons soon.

Wikimedia Netherlands, Wikimedia Germany, and the Wikimedia Foundation subsidized travel and accommodation for dozens of participants, enabling the highest participation in this event’s history. As one subsidized participant wrote, “One of the wonderful things about the Wikipedia world is the support given to the volunteers from the different chapters and the parent Wikimedia Foundation to promote community growth and building awesome stuff that the whole world can use….It’s such surprises that makes one love contributing to open source.” Organizers also put together a social events program that included a boat cruise of Amsterdam’s canals.

Participants are still listing what they accomplished or learned during the event, but here’s a sample:

  • The Wikimaps project aims to present historical maps on Wikimedia sites, and to work together with OpenStreetMap Historic “to find a common way to model historical geodata” (more details). Maps aficionados discussed the project and made plans in Amsterdam. One volunteer, Arun Ganesh, wrote a prototype wiki atlas: an interactive SVG file that comes with automatic labelling (details).
  • Moritz Schubotz, a volunteer, worked on improving search and math functionality in MediaWiki.
  • The Foundation testing and quality assurance team improved test coverage and the test environment, and taught other participants how to do QA for Wikimedia.
  • Pau Giner, a designer at the Foundation, wrote code to use an SVG for the collapsible section arrow in MediaWiki’s Vector skin. This will make the image less fuzzy-looking.
  • two technologists at Amsterdam hackathon

    A WMF staffer holds a microphone to amplify a volunteer’s voice during the closing demo session at the Amsterdam hackathon.

    User:Ruud Koot wrote a Wikivoyage listing editor that will make it easier to improve the specific parts of a travel suggestion without having to load the whole page.

  • Several volunteers worked on the account creation tool and process for English Wikipedia, to help the ACC team deal with prospective editors who have not been able to create an account via the web interface. The improved tool (code) streamlines the workflow, helping volunteers do their work faster.
  • A group of staffers and volunteers interested in statistical data improved the User Metrics API‘s reliability and security. Another wrote a proof-of-concept MediaWiki extension enabling editors to embed Limn graphs in wiki pages via wikitext.

So far, 90 participants have submitted the post-event survey and results are largely positive, with (of course) several suggestions for improvements in the future. For instance, next year, organizers should help trainers prepare more, and help participants with common interests find and work with each other more easily.  We don’t yet know where or when next year’s developer meeting will be, but it’ll happen; subscribe to the low-traffic wikitech-announce mailing list to hear when it’s settled.

You may also wish to read the Wikipedia Signpost report on the event.

Thanks are due to staffers at the Wikimedia Foundation, Wikimedia Netherlands, and Wikimedia Germany who made the event possible, and to volunteers who ran the event, especially lead Maarten Dammers.  And thanks to all the participants who gave up their weekend to make our sites better.

Sumana Harihareswara
Engineering Community Manager, Wikimedia Foundation