Wikimedia blog

News from the Wikimedia Foundation and about the Wikimedia movement

Technology

News and information from the Wikimedia Foundation’s Technology department (RSS feed).

RfC: Should we support MP4 video on our sites?

A video of a cheetah, captured in slow-motion at 1200 fps. The video was released on Vimeo in MP4 format and converted to OGV format before uploading to Commons. It cannot be viewed in this format on most mobile phones and many web browsers.

The Wikimedia Foundation’s multimedia team seeks your guidance on a proposal to support the MP4 video format. This digital video standard is used widely around the world to record, edit and watch videos on mobile phones, desktop computers and home video devices. It is also known as H.264/MPEG-4 or AVC.

Supporting the MP4 format would make it much easier for our users to view and contribute video on Wikimedia projects. Video files could be offered in dual formats on our sites, so we could continue to support current open formats (WebM and Ogg Theora).

Currently, open video files cannot be viewed on many mobile devices or web browsers without extra software, making it difficult or impossible for several hundred million monthly visitors to watch videos on our sites. Video contributions are also limited by the fact that most mobile phones and camcorders record video only in MP4 format, and that transcoding software is scarce and hard to use by casual users.

However, MP4 is a patent-encumbered format, and using a proprietary format would be a departure from our current practice of only supporting open formats on our sites—even though the licenses appear to have acceptable legal terms, with only a small fee required.

We would appreciate your guidance on whether or not to support MP4 on our sites. This Request for Comments presents views both in favor of and against MP4 support, and hundreds of community members have already posted their recommendations.

What do you think? Please post your comments on this page.

All users are welcome to participate, whether you are active on Commons, Wikipedia, other Wikimedia projects—or any site that uses content from our free media repository. We also invite you to spread the word in your community about this issue.

We look forward to a constructive discussion with you and your community, so we can make a more informed decision together about this important question.

All the best,

Fabrice Florin, Product Manager, Multimedia
On behalf of the Multimedia team

Wikimedia Foundation’s Engineering and Product Group

Wikimedia engineering report, December 2013

Major news in December include:

  • a retrospective on Language Engineering events, including the language summit in Pune, India;
  • the launch of a draft feature on the English Wikipedia, to provide a gentler start for Wikipedia articles.

Note: We’re also providing a shorter, simpler and translatable version of this report that does not assume specialized technical knowledge.
(more…)

A Multimedia Vision for 2016

How will we use multimedia on our sites in three years?

The Wikimedia Foundation’s Multimedia team was formed to provide a richer experience and support more media contributions on Wikipedia, Commons, and MediaWiki sites. We believe that audio-visual media offer a unique opportunity to engage a wide range of users to participate productively in our collective work.

To inform our plans, we’ve created a simple vision of how we might collaborate through multimedia by 2016. This hypothetical scenario was prepared with guidance from community members and is intended for discussion purposes, to help us visualize possible improvements to our user experience over the next three years.

Vision

The best way to view this vision is to watch this video:

Multimedia Vision 2016, presented by Fabrice Florin at a Wikimedia Meetup in San Francisco on Dec. 9, 2013.

(more…)

“Tech News”: Fighting technical information overload for Wikimedians

Every week, tech ambassadors assemble, simplify and translate “Tech News,” a curated newsletter then delivered to hundreds of subscribers across wikis. But how exactly did this start, how does it work behind the scenes, and how does it fit within our efforts to bring developers and users closer together?

Every week, the newsletter is assembled, simplified, translated and delivered across wikis. Read on to learn more about how this works.

Wikimedians, the tens of thousands of volunteers[1] who write and improve articles on Wikipedia and its sister sites, do not like to encounter software bugs and other changes that prevent them from curating the sum of all human knowledge. Quite understandably, they regularly complain that they were not notified of a specific change or new feature that broke their workflow. It’s a perennial topic, and it’s generally brought up independently every few months; the latest occurrence happened just a few weeks ago[2].

And yet, in a movement as transparent as Wikimedia, where almost every document, code change and mailing list is public, the problem is rarely the lack of information. Anyone can look at the more than 5,000 code changes made on average by developers every month[3]; anyone can also contribute to the more than 1,200 issues opened every month[4], or read (and reply to) the more than 1,500 mailing list messages that they exchange[5]. I’m not even mentioning code reviews, real-time IRC chat or edits to the documentation on mediawiki.org, all of which are also public.

No, the problem is rarely that information is kept private; in a situation quite representative of our time, Wikimedians who want to follow technical changes are faced with information overload. We know the information we’re looking for is out there; the problem is how to find it without being overwhelmed, and how to find it before it has consequences for us. (more…)

Wikimedia moving to Elasticsearch

We’re in the process of rolling out new search infrastructure to all of the wikis, so it’s a good time to explain what’s coming to all Wikimedia wikis in the very immediate future, why we’re changing it, and how you can get involved.

Screenshot of the new search box

The new search engine is coming soon to all Wikimedia wikis, and may already be on your favorite wiki

First a bit of background. All Wikimedia sites have been using a home-grown search system based on Apache Lucene since 2005 or 2006. It was written primarily by volunteer Robert Stojnić and is called lucene-search-2. This is a fantastic search engine, which has powered the sites for years now, and has managed to scale very well for the past 8 years or so. Early in 2013 this became a point of significant operational problems; short-term we were able to patch some of the most glaring issues in lucene-search-2 but it became increasingly apparent that a replacement was needed. Robert is no longer around and the system is showing its age.

We’re very happy with Lucene but we wanted to get out of the business of maintaining a special-purpose open-source search system when there are two very good general-purpose open-source search systems available: Solr and Elasticsearch. Both are based on Lucene and horizontally scalable for data and query volume. After experimenting with both and implementing basic MediaWiki integration we chose to settle on Elasticsearch for the following reasons:

  • Elasticsearch’s reference manual and contribution documentation promised an easy start and pleasant time getting changes upstream when we’ve needed to
  • Elasticsearch’s super expressive search API lets us search any way we need to search and gives us confidence that we can expand on it. Not to mention we can easily write very expressive ad-hoc queries when we need to.
  • Elasticsearch’s index maintenance API lets us maintain the index right from our MediaWiki extension, so it’s easier for us to deploy and test, and should be easier for MediaWiki users outside Wikimedia to use. At the time of the choice, Solr’s schema API was read-only.
  • Rack awareness, automatic shard rebalancing, statistics exposed over HTTP, preference for JSON and YML over XML, and first-party Debian packages were also nice.

To provide the integration to MediaWiki, we’ve written a new extension called CirrusSearch that we’ve designed to be mostly backwards-compatible with the current search with the following exceptions:

  • Templates are expanded before indexing so text that comes from templates will be searchable but text inside templates no longer will be.
  • Page updates are reflected in search results pretty quickly after they are made, usually within seconds for single page edits.
  • Wiki communities can mark some pages as higher or lower quality and it will be reflected in the search results.
  • A few new “expert” options have been added (intitle: is negate-able, prefer-recent: etc).

We’ve documented all of these features and more on mediawiki.org, and the page is licensed in the public domain so people can feel free to copy it to their wikis as a basis of documentation.

We plan for this replacement search to be a Beta Feature for all wikis by the end of February and the primary search in March or April. See our ever-evolving timeline for ever-evolving specifics.

We’ve got a lot of exciting things on the horizon now that we’ve got a modern and stable search for Wikimedia. We’re talking Wikidata, Commons metadata, faceting, real cross-wiki searching, etc. Please get involved by filing bugs, talking to us on the project page, or by finding us on IRC and pinging us there. On IRC, you can find us as ^d and manybubbles.

Chad Horohoe and Nik Everett, Wikimedia Foundation

New draft feature provides a gentler start for Wikipedia articles

For most of Wikipedia’s history, we encouraged editors to create new encyclopedia articles by publishing immediately. Just find a page that doesn’t exist, type in content, and after you hit save, it’s shared with the world. This helped Wikipedia grow to the millions of articles it has now, but the project has matured in many ways, and we need additional tools for creating great new encyclopedia articles.

Starting on the English-language Wikipedia, all users (registered or anonymous) now have the option to start drafts before publishing. A draft simply has “Draft:” before the title of the page you’re creating, like this example. Drafts are not visible to readers using Wikipedia’s default search nor in external search engines such as Google, though you may find them using the advanced search options.

Why we need drafts on Wikipedia

Wikipedia’s goal is to be the most comprehensive and reliable reference work in your language, so you might ask why we would encourage people to not publish their articles immediately so readers can enjoy them.

In small Wikipedias like Swahili or Estonian, you’d be right — we’ll probably encourage all authors to skip writing drafts. However, in larger Wikipedias where quality standards are very high, thousands of new articles are deleted (sometimes within just minutes) because they don’t meet essential requirements for what makes a good Wikipedia article.

Our most recent data indicates about 80% of the articles started by brand new users are deleted, when examining Wikipedias in English, Spanish, French, and Russian. By creating a draft, authors will have more time and space to gradually work on a new topic, and can get constructive feedback from other editors. In fact, even advanced Wikipedia editors sometimes use sub-pages of their user profile (sometimes called “sandboxes”) as an unofficial draft space.

We should note that we don’t want drafts to prevent editors from following their curent process for article creation. Wikipedia articles are all works in progress, even after publication, and this fact won’t change any time soon. We’re simply adding another option for people that want the time and space that drafts affords.

What’s next

This is a very early version of drafts on Wikipedia, and frankly it’s missing a great deal of functionality. In the future, we’ll be adding features to drafts that will make them more useful. We’re exploring different design concepts to make it easy to request and provide help during the draft process, better support the publication of drafts as articles (and moving them back to draft state if they need more work), and encourage collaboration between editors.

design comp

Design concepts for Search and editing of Drafts

If you’d like to help us in this effort, please sign up for a usability testing session. In these sessions, we’ll show you prototypes of new features and get your feedback. No prior experience with Wikipedia editing is required!

Pau Giner, User Experience Designer
Steven Walling, Product Manager

OpenDyslexic font now available on Polish Wikipedia

This post is available in 2 languages: Polski 7% • English 100%

English

Screenshot of selecting the OpenDyslexic font

For those who suffer from dyslexia, the simple task of reading can become a monumental struggle. It can be hard to understand exactly what what it means to have dyslexia for those who don’t suffer from it, for this reason the condition can often go unaddressed. Fortunately there is hope in the form of the OpenDyslexic font.

With so much reading being done on computer screens, it is finally possible to help individuals with dyslexia. The OpenDyslexic font changes the shape of characters enough to make reading a lot easier for those who suffer from dyslexia.

Wikipedia supports OpenDyslexic for many languages, but unfortunately not for Polish. At the CEE conference in Modra, Slovakia, we learned that Polish can be supported as well. The request to enable OpenDyslexic was quickly granted and it is now fully supported. We would like to celebrate this occasion with the larger dyslexic community who we hope will benefit from this new feature on Polish Wikipedia.

Gerard Meijssen, Wikimedian

(more…)

Language Engineering Events – Language Summit, Fall 2013

The Wikimedia Language Engineering team, along with Red Hat, organised the Fall edition of the Open Source Language Summit in Pune, India on November 18 and 19, 2013.

Members from the Language Engineering, Mobile, VisualEditor, and Design teams of the Wikimedia Foundation joined participants from Red Hat, Google, Adobe, Microsoft Research, Indic language projects, Open Source Projects (Fedora, Debian) and Wikipedians from various Indian languages. Google Summer of Code interns for Wikimedia Language projects were also present. The 2-day event was organised as work-sessions, focussed on fonts, input tools, content translation and language support on desktop, web and mobile platforms.

Participants at the Open Source Language Summit, Pune India

The Fontbook project, started during the Language Summit earlier this year, was marked to be extended to 8 more Indian languages. The project aims to create a technical specification for Indic fonts based upon the Open Type v 1.6 specifications. Pravin Satpute and Sneha Kore of Red Hat presented their work for the next version of the Lohit font-family based upon the same specification, using Harfbuzz-ng. It is expected that this effort will complement the expected accomplishment of the Fontbook project.

The other font sessions included a walkthrough of the Autonym font created by Santhosh Thottingal, a Q&A session by Behdad Esfahbod about the state of Indic font rendering through Harfbuzz-ng, and a session to package webfonts for Debian and Fedora for native support. Learn more about the font sessions.

Improving the input tools for multilingual input on the VisualEditor was extensively discussed. David Chan walked through the event logger system built for capturing IME input events, which is being used as an automated IME testing framework available at http://tinyurl.com/imelog to build a library of similar events across IMEs, OSs and languages.

Santhosh Thottingal stepped through several tough use cases of handling multilingual input, to support the VisualEditor’s inherent need to provide non-native support for handling language content blocks within the contentEditable surface. Wikipedians from various Indic languages also provided their inputs. On-screen keyboards, mobile input methods like LiteratIM and predictive typing methods like ibus-typing-booster (available for Fedora) were also discussed. Read more about the input method sessions.

The Language Coverage Matrix Dashboard that displays language support status for all languages in Wikimedia projects was showcased. The Fedora Internationalization team, who currently provides resources for fewer languages than the Wikimedia projects, will identify the gap using the LCMD data and assess the resources that can be leveraged for enhancing the support on Desktops. Dr. Kalika Bali from Microsoft Research Labs presented on leveraging content translation platforms for Indian languages and highlighted that for Indic languages MT could be improved significantly by using web-scale content like Wikipedia.

Learn more about the sessions, accomplishments and next steps for these projects from the Event Report.

Runa Bhattacharjee, Outreach and QA coordinator, Language Engineering, Wikimedia Foundation

Wikimedia engineering report, November 2013

Major news in November include:

Note: We’re also providing a shorter, simpler and translatable version of this report that does not assume specialized technical knowledge.

(more…)

Adding musical scores to Wikimedia

Sound and musical content have long trailed behind other subjects on Wikipedia, but that is beginning to change with a new musical scores extension for MediaWiki, the software running Wikipedia and thousands of other wikis. The Score extension was added to a MediaWiki deployment earlier this year and allows users to render musical scores as PNG images and transform them into audio and MIDI files.

Score utilizes the free music-engraving program LilyPond to produce musical notations and insert them into wiki code. This code is then passed on to a LilyPond renderer, which produces images that can be uploaded to Wikipedia articles. “This is somewhat similar to the way mathematical formulas are rendered in Wikipedia,” said Markus Glaser, a Wikimedian who helped develop the extension and gave a presentation on musical scores at Wikimania in 2012. Glaser said it made sense to use LilyPond because, in addition to being free and open source, “it’s text-based, can be easy, but possesses the complexities needed to fit the needs of advanced and professional notation.”

Over time, the hope is to expand on this extension and grow it into a viable resource, encouraging music teachers, music historians and the musicology community to use Score to share their knowledge.

“Studying music on the Internet is something that remains a bit confusing and fragmented. If you are after a musical performance, you can try and hunt one down on YouTube, Spotify or other similar sites,” said Chris Keating, Chair of Wikimedia UK and an amateur violinist, who explained how many of the necessary tools to analyze music still remain largely absent on the Internet. “If you’re after free sheet music, you will probably end up looking on IMSLP. And finally, if you want to read about something, say, music theory, you are likely to come to Wikipedia.”

After setup, users can embed simple LilyPond notation into wikitext using score tags.

(more…)