Wikimedia blog

News from the Wikimedia Foundation and about the Wikimedia movement

Labs

Preparing for the migration from the Wikimedia Toolserver to Tool Labs

Last week-end, the Wikimedia Hackathon took place in Amsterdam, and we notably worked on the migration from the Wikimedia Toolserver to Tool Labs. According to the Roadmap, Tool Labs will have all the necessary features by the end of June 2013. As of then, tool maintainers will have one year to migrate their software. By the end of 2014, the Toolserver will be decommissioned.

What is working?

First of all: Tool maintainers do not have to care about virtual instances or other background stuff. You will develop on servers similar to the Toolserver, e.g. with the infrastructure for web services. The servers are running Ubuntu Precise. At the moment, there are replicas of six (out of seven) database clusters. The last one (with CentralAuth) is due in June. You can already work with all the big and many small Wikipedias, with Commons and Wikidata. You have access to all the data that is visible to registered users without special privileges in a wiki. You can create your own user databases. According to the first experiences, Tool Labs is fast.

In addition to home directories, you also have shared project storage. Tool Labs wants to make it as easy as possible to develop software together, which is why you can add others to your projects via the web interface. There is also a time travel feature! You can reset your files to the state of the last three hours, the last three days and the last two Sundays. The job system that is used in Tool Labs is OpenGridEngine. You can find an intro on the Tool Labs help page. Bugs can be reported in Wikimedia’s Bugzilla: Please use the product “Wikimedia Labs” and one of the components “Tools” or “Bots”. If you miss software in Tool Labs that could be of interest for others, too, please file a bug!

What about “Tools” versus “Bots”?

These are the names of the two projects Tool Labs consists of. The larger environment (Wikimedia Labs) is organized in projects, two of which form Tool Labs. They are an environment inside Labs that is customized for Toolserver users. The naming might be a bit misleading: The difference between “Tools” and “Bots” is not what you run in which project, but that you can run your tools in two different environments. The “Tools” project is a stable environment maintained by four admins (one of them a volunteer). There are no experiments with software versions here. In contrast to this, the “Bots” project is a more flexible environment in which you can play with changes in the environment itself, too. Here, it will be easier to get root access. (If you are interested, ask on the mailing list.)

Open tasks?

Apart from the open tasks on the roadmap, the documentation needs improvement. The pioneers among you can help others a lot by documenting experiences. Magnus Manske and Russell Blau have started to lead by example by adding a lot of documentation, and you can help as well! We are thinking about how to redirect deprecated links to migrated tools in the easiest way possible. The Tool Labs user interface also needs some love; feel free to come to us if you want to help here!

If you run into problems or have questions when migrating tools, be bold and ask! The best places are the labs-l mailing list or the IRC channel #wikimedia-labs connect. The admins’ nicks are Coren and petan. There is also a list of Frequently Asked Questions that you can expand. And finally: If you find that your tool needs more adaptation than you think you can manage on your own, talk to Johannes Kroll or myself at Wikimedia Deutschland for support!

Silke Meyer
Projektmanagerin für den Toolserver, Wikimedia Deutschland

Opening our operations with Wikimedia Labs

For the past year and a half we’ve been working on a project named Wikimedia Labs, which enables us to invite our community to contribute to how our sites are run. Labs is a cloud computing environment using OpenStack for development, testing and deployment of Wikimedia’s infrastructure as a whole, enabling us to treat our infrastructure as an open source software project.

The problems we’re solving

When Wikipedia and its sister projects started, volunteers had root level access on our infrastructure. They were the only roots and most of the infrastructure they built is still in use today. Our lenient access policy made us flexible, so changes could happen quickly. Also, the sites were smaller, had far fewer users, and large, fundamental changes could be made in production.

Growth has made us less willing to give out root access to volunteers. Because of the size of our sites, downtime is less acceptable. But having fewer volunteers means we have less ideas, and due to that, our ability to make changes quickly is decreased. We haven’t had a new volunteer root in years. We haven’t even had a new volunteer with shell access. Engaging volunteers and enabling them to easily contribute is a wider problem as well.

Our software development community scales with volunteers. Unfortunately, operations doesn’t scale in a similar way right now. We’re limited to the staff operations engineers we currently have. The staff is great, but the fact that operations can’t scale to meet the needs of a large growth of developers means that operations is a bottleneck. Furthermore, our access policy prevents volunteer developers from learning how our infrastructure works.

This leads to a situation where our staff developers and volunteer developers can’t easily collaborate. Our volunteers also have no way of appropriately testing their changes, since our infrastructure is complex and difficult to replicate. This means it’s harder to take contributions, which further slows the pace of changes on our sites.

(more…)

A profile in free collaboration

Wikimedia Foundation operations engineer Ryan Lane. Photo by Victor Grigas, CC-BY-SA 3.0.

Most top websites have thousands of software developers on staff, creating new features and keeping the site running securely. The Wikimedia Foundation has about forty. That’s pretty amazing, considering Wikipedia is the fifth most popular web property in the world. So, what’s our secret?

Well, we don’t have any secrets.

We make everything free, in every sense of the word. The technology we operate has been built by thousands of people around the world who collaborate freely and build upon each other’s contributions. Every article, every picture, every piece of code is free for anyone to use, reuse, copy, distribute and improve.

“Other tech companies wouldn’t share their installation, configuration or system documentation,” said Ryan Lane, an operations engineer at the Wikimedia Foundation. These proprietary data are the competitive advantage most websites have over their peers and they guard them dearly. “Wikipedia documents and shares all of that.”

“No other organization of our scope would dream of being this open,” he added. “It is our fundamental organizing principle.”

Lane understands what it means to operate in secret. Before coming to the Foundation, he spent six years working on classified projects for the U.S. government at the Naval Oceanographic Office (NAVO). He was forbidden from speaking about his work.

“In the government, I wouldn’t be allowed to talk about any of it. Not being able to talk about anything I do is really painful,” he said of working in a closed environment. “The ability to share everything is very freeing.”

Lane hails from New Orleans, where he studied computer science at the University of New Orleans. He has been with the Foundation for nearly two years, managing web infrastructure to ensure that Wikimedia projects become more reliable and efficient. For Lane, working in an open-source and transparent environment is what makes his work meaningful.

“In computer science, it’s very difficult not to be able to share your knowledge with other people. The way I learned most of the things I know is because people shared their expertise with me,” he noted.

Because the Wikimedia sites are so open, according to Lane, it’s much easier to collaborate with the community. In addition to the roughly 40 software developers on staff at the Foundation, there are more than 200 regular volunteer developers improving MediaWiki software, the backbone of Wikipedia and thousands of other wikis.

Lane manages Wikimedia Labs, a project that was created to allow volunteers to make contributions to MediaWiki development, tools and analytics. Working in an open environment means Lane can not only talk about a problem, he can give a total stranger a replica of our configuration system so they can help change and improve our operations infrastructure.

At a recent hackathon in San Francisco, Lane said, a programmer who had never previously worked within Wikimedia’s environment fixed a bug in the logging infrastructure behind our https site. His code was good and Lane pushed it to production. “It was running live within a few hours,” he said.

According to Lane, in a closed environment everyone has to do everything themselves, which requires more people on the whole for the organization. Or you have to pay “a lot of money to get support to come and help you, and the support is generally subpar in comparison.”

When asked whether being so transparent was a security liability, Lane argued the value of open-source was more significant than the risk of someone hacking the projects.

“It might be a little crazy to share our server configuration,” Lane admitted. “To a point, it does make us more vulnerable, but I think there’s enough benefit in it to outweigh the worries about the vulnerabilities.”

(For more information about Lane’s work with the Wikimedia projects, read his blog here.)

Reporting and story by Elaine Mao and Jordan Hu
Communications Department Interns

Techies learn, make, win at Foundation’s first San Francisco hackathon

Participants at the San Francisco hackathon in 2012

Participants at the San Francisco hackathon in January 2012

In January, 92 participants gathered in San Francisco to learn about Wikimedia technology and to build things in our first Bay Area hackathon.

After a kickoff speech by Foundation VP of Engineering Erik Möller (video), we led tutorials on the MediaWiki web API, customizing wikis with JavaScript user scripts and Gadgets, and building the Wikipedia Android app.  (We recorded each training; click those links for how-to guides and videos.)  We asked the participants to self-organize into teams and work on projects.  After their demonstration showcase, judges awarded a few prizes to the best demos.

(more…)

Beta cluster allows Wikimedians to test upcoming software on Labs before deployment

Over the last few weeks, we’ve set up a test environment on Wikimedia Labs to replicate our production cluster and test new software before it’s deployed to Wikimedia sites. This will notably allow us to identify issues with the upcoming version of MediaWiki (1.19) before its deployment — but we need your help.

In case you haven’t heard yet, Wikimedia Labs is a platform aimed to make it easier for developers and system administrators to try out improvements to Wikimedia infrastructure, including MediaWiki, and to do analytics and bot work.

In the past, we’ve used prototype wikis to set up testing environments for upcoming releases of MediaWiki or to test new features. This has been helpful, but has suffered from lack of ongoing maintenance.

Over the holidays, I had the idea — with the upcoming 1.19 release, and the Labs servers newly online and available for non-WMF staff — of using Wikimedia Labs to duplicate the production cluster’s configuration in the Labs environment, and work with volunteers to help maintain this environment.

I particularly want to thank the following people for their work on this project:

  • Petr Bena been driving this almost all the way. He started setting this environment, the servers, apache configuration, and has been helping to keep it going on a pretty consistent basis.
  • John du Hart came along after Petr had already begun and lent his experience with setting up wiki farms. With his help, we put together a really great configuration that more closely duplicates what is in production.
  • Oren Bochman has stepped in to get search working on our micro-cluster. On Wikimedia sites, search has always relied on the help of volunteers. While we don’t yet have search working, Oren has helped us document the search back-end — which will help others set up search like we have on the cluster — and has already started to help us build the next generation of search.

Join in now to identify issues before they reach your wiki

We’ve recently opened this up for the real testing, so now is the time to jump in. Please look at the cluster’s SiteMatrix and find wikis to test. Try reading, editing, using your favorite gadgets, and so on as you normally would; treat it as a giant sandbox. If you find a problem, please report it on the problem reports page.

With your help, we can make the upcoming upgrade smoother.

Mark A. Hershberger, Bugmeister

Tech meetup moves Wikimedia infrastructure forward

Earlier this month, about thirty MediaWiki developers and interested technologists gathered in New Orleans to learn and to work on Wikimedia’s technical infrastructure.  We made broad progress on the infrastructure of innovation at Wikimedia (notes).  Specifically:

NOLA Hackathon 16

Tim Starling and DJ Bauch driving towards greater media file storage system independence and robustness

  • We are now much closer to officially opening the doors to Wikimedia Labs and giving far more people the ability to contribute to MediaWiki without having to set up and maintain their own development environments at home.  Wikimedia Labs will provide hosted, virtualized test and development sandboxes for new and experienced programmers and systems administrators.  Many developers got beta Labs accounts, we tested at a larger scale, and we fixed several bugs.
  • Developers agreed to create a file backend abstraction layer to enable large-scale MediaWiki installations to use one of several storage systems to contain big collections of big media files.  (Wikimedia plans on using Swift, which is open source.) Microsoft’s Ben Lobaugh and SAIC’s DJ Bauch collaborated towards improving MediaWiki’s performance on Microsoft technologies as well.  Developers made architectural decisions, refactored some existing code, and improved documentation and tests for the SwiftMedia extension to MediaWiki.
  • Chad Horohoe teaching developers about unit testing

    Chad Horohoe teaching developers unit testing

    We now have a continuous integration server up and running.  This will continuously run tests checking on the latest new features and bugfixes that developers write, resulting in fewer bugs and faster development. Developers will need to write tests to reap the benefits, so Chad Horohoe taught a test-writing workshop.

  • Max Semenik finished and demonstrated the first version of his API Query Sandbox.  This allows software developers anywhere to experiment with ways to automatically get data from Wikipedia or other sites that run MediaWiki, thus enabling wider and deeper reuse of Wikimedia content.
  • Operations folks continued the Puppetization of our infrastructure: they completely reworked Varnish management in Puppet, and worked on Puppet configurations for SwiftMedia testing. This configuration management work will ensure that ops can move faster and more confidently in building and maintaining Wikimedia infrastructure. And Canonical’s Mark Mims and Kapil Thangavelu worked on improving methods for Wikimedia developers “to spin up stacks of services within the labs environment” using Juju (more details).
  • NOLA Hackathon 28

    Brion Vibber leading developers into the "glorious Git future"

    Since the engineering department is planning a switch from Subversion to Git in the next few months, Brion taught nearly everyone there how Git works (slides, audio), and how we’ll be using Git in the future. This change in our source code repository and workflow will, we hope, enable more speed and flexibility in development, both for WMF developers and community contributors.
  • We prioritized and addressed several open requests for the operations team and defect reports about the latest version of MediaWiki, 1.18, which had just been deployed across WMF sites.
  • Roan found and fixed an issue that was spouting symbolic link errors into our Apache logs, so now it’ll be easier for us to see more dangerous errors in those logs.
  • Google Summer of Code students Salvatore Ingala and Kevin Brown made progress on integrating their summers’ work into MediaWiki as used and deployed by others; Salvatore and WMF developer Roan Kattouw have a plan for getting his user scripts improvements reviewed and deployed, so they can benefit Wikimedia readers and editors.
  • A volunteer came in on Friday night knowing nothing about developing for MediaWiki, and by the end of the weekend had a working development environment on her laptop and had some ideas about how to contribute.
  • We had substantive conversations about the summer internship program and about third-party collaboration that will affect how we work in the future.

NOLA Hackathon 1

Launch Pad New Orleans, a great venue

We also ate dinner together, walked Bourbon Street, and generally got to know colleagues we’d never met before.  I expect these relationships will bear fruit for years to come.

Thanks to Ryan Lane and Dana Isokawa for organizing the event with me, and thanks to Launch Pad New Orleans for providing the venue!

Our next developers’ event is a hackathon in Mumbai November 18-20 concentrating on internationalization, localization, and mobile work.  To find out about other upcoming Wikimedia technical events, check the meetings wiki page, and follow @MediaWikiMeet on Identi.ca or Twitter.

Sumana Harihareswara
Volunteer Development Coordinator
Wikimedia Foundation

Ever wondered how the Wikimedia servers are configured?

Well, wonder no longer! To configure the Wikimedia servers, we use Puppet, a configuration management system, which lets us write code that manages all of our servers like a single large application. Of course, to really know how our servers are configured, you’d need to see our Puppet configuration.

Good news: we’ve just released our Puppet configuration in a public Git repository.

What is and isn’t included

Basically everything is included in the repository. We spent a few weeks removing private and sensitive things from the repository, though. We have these in a private repository that is only available to Wikimedia staff and volunteers with root access.

This, of course, means that the puppet configuration, as released, won’t completely work. The public repository makes references to files and manifests in the private repository. To make the repository work, you’ll need to fill in the missing information. There isn’t very much in the private repository, though, so that task should be fairly easy.

The point of making this repository public

We have a couple reasons for making this repository public:

  1. It shares knowledge with the world
  2. It lets us treat operations like a software development project

Both reasons align with our mission, but we were already mostly sharing this knowledge via wikitech. The second reason aligns more closely with our mission, as it allows us to let the world be directly involved in our operations efforts.

Labs and community oriented operations

The release of this Puppet repository is the first step in the Wikimedia Test/Dev Labs project. We’ll be going further than just making the repository readable by the world. Part of the Test/Dev Labs project is to create a clone of our production cluster. This clone will run a branch of the puppet repository.

Staff and community developers, and staff and community operations engineers will be able to push changes to the test branch of the Puppet repository, which will manage the cloned cluster. They’ll then be able to push these changes for review to the production branch of the Puppet repository. The staff operations engineers can then code-review the changes and push the changes out to the production systems.

Like the Wikimedia content, the site interface, and the site’s software (MediaWiki), community members will be able to edit the site’s architecture as well.

Accessing the repository

Since this is a public Git repository, you can do an anonymous git clone like so:

git clone https://gerrit.wikimedia.org/r/p/operations/puppet

You can browse the repository through the gitweb interface. You can see the code review activity via Gerrit.

Ryan Lane
Operations Engineer

Video Labs: Universal Subtitles on Commons

Universal Subtitles Widget Sync Interface

Universal Subtitles synchronisation interface gives subtitle authors fine grained control over subtitle timing.

For the past 6 months the Participatory Culture Foundation has been hard at work on their latest open web video mission to make captioning, subtitling, and translating video publicly accessible in a way that’s free and open. Part of the Mozilla Drum Beat campaign for a better web, Universal Subtitles is a tool and platform to help bring an open solution to subtitling web video. Commons has supported timed text via the mwEmbed gadget for some time, but up until today it has been very difficult to create the initial subtitle track. I have been watching the development of the universal subtitles efforts, and while at the subtitle summit and open video conference we were finally able to hack on bringing the Universal Subtitles widget to Wikimedia Commons.

Today, I am happy to share our first pass at integrating our open subtitle efforts. Please keep in mind this integration is still very early on in development, but the basic milestone of being able to use the tool on commons to create and sync up subtitle tracks is an important first step. Even without helpful tools in place, the Wikimedia community has been creating subtitles and translations. We hope this new subtitle edit tools will broaden the number of participants and enable the Wikimedia community to set a new standard for high quality multilingual accessibility in online video content.

If you have a moment, feel free to check out the widget and provide some feedback. If you are looking for a video to subtitle, check out the recently created needs subtitling category.

Michael Dale, Open Source Video Collaboration Technology

Video Labs: P2P Next Community CDN for Video Distribution

As Wikimedia and the community embark on campaigns and programs to increase video contribution and usage on the site, we are starting to see video usage on Wikimedia sites grow and we hope for it to grow a great deal more. One potential problem with increased video usage on the Wikimedia sites is that video is many times more costly to distribute than text and images that make up Wikipedia articles today. Eventually bandwidth costs could saturate the foundation budget or leave less resources for other projects and programs. For this reason it is important to start exploring and experimenting with future content distribution platforms and partnerships.

The P2P-Next consortium is an EU-funded project exploring the future of Internet video distribution. Their aims are to dramatically reduce the costs of video distribution through community CDNs and P2P technology. They recently presented at Gdansk Wikimania 2010, and today I am happy to invite the Wikimedia community to try out their latest experimental efforts to greatly reduce video distribution costs. Swarmplayer V2.0 is being released today for Firefox (an Internet Explorer plugin is in testing). The Swamplayer enables visitors to easily share their upload bandwidth to help distribute video. The add-on works with the Kaltura HTML5 library ( aka mwEmbed ) and url2torrent.net, to enable visitors to help offset distribute costs of any Ogg Theora video embed in any web page.

p2p next desing overview

Swarmplayer next design overview, learn more on swarmplayer.p2p-next.org

We have enabled this for Wikimedia video via the multimedia beta. Once you installed the add-on any video you view on Wikimedia sites with the multimedia beta enabled will be transparently streamed via bittorrent. The add-on includes simple tools to configure how much bandwidth you use to upload. Even if you upload nothing, using the add-on helps distribute load by playing the video from the P2P network and the local cache on subsequent views. The Swarmplayer has clever performance tuning which downloads high priority pieces over http while getting low priority bits of the video from the bittorrent swarm. This ensures a smooth playback experience while maximizing use of the P2P network. You can learn more about the technology on the Swam player add-on site

The P2P Next Team from Delft University of Technology will be presenting the P2P-Next project at the Open Video Conference on October 2nd.

Michael Dale, Open Source Video Collaboration Technology

Video Labs: Kaltura HTML5 Sequencer available on Wikimedia Commons

sequence drag drop

Screenshot showing a search for cats and drag an image into the sequence

I am happy to invite the Wikimedia community to try out the latest Kaltura HTML5 video sequencer as part of a Wikimedia/Kaltura Video Labs project that can now be used on Wikimedia Commons with resulting sequences visible on any Wikimedia project. For those that have been following the efforts, it has been a long road to  deliver this sequence editing experience within the open web platform and within the MediaWiki platform. This blog post will highlight the foundational technologies in use by the sequencer in its present state and outline some of the upcoming features in Firefox 4, and enhancements to the sequencer itself that are set to improve the editing experience.

If you want to just jump into editing, please check out the commons documentation page and play around with the editor and let us know what you think. This project is early on in its development. Your bug reports,  ideas, feedback and participation will help drive future features and how these tools are used within Wikimedia projects.

If you’re interested in Video on Wikipedia in general, please consider joining the Wikivideo mailing list which will cover a wide range topics, including the sequencer, collaborative subtitles, timed text, video uploading, video distribution, format guidelines, and campaigns to increase video contributions to the site.

And finally, if you are in the New York area consider checking out the Open Video Conference coming up October 1st to the 3nd, which will be a great space to hack on open video and work on ideas for the future of video on Wikimedia projects.

(more…)