API:Query

From MediaWiki.org
Jump to: navigation, search
Tools clipart.png This page is part of the MediaWiki API documentation.
Language: English  • Deutsch • español • فارسی • 日本語 • русский • 中文
MediaWiki API

Quick overview:

v · d · e

The action=query module allows for retrieving all sorts of data, and is loosely based on the now obsolete Query API. action=query is also used to retrieve tokens needed for editing and the like.

The query module has many submodules (called query modules), each with a different function. There are three types of query modules:

  • Meta information about the wiki and the logged-in user
  • Properties of pages, including page revisions and content
  • Lists of pages that match certain criteria

Query modules can be combined freely, so you can use e.g. prop=info|revisions&list=backlinks|embeddedin|imagelinks&meta=userinfo to call six modules in one request.

Apart from query modules, action=query also has some features of its own.

Contents

[edit] Sample query

Before we get into the nitty-gritty, here's a useful sample query that simply gets the content of a page:

api.php?action=query&prop=revisions&rvprop=content&format=xml&titles=Main%20Page

This means fetch (action=query) the content (rvprop=content) of the most recent revision of Main Page (title=Main%20Page) in XML format (format=xml).

Alternatively, you can use action=raw as a parameter to index.php to get the content of a page: index.php?title=Main%20Page&action=raw

[edit] Specifying titles

You can specify titles in the following ways:

  • Using the titles parameter, e.g. titles=Foo|Bar|Main_Page
  • Using the pageids parameter, e.g. pageids=123|456|75915
  • Using the revids parameter, e.g. revids=478198|54872|54894545
    • Most query modules will use the page the revision ID belongs to. Only prop=revisions actually uses the revision ID itself
  • Using a generator

Specifying titles through the query string is limited to 50 titles per query (or 500 for those with the apihighlimits right, usually bots and sysops).

[edit] Title normalization

Title normalization converts page titles to their canonical form. This means capitalizing the first character, replacing underscores with spaces, etc. Title normalization is done automatically, regardless of which query modules are used. However, any trailing line breaks in page titles (\n) will cause odd behavior and they should be stripped out first.

Capitalization, localization, "_" => " ", "Project" => "Wikipedia", ...

<api>
  <query>
    <normalized>
      <n from="Project:articleA" to="Wikipedia:ArticleA" />
      <n from="article_B" to="Article B" />
    </normalized>
    <pages>
      <page ns="4" title="Wikipedia:ArticleA" missing="" />
      <page ns="0" title="Article B" missing="" />
    </pages>
  </query>
</api>

[edit] Missing and invalid titles

Titles that don't exist or are invalid still appear in the <pages> section, but they have the missing="" or invalid="" attribute set. In output formats that support numeric array keys (such as JSON and PHP serialized), missing and invalid titles will have unique, negative page IDs. Query modules will just ignore missing or invalid titles, as they can't do anything useful with them. A missing title, an invalid one and an existing one in JSON

{
        "query": {
                "pages": {
                        "-2": {
                                "ns": 0,
                                "title": "Doesntexist",
                                "missing": ""
                        },
                        "-1": {
                                "title": "Talk:",
                                "invalid": ""
                        },
                        "54": {
                                "pageid": 54,
                                "ns": 0,
                                "title": "Main Page",
                        }
                }
        }
}

[edit] Titles in the Special: and Media: namespaces

Currently, titles in the Special: and Media: namespaces cannot be queried. If any such titles are found in the titles= parameter or passed to a module by a generator, a warning will be issued.

[edit] Resolving redirects

Redirects can be resolved automatically, so that the target of redirect is returned instead of the given title. The example below isn't really useful because it doesn't use any query modules, but shows how the redirects parameter works. Both normalization and redirection may take place. In case of double redirects, all redirects will be resolved, and in case of a circular redirect, there might not be a page in the 'pages' section (see also below). Redirect resolution cannot be used in combination with the revids= parameter or with a generator generating revids; doing that will produce a warning and will not resolve redirects for the specified revids.

Using "redirects" parameter. "Main page" is a redirect to "Main Page"

<api>
  <query>
    <redirects>
      <r from="Main page" to="Main Page" />
    </redirects>
    <pages>
      <page pageid="11105676" ns="0" title="Main Page" />
    </pages>
  </query>
</api>

Same request but without the "redirects" parameter.

<api>
  <query>
    <pages>
      <page pageid="217225" ns="0" title="Main page" />
    </pages>
  </query>
</api>

Without "redirects" you may want to use prop=info to obtain redirect status.

<api>
  <query>
    <pages>
      <page pageid="217225" ns="0" title="Main page" touched="2007-06-29T11:22:39Z" lastrevid="78280008" counter="0" length="56" redirect="" />
    </pages>
  </query>
</api>

[edit] Circular redirects

Assume Page1 → Page2 → Page3 → Page1 (circular redirect). Also, in this example a non-normalized name 'page1' is used. Circular redirect behavior

<?xml version="1.0" encoding="utf-8"?>
<api>
  <query>
    <normalized>
      <n from="page1" to="Page1" />
    </normalized>
    <redirects>
      <r from="Page1" to="Page2" />
      <r from="Page2" to="Page3" />
      <r from="Page3" to="Page1" />
    </redirects>
  </query>
</api>

[edit] Limits

See here for more information on limits.

[edit] Continuing queries

Very often, you will not get all the data you want in one request. To continue the request, you can use the provided query-continue value. Using the query-continue value

<?xml version="1.0" encoding="utf-8"?>
<api>
  <query-continue>
    <allcategories acfrom="List of Baptist sub-denominations" />
  </query-continue>
  <query>
    <allcategories>
      <c>List of &quot;M&quot; series military vehicles</c>
      <c>List of Alternative Rock Groups</c>
      <c>List of Alumni of Philippine Science High School</c>
      <c>List of American artists</c>
      <c>List of Anglicans and Episcopalians</c>
      <c>List of Arizona Reptiles</c>
      <c>List of Artists by record label</c>
      <c>List of Australian Anglicans</c>
      <c>List of Bahá'ís</c>
      <c>List of Balliol College people</c>
    </allcategories>
  </query>
</api>

You can now use acfrom=List%20of%20Baptist%20sub-denominations to get the next ten categories.

The query-continue node will contain a subnode for each module used in the query that needs continuation, and these subnodes will contain properties to be used when making the followup "continuation" query. Note that clients should not be depending on the particular property names given for continuation of any module or the format of the values returned for continuation, as these may change.

When using a generator, you might get multiple query-continue values, one for the generator and one or more for the 'regular' prop modules used. In this case you need to continue the 'regular' modules first (with the old values of the generator's continuation properties) until they run out, and only then continue the generator. The generator's continuation properties may be identified because they belong to the query module being used as a generator and will begin with a 'g'.

[edit] Continuation example

Consider an initial query [1], which as a contrived example uses the categories module as both the generator and as a prop module. This will return a query-continue something like this:

  <query-continue>
    <categories gclcontinue="14588|MediaWiki_API_Overview" clcontinue="6418|MediaWiki_for_site_admins" />
    <links plcontinue="6418|14|Manual/cs" />
  </query-continue>

Note that continuation is returned for both the links module and the categories module as both the generator and as a prop module. You would add plcontinue=6418|14|Manual/cs (for links) and clcontinue=6418|MediaWiki_for_site_admins (for categories as a prop module) to your query, while ignoring for now gclcontinue=14588|MediaWiki_API_Overview because it belongs to the generator module and begins with a 'g'.

The followup query [2] would then return something like this:

  <query-continue>
    <categories gclcontinue="14588|MediaWiki_API_Overview" clcontinue="19269|Top_level" />
    <links plcontinue="6418|14|Manual/gl" />
  </query-continue>

Continuing with continuation, you might eventually get a response like this:

  <query-continue>
    <categories gclcontinue="14588|MediaWiki_API_Overview" />
    <links plcontinue="6418|14|Manual/km" />
  </query-continue>

At this point you may remove categories from the prop list since there is no longer any continuation property being returned for it, or you may leave it in there while continuing to use the last value received for clcontinue. Do not retain categories in prop while removing clcontinue, as then you will start receiving all the prop=categories results all over again.

Eventually a query will return only the generator continuation:

  <query-continue>
    <categories gclcontinue="14588|MediaWiki_API_Overview" />
  </query-continue>

Now, finally, you go back to your original query and add gclcontinue=14588|MediaWiki_API_Overview to get the next set of results from the generator, and repeat the process of continuation as necessary until no query-continue is returned at all.

[edit] Getting a list of page IDs

With the indexpageids parameter, you'll get a list of all page IDs listed in the <pageids> element. This is particularly useful for formats like JSON in which the pages array has numeric indexes. Getting a list of all page IDs

{
        "query": {
                "pageids": [
                        "-2",
                        "-1",
                        "15580374"
                ],
                "pages": {
                        "-2": {
                                "ns": 0,
                                "title": "Fksdlfsdss",
                                "missing": ""
                        },
                        "-1": {
                                "title": "Talk:",
                                "invalid": ""
                        },
                        "15580374": {
                                "pageid": 15580374,
                                "ns": 0,
                                "title": "Main Page"
                        }
                }
        }
}

[edit] Exporting pages

You can export pages through the API with the export parameter. If the export parameter is set, an XML dump of all pages in the <pages> element will be added to the result. The export parameter only gives a result when used with specified titles (Generator, titles, pageids or revid). Note that the XML dump will be wrapped in the requested format; if that format is XML, characters like < and > will be encoded as entities (&lt; and &gt;) If the exportnowrap parameter is also set, only the XML dump (not wrapped in an API result) will be returned.


Exporting the contents of API

<!-- TODO -->

Exporting all templates used in API

<?xml version="1.0"?>
<api>
  <query>
    <pages>
      <page pageid="16385" ns="10" title="Template:API Intro" />
      <page pageid="6458" ns="10" title="Template:Languages" />
      <page pageid="9631" ns="10" title="Template:Languages/Lang" />
    </pages>
    <export>
      <!-- XML dump here -->
    </export>
  </query>
</api>

See also: Importing pages

[edit] Generators

With generators, you can use the output of a list instead of the titles parameter. The output of the list must be a list of pages, whose titles are automatically used instead of the titles, pageids or revids parameter. Other query modules will treat those pages as if they were provided by the user through the titles parameter. Only one generator is allowed. Some prop modules can also be used as a generator.

Parameters passed to a generator must be prefixed with a g. For instance, when using generator=backlinks, use gbltitle instead of bltitle.

It should also be noted that generators only pass page titles to the 'real' query, and do not output any information themselves. Setting parameters like gcmprop will therefore have no effect.

[edit] Using list=allpages as generator

Get links and categories for the first three pages in the main namespace starting with "Ba"

<?xml version="1.0" encoding="utf-8"?>
<api>
  <query-continue>
    <allpages gapfrom="Ba&#039;ad Sneen (Song)" />
  </query-continue>
  <query>
    <pages>
      <page pageid="98178" ns="0" title="Ba">
        <links>
          <pl ns="0" title="BA" />
          <pl ns="4" title="Wikipedia:Redirect" />
          <pl ns="4" title="Wikipedia:Template messages/Redirect pages" />
          <pl ns="10" title="Template:R from alternative name" />
          <pl ns="10" title="Template:R from alternative spelling" />
          <pl ns="14" title="Category:Redirects from other capitalisations" />
        </links>
        <categories>
          <cl ns="14" title="Category:Redirects from other capitalisations" />
          <cl ns="14" title="Category:Unprintworthy redirects" />
        </categories>
      </page>
      <page pageid="14977970" ns="0" title="Ba&#039;">
        <links>
          <pl ns="0" title="Kirkwall Ba game" />
        </links>
      </page>
      <page pageid="10463369" ns="0" title="Ba&#039;Gamnan">
        <links>
          <pl ns="0" title="Characters of Final Fantasy XII" />
        </links>
      </page>
    </pages>
  </query>
</api>

[edit] Generators and redirects

Here, we use prop=links as a generator. This query will get all the links from all the pages that are linked from Title. For this example, assume that Title has links to TitleA and TitleB. TitleB is a redirect to TitleC. TitleA links to TitleA1, TitleA2, TitleA3; and TitleC links to TitleC1 and TitleC2. Redirect are solved because the redirects parameter is set.

The query will execute the following steps:

  1. Resolve redirects for titles in the titles parameter
  2. For all the titles in the titles parameter, get the list of pages they link to
  3. Resolve redirects in that list
  4. Run the prop=links query on that list of titles

Using redirect resolution with generators

<?xml version="1.0" encoding="utf-8"?>
<api>
  <query>
    <pages>
      <page pageid="32" ns="0" title="TitleA">
        <links>
          <pl ns="0" title="TitleA1" />
          <pl ns="0" title="TitleA2" />
          <pl ns="0" title="TitleA3" />
        </links>
      </page>
      <page pageid="54" ns="0" title="TitleC">
        <links>
          <pl ns="0" title="TitleC1" />
          <pl ns="0" title="TitleC2" />
        </links>
      </page>
    </pages>
    <redirects>
      <r from="TitleB" to="TitleC" />
    </redirects>
  </query>
</api>

[edit] More generator examples

Show info about 4 pages starting at the letter "T"
http://en.wikipedia.org/w/api.php?action=query&generator=allpages&gaplimit=4&gapfrom=T&prop=info
Show content of first 2 non-redirect pages beginning at "Re"
http://en.wikipedia.org/w/api.php?action=query&generator=allpages&gaplimit=2&gapfilterredir=nonredirects&gapfrom=Re&prop=revisions&rvprop=content

[edit] Page types

Page Type Example Used in the given page(s) Which pages have it List all in the wiki
Page Link [[Page]] prop=links list=backlinks list=alllinks
Template transclusion {{Template}} prop=templates list=embeddedin list=alltransclusions
Categories [[category:Cat]] prop=categories list=categorymembers list=allcategories
Images [[file:image.png]] prop=images list=imageusage list=allimages
Language links [[ru:Page]] prop=langlinks list=langbacklinks
Interwiki links [[meta:Page]] prop=iwlinks list=iwbacklinks
URLs http://mediawiki.org prop=extlinks list=exturlusage

[edit] Possible warnings

  • No support for special pages has been implemented
    • Thrown if a title in the Special: or Media: namespace is given
  • Redirect resolution cannot be used together with the revids= parameter. Any redirects the revids= point to have not been resolved.
    • Note that this can also be caused by a generator that generates revids