Commons:Batch uploading
Bot policy and list · Requests to operate a bot Requests for work to be done by a bot · Changes to allow localization · Requests for batch uploads |
Commons Batch Uploading is a project to centralize the uploading of a collection of files, that have released their work as PD or any Commons compatible license. The files would be assigned to a bot operator who would see how the request would be fulfilled. (To upload batches from Flickr, please make requests on Commons:Flickr batch uploading)
See w:Wikipedia:Public domain image resources for potential future batch uploads.
Create your Upload request:
Add your Upload request under one of the following sections:
|
|
[edit] Scripters
- Dcoetzee (talk · contribs)
- Multichill (talk · contribs)
- Duesentrieb (talk · contribs)
TheDJ (talk · contribs)- Aude (talk · contribs) - including batch audio & video uploads
- Jarekt (talk · contribs)
- Slick (talk · contribs) - no audio/video
[edit] Tools
- See Commons:Upload tools. The Python Wikipedia Bot framework supports image uploads and is particularly versatile.
- User:Dcoetzee is working on a tool for placing images in articles. See Commons:Placing images#Tools.
- Upload Script by Erik Möller
- Flickrripper allows batch uploading from a set, group or a user id on flickr.
- Panoramiopicker allows batch uploading from a user id on Panoramio.
- We need tools to facilitate rapid, accurate categorization of many images at once.
- Commonist
[edit] Scripts, Examples and Information
- the scripts I using on jobs here and here
- a bash script to extract the VRINs on (U.S. military) pictures on commons, can very usefull to find duplicate before upload
- Details about 'Zoomify' images and how to get it (in german)
--Unimog404 (talk) 11:36, 16 December 2012 (UTC)
[edit] New requests
[edit] Defence Imagery
- Source to upload from: http://www.defenceimagery.mod.uk/
- Did you observe an URL pattern: Yes, of a sort. Every file has an ID, but it's extremely long, and the process is even more complicated because only some of the files on the site are OGL-licenced - the rest are copyrighted. This is because the OGL licence is 'opt-in' for the MoD.
- Do you know whether the site as an API: Not as far as I am aware, no. It may have one available for members of the press, but not projects like ours...
- What else can ease uploading (is the site valid XHTML, WCM they use…)? I don't understand the question, I'm sorry. If it helps, the images have extremely detailed metadata.
- Did you contact the site owner? No, but I can do the legwork if it would make it easier.
- Describe the works to be uploaded in detail (audio files, images by …): All of the images at http://www.defenceimagery.mod.uk/fotoweb/Grid.fwx?archiveId=5042 - Archive 5042 is, I believe, the archive of OGL-licenced images. There are a variety of authors, but they are all high-quality JPG images. Unfortunately, the archive also updates every few days with new pictures as they're uploaded. Ideally, a bot would need to scrape this site maybe once a week.
- Which license tag(s) should be applied? The {{OGL}} licence.
- Is there a template that could be used on the file description pages? Do you think a special template should be created? I would be happy to create a special template.
[edit] Opinions
Assigned to | Progress | Bot name | Category |
---|---|---|---|
[edit] Rijksmuseum
The renovated en:Rijksmuseum in Amsterdam has made their digital collection of 111,000+ objects available digitally under a CC-0 license (https://www.rijksmuseum.nl/en/api/terms-and-conditions-of-use). An API key is needed for digital downloading (https://www.rijksmuseum.nl/en/api). According to the museum:
- "All object descriptions available via this API are covered by a Creative Commons 0 licence. The images are in the Public Domain, according to which the data and the images are free of rights and may be copied, changed, distributed or exported without the Rijksmuseum’s permission."
Sandstein (talk) 20:40, 7 April 2013 (UTC)
- Describe the works to be uploaded in detail (audio files, images by …): Presumably the entire collection is of use. According to https://www.rijksmuseum.nl/en/api/instructions-for-use: "The Rijksmuseum API Collection is a set of more than 110,000 descriptions of objects (metadata) and digital images from the Rijksmuseum collection. The works of art and implements in the set date from ancient times through to the late 19th century and provide an excellent overview of the richness, diversity and beauty of the Dutch and international heritage. Unfortunately, copyright restrictions mean that we are not yet able to include any works from the 20th or 21st centuries. The set includes paintings and prints (ranging from the great masters of the Golden Age through to anonymous biblical paintings and other painted objects from the Middle Ages), 19th-century photographs, ceramics, furniture, silverware, doll’s houses, miniatures, etc. Digital photographs were taken of all of the objects in this set."
- Which license tag(s) should be applied? Template:Cc-zero
- Is there a template that could be used on the file description pages? Do you think a special template should be created? Museum:Rijksmuseum. Also, the following should probably be taken into account even though we are not an app: "In all apps to be built in which images belonging to the Rijksmuseum are used, app designers will credit these as having been built with the API of the Rijksmuseum, including images and documentation. The credit must be placed where it can be seen easily by users. App-builders will credit all images with the words ‘Rijksmuseum collection’."
[edit] Opinions
- I'm quite aware of this awesome collection. Haven't uploaded it yet because we're planning to use it as pilot for Commons:GLAMToolset project. Not sure when this will happen exactly, probably in the next months. Multichill (talk) 10:42, 13 April 2013 (UTC)
Assigned to | Progress | Bot name | Category |
---|---|---|---|
[edit] Los Angeles County Museum of Public Art
- Source to upload from:
- Describe the works to be uploaded in detail (audio files, images by …):
« The Los Angeles County Museum of Public Art has released some 20,000 PD images of their collection ([1], example: [2]). » Jean-Fred (talk) 14:00, 13 March 2013
- Which license tag(s) should be applied?
- Is there a template that could be used on the file description pages? Do you think a special template should be created?
- {{LACMA online}}
- {{PD-LACMA}}
- {{From LACMA}} − Needs to be rewritten.
[edit] Opinions
- Unless someone wants to pick this up early, I would be happy to look at it in a few weeks, it seems right up my alley. --Fæ (talk) 17:00, 16 March 2013 (UTC)
- How it will be done? There are not only PD works. I think LACMA could create some xml file for us. Dominikmatus (talk) 06:20, 20 March 2013 (UTC)
- A easy test seems to be whether the image is marked as "Image not zoomable due to copyright restrictions"1, has a copyright note (looking like <div class="field-name-field-copyright-text">© John Baldessari</div>[3]) or whether it has a download link. The text does not quite match the Terms of Use[4] and for that reason I would email LACMA just to explain what Wikimedia Commons is and which photographs were going to be uploaded. I doubt that the LACMA could offer much more that is already in the online gallery as curator and conservation notes and so forth, are likely to have unclear copyright. (BTW some images have extensive curator notes available.[5] I would check whether the full text is intended to be reusable, off-limits as "Protected Content" as defined in the Terms of use, or whether a limited extract might be okay, such as the first 50 words as I have done with other batch uploads.) --Fæ (talk) 07:52, 20 March 2013 (UTC)
- An easy filter in LACMA's website search is "has_unrestricted_image", so this might either be better than the above checks or be run in addition to them. See this example search http://collections.lacma.org/search/site/?f[0]=bm_field_has_image%3Atrue&f[1]=im_field_chronology%3A14337&f[]=bm_field_has_unrestricted_image%3Atrue for unrestricted images of ancient artefacts (1,816 images); in practice the upload might usefully be staged by chronology as by default this starts with the least possibly contentious in terms of copyright. --Fæ (talk) 09:09, 21 March 2013 (UTC)
- Question I am close to finishing a nice mapping using BeautifulSoup to general image description pages, but I have problem in the way LACMA appear to have "updated" their website. We have an prior upload of an 18th C. waistcoat at a very high resolution of 4,000x6,000+ px. The original source is at [6] but I cannot see a way of getting from the current catalogue entry [7] to that old version. The new system shows 5 images, the first duplicates the old upload but is half the resolution (when the expanded button is selected) whilst the other 4 are good detail shots that appear clipped from the high resolution one we already have. Unfortunately id=159291 is the only relevant old reference and there is no mention of that number anywhere on the new catalogue entry, or the id's for its images. --Fæ (talk) 22:37, 21 March 2013 (UTC)
- I think, we should write email to LACMA with this problem. It is not good (for SEO) to change URL without redirection. Dominikmatus (talk) 09:42, 22 March 2013 (UTC)
- Yes, I was coming to the conclusion that should be my next step. It might not be solvable technically, so if LACMA cannot, or don't have the time, to help, then the solution might be to go ahead with the batch upload even if a few files will be scaled down (but still high quality) duplicates of some high resolution photos we already have. I'll do my best not to be left in that position and I'll start drafting up an email - no hurry as I don't expect a same day answer from the museum on a Friday.
--Fæ (talk) 10:29, 22 March 2013 (UTC)
- I have written today to the web contact at LACMA and asked about how to use the d/b id to track down the large resolution image and whether it is okay to scrape the text from the catalogue entry (such as curator notes). --Fæ (talk) 10:35, 26 March 2013 (UTC)
- I did a bunch of those high-res downloads (by hand). I'll be interested to see what LACMA says. - PKM (talk) 01:12, 6 April 2013 (UTC)
- No response yet. I might get on with an initial batch for testing as soon as the mobile upload problem is resolved, rather than expecting a reply. --Fæ (talk) 01:56, 6 April 2013 (UTC)
- I did a bunch of those high-res downloads (by hand). I'll be interested to see what LACMA says. - PKM (talk) 01:12, 6 April 2013 (UTC)
- Yes, I was coming to the conclusion that should be my next step. It might not be solvable technically, so if LACMA cannot, or don't have the time, to help, then the solution might be to go ahead with the batch upload even if a few files will be scaled down (but still high quality) duplicates of some high resolution photos we already have. I'll do my best not to be left in that position and I'll start drafting up an email - no hurry as I don't expect a same day answer from the museum on a Friday.
Job | Assigned to | Progress | Links |
---|---|---|---|
Code and initial batch (some ancient artefacts) | Fæ (talk) | Status: in progress | |
Resolve multiple view artefacts | Fæ (talk) | Status: In progress | |
Inform LACMA | Fæ (talk) | Status: Done | |
Create digestion template | Fæ (talk) | Status: pending | |
Complete upload | Fæ (talk) | Status: pending | |
Promote to community | Fæ (talk) | Status: pending |
[edit] Fonds Ancely
This upload is part of a partnership between Wikimédia France and the Library of Toulouse. It consists of 2085 public domain files. You may see general notes and work in progress on User:Jean-Frédéric/Ancely.
The metadata is held in a OAI PMH repository. The code explores it and retrieves records ; then if applicable the various fields are matched to a manual alignement of Commons categories and tags, community curated. This is then fed to a data ingestion templates which translates the metadata to {{Artwork}}. Actual upload is made with Pywikipedia-rewrite by User:AncelyBot.
In its current state, the categorisation system with the alignment outputs 31,801 categories (1,694 distinct) − the drawback is that many are high-level categories (“Shawls”, “men”, etc.)
- Ingestion template: User:Jean-Frédéric/Ancely/Ingestion
- Source code: GitHub
- Test file: File:Pyrénées - Jasque Esterlet, guide aux Eaux-Bonnes - Marchande de beurre aux Eaux-Bonnes - Fonds Ancely - B315556101 A PINGRET 014.jpeg
Looking forward your thoughts, Jean-Fred (talk) 22:49, 6 March 2013 (UTC)
[edit] Opinions
- Uploaded five more − see Special:ListFiles/AncelyBot Jean-Fred (talk) 01:14, 16 March 2013 (UTC)
- Uploaded fifteen more − and I will continue uploading files until my demands are met! Jean-Fred (talk) 00:23, 19 March 2013 (UTC)
Support everything looks fine for me. (may be a bit overcat) --PierreSelim (talk) 14:24, 20 March 2013 (UTC)
- Ok, uploading 100 right now. Jean-Fred (talk) 21:06, 11 April 2013 (UTC)
- Looks very good. The only thing that worries me a bit is the number of categories per image. That might become a problem. Please upload more! Multichill (talk) 10:39, 13 April 2013 (UTC)
Oppose now, we have forgotten to finish the Creator mapping User:Jean-Frédéric/Ancely/Creator --PierreSelim (talk) 12:02, 25 April 2013 (UTC)
Assigned to | Progress | Bot name | Category |
---|---|---|---|
User:Jean-Frédéric | User:AncelyBot |
[edit] South African churches
User af:Gebruiker:Morne has uploaded hundreds of perfect images of buildings in South Africa (mostly churches) in Afrikaans Wikipedia, all under the same licence "you are free to use, copy, modify, if you properly credit the author" (see an example). I consider it important, as there are unfortunately relatively few images of South African cities, towns and villages in Wikipedia. --Dmitri Lytov (talk) 03:21, 3 March 2013 (UTC)
- Describe the works to be uploaded in detail (audio files, images by …):
It's a collection of several hundred images of churches in South Africa.
- Which license tag(s) should be applied?
"you are free to use, copy, modify, if you properly credit the author" (see an example).
- Is there a template that could be used on the file description pages? Do you think a special template should be created?
Sorry, no idea.
[edit] Opinions
Assigned to | Progress | Bot name | Category |
---|---|---|---|
[edit] LSH
- Livrustkammaren och Skoklosters slott med Stiftelsen Hallwylska museet (COM:LSH):
- Each image caries a unique identifier which may be linked to a URL (although these aren't live yet)
- No API
- They are donating the images and together with these the associated metadata. They've also done some preliminary matching between keywords and Commons categories as well as between artist/events/depicted people and (Swedish) Wikipedia pages.
- It's a collaboration so yes!
- Describe the works to be uploaded in detail (audio files, images by …):
These are a collection of approx. 20,000 high resolution photographs in tiff formate of the objects held in the collections of these three museums. The files are all less than 500MB but may be larger than 100MB, resolution is less than 25 megapixels (relevant with respect to Commons:Maximum file size).
- Which license tag(s) should be applied?
All of the depicted objects are owned by the museum and PD-old. The photographs themselves are either old enough to be PD-Sweden-photo or released as either CC0 or CC-BY-SA (don't know which version yet). {{LSH license}} has been prepared for this purpose.
- Is there a template that could be used on the file description pages? Do you think a special template should be created?
{{LSH_artwork}}
[edit] Opinions
Having looked around a bit it looks as though Chunked uploads might not be integrated into the pywikipediabot framework. Does anyone know anything more about this or if there is a practical workaround? I can ask them to downsample the images but it seems as a waste when they've offered us high-res. /André Costa (WMSE) (talk) 09:32, 7 February 2013 (UTC)
- Seems it is not integrated indeed :-/ source. Jean-Fred (talk) 12:23, 7 February 2013 (UTC)
- Yes I saw that one. Since it was close to a year ago though I was hoping that the situation had changed since then and that I had somehow missed the follow-up e-mail. /André Costa (WMSE) (talk) 08:17, 8 February 2013 (UTC)
If you want to start at the low level, you can construct your XMLHTTP-Requests yourself. Sample how chunked upload // how the XHR should look like at mw:API:Upload. On Windows, I used Fiddler2 to inspect that everything worked as it should. If you like I can supply my VB(A)-classes but I guess you are on Linux. -- Rillke(q?) 18:18, 22 March 2013 (UTC)
Assigned to | Progress | Bot name | Category |
---|---|---|---|
André Costa (WMSE) | LSHuploadBot |
[edit] National Gallery of Art
Jean-Fred (talk) 14:19, 18 January 2013 (UTC)
- Source to upload from: National Gallery of Art online database, per their open access policy
- Did you observe an URL pattern
- Do you know whether the site as an API
- What else can ease uploading (is the site valid XHTML, WCM they use…)?
- Did you contact the site owner?
- See here, they welcome the idea
- Describe the works to be uploaded in detail (audio files, images by …):
Artwork digitisations
- Which license tag(s) should be applied?
Existing uploads seem to rely on {{PD-author|National Gallery of Art}} or {{PD-art|PD-old-100}}.
I guess a custom wrapper for {{Licensed-PD-Art|PD-old-whatever|{{PD-author|National Gallery of Art}}}} would do the trick (in the spirit of {{Walters Art Museum license/2D}})
- Went ahead and created {{PD-Art-National Gallery of Art}}. Jean-Fred (talk) 14:47, 18 January 2013 (UTC)
- Is there a template that could be used on the file description pages? Do you think a special template should be created?
- {{NGA Images}} (on that note, {{NGA online}} should be merged with it)
- Category:Files from the National Gallery of Art
[edit] Opinions
Assigned to | Progress | Bot name | Category |
---|---|---|---|
[edit] US Army Research Laboratory Eniac
- Source to upload from:
- URL pattern http://ftp.arl.army.mil/ftp/historic-computers/png/, http://ftp.arl.army.mil/ftp/historic-computers/jpeg/
- No idea whether the site as an API
- Site owner states (at http://ftp.arl.mil/ftp/historic-computers/) 'All photos marked "U. S. Army Photo" are in the public domain, and may be used without fee, provided that each use is marked "U. S. Army Photo". All diagrams marked "U. S. Army Diagram" are in the public domain, and may be used without fee, provided that each use is marked "U. S. Army Diagram".'
- Describe the works to be uploaded in detail (audio files, images by …):
- Images (PNGs are high-res, also lo-res GIF, JPG): those which are photos should ideally be converted to JPG)
- I only count around 20 such images, so please state if that's too few for a batch upload to be considered.
- There are a few duplicates within Category:ENIAC, but I gather that the batch proposal equivalents are generally of better quality.
- Which license tag(s) should be applied?
- Is there a template that could be used on the file description pages? Do you think a special template should be created?
- Not that I know of... possibly something about ENIAC
[edit] Opinions
Assigned to | Progress | Bot name | Category |
---|---|---|---|
[edit] 11k of Areal Photos
In the course of the arial photo project of the German Wikimedia de:Wikipedia:Projekt Fotoflüge I wrote an article for a pilots magazine. After that I got in contact with a Pilot who wants to share his own created areal photo collection which he created over the past 24 years. It seams that all photos are already geo-referenced and classified (by type like solar power plant, church as well as by region like Europe, Andalucia, Sanlucar). The classification as well as the geo-reference is within the exif data of the images. During a manual upload the geo-reference was recognized correct by commons. Because of the big amount of pictures it would be fine if there is some way to may automize the upload and if possible somehow to match the classification of the pictures to the commons categories. I have no idea if or how this is possible and it would be great to get some information if this is possible or to get some help for this request. The Classification is sometimes in German and not matching the Commons categories. The Pilot has already created a Wikipedia / Commons User and uploaded one example file where you could see how the data is sored within the exif Data.
- Source to upload from:
The files are on a computer of the pilot / photographer.
-
- Did you observe an URL pattern
- Do you know whether the site as an API
- What else can ease uploading (is the site valid XHTML, WCM they use…)?
- Did you contact the site owner?
Not the site owner but the photographer User:Graf-flugplatz
- Describe the works to be uploaded in detail (audio files, images by …):
About 11.000 of digital arial photos should be uploaded.
- Which license tag(s) should be applied?
Has to be clarified with Author, but expect "CC BY-SA 3.0" like the example.
Update 18.12.2012: License "CC BY-SA 3.0" is approved by Author User:Graf-flugplatz.
- Is there a template that could be used on the file description pages? Do you think a special template should be created?
[edit] Opinions
Nice sample images. I'm from Germany too and I like to help. But not before end of april 2013 because I am away and busy. If this will be ok, just waiting ... --Slick (talk) 17:26, 9 January 2013 (UTC)
Ok, how can I get the images to upload? I like to have them here, so I can check the tags they have and can try to find best categories for. Possible solutions are I download them all from a source or you can send it to me on by CD/DVD (I am from germany too). You can contact me (in german please) here about this. Additional I suggest the pilot (or you) fill in a minimal content on his userpage for other they are interesting in the source/creator. (i.E. the same information as in this request) --Slick (talk) 08:51, 6 February 2013 (UTC)
Assigned to | Progress | Bot name | Category |
---|---|---|---|
Slick | Waiting for user response... |
[edit] Garden of the Victory in Chelyabinsk
- Source to upload from:
User Ain92 asked me to upload some photos with Panoramio Picker but I have never done it and found that it's too complicated to understand it in the nearest time. So I ask to upload for category:Garden of the Victory in Chelyabinsk all photos from this page and 2-9th photos from this page (they are cc-by). Анастасия Львоваru (ru-n, en-2) 07:03, 11 December 2012 (UTC)
[edit] Opinions
Assigned to | Progress | Bot name | Category |
---|---|---|---|
[edit] Land Air Sea Warfare
Icons of available units of the game/wikipedia page Land Air Sea Warfare (LASW). Upload to wikipedia falls under the minimalist criteria. In addition, I have received permission of the author of the game.
- I do not understand what to upload. Please refer a valid source. If there is a OTRS ticket about the permissions, please add. --Slick (talk) 15:40, 26 October 2012 (UTC)
[edit] Opinions
Assigned to | Progress | Bot name | Category |
---|---|---|---|
[edit] AELG
- Describe the works to be uploaded in detail (audio files, images by …):
We wanted to upload free images from AELG Website because they have galleries from Galician writers. They have a CC-BY-SA license for some photos from the galleries from the authors, photos from authors Eduardo Castro Bal and Santos-Díez.
There is an index of authors here and this is an example of the gallery of an writer. The individual photos have an url like this.
- Which license tag(s) should be applied?
The images are CC-BY-SA, some from Eduardo Castro photographer and others from Santos-Díez photographer.
- Is there a template that could be used on the file description pages? Do you think a special template should be created?
There is a template to use with the photos: {{AELG}}. There is a category too. Bye, --Elisardojm (talk) 00:14, 28 September 2012 (UTC)
[edit] Opinions
Comment Somebody could review if this work can realise or is necessary more information? Thanks, --Elisardojm (talk) 22:14, 9 November 2012 (UTC)
- Looks good and no more information is needed yet. But usually it can take some time to realize. Just waiting ... --Slick (talk) 16:25, 9 December 2012 (UTC)
- Ok, if somebody needs more details or goes to try realise this task, I would appreciate him that it warned me in my talk page. Thanks!, --Elisardojm (talk) 09:54, 11 December 2012 (UTC)
- I'll do the upload tmrw. Smallman12q (talk) 03:19, 22 January 2013 (UTC)
- If you need more information or details about this task, you can ask me. Thanks!, --Elisardojm (talk) 14:16, 22 January 2013 (UTC)
- I'll do the upload tmrw. Smallman12q (talk) 03:19, 22 January 2013 (UTC)
- Ok, if somebody needs more details or goes to try realise this task, I would appreciate him that it warned me in my talk page. Thanks!, --Elisardojm (talk) 09:54, 11 December 2012 (UTC)
Source | ||||
---|---|---|---|---|
|
┌─────────────────────────────────┘
Done I've completed the upload...~800 uploaded. Some such as File:Valentín_Arias_(AELG)-1.jpg aren't thumbnailing...but work fine in firefox and show metadata so its a bug on the wiki side. Cheers. Smallman12q (talk) 21:22, 22 January 2013 (UTC)
- Image rendering bug is being look into at w:Wikipedia:Village_pump_(technical)#Images_not_rendering.Smallman12q (talk) 23:49, 23 January 2013 (UTC)
- Image rendering bug was resolved. Fixed issue with spacing in direct links brought up at User_talk:Smallman12q#AELG_photo.27s_upload.
source |
---|
#!/usr/bin/env python # -*- coding: utf-8 -*- from Site2 import Site2 from p import p import sys print "Encoding is: " + sys.getdefaultencoding() print "UTF8 check: ☠" commons = Site2("https://commons.wikimedia.org/w/api.php") commons.login("smallbot",p.bP) commons.settoken('edit') files = {} subcats = commons.getcategorymembers(u'Category:Images from AELG', 14) for cat in subcats: print cat.encode('utf-8', 'ignore') catmembers = commons.getcategorymembertexts(cat, 6) for member in catmembers: files.setdefault(member, catmembers[member]) print 'Done loading cat pages' for file in files: oldfiletext=files[file] newfiletext=oldfiletext.replace(u'Direct link',u' Direct link') if (oldfiletext != newfiletext): print 'fixing' + file.encode('utf-8', 'ignore') commons.edittext(file,newfiletext,'[[Commons:Batch uploading/AELG]]: Fixing direct link (Add space).') print 'done' |
Smallman12q (talk) 04:39, 26 January 2013 (UTC) Per User_talk:Smallman12q#AELG_photo.27s_upload, added author link:
Source | |||
---|---|---|---|
|
Smallman12q (talk) 03:08, 7 February 2013 (UTC)
Assigned to | Progress | Bot name | Category |
---|---|---|---|
User:Smallman12q | Done | User:Smallbot | Category:Images from AELG |
[edit] Gerald R. Ford Presidential Library and Museum
The Ford Presidential Lib/Museum is a federal archives, part of NARA. We'd like to create a partnership with Wikimedia:Commons and get all of our digitized material up. All materials are in the public domain. Agency management is on board, and we have a team already working on this! I've been uploading materials one-by-one, I've gotten about 170 images uploaded - see Commons:Gerald R. Ford Presidential Library and Museum - I figure it should take me til oh, 2215 to get everything up! We're looking for an administrator to work with and develop a plan. Bdcousineau (talk) 18:50, 5 September 2012 (UTC)
- See Commons:Gerald R. Ford Presidential Library and Museum for current progress.Smallman12q (talk) 23:26, 17 September 2012 (UTC)
[edit] Opinions
Assigned to | Progress | Bot name | Category |
---|---|---|---|
[edit] Rudolf Steiner Gesamtausgabe
Die folgende Seite bietet alle Werke der Gesamtausgabe Rudolf Steiners (gemeinfrei) als Scan in zitierfähigen Ausgaben. Eine Übernahme zu Wikimedia Commons wurde hier besprochen und gewünscht.
http://bdn-steiner.ru/modules.php?name=Ga
I downloading the files und prepare for upload. Which one is the correct licence template in this case? I guess PD-old. Only this or need a second one? --Slick (talk) 21:14, 11 August 2012 (UTC)Downloads finish. --Slick (talk) 08:47, 13 August 2012 (UTC)
A discussion in german about the licence can found here. Looks like there is a problem with scans from sources newer than 1923. --Slick (talk) 13:35, 15 August 2012 (UTC)
I cancel to support this batch job, remove all local work already done, because missing help/support although requested more than one time. Revert job to Request-List. --Slick (talk) 20:30, 23 August 2012 (UTC)
[edit] Opinions
Assigned to | Progress | Bot name | Category |
---|---|---|---|
[edit] Detroit Publishing Company at LoC
"This collection of photographs from the Detroit Publishing Company Collection includes over 25,000 glass negatives and transparencies as well as about 300 color photolithograph prints, mostly of the eastern United States. The collection includes the work of a number of photographers, one of whom was the well known photographer William Henry Jackson. A small group within the larger collection includes about 900 Mammoth Plate Photographs taken by William Henry Jackson along several railroad lines in the United States and Mexico in the 1880s and 1890s. The group also includes views of California, Wyoming and the Canadian Rockies." Subject index; geographical index. cmadler (talk) 17:17, 20 March 2012 (UTC)
[edit] Opinions
Assigned to | Progress | Bot name | Category |
---|---|---|---|
[edit] Cesare Brizio
Photographer Cesare Brizio has agreed to donate 1300+ images here. Images may be taken from the web page OR originals can be sent to anyone on a DVD if required. He also suggested some sound files - but they are in the wrong format (mp3).
Data from OTRS ticket 2012021810002796 follows (permission obtained to copy this OTRS message here)
++++++++++++++
Dear Ron Jones: yes, I confirm that I am actually glad to release all the images located at via the "View Media" link at http://tolweb.org/onlinecontributors/app?service=external/ImageContributorDetailPage&sp=1810 as "Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)". Furthermore, I can provide upon request higher resolution versions (1024x768 or more) of almost all the same images.
By the way, I would gladly release as "Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)" all the audio samples (recordings of animal sounds) available at the web pages listed here:
- http://cebrizio.xoom.it/cebrizio/MapAudio.htm
- http://cebrizio.xoom.it/cebrizio/BIOAC_ORTH/OrthAudioSamples.htm
- http://cebrizio.xoom.it/cebrizio/BIOAC_HEMI/HomoAudioSamples.htm
- http://cebrizio.xoom.it/cebrizio/BIOAC_HEMI/HeteAudioSamples.htm
- http://cebrizio.xoom.it/cebrizio/BIOAC_AMPH/AmphAudioSamples.htm
best regards,
Cesare Brizio
+++++++++++++
[edit] Opinions
Support Sounds good espacally the fact that we don't have for all biological articals pics.--Sanandros (talk) 14:49, 21 August 2012 (UTC)
Assigned to | Progress | Bot name | Category |
---|---|---|---|
[edit] Works of Maurice Ravel
All files from http://imslp.org/wiki/Category:Ravel,_Maurice can be uploaded to Commons (57 files).
Maurice Ravel's works are in the public domain in France since a decision by the Cour de cassation in 2007 (French Supreme Court). See Wikipedia articles for details. There are about 35 published before 1923, for which there is no URAA issue. Yann (talk) 12:12, 15 September 2012 (UTC)
- Category:Compositions by Maurice Ravel
- License {{PD-old}}
[edit] Opinions
Assigned to | Progress | Bot name | Category |
---|---|---|---|
- Waiting for the backlog of this page may take longer time than manual uploading 57 images using Special:UploadWizard. Bennylin (yes?) 12:49, 26 February 2012 (UTC)
- If you would just had a look at the page or at least a bit of music knowledge...; but today I am bountiful and do not respond with other unhelpful comments. I just ask me how you could became steward with those hasty comments. If you want to help, you could take upload requests or analyze them carefully. Or are you even paid by WMF to advertise UpWiz?
- Please make some suggestions how to get a good descriptions from the page. (Including a custom template, categories, ...)
- Page structure:
- Wiki (MW with API)
- Scores itself are uncategorized
- files linked from a template
File Name \d{1,3}=(.+)
and|File Description \d{1,3}=(.+)
. - -- RE rillke questions? 16:25, 28 March 2012 (UTC)
- {{Not-PD-US-URAA}} is not a valid license template (says that right on it!). Works should be verified as being PD or otherwise free in the US before uploading. Otherwise you're just adding to the Commons:WikiProject Public Domain/URAA review workload. cmadler (talk) 12:17, 27 April 2012 (UTC)
- Pre-1923 works should be tagged with {{PD-1923}} to cover US copyright status. Post-1923 works are probably still copyrighted in the US, and should not be uploaded without investigation into the status. cmadler (talk) 13:35, 17 September 2012 (UTC)
- Is this tag necessary even for non-US works? Yann (talk) 15:29, 17 September 2012 (UTC)
- Yes, because works on Commons must be free in both the country of origin and the US. (Right on {{PD-old}}, it says, "You must also include a United States public domain tag to indicate why this work is in the public domain in the United States.") Alternatively, {{PD-old-70-1923}} is a single template covering both the US and French copyright. cmadler (talk) 12:39, 18 September 2012 (UTC)
- Is this tag necessary even for non-US works? Yann (talk) 15:29, 17 September 2012 (UTC)
- Pre-1923 works should be tagged with {{PD-1923}} to cover US copyright status. Post-1923 works are probably still copyrighted in the US, and should not be uploaded without investigation into the status. cmadler (talk) 13:35, 17 September 2012 (UTC)
Oppose Actually, now that I look at it, I don't think any of his works are in public domain in France, the country of origin. The Cour de cassation ruling found that the prorogations de guerre (extensions for the two World Wars) were superceded by later copyright laws, but only for non-musical works. Since we're discussing musical works, the prorogations still need to be taken into account. Works published through 1920 get an additional 14 years, 272 days, while works published from 1920 through 1947 (since Ravel died in 1937, this covers all the rest of his works) get an additional 8 years, 120 days. So Ravel's works through 1920 are copyrighted in France until late 2022 (272 days gets you almost to the end of September), while his post-1920 works are copyrighted in France until 2016 (120 days goes to late April). cmadler (talk) 12:49, 18 September 2012 (UTC)
- The Cour de cassation did not mention the type of works to which its ruling applies. Yann (talk) 13:43, 18 September 2012 (UTC)
- If I understand correctly, the 2007 Cour de cassation ruling related primarily to the 1997 law, which had extended the normal duration for non-musical works from 50 years to 70 years (but was not cumulative with the war extensions), and dealt specifically with the works of two painters, Monet and Boldini. But musical works had already been extended to 70 years pma in 1985, by the "Lang" law, and in the 2007 ruling, the court found that this law was cumulative with the war extensions ("la loi du 3 juillet 1985 avait porté à 70 ans la durée de protection normale, de sorte que les bénéficiaires des prorogations de guerre applicables à cette date pouvaient prétendre à une durée de protection excédant 70 ans"), but only in the case of composers who had already "acquired" the right (already died, starting the copyright clock) prior to July 1992. Have I misunderstood an aspect of this? cmadler (talk) 16:34, 18 September 2012 (UTC)
- After its two rulings, the Cour de Cassation summarized the situation in its annual report for 2007. It mentions the particular situation of musical works, in the terms quoted above by cmadler. However, as the 2007 rulings were not about music or Ravel, there are apparently still some arguments about how to interpret and apply the principles and how the computation of the term of protection should be done in the specific case of Ravel and, depending on the result, if his works are still under copyright in France or if they are in the public domain there. This 2008 article concluded that, at that time, the question was still uncertain but that commentators seemed to lean more toward the theory of the longer term of protection. Anyway, it seems that the SACEM still perceives money relating to the author's rights of Ravel's works for the uses of those works "à l'étranger" (outside of France, in some countries where the works are still under copyright).[8]. I didn't find something telling clearly if they still perceived fees from the uses of Ravel's works in France after 2008. If the works are still under copyright in France and given the sums of money that would represent, it is somewhat surprising that no litigation is found. It may not help clarify the situation that the money perceived from the copyright used to be claimed by a mysterious offshore company, although I suppose that does not affect the term of protection. -- Asclepias (talk) 19:33, 19 September 2012 (UTC)
- The Cour de cassation did not mention the type of works to which its ruling applies. Yann (talk) 13:43, 18 September 2012 (UTC)
[edit] HABS
While working on the English Wikipedia I stumbled upon the Historic American Buildings Survey/Historic American Engineering Record/Historic American collection several times. This is a huge (350.000+) collection of photographs and drawings of historic buildings in the US. The collection is in the public domain although it contains some exceptions (haven't been able to find one). The collection has good metadata like title, author, date and the location (awesome for categorization). Every item has an high resolution tif file. I'm using User:Multichill/HABS as a layout template right now. I did some tests. Once the template is all tweaked I will substitute it. After that at that the template will be substituted on upload. The images are high resolution tiffs, that's of course very nice, but also problematic because the images are not rendered at the moment. The WMF has plans to change that so I rather not upload jpg's too. Any opinions on this? Multichill (talk) 12:06, 14 January 2012 (UTC)
Decided I'd upload both jpg and tiffs. Did some more tweaking:
- Every image gets {{HABS-source}} for source and {{PD-USGov}}. This sets Category:Files from the Historic American Buildings Survey & Category:PD US Government
- Jpg and tiff files are linked in the other_versions field
- The bot adds a county category (for example Category:Haines Borough, Alaska). If the category does not exist, User:Multichill/US county category won't be substituted making it easy to track this
- The bot adds a "Historic buildings in <state>" category (for example Category:Historic buildings in Alaska). This might seem a bit redundant to Category:Historic American Buildings Survey by state and it is. I'm going to remodel that category structure. The source (HABS) shouldn't be intersected with the location. See related talk at Template talk:PD-USGov-Interior-HABS#Template needs to be split.
Multichill (talk) 16:36, 22 January 2012 (UTC)
- See also, discussions at: Category talk:Historic buildings in the United States, and Multichill talk.—Look2See1 (talk) 23:29, 1 February 2012 (UTC)
[edit] Restart
So I started this a couple of months ago. Ran into some technical problems and a lot of negative feedback so I decided to waste my time on something else.
- In the meantime the file size limit was raised to 25 megapixels so I no longer need to upload two images. I'm just going to upload one high-res tiff image.
- Categorization is probably going to be category:buildings in <county> with fallback to category:<county>
- Naming of the files is quite rough, need to improve that (too long, too many weird characters)
- Need to use the json and see what kind of useful information is in there that I'm not using (like coordinates)
- I need to have a very conservative copyright check to not upload dozens of unfree files.
- I should probably add a template like {{Maybe US heritage}} that explains that this photo might be on the NRHP or some local registry and to replace this template with the right one. Is that picks up that would really combine nicely with all the image we took ourselves (for example in Wiki Loves Monuments)
Multichill (talk) 19:38, 24 October 2012 (UTC)
- Everything is about ready. Waiting on en:Wikipedia talk:WikiProject National Register of Historic Places#HABS upload. Multichill (talk) 21:47, 9 January 2013 (UTC)
Assigned to | Progress | Bot name | Category |
---|---|---|---|
Multichill | Did some first tests | BotMultichillT | None |
[edit] Chris's Acorns
I accept that this is a premature request, so please accept my apologies if that's undesirable.
- Chris's Acorns contains a number of images suitable for RISC OS hardware and subcategories of Computer motherboards by manufacturer. A batch upload has been suggested, with a subsequent positive licensing response from the owner, Chris Whytehead.
- The licensing amendment is still to be finalised and currently states I have taken all the photographs on this site (with a few exceptions, such as Acorn publicity pictures of Phoebe) and they remain my copyright.
- I will have the opportunity to talk to Chris in person on Sat 29 Oct 2011. It would be helpful if he could be advised what actions will be necessary in order for the batch upload to proceed.
- Some points (and I'm sure that experienced people here will have others) for consideration are:
- All the (approx. 3000) photos arguably have educational value, so should that be the target? If not, under what criteria should decisions be made?
- Some pages (e.g. Acorn Phoebe 2100) contain non-free publicity photographs. How are such photos best tagged for omission?
- In order to collate them as a collection, would it be appropriate for them to be put in Photographs by Chris Whytehead (within Photographs by author)? Or should they go elsewhere, as he's apparently a registered Wikipedian?
- What are the options for proceeding with the upload and what will be required of Chris?
All advice gratefully received. Thanks for reading. --Trevj (talk) 12:06, 4 October 2011 (UTC)
[edit] Opinions
Assigned to | Progress | Bot name | Category |
---|---|---|---|
Smallman12q | Bot Request Filed | Smallbot (talk · contribs) |
Request filed.Smallman12q (talk) 02:20, 17 November 2011 (UTC)
[edit] Maritime photo collection
Category:Frederic Logghe Maritime photo collection includes only part of the collection available at the website listed there. The collection itself didn't seem to have grown recently and Commons might be a good place to maintain it in the long term. --07:09, 28. Sep. 2011 Docu
- Anybody should check the licence before a mass import. I am not sure they all be free. I found lot of pictures with copyright informations. i.E: [9] [10] [11] --Slick (talk) 16:30, 4 August 2012 (UTC)
[edit] Opinions
Assigned to | Progress | Bot name | Category |
---|---|---|---|
[edit] Images from Caelum Observatory & The Mount Lemmon SkyCenter
Adam Block from The Mount Lemmon SkyCenter has kindly agreed to release a large amount of his images with a CC BY-SA 3.0 license. He has done this specifically so they can be used on wiki projects. A .zip containing all of the released images can be found here. I would like to be able to upload them all into a category called 'Images from Caelum Observatory & The Mount Lemmon SkyCenter' or something in that vein. Many of them will be very useful and have high EV. A link to one of his galleries showing the relevant copyright statements can be found here. As there is 200+ files in the .zip file, uploading them all would be very tedious. I would be very grateful if someone could assist me with this matter. Thanks, Originalwana (talk) 13:16, 10 September 2011 (UTC)
- Looks like it is difficult to upload the files from zip with a batch-job because missing information, i.E. description. IMHO makes more sence to parse the website for images under CC because there are very useful descriptions. (Example) --Slick (talk) 11:02, 11 August 2012 (UTC)
- That would be great but I have no idea how to go about it, do you know how this could be done? Thanks Originalwana (talk) 10:22, 13 August 2012 (UTC)
[edit] Opinions
Assigned to | Progress | Bot name | Category |
---|---|---|---|
[edit] Dokpro
The site [12] has a great public domain collection of norvegian manuscripts. For exemple, the totality of the manuscript œuvre of Henrik Ibsen (in the UNESCO patrimony).
The objectif is : download all pictures, convert in a djvu file (by book) and upload the djvu in Commons.
Is it possible ? thx ! --M0tty (talk) 19:36, 3 October 2010 (UTC)
[edit] Opinions
Assigned to | Progress | Bot name | Category |
---|---|---|---|
[edit] UMich
All the images / videos from UMich listed in these two directories [13] If they could all be added to a single category I will than combine them into Wikipedia. --James Heilman, MD (talk) 23:13, 19 July 2011 (UTC)
[edit] Opinions
Assigned to | Progress | Bot name | Category |
---|---|---|---|
[edit] ECGPedia
The owners of ECGpedia have agreed to allow release of their images under a Creative Common 3.0 license http://en.ecgpedia.org/wiki/Main_Page http://www.echopedia.org http://www.pcipedia.org This applies to all images except http://en.ecgpedia.org/wiki/Rhythm_Puzzles which they are unable to release do to a continued non commercial requirements. There are about 2000 images in all. A list can be found here http://en.ecgpedia.org/index.php?title=Special%3AAllPages&from=&to=&namespace=6 --James Heilman, MD (talk) 18:51, 13 July 2011 (UTC)
- All images are licensed as "Creative Commons Attribution Noncommercial Share-Alike". Wikimedia commons does not allow "Noncommercial" licenses, so unless ECGpedia re-license their images we are not going to be able to use them. If they re-license that will need to be marked on the individual images themselves or through OTRS, which will list which images are covered. --Jarekt (talk) 19:36, 13 July 2011 (UTC)
-
- Yes they have agreed to re-release the images under a license that allows commercial use. So the images will need to be marked as such.--James Heilman, MD (talk) 20:23, 13 July 2011 (UTC)
-
-
- Here is the OTRS Ticket#2011102310008874 There are about 3000 ECGs and 700 echo images. --James Heilman, MD (talk) 13:55, 23 November 2011 (UTC)
-
[edit] Opinions
Assigned to | Progress | Bot name | Category |
---|---|---|---|
Smallman12q | Smallbot |
[edit] ian.umces.edu
http://ian.umces.edu offers 3251 free high resolution images and 2544 free vector symbols licensed under CC BY 3.0. --Leyo 06:59, 21 June 2011 (UTC)
- A worthy set of images. I was able to download 2546 SVG files in a single ZIP file, but matching it with metadata is more challenging. --Jarekt (talk) 03:37, 29 June 2011 (UTC)
- Are the file names in the ZIP file self-explanatory or rather meaningless? --Leyo 09:07, 29 June 2011 (UTC)
- Filenames identify source and have few words about content, see for example here. For SVG files, I think we need to write some scraping software to create a spreadsheet with:
- "Author" and "Author Company"
- Title and description
- "Date created"
- URL (to be used to link back to the source image)
- "Album name" and "Keywords" can be useful for choosing categories
- "Filename" (to match it with the downloaded file)
- I am at the moment rather busy with Commons:Batch uploading/Web Gallery of Art but if someone can gather the metadata I can upload the files. --Jarekt (talk) 15:05, 29 June 2011 (UTC)
- As according to our discussion here (in German), these files additionally need to be fixed to change numbers that omit the leading zero (like .12345) to include this zero ( ---> 0.12345), else wikipedias renderer doesn't parse them correctly. (the substitution can also be of the type -.12345 ---> -0.12345). This is just in case someone very suddenly rushes in to upload these :) Iridos (talk) 23:24, 4 July 2011 (UTC)
- All of the SVGs in this library were originally created with Illustrator, although most were run through SCOUR, which I now see strips leading zeros. Does anyone know of any other SVG parsers that have a problem without leading zeros? —Preceding unsigned comment added by Adrianbj (talk • contribs)
- Filenames identify source and have few words about content, see for example here. For SVG files, I think we need to write some scraping software to create a spreadsheet with:
- Are the file names in the ZIP file self-explanatory or rather meaningless? --Leyo 09:07, 29 June 2011 (UTC)
- All the SVG files already contain DC metadata. There is also an online spreadsheet and excel version of metadata available.
- Links to searchable database of all images/symbols and custom download builder for all the symbols in SVG, AI and PNG in a zip archive.
- Just read through a translated version of the german discussion. Not sure why that virus didn't rasterize well. The PNG previews and downloadable versions on the IAN website were all created automatically with iMagick and rSVG, although problems like you are seeing did occur with various older versions of iMagick and rSVG. —Preceding unsigned comment added by Adrianbj (talk • contribs)
It seems they changed their licensing terms; the new license doesn't allow redistribution or sales, which makes it unacceptable for Commons. I guess the upload could still happen since CC licenses are irrevocable, but I imagine they wouldn't appreciate it much. The best solution would be for someone to contact them and ask them to change it back to CC BY. InverseHypercube 07:51, 18 February 2012 (UTC)
-
- Sorry about the licensing change - we do rely on this resource to bring traffic to our website, so we would really appreciate honoring of our new license. Thanks. —Preceding unsigned comment added by Adrianbj (talk • contribs)
- Thanks for commenting! However, if the images are licensed under CC-BY, we would be required to attribute (and link back to) your website, so no traffic loss would occur. In fact, since CC-BY would allow us to transfer images from your website, having your images on high-traffic sites such as Wikipedia would increase hits to your site, since they would all link to it. InverseHypercube 04:37, 8 March 2012 (UTC)
- A custom license tag such as in Category:Custom license tags might be used. It might contain a link to the website and or a direct link to the respective image (example). --Leyo 09:32, 8 March 2012 (UTC)
- I'd like to make the preview sized versions of all our images (photos and vector illustrations) available on Commons with a custom license tag (and attribution) and a direct link to the respective image on our site where users can register (free) and download the full resolution / vector (SVG) versions. Almost all the photos (JPG) have metadata embedded. All the SVG files also have metadata, but the preview PNGs do not because of the metadata issues with the PNG format. As I mentioned above, there is an automated spreadsheet available from our site with all the metadata. We are constantly adding images to the library. Is there any possibility to automatically update commons if I create a web service (XML/JSON) of all the images and metadata?
- That sounds great, and it can definitely be done. However, as I understand copyright law, by licensing the previews under CC-BY, for example, you would also be licensing the SVG files under the same license, since they do not meet the threshold of originality over the previews. While we might only upload the previews, I don't think you could stop others from distributing the SVG files. InverseHypercube 17:35, 8 March 2012 (UTC)
- I guess I was thinking of a custom license, rather than CC-BY, as suggested by Leyo. Would that work?
- I still think you would be effectively licensing the SVG files under the same license. InverseHypercube 17:49, 8 March 2012 (UTC)
- I understand what you are saying, but if the custom license says that users cannot redistribute or sell and that they must provide attribution even for the preview PNGs, would that work? Maybe this is too problematic for posting to Commons? We are actually also wanting to add the option for users to purchase the right to use our images without attribution, because at the moment, there are many cases when they can't use them due to the attribution requirement. We think this dual licensing model will make them more useful for more people. I'd be curious if anyone has any further suggestions.
- No, that wouldn't be allowed on Commons. See Commons:Licensing#Acceptable licenses; non-commercial licenses are not permitted. However, if you licensed the SVGs under a license that required attribution for redistribution, it would apply to the PNGs too. InverseHypercube 22:14, 9 March 2012 (UTC)
- Looks like I was wrong about not being able to license the preview images and the vector versions under separate licenses; the community consensus seems to be that you can. See Commons:Village_pump/Copyright/Archive/2012/01#CC_BY-SA_3.0_and_the_original_image_quality. InverseHypercube 18:38, 14 March 2012 (UTC)
- No, that wouldn't be allowed on Commons. See Commons:Licensing#Acceptable licenses; non-commercial licenses are not permitted. However, if you licensed the SVGs under a license that required attribution for redistribution, it would apply to the PNGs too. InverseHypercube 22:14, 9 March 2012 (UTC)
- I understand what you are saying, but if the custom license says that users cannot redistribute or sell and that they must provide attribution even for the preview PNGs, would that work? Maybe this is too problematic for posting to Commons? We are actually also wanting to add the option for users to purchase the right to use our images without attribution, because at the moment, there are many cases when they can't use them due to the attribution requirement. We think this dual licensing model will make them more useful for more people. I'd be curious if anyone has any further suggestions.
- I still think you would be effectively licensing the SVG files under the same license. InverseHypercube 17:49, 8 March 2012 (UTC)
- I guess I was thinking of a custom license, rather than CC-BY, as suggested by Leyo. Would that work?
- That sounds great, and it can definitely be done. However, as I understand copyright law, by licensing the previews under CC-BY, for example, you would also be licensing the SVG files under the same license, since they do not meet the threshold of originality over the previews. While we might only upload the previews, I don't think you could stop others from distributing the SVG files. InverseHypercube 17:35, 8 March 2012 (UTC)
- I'd like to make the preview sized versions of all our images (photos and vector illustrations) available on Commons with a custom license tag (and attribution) and a direct link to the respective image on our site where users can register (free) and download the full resolution / vector (SVG) versions. Almost all the photos (JPG) have metadata embedded. All the SVG files also have metadata, but the preview PNGs do not because of the metadata issues with the PNG format. As I mentioned above, there is an automated spreadsheet available from our site with all the metadata. We are constantly adding images to the library. Is there any possibility to automatically update commons if I create a web service (XML/JSON) of all the images and metadata?
- A custom license tag such as in Category:Custom license tags might be used. It might contain a link to the website and or a direct link to the respective image (example). --Leyo 09:32, 8 March 2012 (UTC)
- Thanks for commenting! However, if the images are licensed under CC-BY, we would be required to attribute (and link back to) your website, so no traffic loss would occur. In fact, since CC-BY would allow us to transfer images from your website, having your images on high-traffic sites such as Wikipedia would increase hits to your site, since they would all link to it. InverseHypercube 04:37, 8 March 2012 (UTC)
- Sorry about the licensing change - we do rely on this resource to bring traffic to our website, so we would really appreciate honoring of our new license. Thanks. —Preceding unsigned comment added by Adrianbj (talk • contribs)
[edit] Opinions
Assigned to | Progress | Bot name | Category |
---|---|---|---|
[edit] Yale
As discussed at Village Pump and announced here Yale released 250k images in its database under {{Cc-by-3.0}} license, see here for details.
We should start looking into moving them here while retaining all available metadata. --Jarekt (talk) 14:43, 3 June 2011 (UTC)
[edit] Opinions
My prelimary evaluation:
- 47343 images of paintings are available in high resolution at the present time. (Go here, fill in no fields, and click "Find.")
- Images are made available as TIFF files, max resolution appears to be 2400 x 3000 px, 8-bit color, often smaller (they're crops of a single photo, but not bad). We should upload original TIFFs as well as JPEG versions, and cross-link them.
- Image downloads via the website are protected by a re-CAPTCHA system. This needs to be either defeated, circumvented, or we need special permission to bypass it.
- Download speed appears to be throttled to about 80 KB/s. At this rate it will take roughly 93 days just to download them all. This is expected and should not be circumvented, since bandwidth hogging costs money and draws ire.
- We will require is a special license tag for these, because the situation is not simple. Yale has released their digitizations under CC-BY, which will be important in nations where digitizations may be protected by copyright or by a publisher's right, or in case of a hypothetical reversal of Bridgeman v. Corel. On the other hand, PD-Art indicates that attributing the source is not a legal requirement in the United States or other nations where reproductions carry no copyright, and we should not make reusers think that it is required. We need a special tag that combines these, while referring to the original entry in Yale's collection.
- I don't know if the URL suffix is a stable reference number. We should instead link to a search for the Accession Number, like this.
- Extracting metadata from HTML should be straightforward. Their metadata fields match our {{Artwork}} template rather well.
I can write a tool to get started on this, but have other obligations this week. Other opinions are welcome. Dcoetzee (talk) 07:29, 5 June 2011 (UTC)
- We already have good contacts at Yale. Meg Bellinger from Yale gave keynote speech at GLAMcamp_NYC (see notes and slides). We can ask en:User:Witty lama, who I think interacted with them, to check what would be be the way to get the data with the least interruptions. We can also check if and how would they prefer that we link to their system. I can start on the license templates, institution templates, etc. --Jarekt (talk) 20:35, 5 June 2011 (UTC)
[edit] License
I created {{PD-Art-Yale}} for 2D artworks. Please verify & correct/improve. I think we should add attribution text parameter and possibly put parts of it in an info box with Yale Logo so the credit is not lost in the text.
It is uncertain to me if CC license extends to "digitization" of 3D objects which are otherwise in PD. --Jarekt (talk) 14:05, 9 June 2011 (UTC)
- Looks good so far. I don't know if this collection includes three-dimensional works, or paintings with three-dimensional frames, but if it does it's worth noting that they must be used under the terms of the CC license in all nations (as the photograph would not be a mere copy). Dcoetzee (talk) 23:27, 9 June 2011 (UTC)
-
- Yes if they CC extends to photography of the 3D objects than we would need a separate license: Artwork - PD-old, Photography - CC
--Jarekt (talk) 02:09, 10 June 2011 (UTC)
Assigned to | Progress | Bot name | Category |
---|---|---|---|
[edit] Geheugen van Nederland
Initial request from Commons:Picture requests/Requests/Europe:
"There is a collection of photographs of historic maps, originating as far as I understand from the "Nederlands Scheepvaartmuseum Amsterdam". I have seen it here [14]. At the moment I am expecially interested in [15]. The maps would be interesting for a great number of articles, the latter one for some articles about "Noord-Friesland". Maybe somebody can make it possible to upload the whole collection. Thanks in advance and with best regards --92.230.245.120 03:35, 24 May 2010 (UTC)"
There are several collections that might be of use:
- Atlases from the Maritime Museum
- Atlases
- Romanticism in 70 paintings
- Alexander Ver Huell - Category:Alexander Ver Huell
- Nineteenth century school books
- Nynke van Hichtum - Category:Nynke van Hichtum
- Paintings from the Frans Hals Museum
- Paintings from the Mauritshuis
- Paintings of the Rijksmuseum
- Protestant portraits
- Rotterdam theater playbills (1791-1887)
- Digital Historical Atlas
- Engravings and drawings from the eighteenth century
- The poet Charles Beltjens - Category:Charles Beltjens
- Anna Louisa Geertruida Bosboom-Toussaint - Category:Anna Louisa Geertruida Bosboom-Toussaint
- Vincent van Gogh: letters, art, and context
- Music manuscripts of Alphons Diepenbrock (1862-1921) - Category:Alphons Diepenbrock
- ...
820 files from geheugenvannederland.nl are already available at Commons. -- Common Good 19:10, 29 April 2011 (UTC)
- I am not sure about the licence. I only found this. Sure we can import the images? --Slick (talk) 19:50, 13 August 2012 (UTC)
[edit] Opinions
Assigned to | Progress | Bot name | Category |
---|---|---|---|
[edit] Africa Centre
Africa Centre is a non profit organisation in Cape Town that supports arts and culture projects across Africa. Since 2007 they have commissioned thousands of arts and culture images that are related to their projects. The images give an insight into performance art, public art, site-specific art, poetry, visual art, social innovation, architecture, public space, etc. in Africa. They have applied the Creative Commons Attribution-ShareAlice 3.0 license, and have given me permission to upload their files. The photos for each of the Africa Centre projects would be uploaded under the categories Performance Art, Visual Art, Public Art, Poetry, Culture, Arts, and City of origin.Riannedac (talk) 08:54, 15 April 2011 (UTC)
- I guess a lot of these photographs are actually derivative works of modern art. Does the Africa Centre own the copyright to the works? Permission should be arranged with Commons:OTRS.
- For the actual uploading part we're writing Commons:Guide to batch uploading.
- Are you already in touch with Wikimedia South Africa? I'll send them an email about this project. Multichill (talk) 14:30, 16 May 2011 (UTC)
[edit] de.wikipedia.org
Everything in wikipedia:de:Kategorie:Datei:Commonsfähig that does not have wikipedia:de:Vorlage:NoCommons attached. Matt (talk) 12:06, 2 January 2011 (UTC)
[edit] Opinions
Some guys including myself at wikipedia:de:Wikipedia:WikiProjekt Commons-Transfer are currently doing this half-manually using Commons:Tools#Commons Helper which is a quite old and umaintained script. As there are >100.000 images ready for transfer this is going to take too long. It is also a very repetitive task as CommonsHelper does the same conversion-errors over and over again and it's successor is also not working / developed actively. There is User:Boteas, but it needs an extra template to start working which complicates everything even more. A more automatised solution would be great. Matt (talk) 12:06, 2 January 2011 (UTC)
Oppose I am astonished... why have we de:WP:NCF? Keep in mind that not all pictures which should have have the NoCommons template attached. E.g. Photographs of protected buildings in France. → automated transferring is not possible. At some step the licensing needs to be checked. Cheers --Saibo (Δ) 16:07, 2 January 2011 (UTC)
Oppose It should be clear, that the category Datei:Commonsfähig isn't set manually per review, but only by license-templates. Actually, we have some pictures of Paul Klee, that are tagged with PD-old at de.wp, but aren't free in US and so not commons-compatible. This couldn't be sorted by a bot. At this time, there are some projects locally for transferring to Commons, so the files on de.wp decreases about 100 per day. So for every not manually checked pictures that isn't ready for commons it have to be undeleted on de.wp and deleted here. There are 459 Files transferred to commons, that aren't checked - why work on this first? Additionally, all Files at the sub-categories of de:Wikipedia:Dateiüberprüfung have to be excluded. --Quedel (talk) 23:42, 2 January 2011 (UTC)
- Perhaps It should be possible to find some categories to work on. For example PD old, NASA images etc. If the license is wrong then file should be deleted on de-wiki anyway. Admins could do the check when they delete the file on de-wiki. If the file is ok they delete the local file. If not they nominate local + Commons file for deletion. Files could also be tagged with a special template on Commons to show this file is removed without a review. It the template has two links "OK" and Not OK" users can click on it only takes a moment to review. If ok template is removed and if not ok file is nominated for deletion. --MGA73 (talk) 16:46, 7 January 2011 (UTC)
-
-
- Then we need to disable revobot to not have all the images in their "raw state" (unchecked) directly in nowCommons category at dewp. There needs to be a detailed plan when which actions need to be done and who does it and how long it will approximately take. And of course a calculation where the savings (if there are at all) are. Your first request here was a bit rushed, Matt. And: to be honest, I would prefer to discuss this in German. Cheers --Saibo (Δ) 21:48, 8 January 2011 (UTC)
- w:de:Vorlage:Bild-PD-US is maybe a difficult example as it would be best to use more specific templates here at Commons like Template:PD-USGov-NASA, Template:PD-USGov-CIA, Template:PD-USGov-NOAA... I think this has to be done manually or at least corrected afterwards by hand. We deleted those at de.wikipedia.org and replaced it with the generic template. Matt (talk) 15:25, 9 January 2011 (UTC)
-
-
- To MGA73's ongoing: There is a well system for files with lacking informations, that will result in fine licensed-files for about half the files. With this new way, this files will be lost, because no automatically asking the uploader will be possible. Another problem is: who will do the work? The list of "NowCommons"-Files not deleted on de.wikipedia increases, there aren't enoug admins to check they. And there are are more than 500 files transferred to commons and not checked. Additionally, there are an amount of file, not tagged with FoP or similiar, or files that are PD-old in Germany, but not on Commons (for example: pictures from Mr. Klee). So this file would be transferred get the "NowCommons-Template" locally, then had to be deleted on Commons and untagged locally. How many files are in Kategorie:Commonsfähig? Approximate 80.000 files (only half of all files). Testing the category GFDL-Bild shows, that only 55% are commons-ready as they are tagged actually (for CC-by-sa, the most chosen license i cannot check it, CatScan doesnt handel so much files). To Saibo: Disucssing in German would nice. --Quedel (talk) 23:11, 8 January 2011 (UTC)
- The plan could be like this. We choose a category where we expect that perhaps > 95 % of the files to be good. Then some users scan the category for junk and possible copyvios and tag those few files for deletion or a "don't move". Then a bot moves the rest of the files - perhaps except those without a description. Then admins on de-wiki deletes the local file if transfer is ok. If not they change the "NowCommons" to a "NoCommons" (or whatever template is used on de-wiki) and mark file for deletion on Commons. That way we only have to check files once.
- If the bot works correct there is no need to check that information is transfered correctly. Then we "just" need to check if the license is valid and the categories. If license is ok and the only outstanding issue then we could still delete the file on Commons.
- Big categories is not a problem. We just do the queries we need on toolserver to find the files to work on. So just bring up all ideas and things we should know, do or not do. --MGA73 (talk) 18:14, 10 January 2011 (UTC)
- To MGA73's ongoing: There is a well system for files with lacking informations, that will result in fine licensed-files for about half the files. With this new way, this files will be lost, because no automatically asking the uploader will be possible. Another problem is: who will do the work? The list of "NowCommons"-Files not deleted on de.wikipedia increases, there aren't enoug admins to check they. And there are are more than 500 files transferred to commons and not checked. Additionally, there are an amount of file, not tagged with FoP or similiar, or files that are PD-old in Germany, but not on Commons (for example: pictures from Mr. Klee). So this file would be transferred get the "NowCommons-Template" locally, then had to be deleted on Commons and untagged locally. How many files are in Kategorie:Commonsfähig? Approximate 80.000 files (only half of all files). Testing the category GFDL-Bild shows, that only 55% are commons-ready as they are tagged actually (for CC-by-sa, the most chosen license i cannot check it, CatScan doesnt handel so much files). To Saibo: Disucssing in German would nice. --Quedel (talk) 23:11, 8 January 2011 (UTC)
-
-
Ich mach mal 'ne Liste (die kann sicherlich effizienter noch zusammengefasst werden)
- include
- exclude
- de:Kategorie:NowCommons oder die die Vorlagen de:Vorlage:NowCommons bzw. de:Vorlage:NowCommons/Mängel einbinden
- de:Kategorie:Datei:geprüfte Datei
- de:Kategorie:Datei:Kategorisieren nach Zweitprüfung
- de:Kategorie:Datei:NoCommons oder die die Vorlage de:Vorlage:NoCommons einbinden
- de:Kategorie:Datei:Zweitprüfung ausstehend
- de:Kategorie:Wikipedia:Dateiüberprüfung (Tageskategorien, aktuell)
- de:Kategorie:Wikipedia:Dateiüberprüfung/Gültige Problemangabe
- de:Kategorie:Wikipedia:Dateiüberprüfung/Informationsmängel
- de:Kategorie:Wikipedia:Dateiüberprüfung/Ohne Problemangabe
- de:Kategorie:Datei:Beschreibung fehlt
- de:Kategorie:Datei:Panoramafreiheit oder die die Vorlage einbinden
- de:Vorlage:NoCommons (Benutzerbild)
- alle Vorlagen, die mit "Dateiüberprüfung" beginnen, inkludierend de:Vorlage:DÜP.
- Zu bedenken und einzusortieren in die obigen Kategorien
- w:de:Vorlage:Kennzeichen verfassungswidriger Organisationen (CommonsHelper übersetzt das nicht)
--Quedel (talk) 23:56, 8 January 2011 (UTC)
-
- I opened a bug report for that at https://jira.toolserver.org/browse/MAGNUS-228
- needs manual tweaking or intelligent bot
- w:de:Vorlage:Bild-PD-US → Template:PD-USGov-$Acronym-of-the-federal-institution
Info Announcement of a new bot: COM:VP#Easy bulk transfer from English Wikipedia to Commons. --Leyo 16:51, 22 January 2011 (UTC)
Assigned to | Progress | Bot name | Category |
---|---|---|---|
[edit] Canada Line
From the English Wikipedia I stumbled upon http://canadalinephotos.blogspot.com/. "Here you will find photography of the Vancouver, B.C. Canada Line, which opened to the public on August 17th, 2009. <...> The photography presented in this blog (780 posts containing 22,000 photos) will be kept online as a historical archive of the construction of the Canada Line." All files are licensed {{cc-by-2.5-ca}} and you have to attribute "Tafyrn & Seamora" and link back to the blog. You can find the actual images in http://www.seataf.com/blogs/canadaline/ . Based on the page it's used on (for example http://canadalinephotos.blogspot.com/2009/04/2009-04-16-waterfront-station.html) you should be able to decide on the title and give it a category (or Category:Canada Line if you can't find anything else).
- Great collection, but all images are in 640×480. Are we sure to upload this small resolution? Can anybody, with better english than me, contact the author and request higher resolution for commons? --Slick (talk) 11:13, 11 August 2012 (UTC)
[edit] Opinions
Assigned to | Progress | Bot name | Category |
---|---|---|---|
[edit] Codex Gigas
The Swedish National Library has made available the Godex Gigas, a 13th century bible manuscript which is also the largest medieval manuscript in existence, in its entirety. It's available in high resolution through FSI Viewer and in medium resolution as jpegs. The whole file structure is available at National Library's website here. The jpegs seem quite simply to download, but it would be even more interesting to extract the high-resolution pictures out of the viewer.
As a reproduction of a medieval volume, there are no copyright issues to worry about (except for perhaps the pictures of the highly ornate cover).
Peter Isotalo 13:30, 29 November 2010 (UTC)
[edit] Opinions
Assigned to | Progress | Bot name | Category |
---|---|---|---|
[edit] Right Livelihood Award
After some discussion with the Right Livelihood Award Foundation I got a clarification on the usage conditions of the photos provided on their website. The details of the discussion can be found in the OTRS, ticket 2010103110002401.
Basically, pictures (mainly portraits of the laureates) from download.rightlivelihood.org which are marked with a copyright by the Right Livelihood Award Foundation in the respective license files can be used free upon attribution of the photographer and the Foundation, i.e. Template:Attribution. All pictures with other copyrights are in general incompatible to Wikimedia Commons since the Foundation does not own all the rights and they are "free to use as long as they are used in the context of the Foundation's and its Laureates' work." In this respect, the information on http://rightlivelihood.org/press_room.html is not formulated well.
So I wonder if some kind of batch upload of these pictures make any sense, or if it is faster sorted and uploaded completely manually. --Prolineserver (talk) 22:37, 26 November 2010 (UTC)
[edit] Opinions
Assigned to | Progress | Bot name | Category |
---|---|---|---|
[edit] Pictures of Tom Ruen
I request an upload of astronomical images of Tom Ruen from English Wikipedia. As I can see, all of them have free license, so they can be uploaded into Commons. --Emaus (talk) 19:34, 21 October 2010 (UTC)
[edit] Opinions
Assigned to | Progress | Bot name | Category |
---|---|---|---|
[edit] Old city maps
Please have a look at this website. It is a digitization project of old maps. It is done by the Hebrew University of Jerusalem and other institutions. They have a sizable database of old maps that are mostly in the public domain. I searched Commons for a sample of their files to find out if they have already been uploaded, and couldn't find any. These are very rare centuries-old maps and they could be invaluable for many Wikimedia projects.
The maps contain copyright watermarks which obviously don't represent the true status of the copyrights. However, if the university can be contacted and asked if they could collaborate with us and give us access to the un-watermarked maps, and in return we could offer a customized tag (like what'd been done for other mass uploads), it would save us incredible amounts of time and effort working on removing them at the Graphic lab, especially since we have enough work at our hands (just look at the Category:Images for cleanup backlog).
I hope you can start this upload soon. Regards, -- Orionist ★ talk 23:41, 5 October 2010 (UTC)
[edit] Opinions
Assigned to | Progress | Bot name | Category |
---|---|---|---|
[edit] IUCN red list
As I mentioned at talk page, we have established a partnership with the International Union for the Conservation of Nature (IUCN) to produce range maps for many species of animals. See Commons:IUCN red list for a few more details. The GIS manager at IUCN has actually kindly produced now around 6000 maps (in .gif) for all the amphibian species they currently have data on. They have placed the zip file in a password protected ftp site (I can send someone the file, it is only 60 something megabytes). You can see the samples I have uploaded, at Commons talk:IUCN red list#New developments. I also have a .dbf file with information about the source of the spatial data, and I will get shortly a file with a relation between species names and the identification number for the species at the red list website (for example 56054 is the ID for Acanthixalus sonjae). This can be used to extract the Assessor information required to complete the description file. I know that Polbot's sixth task used information retrieved from the IUCN red list website, so it should be possible to use part of its code to retrieve this information. There should be a few other batches later on. GoEThe (talk) 12:41, 1 October 2010 (UTC)
Update: The IUCN is going to send me a total of 30,000 images to be uploaded. They would like it to be done in time for their next website update, which will be on November 11th. Can anybody help me with this? GoEThe (talk) 16:06, 21 October 2011 (UTC)
[edit] Opinions
Support I support this as there are plenty of articles are missing range maps. It is nice that the IUCN would like to support us by releasing range maps to us even if they are in .gif format. --Clarkcj12 (talk) 22:27, 24 January 2012 (UTC)
- Maybe we can also per bot make png out of them or is that not possible.--Sanandros (talk) 14:31, 21 August 2012 (UTC)
Assigned to | Progress | Bot name | Category |
---|---|---|---|
-
- I think I might be able to do this if I get enough information. Werieth (talk) 01:24, 28 January 2013 (UTC)
- I would be interested in helping this move forward. -- Daniel Mietchen - WiR/OS (talk) 22:04, 8 February 2013 (UTC)
- I think I might be able to do this if I get enough information. Werieth (talk) 01:24, 28 January 2013 (UTC)
[edit] KROK2009
Please upload photos from "Festival of world animation "KROK2009"". License: CC-BY 3.0.
[edit] Opinions
What is that kind of festival and who are the persons? Right now i don't see any scope--Sanandros (talk) 14:32, 21 August 2012 (UTC)
Assigned to | Progress | Bot name | Category |
---|---|---|---|
[edit] VOA pronunciation sound files
Voice of America has a great pronunciation guide with sound files for 2200 hard-to-pronounce names, places, etc. The sound files are PD as US govt works. These would be great additions to many Wikipedia articles. The pronunciation guide is here. These would need to be downloaded, converted to OGG, and uploaded. Thoughts? Calliopejen1 (talk) 18:41, 12 September 2010 (UTC)
[edit] Opinions
Assigned to | Progress | Bot name | Category |
---|---|---|---|
It looks like there are ~6500 entries under "list lookup". The sounds seem to be good. The conversion could easily be done with ffmpeg. The mp3's don't seem to be more than 15kb, so the total upload would be around a hundred MB. In addition to the mp3, their is information on name, country, country, pronunciation, and notes which could be used to categorize them. Provided the entries are indeed PD, they could be easily batch uploaded. For naming, you could use VOA-name.Smallman12q (talk) 23:44, 27 December 2010 (UTC)
- How u really can say the are PD-VOA? Who are the authors? I'd upload theme with PD-Treshhold of Originality
[edit] Population distributions of Japan
I would like to upload images from this category. The images in question are populations distributions of various japanese cities, towns and villages. They are used, for example, in this article or this one. I've uploaded a bunch of samples: 1, 2, 3, the full list. Claymore (talk) 14:50, 6 August 2010 (UTC)
[edit] Opinions
Looks very good! Nothing comes to mind to change here. At jawp however you should add {{NowCommons}} to the images and replace all usage so the admins at jawp can easily delete the files. Multichill (talk) 17:58, 6 August 2010 (UTC)
- They depend on a template system that requires names of the file to be "Demography(xxxxx).svg". I'll see if I can convience them to move to the template implementation I created for ruwp. Claymore (talk) 07:07, 7 August 2010 (UTC)
- A template system which is based on the names of files is sooooo broken. Multichill (talk) 09:03, 7 August 2010 (UTC)
Assigned to | Progress | Bot name | Category |
---|---|---|---|
Claymore | ClaymoreBot | Population distribution of Japan |
[edit] The Tansey Collection of Miniatures
Hi. The Tansey Collection of Miniatures have a large collection of 17th, 18th and 19th century miniature portrait paintings in high resolution. The paintings are definitely within our scope, and would be a great addition to the commons. I have therefore uploaded some of them here, but since there are so many and the frames needs to be cropped to make them eligible for PD-Art, some help would be appreciated. Cheers —P. S. Burton (talk) 17:25, 29 July 2010 (UTC)
[edit] Opinions
- This sounds like it could be end up being a situation similar to that which developed with the UK National Portrait Gallery. Have you, as a courtesy, considered contacting the curators of the collection before doing a systematic process such as this? I also have objections to this based on the cropping, but that is a different issue to that related to batch uploading so I will raise this back at the Village pump (though any need for cropping will make automation difficult or impossible here, especially for the circular and oval miniatures, which is most of them). Carcharoth (Commons) (talk) 06:29, 31 July 2010 (UTC)
Assigned to | Progress | Bot name | Category |
---|---|---|---|
[edit] Piqs.de
We could have a bot upload images from http://www.piqs.de/ It is a page like Flickr but all images are licensed http://creativecommons.org/licenses/by/2.0/de/deed.de and therefore all ok for Commons.
I created a category for the images and a template to use {{Piqs}}. It needs a better picture but the biggest problem is which images we should upload. We could upload ALL images or have users SELECT images. Perhaps we could make a bot like the one we use to upload images from Flickr. Suggestions? Opinions? --MGA73 (talk) 20:39, 24 July 2010 (UTC)
- Nice page. For the initial import my suggestion is parse the subpages of the top pictures or here or here or here. If possible watch for new files there in the future (i.E. by the given rss feeds). So we will get only the best and not all the others. But this only make sence when it is done in intervals, not only one time. And another hint, the bot should to login to get the highest solution (or maybe there is a woraround to download the original?). --Slick (talk) 14:09, 11 August 2012 (UTC)
[edit] Opinions
Assigned to | Progress | Bot name | Category |
---|---|---|---|
[edit] Pearson Scott Foresman SVG files
Users at the Open Clip Art Library have created many SVG versions of line drawing files by Pearson Scott Foresman here. They should be uploaded with the DerivativeFX tool, and the raster version tagged with Template:SupersededSVG. File:Catfish (PSF).svg is one file I have uploaded so far; use it as a basis for formatting new SVG upload filepages. --Siddharth Patil (talk) 15:13, 28 June 2010 (UTC)
[edit] Opinions
Assigned to | Progress | Bot name | Category |
---|---|---|---|
[edit] Weather maps
The Hydrometeorological Prediction Center provides daily weather maps of the United States from 2003 to the present. These are high-quality and educational, and are able to be used in galleries, articles and other content pages on several projects. All strictly {{PD-USGov}}. This is just a proposal for now, rather than an actual request, to see what folks think. –Juliancolton | Talk 16:54, 19 March 2010 (UTC)
[edit] Opinions
- I don't bite! :) –Juliancolton | Talk 00:22, 24 May 2010 (UTC)
- If you feel like the images would indeed be useful, then I am happy to upload them to Commons. There are exactly 18490 images to be downloaded, counting from September 1, 2002 until today, October 15, 2012. I'll let you know as soon as I have them on my HDD. odder (talk) 14:20, 15 October 2012 (UTC)
- Looks like the Hydrometeorological Prediction Center doesn't publish its maps in advance, so I only managed to get the maps until October 14, 2012. There are 14,841 .gif files to be uploaded, and they're around 650 MiB in total. odder (talk) 07:50, 19 October 2012 (UTC)
- If you feel like the images would indeed be useful, then I am happy to upload them to Commons. There are exactly 18490 images to be downloaded, counting from September 1, 2002 until today, October 15, 2012. I'll let you know as soon as I have them on my HDD. odder (talk) 14:20, 15 October 2012 (UTC)
Assigned to | Progress | Bot name | Category |
---|---|---|---|
[edit] Old Book Art
http://www.oldbookart.com/ This site has tons of old public domain book illustrations. If a bot can upload them, I'll happily categorize them. Rocket000 (talk) 14:31, 21 January 2010 (UTC)
- After reading http://www.oldbookart.com/about/, I think it's best to contact him and see if we can do some sort of partnership. Are you willing to contact him? Multichill (talk) 22:15, 23 May 2010 (UTC)
- The images are either released as CC-by-sa or public domain. I think he specifies that he would like as a courtesy a link back to his website, so I think that would suffice for this upload.--Diaa abdelmoneim (talk) 07:22, 14 September 2010 (UTC)
Assigned to | Progress | Bot name | Category |
---|---|---|---|
[edit] US Coast Guard
Like the other US federal gov sites, this site contains lot's of nice images. gallery. Multichill (talk) 21:45, 16 January 2010 (UTC)
[edit] Opinions
- I think these would be great to upload. Definitely public domain. As I'm getting into batch uploads, I'm willing to work on these. -Aude (talk | contribs) 21:25, 9 March 2011 (UTC)
Support Like the Navy pics they are also fine for us.--Sanandros (talk) 14:40, 21 August 2012 (UTC)
Assigned to | Progress | Bot name | Category |
---|---|---|---|
[edit] University of Washington Digital Collections
The same algorithm applied to Commons:Batch uploading/Freshwater and Marine Image Bank can be used on multiple collections of the UW collections. I'll list some here with the reason of why the images would be PD.
- Albert Henry Barnes collection 302 files. It's {{PD-old-70}} since the author died in 1920 according to this
- Alaska Youkon Pacific 1311 might contain some of Category:Alaska-Yukon-Pacific Exposition {{PD-US}}
- William F. Boyd All images are before 1923 according to this so {{PD-US}}
- Boyd and Braas photographs all before the 20th century
- Childrens books most are PD since they were released before 20th century.
- John N. Cobb died in 1930 so {{PD-old-70}}.
There are many more that could be checked.--Diaa abdelmoneim (talk) 17:54, 17 October 2009 (UTC)
[edit] Opinions
Assigned to | Progress | Bot name | Category |
---|---|---|---|
[edit] NOAA Photo Library
The Fema request got me started. NOAA has a nice set of images at http://www.photolib.noaa.gov/ . Not sure what amount of images we're talking about, but at least a couple of thousands. Multichill (talk) 20:09, 14 October 2009 (UTC)
See the catalog of images.
[edit] Opinions
If possible, go ahead with it since there haven't been any objections. –Juliancolton | Talk 16:55, 19 March 2010 (UTC)
It does sound good. -- User:Docu at 19:42, 2 May 2010 (UTC)
Some or all of these images don't have metadata, including the dates of when they were taken. --O (谈 • висчвын) 20:04, 07 August 2010 (GMT)
Hmm r they really free? Cause some of them have an author which is not working for the NOAA directly, but working for an university which takes part in that project.--Sanandros (talk) 14:44, 21 August 2012 (UTC)
Assigned to | Progress | Bot name |
---|---|---|
[edit] Images from Beinecke's collections
One more wonderfull collection with lot of PD-images - http://beinecke.library.yale.edu/digitallibrary/ 200,000 digitized images of photographs, illuminated manuscripts, maps, works of art, and books from the Beinecke's collections --Butko (talk) 08:50, 16 April 2009 (UTC)
- Did you contact them? Did you get a release? Or is this merely a suggestion. That shouldn't go here imho. Nice collection though, we should contact them to get some nice images. Multichill (talk) 14:05, 7 June 2009 (UTC)
- I would like to help out on the acquisition of images of this library. I wanted to send an e-mail but thought it would be best if we work together on a draft. --Diaa abdelmoneim (talk) 14:59, 7 June 2009 (UTC)
- Ok. As discussed on irc: You'll contact the library. Please keep me posted. Multichill (talk) 15:07, 7 June 2009 (UTC)
- Any update on this one? Multichill (talk) 23:14, 4 September 2009 (UTC)
- I sent them a mail multiple times but they didn't reply....--Diaa abdelmoneim (talk) 23:18, 4 September 2009 (UTC)
- Any update on this one? Multichill (talk) 23:14, 4 September 2009 (UTC)
- Ok. As discussed on irc: You'll contact the library. Please keep me posted. Multichill (talk) 15:07, 7 June 2009 (UTC)
- I would like to help out on the acquisition of images of this library. I wanted to send an e-mail but thought it would be best if we work together on a draft. --Diaa abdelmoneim (talk) 14:59, 7 June 2009 (UTC)
- User:JovanCormac seems to have started uploading the Detroit Company images. Maybe the batch should be split into many parts then each uploaded on its own.--Diaa abdelmoneim (talk) 17:06, 16 October 2009 (UTC)
Can this be removed from the the list? (Commons:Batch uploading)? -- RE rillke questions? 18:27, 4 June 2012 (UTC)
[edit] Images from World Digital Library
New site with PD-images - http://www.wdl.org. Contain 1170 items --Butko (talk) 06:52, 22 April 2009 (UTC)
- User:Sj shown interest in working on this upload. Looks like a very nice collection. Some points:
- The items have an id (http://www.wdl.org/en/item/100/), so easy to loop over
- The description of the items is available in a lot of languages, we should use that
- Lot's of metadata is available, this should make categorization easier
- One item can contain multiple files. We should be aware of that
- Files are available in the tiff file format. We should either have tiff thumbnails or upload tiff and a jpg version (transcoding!)
- Experience and code gained with the usgov uploads should be (re)used
- Some items have curator video's, might be fun to upload too
- Multichill (talk) 14:13, 8 November 2009 (UTC)
Any progress? -- RE rillke questions? 18:29, 4 June 2012 (UTC)
- Thanks for the reminder. They've done a batch of updates recently; I'll see if I can get a dump next week before finding a suitable scraper. --SJ+ 06:52, 21 June 2012 (UTC)
[edit] Maps from Ryhiner Collection
Available from www.stub.unibe.ch/stub/ryhiner/ I´ve dealing with this collection for time (see this file for a example). This collection consists in "over 16000 high resolution images: maps, town plans and topographical views from the 16th to the early 19th century". So, if this declaration can be taken in face value, there is no problem with copyright because this maps are already in Public domain and being a 2D works their digital copies are also in PD. So if this statements are correct all their collection could uploaded by a bot to commons. Their maps are avaible in high resolution using zoomify (see the exemple map in their site). Tm (talk) 13:20, 22 April 2009 (UTC)
[edit] Opinions
- Looks like a great collection. Is it possible to access the source files? Did you try contacting them? Multichill (talk) 14:03, 7 June 2009 (UTC)
Sorry for the delayed answer. To aswer your first question, i don´t know if it´s possible to have online acess to their source files, and i am not very techie savy. Also i didn´t try to contact them. What is your opinion of what are the next steps to take? Tm (talk) 01:25, 15 June 2009 (UTC)
- I´ve sent today an email asking for their permission to make this batch upload. I thought that asking now if their source files are avaible online in this stage would be too soon. Tm (talk) 15:10, 2 July 2009 (UTC)
- Sorry about not responding sooner, looks like i forgot to watchlist this page. We're in the non tech phase. Try to contact them, see if they like it. If that turns out alright we can start the actual data retrieval and uploading part. Writing a general story about this is still on my list. I'll see if I can make a first version. Multichill (talk) 16:59, 2 July 2009 (UTC)
Just a quick update to tell that i received a automatic answer about the absence of the person contacted by my email, and i forward it to a email i received in the answer. When and if i receive a answer i´ll update this page. Tm (talk) 00:48, 3 July 2009 (UTC)
I received a aswer, and already replied to it, but i am waiting permission to republish the email or the contents of the aswer that i received. Tm (talk) 04:10, 12 July 2009 (UTC)
- You can always use OTRS if you want to keep it private. Multichill (talk) 10:56, 12 July 2009 (UTC)
The question isn’t exactly about privacy, but more about building trust between the parts, after the NPG case (I fully support Dcoetzee), with might have been heard by this people and gave them a bad impression of Wikimedia Commons and its users. I can tell, without breaking the secrecy correspondence, that the answer that I received was slightly positive to the possibility of cooperation, but the person that answered made some questions, doubts and remarks that need to be addressed, about this possible cooperation, (I gave my opinion), but requested that its answer be publish so that more people can give their input. Despite this I received an automatic answer to my second email telling that I might not receive a second email until 10 of August. Tm (talk) 07:39, 19 July 2009 (UTC)
- Any update on this one? Multichill (talk) 23:13, 4 September 2009 (UTC)
Not much. I´ve received a email on 11 of August telling, that do to the holidays of the person that i´ve send the mail, the answer would be delayed but i´ve not received nothing subsequently, until now. Tm (talk) 23:43, 4 September 2009 (UTC)
- I have send an email today. as i´ve only received a email on 15 of September telling me that the person i contacted had contacted the library but was still waiting an answer. In this email i asked if there is already an answer. When i receive a answer i´ll update this page. Tm (talk) 04:05, 21 November 2009 (UTC)
-
- I have to report that the library that keeps this collection, unfortunetly, decided to reject the request made some months ago as, according to the person i exchanged emails, this request "lacks a formal application and there is no treatment needed because the maps are already available online for the public." Tm (talk) 23:34, 14 January 2010 (UTC)
-
-
- Ok. Looks like we're going to scrape their site after all. I'll have a look at it. Multichill (talk) 23:51, 14 January 2010 (UTC)
-
- These images are easily scrapable through a bit of regex and looping. The various galleries are listed here Where each gallery has about 40 images of the same subject, different periods probably. Next to each gallery the name of the place is listed, where the category could just be like Category:Scotland maps or the like. We've done uploads through the Zoomify upload before so the experience is there.--Diaa abdelmoneim (talk) 10:07, 17 October 2009 (UTC)
- I had a look at {{PD-art}}. Seems to work in Switzerland so no NPG issues ;-)
- The plan:
- Loop over the galleries at http://www.zb.unibe.ch/maps/ryhiner/sammlung/?group=volume (does that contain all maps?)
- Loop over all images in a gallery
- For each image pull the metadata. Several sources. Have to see what information is useful
- Pull the image with some dezoomify tool
- Generate filename, description and categories
- Upload to Commons
- What metadata to use exactly is somewhat tricky. Also the dezoomify if a bit of extra work. Multichill (talk) 15:43, 15 January 2010 (UTC)
- Multichill, might I volunteer my dezoomify.py script, which will take in a web page holding a zoomify Flash object, regex for the location of the image tiles automatically and download and recompose the highest zoom level available. Have a look at: this page, which has a full code listing. Example of its work can be seen here. I hope it's useful. Inductiveload (talk) 02:58, 19 February 2010 (UTC)
- Sure. Looks nice at first glance, but you should split it up in functions and use objects so it can be used in other programs (like pywikipedia). Probably best to make a lib part and a commandline part (which uses the lib part). What license is you code? Do you need some help restructuring it? Did you take a look at this script when you wrote your code? Multichill (talk) 09:19, 19 February 2010 (UTC)
- Multichill, might I volunteer my dezoomify.py script, which will take in a web page holding a zoomify Flash object, regex for the location of the image tiles automatically and download and recompose the highest zoom level available. Have a look at: this page, which has a full code listing. Example of its work can be seen here. I hope it's useful. Inductiveload (talk) 02:58, 19 February 2010 (UTC)
Any progress? -- RE rillke questions? 18:25, 4 June 2012 (UTC)
Assigned to | Progress | Bot name |
---|---|---|
[edit] Freshwater and Marine Image Bank
The Freshwater and Marine Image Bank from the Digital Collections at the University of Washington states:
- "Materials in the Freshwater and Marine Image Bank are in the public domain. No copyright permissions are needed. Acknowledgement of the Freshwater and Marine Image Bank as a source for borrowed images is requested."
The entire library can be browsed here: [16]
These photos would be useful in the many marine and freshwater life articles of the Wikipedias. The images are encyclopedic and are very high quality.
The digital collection has been "closed" since June, but the site is still accessible. My guess is the site will shut down within a few days (whenever their webspace subscription ends).
Any way someone could set up a batch for this? Thanks, Bob the Wikipedian (talk) 19:46, 13 July 2009 (UTC)
[edit] Opinions
Um...no responses yet? Perhaps I should revisit the fact this database isn't supposed to be up much longer. Either we take the images now or they might not be there a few months from now. Bob the Wikipedian (talk) 01:23, 28 July 2009 (UTC)
- I would like to echo the great potential utility of the UW image database! In many of the subjects of particular interest to me (e.g. North Pacific marine ecology, marine mammals, Pacific salmon, sturgeon species, indigenous people) the collection is a real goldmine. Somebody, whoever is out here making such magical batch uploads possible, please respond! Best, Eliezg (talk) 21:13, 9 August 2009 (UTC)
- Simply looping from http://content.lib.washington.edu/cdm4/item_viewer.php?CISOROOT=/fishimages&CISOPTR=32550 to http://content.lib.washington.edu/cdm4/item_viewer.php?CISOROOT=/fishimages&CISOPTR=53764 gives you all the required data for the collections. I don't know however how to extract the images themselves.
- Ok I found out a way to do so, using getimage.exe ... just use this code: http://content.lib.washington.edu/cgi-bin/getimage.exe?CISOROOT=/fishimages&CISOPTR=52164&DMSCALE=100&DMWIDTH=MAX&DMHEIGHT=MAX&DMX=0&DMY=0&DMTEXT=&REC=1&DMTHUMB=0&DMROTATE=0 while looping "CISOPTR" number. So the images are from http://content.lib.washington.edu/cgi-bin/getimage.exe?CISOROOT=/fishimages&CISOPTR=32550&DMSCALE=100&DMWIDTH=MAX&DMHEIGHT=MAX&DMX=0&DMY=0&DMTEXT=&REC=1&DMTHUMB=0&DMROTATE=0 to http://content.lib.washington.edu/cgi-bin/getimage.exe?CISOROOT=/fishimages&CISOPTR=53764&DMSCALE=100&DMWIDTH=MAX&DMHEIGHT=MAX&DMX=0&DMY=0&DMTEXT=&REC=1&DMTHUMB=0&DMROTATE=0 with the different metadata grabbable as described above...--Diaa abdelmoneim (talk) 16:45, 17 October 2009 (UTC)
Assigned to | Progress | Bot name |
---|---|---|
[edit] Zorger
Message below was posted on the Commons:Village pump --Jarekt (talk) 19:23, 18 September 2009 (UTC)
- Looks like a batch upload could be useful here: public-domain.zorger.com. Tekstman (talk) 18:03, 18 September 2009 (UTC)
I browsed the site and they seem to have few hundred images scaned from old books with clear sources and their own PD justification. Some of those images might be useful, like those. Some should match them to our PD licenses. --Jarekt (talk) 19:23, 18 September 2009 (UTC)
[edit] Opinions
Assigned to | Progress | Bot name |
---|---|---|
[edit] Mollusca by Jan Delsing
Photos of shells of Mollusca (143 bivalves, 1469 gastropods) by Jan Delsing from http://www.biolib.cz/en/galleryuser/?uid=3973
The only uploaded example is: http://commons.wikimedia.org/wiki/File:Pythia_scarabaeus_shell.jpg
The best names of files would be: BINOMIAL NAME shell.jpg
Example of filenames:
- File:Pythia scarabaeus shell.jpg
- File:Pythia scarabaeus shell 2.jpg
- File:Pythia scarabaeus shell 3.jpg
- File:Pythia scarabaeus shell 4.jpg
- and so on.
Thanks. --Snek01 (talk) 18:09, 6 October 2009 (UTC)
- If this information could help, then EOL has cooperation with biolib.cz and EOL takes public domain images and Creative Commons images from this source automatically. --Snek01 (talk) 10:18, 12 November 2009 (UTC)
[edit] Opinions
Assigned to | Progress | Bot name |
---|---|---|
[edit] Nasa Technical Reports Server (NTRS)
Nasa's NTRS contains over 1 million records including over 40,000 pdfs; tens of thouasands of images, and tens of thousands of videos. Most of these are items are in public domain as they are a work of nasa.
For an example of pdfs containing useful skematics, images, and diagrams see...
In addition to pdfs...they also have tens of thousands of images and videos.
Perhaps NASA could be contacted requesting a dump of sorts of their images...but at a million records, many of which are copyright free...there's a trove of free media.
- So does anyone have any opinions?Smallman12q (talk) 23:31, 13 October 2009 (UTC)
- Anyone...Smallman12q (talk) 22:44, 19 October 2009 (UTC)
- So I take it that tens of thousands of high quality diagrams and photographs of spacecraft simply aren't of interest?Smallman12q (talk) 14:30, 31 October 2009 (UTC)
- I couldn't find any videos or images. And it's difficult to extract images from the pdf.--Diaa abdelmoneim (talk) 12:16, 11 December 2009 (UTC)
- I looked at the pdf and now know more about cleaning telescope mirrors than I am likely to need. Most images need rotation at least. What is worse is that they are interspersed with manufacturers copyvios. Rich Farmbrough, 12:06 24 March 2012 (GMT).
- I couldn't find any videos or images. And it's difficult to extract images from the pdf.--Diaa abdelmoneim (talk) 12:16, 11 December 2009 (UTC)
- So I take it that tens of thousands of high quality diagrams and photographs of spacecraft simply aren't of interest?Smallman12q (talk) 14:30, 31 October 2009 (UTC)
- Anyone...Smallman12q (talk) 22:44, 19 October 2009 (UTC)
- You couldn't find the videos/images=(. Look
- here for virtually all of their files pdfs, etc..
- http://ntrs.nasa.gov/search.jsp?Ne=26&N=265+276&Ns=HarvestDate%7C1&as=false here for their online videos...example
- http://ntrs.nasa.gov/search.jsp?Ne=25&N=269+266&Ns=HarvestDate%7C1&as=false for photos/images...here they have 516504 records example, example2
A lot of the images are related to nasa activities, while others are more general photos and diagrams.Smallman12q (talk) 22:28, 26 January 2010 (UTC)
- http://ntrs.nasa.gov/search.jsp?Ne=26&N=265+276&Ns=HarvestDate%7C1&as=false No results containing your search criteria were found.
Rich Farmbrough, 11:53 24 March 2012 (GMT).
- Same result for me. -- RE rillke questions? 18:22, 4 June 2012 (UTC)
- I think you can assess, whether the files you intend to upload are in scope and whether they are suitable for Commons. Simply start uploading, if you want them here on Commons. -- RE rillke questions? 18:22, 4 June 2012 (UTC)
-
- NTRS has been temporarily suspended(InformationWeek). Smallman12q (talk) 02:01, 26 March 2013 (UTC)
[edit] Opinions
Assigned to | Progress | Bot name |
---|---|---|
[edit] Virtual Manuscript Library of Switzerland
Scans of manuscripts from the Virtual Manuscript Library of Switzerland. At this date, there are 482 manuscripts from 20 different libraries: http://www.e-codices.unifr.ch/en
Usual copyfraud restrictions included... :(
[edit] Opinions
http://www.e-codices.unifr.ch/en/list/all/ is a nice list. If they have some logic on their website, automatic scraping of their site will be noticed. So don't go too fast ;-) Multichill (talk) 21:10, 15 January 2010 (UTC)
Assigned to | Progress | Bot name | Category |
---|---|---|---|
[edit] Mineral pictures of Leon Hupperichs on mineralienatlas
Hello, I need help for uploading all pictures of Leon Hupperichs on Mineralienatlas:. His user page on mineralienatlas is here (435 pictures on 49 pages).
The picture description page should be the same as in the example File:Ravatite-MA1296598364.jpg. User category is Category:Files by Leon Hupperichs. Greetings -- Ra'ike T C 12:16, 4 March 2011 (UTC) P.S.: The other approved pictures of Leon Hupperichs on mindat will be loaded by User:Reinhard Kraasch, because he has all pictures loaded from mindat, when he worked for that old request and he is informed yet for the new one.
[edit] Opinions
Assigned to | Progress | Bot name | Category |
---|---|---|---|
[edit] Dilma Rousseff
All pictures coming from Dilma Rousseff--Euroman3 (talk) 15:42, 6 October 2011 (UTC)
[edit] Opinions
Assigned to | Progress | Bot name | Category |
---|---|---|---|
[edit] Batch uploads in progress
[edit] Rijksdienst voor het Cultureel Erfgoed
- Source to upload from: image bank RCE in Europeana
- Description: 550.000 images from Monuments (buildings) in the Netherlands (of which 3000 in other countries). 50-80% is probably a Rijksmonument. Around 30% is identified. Another part could be identified based on address information
- license: CC-BY-SA-3.0-NL, see here. 1200x1200px release in OTRS 2012121010014322.
- Templates There is a template {{RCE-license)) and a template for linking to the database {{RCE-source}}
- More information:
- User:Basvb/Test some thoughts on how the images could be processed after uploading. (we need to find their Rijksmonument identifiers.
- Commons:Rijksdienst Cultureel Erfgoed
[edit] Opinions
Question: Did I understand it correct: only images up to max. 800x800 px are unter CC-BY-SA-3.0-NL? So we can use only up to this dimensions? --Slick (talk) 11:48, 1 December 2012 (UTC)
- We are still figuring that out because it's unclear, it seems that all sizes are free, but if you want to download images over 800x800 pixels there is a problem with downloading costs. The site states that images up to 800x800 are available under a free license and that all images are free (so it states both after eachother). For 800x800 it states that they can be downloaded freely, for other sizes it does not state this. Basvb (talk) 11:01, 2 December 2012 (UTC)
If you like to download the full size, just look at the html code and analyse the requests do by Flash-Viewer:
- Step 1) Find the numeric picture id, i.E. in the URL: http://beeldbank.cultureelerfgoed.nl/20312817 -> ID=20312817
- Step 2) get the URL: http://beeldbank.cultureelerfgoed.nl/index.php?option=com_memorixbeeld&view=record&format=topviewxml&tstart=0&id=<ID> you will get a XML output [17]. The values of interest are filepath and the layer with the scalefactor=1:
... <filepath>39abc504-df68-c0ad-0c2b-b33296769b30.tjp</filepath> ... <layer no="5" starttile="45" cols="9" rows="12" scalefactor="1" width="2075" height="2880"/> ...
- Step 3) Read the layer line. Now you know that the picture is split in 9 cols, 12 rows and the starttile is 45. You now just download all tiles started by 45 up to 45+(cols*rows) and join them together by cols and rows. To get a single tile use: http://images.memorix.nl/rce/getpic?<FILEPATH>&<TILENUMBER>, i.E. http://images.memorix.nl/rce/getpic?39abc504-df68-c0ad-0c2b-b33296769b30.tjp&102
All should be very easy do this by a script. To check you joined images is fine, match it with the given width and height.
- We didn't know the Tile/col procedure but we did know easy ways to download 1600x1600px files (just change the links). Permissions are the problem there, we are trying to clear that up but as it seems now we will only get permission to download the 800x800 (or maybe 1200x1200px) files. Basvb (talk) 12:54, 6 December 2012 (UTC)
Some notes from me:
- It's pretty straightforward to query the api. We have priref 20000000 - larger number [18]
- We have json output: http://cultureelerfgoed.adlibsoft.com/harvest/wwwopac.ashx?database=images&search=priref=20310001&output=json
- Fields: made a start at Template:RCE data ingestion layout. Far from complete
Some json used to play around:
{"adlibJSON": {"recordList": {"record": [ {"@attributes": {"priref":"80000109","created":"2011-04-05T19:07:04","modification":"2011-04-05T21:47:18","selected":"False"}, "Description": [ {"description": ["Schildering op de schoorsteenboezem in de Renzumaborg in Uithuizermeeden.\u000d\u000a- begane grond, linker achterkamer: landschap."] } ], "Monument": [ {"monument.complex_number":["515612"], "monument.geographical_keyword":[""], "monument.house_number":["3"], "monument.name":["Rensumaborg"], "monument.number":["21320"], "monument.number.x_coordinates":["6.71402490110"], "monument.number.y_coordinates":["53.41522222070"], "monument.place":["Uithuizermeeden"], "monument.province":["Groningen"], "monument.record_number":["279499"], "monument.street":["Rensumalaan"], "monument.type":[""], "monument.zipcode":["9982 BH"] } ], "object_number":["100109"], "priref":["80000109"], "Reproduction": [ {"reproduction.reference": ["d6071e44-eb0a-4bb0-f345-d0a311489ae6"] } ] } ] }, "diagnostic":{"hits":"1","xmltype":"Grouped","first_item":"1","search":"priref Equals 80000109","sort":"","limit":"1","hits_on_display":"1","response_time":"0","xml_creation_time":"15,6229","link_resolve_time":"15,6229","dbname":"collect","dsname":"","cgistring":"images"}}} {"adlibJSON": {"recordList": {"record": [{"@attributes":{"priref":"20000001","created":"2009-04-19T11:05:45","modification":"2012-10-12T15:25:58","selected":"False"}, "collection":["Fotocollectie"], "Content_subject":[{"content.subject":["Grachtenpand"]}], "creative_commons":[{"value":["RCE","CC-BY-SA","CC-BY-SA"]}], "Description":[{"description":["Exterieur, overzicht voorgevel pand Vrouwenverband"]}], "Monument": [ {"monument.complex_number":["518301"], "monument.geographical_keyword":[""], "monument.house_number":["15"], "monument.name":["Vrouwenverband"], "monument.number":["518303"], "monument.number.x_coordinates":["4.89397111487"], "monument.number.y_coordinates":["52.36897955310"], "monument.place":["Amsterdam"], "monument.province":["Noord-Holland"], "monument.record_number":["417272"], "monument.street":["Turfdraagsterpad"], "monument.type":[""], "monument.zipcode":["1012 XT"] } ], "object_number":["321.954"], "priref":["20000001"], "Production": [ {"creator": [ {"value":["Dukker, G.J."]} ], "creator.role":["Fotograaf"] } ], "Production_date":[{"production.date.start":["1998-07"]}], "Reproduction":[{"reproduction.reference":["d99c8594-4a8c-acf9-b498-6f3a0a4e5f4b"]}], "Rights":[{"rights.notes":["http:\/\/creativecommons.org\/licenses\/by-sa\/3.0\/"]}], "Technique":[{"technique":["zwart wit negatief"]}]}]},
Multichill (talk) 23:10, 16 December 2012 (UTC)
- User:Basvb/Current RCE images - List of images from the database not uploaded by the bot (afterwards have to be watched for duplicates.)
[edit] First test running
Created a lot of templates:
- {{RCE data ingestion layout}} does all the hard work
- {{Netherlands location Dutch}} to convert Dutch location names to the category names here: in use for provinces and ca 30 unique cities
- {{Possible Rijksmonument}} this might be a Rijksmonument
- {{Object location RD}} - We got a lot of coordinates, but in a different system.
- {{RCE-author}} - to get pretty creator templates
- {{RCE-subject}} - to convert Dutch topics into categories here
Looping over the images from 20000000. Only uploading images which have Rights_rights.notes==http://creativecommons.org/licenses/by-sa/3.0/ , data gets flattened into key values from json. Some fields occur more than once (see {{RCE data ingestion layout}}). Some of the fields available for normal users are not available in the api (for example municipality). What do do:
- Add more cases to {{Netherlands location Dutch}} based on discovered errors, or refine it to make use of the province information to determine the right location
- Make a good system to convert {{Possible Rijksmonument}} into {{Rijksmonument}} (maybe like {{Check categories}}?)
- Convert {{Object location RD}} to {{Object location}}
- Expand {{RCE-author}} and create/update creator templates
- Expand {{RCE-subject}} based on [19]
- Replace {{RCE-author}} and {{RCE-subject}} when ready (~/bin/rce-to-subst.sh)
- Manage to whole categorization effort
- Work on possibilities to identify images (semi-)automatic based on address or postal code
Multichill (talk) 22:17, 23 December 2012 (UTC)
Assigned to | Progress | Bot name | Category |
---|---|---|---|
Multichill, Basvb | Testing/Uploading | BotMultichill |
[edit] Archives of American Art - Federal Art Project
285 images are being uploaded as part of the Archives of American Art partnership with Wikimedia. All of these image are federal works, as part of the Federal Art Project. Missvain (talk) 04:25, 26 September 2011 (UTC)
Details about this collection are available at: [20]
One test upload is at File:Archives of American Art - Job Goodman - 2126.jpg and we welcome feedback on how the templates and categories are done. (most of the templates modified from the NARA templates) The upload is being done with a pywikipedia custom script, and is small and slow enough that it probably doesn't need a bot flag at this time. If we do more, larger batches, I (User:Aude) can apply for a bot flag. AAA uploader (talk) 04:35, 26 September 2011 (UTC)
[edit] Opinions
- Looks good. Please use {{Size}}, {{Other date}} and {{Technique}}, and wrap the notes in an {{lang|en}} or {{en}}.
- Maybe we could create a External link template to build the source link ? URLs have a nasty tendance to change over time.
- We also need to extend {{Original caption}} (did not know about that one, thanks! :-) to provide the language of the caption.
- (I have some Python code lying around for parsing Size and Date taht I can sens you if you want − I really need to put those on the SVN >_<).
- Jean-Fred (talk) 12:04, 26 September 2011 (UTC)
-
- Thank you for these suggestions! I've updated the code to include the size and other date / isodate templates. I need to see what Sarah says about technique. I'm not sure how to classify "photographic print". I've also wrapped things in the {{en}} template.
- For external links, these are ugly with no simple id parameter that I know of, but rather include some combination of the title + id. Anyway, I've linked the url to the id, like how NARA does it. Not perfect.
- Please let me know if you or anyone has additional feedback. Cheers. Aude (talk) 03:30, 1 October 2011 (UTC)
- I'm proceeding with the uploads now. I've tweaked how the templates are done, labelling the photographer as "Photographer" and not "Artist" since the subject of the photographs are artists and it may cause confusion, and adjusted how the description and sources are done, per feedback from Archives of American Art staff. It's a smallish number of images, so if anything needs to be tweaked post-upload, that's definitely doable. Cheers. Aude (talk) 03:27, 6 October 2011 (UTC)
Assigned to | Progress | Bot name | Category |
---|---|---|---|
Aude | trial | AAA uploader |
[edit] US National Archives
I plan to use a bot to uploads images from the US National Archives' digital files. I currently have access to a cache of over 120,000 TIFF master files which are ready for upload. The bot is a custom pywikipediabot script written by Multichill (code) and it relies on slakr's toolserver tool to translate NARA metadata into Commons upload code. It will upload images using the custom {{NARA-image-full}}. Each page will be uploaded with that template filled out with the imported NARA metadata, plus {{Uncategorized-NARA}} to facilitate the categorization of these files. Dominic (talk) 19:15, 20 July 2011 (UTC)
[edit] Opinions
Moved form Commons:Bots/Requests/US National Archives bot I wrote a bot to do the uploads. I added the link to the source. Multichill (talk) 19:54, 19 July 2011 (UTC)
Dates in titles |
---|
|
Comment For photographs, like 3 example uploads, I would suggest to look into a way to add more categories:
-
- Author category
- Date category
- Subject category
- Medium category (photographs, paintings, handwritten documents, etc.)
- etc.
- For other types of records other category types might be suitable. It is easier to add some of those categories before the upload. --Jarekt (talk) 01:43, 20 July 2011 (UTC)
- I'm not sure how we could do any of these in an automated way. Not all documents have subjects, and the ones that do do not map onto Commons categories anyway. The same is true of the medium and author fields. The dates also seem difficult. Some of the dates are ranges, some are exact days, just months, or just years. Dates can represent dates of creation, copyright, publication, or broadcast. I am hoping we will be able to organize a major community effort for categorizing these, as it will take humans. The one thing that we can do is categorize them hierarchically according to the National Archives catalog structure. For example, each of the Ansel Adams items would go in the a category for the "Ansel Adams Photographs of National Parks and Monuments, compiled 1941 - 1942, documenting the period ca. 1933 - 1942" series. Of course, some of the series are less descriptive than others, but it's a start. Dominic (talk) 03:00, 20 July 2011 (UTC)
-
-
-
-
- I think we should try 2 approaches. Add categories based on NARA catalog structure, We could make them hidden categories and encourage people to move images out of them, but this way we can group similar images together. I still think that we should try to match NARA authors with Commons creators and add appropriate categories. In my WGA upload all images have Creator template and matching author category. May be a way to accomplish that would be to create translation table there each NARA author is matched with a creator and category. Than your bot would read this table and use it to add proper templates and categories. Table can be easily added to the bot if it was implemented as external CSV file. We probably do not need to match every NARA author, since some might be quite obscure, but we should at least match all authors that already have creator template and authors with large number of records. Dominic, do you think it would be possible to put somewhere list of all authors of the files you are planning to upload and how many records are associated with them? I can try to see how many I can match. --Jarekt (talk) 16:01, 22 July 2011 (UTC)
-
-
-
-
-
-
- This is what I did. I made {{NARA-Author}} for all of the authors. Every author (or person listed as a "contributor", whether it's a photographer, artist, director, etc.) has an ID and a page in the catalog that links to the records they are associated with. That template creates a URL to these author records in the catalog. I am not sure if that helps or hinders the attempt to make categories for them, but maybe we can use the template in some way to add categories based on those unique IDs? I will note, though, that it's actually uncommon for authors to be listed at all. Most documents are created by uncredited federal workers, and others are grouped into series based on the author, but the author field in the record isn't actually used (cf. this series). The full list of author records could actually be extracted from the dataset, if anyone is brave enough to try. Dominic (talk) 16:18, 22 July 2011 (UTC)
-
-
-
-
-
-
-
-
- I did not noticed {{NARA-Author}} before. If it is added to all the images, that have author, than we can easily add creator templates and categories latter. BTW I did not see author records in NARA dataset or its description. --Jarekt (talk) 17:00, 22 July 2011 (UTC)
-
-
-
-
-
-
-
-
-
-
- I do not know if there are separate XML files for the person authority records, like there are for items. However, if an item has a contributor mentioned in its record, the contributor's ID is also there in a field in the item's data file. This is how I am able to upload the files with that information. Dominic (talk) 19:04, 22 July 2011 (UTC)
-
-
-
-
-
Change extensions to ".tif" |
---|
Remove "Item from " |
---|
|
As a test run, I have gone ahead and finished the Ansel Adams batch (220 files). [21] Dominic (talk) 04:45, 20 July 2011 (UTC)
MediaWiki:Stockphoto.js bug |
---|
|
End of move. Multichill (talk) 19:26, 20 July 2011 (UTC) I moved the discussion to here from Commons:Bots/Requests/US National Archives bot. We have two pages:
- Commons:Bots/Requests/US National Archives bot to discus if the operator is able to run a bot.
- This to discus the actual batch upload.
Why did I make this split? Because bot request take ages when we start discussing batch requests and a request gets closed when we actually want to provide more feedback. Can everyone please respect this? Multichill (talk) 19:26, 20 July 2011 (UTC)
Batch uploading |
---|
Overall this looks pretty good. There just a few things I would tweak a bit: (a) Personally I'd use the format "NARA number - <title>" instead of "<title> - NARA - number".
(b) The following file names could use further normalization:
(c) In file descriptions, such as NARA 512467, the date seems to get repeated, once in the title field and once in the date field. This is possibly due to the way the source presents it and gets parsed. For comparison, check: NARA 530898. (d) It would be helpful if something could be done about the categorization. Currently, e.g. NARA 530898 gets added into three NARA categories, but no topical one. Even just adding Category:Indians of North America (already present in the source) would be an improvement. Hope this helps. -- Docu at 06:14, 23 July 2011 (UTC)
|
More date comments |
---|
|
Comments on categorization |
---|
|
More date comments |
---|
def getDate(description): dateRe = re.compile('^\|Date=(.+)$', re.MULTILINE) dateMatch = dateRe.search(description) if dateMatch: dateText = dateMatch.group(1) else: dateText = "" return dateText.strip() def fixDescription(description, dateText): description = description.replace(u"{{int:license}}", u"{{int:license-header}}") titleRe = re.compile('^\|Title=(.+)$', re.MULTILINE) titleMatch = titleRe.search(description) titleText = titleMatch.group(1) if titleText[-len(dateText):] == dateText: description = description.replace(titleText, titleText[:(-len(dateText)-2)]) return description def getTitle(fileId, description, dateText): titleRe = re.compile('^\|Title=(.+)$', re.MULTILINE) titleMatch = titleRe.search(description) titleText = titleMatch.group(1) titleText = cleanUpTitle(titleText) suffix = "" if len(dateText)<11 and len(dateText)>0: suffix = " ("+dateText+")" if len(titleText+suffix)>120: titleText = titleText[0 : 120-len(suffix)] if titleText.count('"')%2<>0: titleText = titleText[:-3]+'.."' title = u'NARA %s: %s.tif' % (fileId, titleText+suffix) return title.replace(u" ", u"_") def cleanUpTitle(title): ''' Clean up the title of a potential mediawiki page. Otherwise the title of the page might not be allowed by the software. ''' title = title.strip() title = re.sub(u"[<{\\[]", u"(", title) title = re.sub(u"[>}\\]]", u")", title) title = re.sub(u"[ _]?\\(!\\)", u"", title) title = re.sub(u",:[ _]", u", ", title) title = re.sub(u"[;:][ _]", u", ", title) title = re.sub(u"[\t\n ]+", u" ", title) title = re.sub(u"[\r\n ]+", u" ", title) title = re.sub(u"[\n]+", u"", title) title = re.sub(u"[?!]([.\"]|$)", u"\\1", title) title = re.sub(u"[&#%?!]", u"^", title) title = re.sub(u"[;]", u",", title) title = re.sub(u"[/+\\\\:]", u"-", title) title = re.sub(u"--+", u"-", title) title = re.sub(u",,+", u",", title) title = re.sub(u"[-,^]([.]|$)", u"\\1", title) return title
description = getDescription(fileId) dateText = getDate(description) description = fixDescription(description, dateText)
title = getTitle(fileId, description, dateText) |
i18n |
---|
|
Teofilo's block request |
---|
|
Records in TIFF format |
---|
Records in TIFF formatThere are a series of textual records included in the trial upload. Commons is a multimedia database and as such doesn't host primarily text documents. There is a sample on the right. For more see: Category:US National Archives series: Enrollment Cards, compiled 1898 - 1914. Which percentage of the 120,000 tiffs do they represent? Is there a planned use for them on a WikiMedia project? -- Docu at 06:24, 26 July 2011 (UTC) (edited)
|
[edit] ARC number
Another solution for ARC could be: store them all in a separate template page so that series ARC=408
would give "Record group 79: Records of the National Park Service, 1785 - 2006 (ARC identifier: 408)". This would make page description more concise and would also allow to add translations of record group and series names by editing only one page.--Zolo (talk) 01:51, 28 July 2011 (UTC)
- We could even store more data than that in the template so that we would only need to provide the document ARC in the file description. This would not be as efficient but this would minimize duplicate info and would provide cleaner, potentially reusable data. Additionnally, this would make file description even easier by hiding away info that in most cases should not be changed by users. I have created a toy template in {{ARC/sandbox2}}.
{{ARC/sandbox2|306514}}
gives
![]() |
This media is available in the holdings of the National Archives and Records Administration, cataloged under the ARC Identifier (National Archives Identifier) 306514.
This tag does not indicate the copyright status of the attached work. A normal copyright tag is still required. See Commons:Licensing for more information.
|
- Record group: Committee Papers, compiled 1806 - 2000 (ARC identifier: 306513)
- Series: 128: Records of Joint Committees of Congress, 1789 - 2004 (ARC identifier: 457)
This means that {{ARC/data}} will need to be quite large. To make it smaller, it could also be used for record groups and series only, and not for individual documents. But it would make it less useful.--Zolo (talk) 03:29, 29 July 2011 (UTC)
- ParserFunctions are a bit beyond me, but could that possibly work with tens of thousands of records? Dominic (talk) 12:54, 29 July 2011 (UTC)
[edit] Template
Batch uploading |
---|
|
I think it would be useful for our users to have on each page a link to the relevant "Scope & Content" page of the photographic series the picture belongs to on the ARC website. These "Scope & Contents" pages contain valuable information on the origins of the pictures. As they are 2 clicks away from Wikimedia Commons, among a number of not-so-useful links, I think most users won't find them if we don't provide a direct link (We might also copy them to wikisource and link to the corresponding wikisource pages. We might copy them here on Commons if we get community approval for using gallery pages for that purpose). So for example, on this file it would be good to have the following : "Series: Signal Corps Photographs of American Military Activity, compiled 1754 - 1954 (Scope & Content)". I think "Scope & Content" is more important, for a first reading, than "Details". The "record ID" and "Source" fields should be merged and called "Source". Teofilo (talk) 23:21, 1 August 2011 (UTC).- I changed my mind. I feel more like removing all the Record group, Series, NAIL Control Number information. The {{NARA-image}} template with its single arcweb link is enough. The users who want to know more can click on that single link which is an entrance to all the extra information. The "Record ID" field is not useful save the Nara-image template. Teofilo (talk) 08:44, 2 August 2011 (UTC)
- I doubt you'll find anyone agreeing with that point of view. And note that the ARC ID is more just the identifier that refers to the catalog record and allows us to make predictable URLs. The series and record group are actually descriptive metadata assigned by the archives that relate to the document creator and/or subject. Dominic (talk) 04:50, 5 August 2011 (UTC)
- I am afraid you are swapping the parts. Until now hardly any upload from the NARA was made by including those extravagant and noisy data which are not useful to a majority of users. You will find hardly anyone among those who uploaded contents from NARA in the past who agrees with you. For example File:USS Intrepid (CV-11) - Nov 44 a.jpg. That these extra data are not useful is common sense. For example, let's see how the Bundesarchiv pictures are documented. In the case of File:Bundesarchiv Bild 101I-731-0388-38, Frankreich, nach der Invasion, Infanteristen.jpg, all the extra information such as
- I doubt you'll find anyone agreeing with that point of view. And note that the ARC ID is more just the identifier that refers to the catalog record and allows us to make predictable URLs. The series and record group are actually descriptive metadata assigned by the archives that relate to the document creator and/or subject. Dominic (talk) 04:50, 5 August 2011 (UTC)
- I changed my mind. I feel more like removing all the Record group, Series, NAIL Control Number information. The {{NARA-image}} template with its single arcweb link is enough. The users who want to know more can click on that single link which is an entrance to all the extra information. The "Record ID" field is not useful save the Nara-image template. Teofilo (talk) 08:44, 2 August 2011 (UTC)
-
-
-
-
- Inventory: Bild 101 I - Propagandakompanien der Wehrmacht - Heer und Luftwaffe
- Classification: Sachklassifikation/E {Zweiter Weltkrieg 1939-1945}/Ee {Kriegsschauplätze und Feldzüge}/Ee 300 {Westfeldzug}/Ee 350 / 360 / 370 / 380 {Frankreich*}/Ee 380 {Frankreich nach der Invasion (ab 6.6.1944)}/Ee 381 {Infanterie} Sachklassifikation/E {Zweiter Weltkrieg 1939-1945}/Ed {Truppen- und Formationsgeschichte*}/Ed 100 / 200 {Heer*}/Ed 110 {Infanterie}
- was removed. Removing is the right thing to do. Please note also that the creator template was made collapsible because a lot of people found it too noisy. There is a wide support to the idea of keeping description pages streamlined and simple. Teofilo (talk) 09:24, 5 August 2011 (UTC)
-
-
-
- Each page contains 2 links to en:U.S. National Archives and Records Administration. I think this is one too many (or two too many if you count commons:National Archives and Records Administration). Couldn't we just get rid of the "Current location" field altogether? Isn't the {{NARA-image}} template sufficient to mean that the pictures are located there ? Teofilo (talk) 23:02, 1 August 2011 (UTC)
- NARA is a major US government agency with more than two dozen facilities. It's not a location. The location field is the record of where the physical document digitized on Commons is located. That the institution's name is linked more than once is because there are three separate templates used on the pages that are complete; it seems pretty trivial. Dominic (talk) 20:12, 2 August 2011 (UTC)
- Brainwashing the user by repeating three times the same message is an advertising technique amounting to using Wikimedia for a promotional campaign at the expense of usability. It overcrowds the template and makes the other information such as the author, date, or description fields proportionately less visible. The reason why the Artwork template contains both a "location" field and a "source" field is that we are dealing with photographs of paintings and photographs of sculptures. The "location" field is for the location of the painting/sculpture, while the source field is for the source of the photograph. For this reason, NARA uploads of paintings such as File:"Crocodile and Snake Fighting" - NARA - 558928.tif are wrong. The "location" field should be filled with "unknown", or with the name of the museum or of the private owner who owns the painting. Writing "National Archives and Records Administration, Still Picture Records Section, Special Media Archives Services Division (NWCS-S)" in the "current location" field of this painting is a mistake (for example compare with File:Serapis Louvre AO1027 profil.jpg, and count the number occurrences of the "Louvre" word there). For works that are just photographs, not photographs of paintings or photographs of sculptures, the "location" field should be removed. Teofilo (talk) 09:24, 5 August 2011 (UTC)
- These files are the records of a government agency, and the location field is the listing of the repository in which the records are held. That is not extraneous or unusual information. Your accusations of brainwashing and advertising are getting tiresome. The institution you are talking about is a public agency that holds public records; it is graciously making its high-res scans available to Commons with no strings attached. The "advertising" you are talking about is metadata added and maintained by Wikimedians because it is useful. Nothing of the sort has been demanded or even asked by the institution you are maligning. Dominic (talk) 16:13, 11 August 2011 (UTC)
- Brainwashing the user by repeating three times the same message is an advertising technique amounting to using Wikimedia for a promotional campaign at the expense of usability. It overcrowds the template and makes the other information such as the author, date, or description fields proportionately less visible. The reason why the Artwork template contains both a "location" field and a "source" field is that we are dealing with photographs of paintings and photographs of sculptures. The "location" field is for the location of the painting/sculpture, while the source field is for the source of the photograph. For this reason, NARA uploads of paintings such as File:"Crocodile and Snake Fighting" - NARA - 558928.tif are wrong. The "location" field should be filled with "unknown", or with the name of the museum or of the private owner who owns the painting. Writing "National Archives and Records Administration, Still Picture Records Section, Special Media Archives Services Division (NWCS-S)" in the "current location" field of this painting is a mistake (for example compare with File:Serapis Louvre AO1027 profil.jpg, and count the number occurrences of the "Louvre" word there). For works that are just photographs, not photographs of paintings or photographs of sculptures, the "location" field should be removed. Teofilo (talk) 09:24, 5 August 2011 (UTC)
- NARA is a major US government agency with more than two dozen facilities. It's not a location. The location field is the record of where the physical document digitized on Commons is located. That the institution's name is linked more than once is because there are three separate templates used on the pages that are complete; it seems pretty trivial. Dominic (talk) 20:12, 2 August 2011 (UTC)
line spacing bug |
---|
Author line a little cumbersome. This yet another reason, even if a weak one, to remove the |
[edit] File name maximum length and file name cutting format
The following is copied from Commons:Administrators' noticeboard/Blocks and protections#User:US National Archives bot
I think the bot should be blocked until the file-name issue is solved. See the "File:Combat memorable..." entry in Commons:National Archives and Records Administration/Error reporting or compare this NARA upload (name cut after "Gene") with previously uploaded picture with full name. Look at this list of 50 uploaded files where most of the file names are cut. It is not realistic to correct all these file name errors afterwards one by one, tagging each picture with {{Rename}}. The upload software bug must be solved so that the files are uploaded with the full name, without cut. Cut names not only produce an impression of bad quality upon users, it also creates a lot of potential wrong keyword searches in search engines. Someone looking for a "gene" (a biological system) should not find the "Alphonse Juin, Commanding Gene" picture in his search results. Teofilo (talk) 22:23, 30 July 2011 (UTC)
- Er, you want it blocked? I can just turn it off, you know. I'm not exactly sure what the issue is, though. The titles get cut off when they reach the length limit. "The upload software bug must be solved so that the files are uploaded with the full name, without cut" is an impossible solution. This doesn't seem like a huge problem, certainly not one that's more important than getting the content on Commons. Most end users are going to be viewing the images on the projects, so the idea that these titles somehow negatively affect users because they are stylistically displeasing is a little baffling to me. Dominic (talk) 23:29, 30 July 2011 (UTC)
- Oh? How come there no polite enquiry from Teofilo on either Commons:Batch uploading/US National Archives or User talk:Dominic? Oh wait... Jean-Fred (talk) 23:37, 30 July 2011 (UTC)For Jean-Frédéric, here is the Commons:National Archives and Records Administration/Error reporting link again, where the problem was debated between Dominic and me below the "File:Combat memorable..." entry. Teofilo (talk) 11:11, 31 July 2011 (UTC)
- Actually, I posted a fix a couple of days ago for the problem Teofilo mentions. Oddly it hasn't been applied yet. -- Docu at 05:36, 31 July 2011 (UTC)
- Thank you for doing so. I was not aware that you had prepared a fix. Teofilo (talk) 11:11, 31 July 2011 (UTC)
- I thought that that was about the dates appended to the end of titles. I don't see where you mentioned the issue Teofilo is concerned about anywhere on the page. Dominic (talk) 19:10, 31 July 2011 (UTC)
- Who is(are) the person(s) in charge of the upload software ? According to en:Wikipedia:Naming_conventions_(technical_restrictions)#Title_length, "Titles must be less than 256 bytes long when encoded in UTF-8.". Measured with http://bytesizematters.com/ , File:US Navy 050419-N-5313A-049 A U.S. Marine Corps AV-8B Harrier launches from the flight deck of the amphibious assault ship USS Kearsarge (LHD 3) during flight operations in the Mediterranean Sea.jpg is 202 bytes long and File:Combat memorable donne le 22, 7re 1779, entre le Captaine Pearson commandant le Serapis et Paul Jones commandant le Bonh - NARA - 532895.tif is only 145 bytes long. So it looks possible to add 256-145=111 more characters into NARA uploads' file names. The full title "Combat memorable donne le 22, 7re 1779, entre le Captaine Pearson commandant le Serapis et Paul Jones commandant le Bonhomme Richard et son escadre, 07/22/1779" being 159 characters long, it should be OK. With 249 characters, "Pvt. Jonathan Hoag,...of a chemical battalion, is awarded the Croix de Guerre by General Alphonse Juin, Commanding General of the F.E.C., for courage shown in treatingwounded, even though he, himself, was wounded. Pozzuoli area, Italy.", 03/21/1944" is perhaps only one or two characters longer than the 256 limit after adding "File:" and ".tif". Also it could be decided to cut whole words instead of cutting in the middle of the words, and to use (…) at the location where the cut is performed, like I did for this upload of mine. Perhaps it would be best to always keep the date at the end of the title, and to cut the words located before the date. Teofilo (talk) 12:40, 1 August 2011 (UTC)
- I am running a script that was written by Multichill; he's not in charge of the bot's actions, but I am not a programmer, so I can't easily make changes without him. I was not originally aware that the character limit was that high. I had thought that the limit was being imposed by the upload form, not by the bot's script, which is why I was saying it wasn't fixable. I see now that we can allow even longer titles, but I am not sure if we should. This should be discussed at Commons:Batch uploading/US National Archives, as the names already seem rather long and unwieldy to me. Your suggestion to not have it cut off titles mid-word, though, is a good one, I agree. In any case, I don't think this is a dealbreaker. The full titles are all contained in the template's "title" parameter, so we wouldn't have to go back and rename anything manually anyway, since a bot can extend the names using that data. I think it is more important to get the files actually uploaded at this point. Dominic (talk) 14:27, 1 August 2011 (UTC)
- Who is(are) the person(s) in charge of the upload software ? According to en:Wikipedia:Naming_conventions_(technical_restrictions)#Title_length, "Titles must be less than 256 bytes long when encoded in UTF-8.". Measured with http://bytesizematters.com/ , File:US Navy 050419-N-5313A-049 A U.S. Marine Corps AV-8B Harrier launches from the flight deck of the amphibious assault ship USS Kearsarge (LHD 3) during flight operations in the Mediterranean Sea.jpg is 202 bytes long and File:Combat memorable donne le 22, 7re 1779, entre le Captaine Pearson commandant le Serapis et Paul Jones commandant le Bonh - NARA - 532895.tif is only 145 bytes long. So it looks possible to add 256-145=111 more characters into NARA uploads' file names. The full title "Combat memorable donne le 22, 7re 1779, entre le Captaine Pearson commandant le Serapis et Paul Jones commandant le Bonhomme Richard et son escadre, 07/22/1779" being 159 characters long, it should be OK. With 249 characters, "Pvt. Jonathan Hoag,...of a chemical battalion, is awarded the Croix de Guerre by General Alphonse Juin, Commanding General of the F.E.C., for courage shown in treatingwounded, even though he, himself, was wounded. Pozzuoli area, Italy.", 03/21/1944" is perhaps only one or two characters longer than the 256 limit after adding "File:" and ".tif". Also it could be decided to cut whole words instead of cutting in the middle of the words, and to use (…) at the location where the cut is performed, like I did for this upload of mine. Perhaps it would be best to always keep the date at the end of the title, and to cut the words located before the date. Teofilo (talk) 12:40, 1 August 2011 (UTC)
- Actually, I posted a fix a couple of days ago for the problem Teofilo mentions. Oddly it hasn't been applied yet. -- Docu at 05:36, 31 July 2011 (UTC)
End of copy from Commons:Administrators' noticeboard/Blocks and protections#User:US National Archives bot
-
-
-
-
-
- Do you have a deadline after which the files won't be available any longer ? File renaming is an activity which consumes a lot of resources and which is generally frown upon unless there is a good reason to do so. I am afraid the massive file renaming operation will be refused. When there is a problem in a car factory you stop the production line until the problem is solved. You don't sell the cars first and recall them a year later to change the defective part. The latter is more expensive. I think we need more opinions from people with bot software writing experience and help from people who would be willing to actually modify the script or write the file renaming bot's script. I am going to copy the present talk on Commons:Batch uploading/US National Archives. Teofilo (talk) 17:00, 1 August 2011 (UTC)
- Well, I am only here for a couple more weeks. The files are not available on the Internet, but on hard drives here in the office. So it wouldn't be wrong to say there is a deadline of sorts. I am not sure the analogy to the factory is appropriate, as we're not recalling anything, just changing a name on a wiki. I'm not even sure if this is important enough that we would want to go back and change past uploads, even if we do change the convention going forward. They are not erroneous, just truncated. Dominic (talk) 17:21, 1 August 2011 (UTC)
- Do you have a deadline after which the files won't be available any longer ? File renaming is an activity which consumes a lot of resources and which is generally frown upon unless there is a good reason to do so. I am afraid the massive file renaming operation will be refused. When there is a problem in a car factory you stop the production line until the problem is solved. You don't sell the cars first and recall them a year later to change the defective part. The latter is more expensive. I think we need more opinions from people with bot software writing experience and help from people who would be willing to actually modify the script or write the file renaming bot's script. I am going to copy the present talk on Commons:Batch uploading/US National Archives. Teofilo (talk) 17:00, 1 August 2011 (UTC)
-
-
-
-
For those who don't want to read all that text, the question is whether we want to make use of the full 250 characters we are allowed for the file names, which can be quite long, or whether we want to truncate it at a shorter length. The script is currently truncating at 120 characters, which isn't exactly short either, but does cause a lot of titles to get cut off. Dominic (talk) 17:21, 1 August 2011 (UTC)
- I agree the file name issue should be fixed before next batch of uploads and I think we should keeping titles short. Lets concentrate on the issue of how to do it. Dominic, Is this still the code you are running? If so than I assume that the issue is with "if len(titleText)>120: titleText = titleText[0 : 120]" line. Docu, did you say you posted a fix somewhere? If so than where? I think we can solve this issue in the timely manner as not to slow down Dominic too much. --Jarekt (talk) 17:43, 1 August 2011 (UTC)
- Yes, that is the code. It seems easy enough to change, except this is more a question of style than a bug in the code, so I'm not sure what chance, if any, to apply. (I think Docu is referring to the date issue, not this one, but I am not sure.) Dominic (talk) 17:59, 1 August 2011 (UTC)
- The date issue appears on the NARA website too. It is not a simple upload bot script problem, although a script could help remove the extra date. I don't think there might be so many files with the date duplicate issue, so I guess it won't be so bad if we leave that issue unsolved. Teofilo (talk) 18:19, 1 August 2011 (UTC)
- I have inquired, and these are actually not errors so much as limitations in the NARA catalog software. That "coverage dates" field, which is used to refer to the dates depicted in the document's subject rather than the document's creation, can only take ranges. When you put in a single day, it still makes it into a range. This isn't something they are going to fix. Dominic (talk) 18:35, 1 August 2011 (UTC)
-
- A few more ideas:
- 1) Unwieldy ? Of course they are but we are in a situation where we must choose between the less unwieldy of two unwieldy possibilities. The possibility with extra-long names, and the possibility with names cut in an automatic fashion which creates wordings that are at times perfectly meaningless. It should not be forgotten that for a number of users English is a foreign language and it is less obvious when you don't master the language to understand that a sentence was cut and you should not even try to read a meaning. Also we should try as much as possible not to misrepresent the quality of the NARA's work. The NARA's work might have a number of shortcomings, but in any case the NARA does not produce botched file names.
- 2) While the files with a cut name are, in my opinion, a problem, there is no reason to prevent the bot from uploading all the other files with a short name. One possibility would be to quickly modify the bot script so that the files with long names are avoided for the time being, and to upload them later, after we have decided what to do with them.
- 3) One option would be to decide the new shorter names manually, on a case by case basis. We would have a bot write all the long file names in the left column of a table, and then we would request Wikimedians to write the shorter names with (…) in the right column. Then when all shorter names are available, the upload bot would be able to pick up the shortened files names from the table. Teofilo (talk) 17:48, 1 August 2011 (UTC)
- Ensuring that we don't cut off names mid-word will help, as would adding "..." to the end when cut off will help. Note that even at 250 characters, some titles will be cut off. I am not sure (especially judging by Jarekt's reply) that there is agreement to do that, though. Dominic (talk) 17:59, 1 August 2011 (UTC)
-
- I have inquired, and these are actually not errors so much as limitations in the NARA catalog software. That "coverage dates" field, which is used to refer to the dates depicted in the document's subject rather than the document's creation, can only take ranges. When you put in a single day, it still makes it into a range. This isn't something they are going to fix. Dominic (talk) 18:35, 1 August 2011 (UTC)
- The date issue appears on the NARA website too. It is not a simple upload bot script problem, although a script could help remove the extra date. I don't think there might be so many files with the date duplicate issue, so I guess it won't be so bad if we leave that issue unsolved. Teofilo (talk) 18:19, 1 August 2011 (UTC)
- Yes, that is the code. It seems easy enough to change, except this is more a question of style than a bug in the code, so I'm not sure what chance, if any, to apply. (I think Docu is referring to the date issue, not this one, but I am not sure.) Dominic (talk) 17:59, 1 August 2011 (UTC)
-
-
-
-
-
-
-
- I see 2 possible solutions:
- Automatic: if filename is longer than 120 characters than look for periods, semicolons or commas and trim there. If string still longer than 120 than trim on the word end. Add ... in last case and may be in case of the trimming at a comma.
- Manual: if filename is longer than 120 characters than (as Teofilo suggested) skip it for time being, while writing its ID and title to some log file. Than from time to time read the log file in Excel (or some other spreadsheet) and manually trim the title. Or post the file somewhere, so others can help (Teofilo?). Than alter your bot to allow upload of those specific files with provided filenames. I should be able to help with this part, if you need help.
- The first solution is much less work. So that would be my preference. --Jarekt (talk) 18:49, 1 August 2011 (UTC)
- 1) If you are patient enough to read 120 characters, why aren't you patient enough to read 256 ? Both the NARA website designers and the Library of Congress website designers have felt normal to require from their users to read titles longer than that. For example the html < title > attribute of http://www.loc.gov/pictures/item/2004670247/ is 330 characters long. What is wrong with that ? If the Library of Congress asked you for advice, what advice would you give ? Also, the fact that a title is displayed on your browser page does not mean you have to read the whole of it. If you are tired with reading, you can stop reading and look at some other area of the page.
- 2) I tagged one the the NARA uploads with {{rename}} diff. The file was renamed today. Here is the result and I think it is much better (although I forgot to include the date). And I don't feel it is too long. If you remove the last part, the dramatic - tragedy - effect meant by the creator is lost. Sometimes titles are pieces of litterature, meant to create emotions. Many of these pictures were used for propaganda. The caption was perhaps as important as the scene represented. Teofilo (talk) 22:18, 1 August 2011 (UTC)
- 3) For people who are unhappy with file names longer than 140 characters (while being shorter than 256 characters) it may be possible to create a Javascript (or gadget, or fullfledged mediawiki extension) which automatically cuts the name that is displayed onscreen (with the possibility to read the longer version in a mouseover). Teofilo (talk) 23:41, 1 August 2011 (UTC)
- I see 2 possible solutions:
-
-
-
-
-
-
I think you are looking at this entirely the wrong way. Relatively few people are looking at the images on Commons itself, and the ones that are are usually the editors that are maintaining them, not the people using the images. No one is really concerned about a long title looking a little unsightly at the top of a description page. We do, however, have to think about how this is going to be used on the projects, and huge file names make article text hard to read in the edit view and make Wikisource index pages incredibly odd-looking. And for what? You're writing as if the file name, which is clearly marked off with a "File:" and a ".tif" and has other data in it, is the title itself. It may be true that titles are pieces of literature and that they are important, but no one wants to remove the title. There is a title field in the metadata for that, quite apart from the file name. Dominic (talk) 00:15, 2 August 2011 (UTC)
- The view that Commons is for Wikipedia is not very popular here. A lot of people insist that Commons should be viewed as a media repository independently of its value for Wikipedia. The file name is aslo important as being the caption you read when your mouse hovers on a file name below a thumbnail in a category page. Teofilo (talk) 00:39, 2 August 2011 (UTC)
┌─────────────────────────────────┘
For me filename needs to meet 2 requirements be meaningful and be unique. The second part (<20 characters) provides uniqueness, and the first part is trying to be be meaningful and I think 100 characters is plenty to accomplish that. I find long names to be distracting and award, and wikitext using them hard to read. However raising the maximum length of the filename would be by far the simplest way to "fix" the issue. --Jarekt (talk) 03:38, 2 August 2011 (UTC)
- In my view, filenames needs to be authentic. If Shakespeare called his play "Romeo & Juliet" you can't rename it "Richard & Julia" because you have a personal liking for these names. If some obscure Office of War Information bureaucrat during World War II decided to call a picture "Members of the 6888th Central Postal Directory Battalion take part in a parade ceremony in honor of Joan d'Arc at the marketplace where she was burned at the stake" you cannot change it. The only alternative would be to use a totally cryptic name, like 43-0194a.gif. I don't think there is a middle unauthentic term between a totally cryptic name and the full authentic name. The argument that the full name is written in the "title" field of the template anyway, fails to convince me, because putting an unauthentic name in a more prominent place than the authentic name remains an aggression of authenticity. The choosing of a long caption or name in association with a picture by some administration during World War II is a historical fact. Even if you find that fact distracting or ugly, you can't change it. By the same token, some picture happen to be ugly. But for authenticity's sake one should not retouch an ugly historical picture to make it look nicer. If a picture has an ugly title, you can't change it either. You can't retouch "Romeo & Juliet". Teofilo (talk) 09:17, 2 August 2011 (UTC)
- For this file, and this one key information, location and year, are cut. Teofilo (talk) 16:04, 2 August 2011 (UTC)
-
-
- It is quite clear by now what your opinion is, Teofilo. What we are looking for is other opinions to see if anyone actually agrees with you. Dominic (talk) 20:17, 2 August 2011 (UTC)
- The only absolute criteria for filenames are 1) uniqueness (easily done with the ARC) and 2) length is under the technical limit (easily done by truncation). All other considerations are cosmetic, as the full metadata is listed in the info template. The filename is just a key for the file database: it doesn't have to contain a perfect description of the image, most files at Commons don't. To be honest, we could call all images "NARA image - ARC 123456.tiff" and be done with it. So I don't think it matters where we chop the description. I'd lean towards shorter, as long filenames can be pain at Wikisource (we have the full name in the Page: namespace, for example), but that is a minor gripe. The metadata will always be in the info area, and only the ARC is required to uniquely identify the image. So, I'd say truncate at whatever is most convenient. Inductiveload (talk) 23:29, 2 August 2011 (UTC)
- It is quite clear by now what your opinion is, Teofilo. What we are looking for is other opinions to see if anyone actually agrees with you. Dominic (talk) 20:17, 2 August 2011 (UTC)
-
This file name cut removed the most important : Captain Harry Truman Teofilo (talk) 21:40, 3 August 2011 (UTC)
- Teofilo, You provided dozen of examples of trimed filenames. However to me the only issues with those is that they are too long. I agree with Inductiveload that "filename is just a key for the file database" and that descriptions can be found inside file descriptions. --Jarekt (talk) 02:39, 4 August 2011 (UTC)
- You wrote "I agree the file name issue should be fixed" above on this page on 1 August (diff). If you agree with Inductiveload that "filename is just a key for the file database", what is the issue which you want to fix ? Or have you changed your mind since 1 August ? Teofilo (talk) 12:58, 4 August 2011 (UTC)
- Note that truncated names now only terminate at the end of complete words and include a "..." when there is any truncation. Dominic (talk) 04:31, 4 August 2011 (UTC)
[edit] File matching tool
I think we need a developer for the development of a file matching tool. That tool would use an interface similar to that of Cat-a-lot, with the possibility to select two files from a gallery page. Then the tool would
- add the |Other version field in both files
- pick up the categories from the older file and add them into the newer file (and vice-versa) Teofilo (talk) 12:08, 5 August 2011 (UTC)
- This does not make sense to me. What gallery page? How will non-identical versions be detected by a bot? The eventual plan is to add JPG/DjVu versions of all these files by bot, so they will all have linked file in "Other versions" that will be usable on the projects at some point. Dominic (talk) 16:13, 11 August 2011 (UTC)
[edit] Author information retrieving bot
We need a bot to explore systematically all www.archives.gov pages similar to http://www.archives.gov/research/military/ww2/photos/ in order to retrieve author information. At present such author information is not provided by the upload bot. Perhaps it is simpler to to this separately with another bot. I think I am personally getting tired to add this information manually (for example, see this diff). Teofilo (talk) 12:08, 5 August 2011 (UTC)
- Those are not structured pages and I see no way for a bot to extract author information from them. There are some tasks that simply require a human. Dominic (talk) 15:43, 5 August 2011 (UTC)
- All captions from http://www.archives.gov/research/military/ww2/photos/ (example : "Danny Kaye, well known stage and screen star, entertains 4,000 5th Marine Div. occupation troops at Sasebo, Japan. The crude sign across the front of the stage says: `Officers keep out! Enlisted men's country.'" Pfc. H. J. Grimm, October 25, 1945. 127-N-138204) and similar pages should be extracted (by a bot or human) and put into the left column of a table. Then a bot should say if the file was uploaded on Commons or not, and if so, provide a link to the file uploaded on Commons, and say if the |author= is still void. Then humans could pickup the author name from the full caption. This would ensure that this is done in a systematic way, and that no chance was missed to find author names. Teofilo (talk) 15:30, 6 August 2011 (UTC)
- Actually a bot could compare the string of characters in the full caption at http://www.archives.gov/research/military/ww2/photos/ and the string of character in the |title= field on Commons. For example, comparing ["Danny Kaye, well known stage and screen star, entertains 4,000 5th Marine Div. occupation troops at Sasebo, Japan. The crude sign across the front of the stage says: `Officers keep out! Enlisted men's country.'" Pfc. H. J. Grimm, October 25, 1945. 127-N-138204] with [|Title=Danny Kaye, well known stage and screen star, entertains 4,000 5th Marine Division occupation troops at Sasebo, Japan. The crude sign across the front of the stage says: "Officers keep out! Enlisted men's country."] would reveal that "Pfc. H. J. Grimm, October 25, 1945. 127-N-138204" was left out. After all left out parts are neatly listed in a table by a bot, humans could try to figure out what they can do with them. Teofilo (talk) 15:44, 6 August 2011 (UTC)
- I think you missed the point. How do you know what to compare? You have a Commons image file, and then you have a string of characters on a random webpage. If a human has to find http://www.archives.gov/research/military/ww2/photos/ and point the script to the line on the page that has the information, it kind of defeats the purpose. Dominic (talk) 17:10, 8 August 2011 (UTC)
[edit] Categorizing progress statistics software
[concerning Commons:National Archives and Records Administration/Categorize/Progress ]
Hello,
Would it be possible for BernsteinBot to compile more data ? At present the "categorized" column on, for example, this page only provides a boolean "categorized" YES/NO parameter. Would it be possible to retrieve the number of added categories and to calculate the percentage of files with 2 or more categories, with 3 or more categories, etc... ? Especially if the number of categories is only one, I consider that the job is not finished. Files should have at least 2 or 3 categories, in most cases. It would be good to have a way to find the files with only one category, so that people can quickly go to those files to finish the job. Teofilo (talk) 12:58, 1 August 2011 (UTC)
The above is a copy of a message I left on Bernsteinbot's owner talk page Teofilo (talk) 12:12, 5 August 2011 (UTC)
I think we need also statistics to control whether the |author field has been completed or is still left blank. Teofilo (talk) 12:19, 5 August 2011 (UTC)
- It operates based on normal Commons procedure. Files are either uncategorized or they're not. I don't see much evidence for your opinion that files with only one category are "unfinished". It would be nice to collect some of these statistics for measuring outcomes, but I'm not convinced it would be very useful (or very much used) by people categorizing. Its certainly not a pressing need. Dominic (talk) 17:05, 8 August 2011 (UTC)
[edit] Using en language templates
Dunno if there'll be any further bots edits to the already uploaded images, but I guess there will. So if there is a chance, could someone please add {{en|…}}
around the descriptions (title and general notes)? I'm a bit surprised that this (apparently) didn't happen already on upload. Using the template would make future translations a bit easier, and is generally recommended here on Commons for internationalization issues (even if it's only regarded as helpful for users who don't speak English, to allow quick and easy identification of the language used). Many thanks in advance --:bdk: 14:32, 20 August 2011 (UTC)
[edit] Resolutions
This page is getting very unwieldy. I am going to be marking and collapsing threads that seem to be resolved so that it is easier to navigate the page and see what needs to be addressed. If anyone feels that I have erroneously marked something as resolved, please feel free to uncollapse it and say so. Dominic (talk) 17:45, 11 August 2011 (UTC)
- I marked general questions of categorization as resolved, as we have developed a process for assisting editors in categorizing. Every image uploaded is given {{Uncategorized-NARA}}, which places it in Category:Media contributed by the National Archives and Records Administration. Each file is also automatically placed in a category for its NARA series. We have an automatically updated project page at Commons:National Archives and Records Administration/Categorize/Progress where Commons editors can see the progress of per-series categorizing and navigate down to to a list of individual images that need categorizing. In tis way, hopefully adding topical categories for all files will be manageable. Dominic (talk) 18:43, 11 August 2011 (UTC)
- Open issues
I am trying to summarize the issues that are in any way open, so we can bring some closure to this and the uploading can be completely above board.
- Can we automatically match NARA author data with Creator: templates and categories on Commons? — I'd like to work on this, but it can be done within the template, so it doesn't need to block uploads.
- Do we want to move the "NARA - <ID> - " part of file names to the front? — It will stay as is unless we hear from more people that they want this.
- Storing metadata on a separate template. — I wasn't entirely sure how useful or even possible this is, so I have left it alone in case others have thoughts.
- Teofilo's requests:
-
- ...wants to remove record ID and location fields.
- ...wants to remove the imposed character limit on file names (i.e., names up to 250+ characters). — I think this is controversial, as it will make very long names. There is not much support so far.
- ...wants to develop a file-matching tool.
- ...wants to automatically add author information from other web pages where possible.
- ...wants more progress statistics from the bot — Certainly possible, if useful.
It seems to me that all of these fall into the category of things that can be worked on during/after the actual upload of files, with the possible exception of the file name lengths. However, that and several others either do not seem very well supported or thought out. New comments, even if it's just simple agreement or disagreement, would help clarify the level of support. Dominic (talk) 19:09, 11 August 2011 (UTC)
Uploaded | Progress | Recent uploads | Category |
---|---|---|---|
123,633 | 50 % | Gallery | Category:Media contributed by the National Archives and Records Administration |
50.1% completed (estimate) |
Assigned to | Progress | Bot name | Category |
---|---|---|---|
Dominic | 50 % | US National Archives bot | Category:Media contributed by the National Archives and Records Administration |
[edit] United States Fish and Wildlife Service
Moved from commons-l to here:
Here's the latest output from my upload script: http://commons.wikimedia.org/wiki/File:FWS_1_Local_resident_working_on_dog_sled_harness.jpg
This includes all of the changes that I was planning to make. What do you think? Shall I go ahead and upload all of the images that I have downloaded at http://images.freeculture.org?
FWIW, the code is here: https://github.com/gameguy43/usable_image_scraper In particular: https://github.com/gameguy43/usable_image_scraper/blob/master/wikiuploader.py
[edit] Opinions
I moved it here. Feel free to change this page whatever way you like. I love the fact that you're working on this! Some input from my side:
- Title: Please don't prepend with "FWS", this messes up to sorting of categories. Please append it (<title> - FWS - <id>.jpg like that)
- Date: Please use the original date (not the scan date)
- Original metadata: Please don't add that. You just have to integrate it with the page. An easy way to do that is to create a template you can substitute at upload, see for example user:Multichill/WGA. If people ever wonder about the original fields they can just check the uploadlog. Started one at User:BotPyrak/FWS
- Categories: This image is uncategorized. The subjects seem to be very suitable for assigning categories. You can either put the images in these categories directly or work with temporary categories so users have to move it.
Multichill (talk) 14:02, 5 June 2011 (UTC) Ok. I started an upload template at User:BotPyrak/FWS.
- At upload do
{{subst:User:BotPyrak/FWS|subst=subst|
- Loop over the key value pairs and output them like
|<key>=<value>
(code for that is already in pywikipedia). The individual subject fields have to be available at subject_1, subject_2 etc. I still have to add the code for that to Pywikipedia. - Close with
}}
Multichill (talk) 14:18, 5 June 2011 (UTC)
Assigned to | Progress | Bot name | Category |
---|---|---|---|
User:BotPyrak |
[edit] Adams
At the National Archives, there should be about another 100 photographs we can host. Category:Photographs by Ansel Adams has currently c. 100.
LOC includes about 200 images. Commons c. 25. -- Docu at 13:27, 13 February 2011 (UTC)
[edit] Opinions
- Ansel Adams put no restrictions on the use, when giving these to LOC. [23] I'm not sure if that equates to them being in the public domain?
- For the NARA photographs, he was commissioned by the National Park Service and this I think they are definitely public domain. The only issue with these is they are not such high resolution scans, and I'm quite sure NARA has high res TIFFs somewhere. We're starting up collaboration w/ NARA, we can try asking for the high res photos. -Aude (talk | contribs) 17:27, 9 March 2011 (UTC)
- I will ask them about this collection! Dominic (talk) 14:24, 26 May 2011 (UTC)
- Hypothetically speaking (:-)), how would you like this group of photographs? If images are reuploaded on Flickr with the highest resolution, is there an easy way of grabbing them? (Noting, of course, that the medium-resolution photos are all on the National Archives' Flickr stream [24] with metadata.) Dominic (talk) 16:19, 26 May 2011 (UTC)
- Actually, this isn't hypothetical anymore. We are starting to re-upload the high-resolution images over the old ones in the Flickr album right now. The first few are done already. (!) Dominic (talk) 18:04, 26 May 2011 (UTC)
- Or alternatively, I am also going to be replacing the images in the Archival Research catalog (ARC) with the high-res ones, and those catalog entries are also where the full metadata lives (the metadata on Flickr has been reduced to the most important fields). They could be taken from there. I do have the original files, which are TIFFs being converted in Photoshop, but they are not associated with the records with metadata, or even named in any useful way, which is why the Flickr album seemed more useful. Dominic (talk) 18:31, 26 May 2011 (UTC)
- I'd be happy to transfer the full-res images from Flickr once they are all uploaded. Kaldari (talk) 02:03, 27 May 2011 (UTC)
-
-
- Category:Photographs_by_Ansel_Adams
- Category:Black_and_white_photographs_of_the_United_States
- Category:Images_from_the_National_Archives_and_Records_Administration
- Category:Images_from_the_National_Archives_and_Records_Administration/Photographs_by_Ansel_Adams I'd go for this cat instead of Category:Photographs by Ansel Adams from the U.S. National Archives. It helps people to find these images at once and subordinates to the already existing category. regards, PETER WEIS TALK 23:29, 27 May 2011 (UTC)
- I've come up with a slightly more radical idea that I just posted to the cultural-partners list - donation specific categories. What do you think of this? Kaldari (talk) 23:37, 27 May 2011 (UTC)
- Great! Let's do both I'd say. We don't have a GLAM cat so far. But then again I think it might be useful for some people to find the images of Ansel Adams within the images from the National Archives and Records Administration. Dunno if this is already over-categorisation?? regards, PETER WEIS TALK 23:49, 27 May 2011 (UTC)
- Where would photos or video fit that we actively go to NARA and digitize ourselves? These wouldn't be a "donation"? Yet, I'd consider en:WP:FedFlix as part of or related to the GLAM project. (we could also go digitize selected photos from Ansel Adams at higher resolution, should we want to do restoration) -Aude (talk | contribs) 23:59, 27 May 2011 (UTC)
- I think sets created as part of special digitization events (rather than donations) should probably go under Category:GLAM events. For example, we have Category:Wikipedia Loves Art events under there currently. If it's just a personal effort, rather than an event or donation, I would suggest just creating a personal user category for them or just using the regular categories exclusively. Anyone else have thoughts on this? Kaldari (talk) 00:06, 28 May 2011 (UTC)
- Where would photos or video fit that we actively go to NARA and digitize ourselves? These wouldn't be a "donation"? Yet, I'd consider en:WP:FedFlix as part of or related to the GLAM project. (we could also go digitize selected photos from Ansel Adams at higher resolution, should we want to do restoration) -Aude (talk | contribs) 23:59, 27 May 2011 (UTC)
- Great! Let's do both I'd say. We don't have a GLAM cat so far. But then again I think it might be useful for some people to find the images of Ansel Adams within the images from the National Archives and Records Administration. Dunno if this is already over-categorisation?? regards, PETER WEIS TALK 23:49, 27 May 2011 (UTC)
- I've come up with a slightly more radical idea that I just posted to the cultural-partners list - donation specific categories. What do you think of this? Kaldari (talk) 23:37, 27 May 2011 (UTC)
- Category:Photographs_by_Ansel_Adams
-
FYI - I'm working on uploading the LOC Manzanar photos. -Aude (talk | contribs) 23:54, 27 May 2011 (UTC)
I've just created {{NARA-cooperation}}. Please improve if you think it's necessary. regards, PETER WEIS TALK 00:18, 28 May 2011 (UTC)
- Nice. I just added it to the test file and the upload script. Kaldari (talk) 00:28, 28 May 2011 (UTC)
- I've seen you added the Creator:Ansel Adams to the image. I assume this will be added to every image? regards, PETER WEIS TALK 00:43, 28 May 2011 (UTC)
Assigned to | Progress | Bot name | Category |
---|---|---|---|
Kaldari | Completed | File Upload Bot (Kaldari) | Category:2011 Ansel Adams donation from U.S. National Archives |
[edit] Geographicus Rare Antique Maps
Geographicus Rare Antique Maps is a specialist dealer in fine and rare antiquarian cartography and historic maps of the 15th though 19th centuries. A large portion of their inventory of authentic antique maps is online at their website. The owner send an email to OTRS and Pharos brought him in contact with me. It's a collection of about 2000 old maps. The owner was perfectly happy with me using dezoomify to get the high resolution images. Stuff used to get this batch upload going:
- Commons:Geographicus - project page
- Commons:Geographicus/sample - page for samples
- {{Geographicus-link}} - deeplink template
- {{Geographicus-source}} - source template
- Category:Images from Geographicus - category for all images
- Category:Maps from Geographicus to be categorized - category for images still needing to be categorized
- User:Multichill/Geographicus - template used at upload
- User:Multichill/Geographicus/cartographers - template to match cartographers
- Source code of the bot (pywikipedia based)
I have a comma separated file with all the metadata. I loop over all images:
- Get the metadata and do a bit of cleaning up
- Construct the title: <product_name> - Geographicus - <id>.jpg
- Construct the description: Basically just put all fields in User:Multichill/Geographicus
- Check if the title doesn't already exist (probably the same file, dupe checking might be a bit problematic, have to find out)
- Download the high resolution image using dezoomify
- Do a dupe check
- Upload the image
I could use some help to get User:Multichill/Geographicus/cartographers more complete. When the upload start it would of course be nice if people can help with Category:Maps from Geographicus to be categorized. Multichill (talk) 11:42, 12 March 2011 (UTC)
[edit] Unknown artist
- Very nice batch Multichill.
- One image I checked had the artist field linking to http://www.geographicus.com/mm5/cartographers/unknown.txt − maybe a bot could replace that by {{unknown|author}} ?
- Jean-Fred (talk) 18:22, 25 March 2011 (UTC)
- Changed. When User:Multichill/Geographicus/cartographers is complete I will do a bot run to replace it. Multichill (talk) 18:43, 25 March 2011 (UTC)
[edit] Geographicus link
Nice project and good file descriptions. I am just wondering on a small point: {{Geographicus-link}} does not really provide an accession number, shouldn't it go in the "references" field ?--Zolo (talk) 06:59, 25 March 2011 (UTC)
- From {{Artwork}}: accession number: Museum's accession number or some other inventory or identification number. Provide also link to museum database if available.
- It does provide an accession number. Take for example this image. The id is Hempstead-uscs-1925, this gives Geographicus code: Hempstead-uscs-1925 (a link to the source in the Geographicus database). I don't think this should go to the references section. Multichill (talk) 18:48, 25 March 2011 (UTC)
- I think the passage from artwork documatation you quote made more sense in older versions of the template, when accession number was called ID. It is true that Geographicus link may be somewhat akin to an accession number, but it does not look like a number, so I think it sounds a bit odd. Since "Hempstead-uscs-1925" is a "code" and is "Necessary for phone orders", maybe we could simply change the layout of {{Geographicus-link}} to something like:
- Geograhicus code : Hempstead-uscs-1925--Zolo (talk) 09:42, 26 March 2011 (UTC)
[edit] Broken uploads
Just a section to add broken files. Maybe we can get them later on:
- File:1857 Colton Map of New York City, New York - Geographicus - NewYorkCity-colton-1857.jpg, was all black. Uploaded the low res version. Multichill (talk) 10:49, 26 March 2011 (UTC).
- File:1870 Andriveau-Goujon Map of Palestine - Israel - Geographicus - Palestine-andriveau-1870.jpg appears to have not uploaded correctly. Rmhermen (talk) 00:06, 26 March 2011 (UTC).
- File:1794 Laurie and Whittle Map of Belgium - Geographicus - Belgium-lauriewhittle-1794.jpg – low res. Robot Monk (talk) 19:58, 8 June 2011 (UTC)
- File:1794 Boulton and Anville Wall Map of Africa (most important 18th cntry map of Africa) - Geographicus - Africa2-boulton-1794.jpg – the map on Geographicus was replaced with another, more colorful version – we should upload it too. Robot Monk (talk) 20:03, 8 June 2011 (UTC)
- File:1851 Tallis Map of Mexico, Texas ^ California - Geographicus - MexicoTexas-tlls-1851.jpg- Tm (talk) This file appears as a black rectangle. 03:53, 14 June 2011 (UTC)
- File:1865 Chapman Sectional Map of Minnesota ( Pocket Map ) - Geographicus - MN-chapman-1865.jpg This file appears as a black rectangle. Tm (talk) 03:53, 14 June 2011 (UTC)
[edit] Ordnance Survey OpenData
I spotted the OS map File:Whitehaven area 1 in 250 000 scale.png being used on the high traffic w:Cumbria shootings article and thought it was a copyright violation. I then looked closer, and realised that w:Ordnance Survey has released several datasets under a free (and Wikimedia Commons compatible) license, as part of their OpenData Initiative.
This is a fantastic resource, and of great use to almost any UK geography article. This could greatly standardise mapping across the UK, and replace many custom one off maps with the recognisable and accessible OS standard. There have already been a few OS OpenData uploads to Commons, and they can be found at Category:Maps from Ordnance Survey. A list of all OpenData products can be found at https://www.ordnancesurvey.co.uk/opendatadownload/products.html, the license can be viewed at http://www.ordnancesurvey.co.uk/oswebsite/opendata/licence/docs/licence.pdf
I've not looked into the datasets themselves, but a lot of them are in TIFF format, which would be an issue. And there's also the issue of how we split the maps into separate files. Even if the source data is in separate files, they may arranged in arbitrary grid squares, which may not be helpful if you'd like to show a map of a town. Is there am easy way to display a matrix of multiple maps on Wikipedia?
Still, I don't think these are big issues. These maps are definitely within Common's scope, and can make an immediate impact across Wikimedia projects. Suitcivil (talk) 22:41, 4 June 2010 (UTC)
[edit] Opinions
Great sets. License looks ok. I'll take have a shot at it. I wanted to look into these datasets anyway for the Commons:Batch uploading/Geograph upload. Multichill (talk) 08:04, 5 June 2010 (UTC)
- I've downloaded the tiff files. It's about 20G. I'm going to upload the tiff files and jpg versions of the files. I'm going to make a new license template to reflect the OS license. That will probably all be fine, I'm just wondering what the best way is to get these files categorized. All files should end up under Category:Maps of the United Kingdom. Based on the grid square it's probably possible to find the right subcategory.
- Each map should link to the other file version (tiff -> jpg, jpg -> tiff) and it would be very nice if every map links to the squares next to it for easy navigation. Multichill (talk) 11:34, 5 June 2010 (UTC)
- At http://toolserver.org/~multichill/temp/OS_OpenData/ I stored the files. I'm currently converting the tif files to jpg files, this will probably take a while to complete. Multichill (talk) 14:04, 5 June 2010 (UTC)
After a short IRC conversation, I'm currently experimenting what the options are regarding SVG. There are vector files available. I'll convert a few to SVG and see how our SVG parser is doing. –Krinkletalk 16:07, 5 June 2010 (UTC)
- SVG will probably be quite hard to create. I'm sticking to the tiff and jpeg files. I created Commons:Batch uploading/Ordnance Survey/Template to be substituted, {{Map tile navigation}} to navigate the tiles and {{OS OpenData}} to be used as license tag. Multichill (talk) 21:47, 5 June 2010 (UTC)
- I'm about to upload the first batch jpg part & tif part. Multichill (talk) 15:43, 6 June 2010 (UTC)
- The first (small) batch is online. Ordnance Survey 1:250 000 Scale Colour Raster map gives a nice overview. In this batch I have 1 map per grid square. I have two more batches, one with 100 images per grid square and one with 400 images per grid square. Feedback would be nice. Multichill (talk) 20:07, 6 June 2010 (UTC)
- I should probably add a category to each batch, what about:
- Should I timestamp the files in the filenames? Multichill (talk) 19:00, 8 June 2010 (UTC)
- I'll just keep on talking to myself. New batches at http://toolserver.org/~multichill/temp/OS_OpenData/OS_Street_View_outputjpg/ and http://toolserver.org/~multichill/temp/OS_OpenData/OS_Street_View_outputtif/ . Probably need some fine tuning. Multichill (talk) 20:10, 8 June 2010 (UTC)
- I'm about to upload the first batch jpg part & tif part. Multichill (talk) 15:43, 6 June 2010 (UTC)
Assigned to | Progress | Bot name | Category |
---|---|---|---|
Multichill | Uploaded the first batch | OrdnanceSurveyBot | Category:Maps of the United Kingdom (and subcats) |
[edit] Boundary-Line data
I've uploaded PNG and SVG versions of the parish_region shapefile in the bd_line.zip. The png works ok, but lacks detail; whilst the svg is huge: Inkscape runs out of memory for me when I try to do anything with it, and Mediawiki struggles.
IMO, one significant flaw in the data is that the boundaries for Bristol, Liverpool and Torbay (amongst others) include significant area of sea; distorting the coastline. The one data file in the set that doesn't have that issue is the high_water file, so combining the two would produce more useful maps again. My SVG skills aren't up to that job, and in any case I'd have memory issues.
I'm not totally sure but I believe the contents of the various shapefiles in bd_line.zip are:
- parish_region - Civil parishes (and equivalent, uploaded)
- district_borough_unitary_ward_region - Electoral wards in England and Scotland; excluding counties with a unitary council (eg Cornwall and Wiltshire)
- high_water_polyline - Coastline of England, Wales and Scotland (and outlying islands)
- westminster_const_region - Parliamentary constituencies for all of England, Wales and Scotland
- district_borough_unitary_region - Admin districts for England, Wales and Scotland
- county_electoral_division_region - Electoral divisions for certain counties in England (non-Unitary Authorities)
- unitary_electoral_division_region - Electoral divisions for Wales, and those parts of England excluded in previous file.
- scotland_and_wales_const_region - Unsure - Welsh assembly and Scottish parliament constituencies I think.
- scotland_and_wales_region_region - Electoral regions for Scotland and Wales (for the devolved governments)
- county_region - English counties, which are not Unitary authorities.
- greater_london_const_region - The london assembly constituencies.
I planning to do location/locator maps for at least:
- Districts, parishes, parliamentary constituencies and wards within counties
- Wards within Unitary Authorities.
- Scottish Parliament, Welsh Assembly and London Assembly constituencies.
I'll do these as png only; mainly as a result of the coastline issue I mentioned.--Nilfanion (talk) 12:47, 8 June 2010 (UTC)
- Nice that data is used!....but isn't this a bit out of the scope of this page? Multichill (talk) 19:00, 8 June 2010 (UTC)
- Fair enough. Did you see the http://openspace.ordnancesurvey.co.uk/openspace/ on how to reuse their data? Multichill (talk) 19:43, 8 June 2010 (UTC)
Curious as to what you plan to do with the Vector Map District data. The OS provides raster (tif and jpg) data, but its primarily a vector product which unfortunately is in shapefile format only...--~~
[edit] Tropenmuseum
The Tropenmuseum donated about 2100 image related to Suriname and will donate a lot more images in the future (see Commons:Tropenmuseum). GerardM did the communication part, did Multichill the uploading/technical part.
[edit] Suriname
The first batch I got were 2100 images related to Suriname and the Marroon. I received a DVD containing the images and a Microsoft Access database containing the metadata. I created a user ODBC connection in windows and used pyodbc to make a connection from python. The code is a combination of custom code, pywikipedia and functions I copied from previous projects (Deutsche Fotothek & WLANL). The filenames were already in the right form and contained a unique identifier so I had my bot loop over the files and for each file:
- Extract the unique id
- Using the identifier pull all relevant info from the database
- Generate a description
- Generate temp categories
- Generate a Sha1 hash and check for duplicates
- If the file doesn't exist yet, upload the file using KITbot
Of course you can find the source in my svn.
The provided metadata was excellent. It contains descriptions in one (Dutch) or more (English) languages and was very useful for generating temp categories. All the images are placed in Category:Images from the Tropenmuseum and a bunch of temp categories. Images have to be copied from these temp categories to real categories. Turned out we don't have a lot of Suriname related images so I pretty much had to build a category tree from the ground up. This is a lot of work, but images end up in very good topic categories. It also improves the chance of images ending up in multiple relevant topic categories (previous batch uploads images got stuck at only one category). This is a lot of moving around, but I that's just a job for a bot. This mapping causes a lot of over-categorized images, but this can easily be fixed with the recategorization bot (imagerecat.py -cat:Images_from_the_Tropenmuseum -onlyfilter
). For the next part we have to figure out how to get people to categorize the images because I don't feel like doing this all alone. Users only have to map temp cats onto topic categories, the actual moving is done by a bot. Not sure how to make this easy for other users. Multichill (talk) 11:41, 16 September 2009 (UTC)
[edit] Indonesia
Yesterday Gerard and I visited the Tropenmuseum. We got 35.000 images and a database with all the metadata. I slightly modified the program I used for Suriname and fired up the bot. Modifications:
- Other database name and other table names
- Changed the regular expression to find the id of the file
- Removed some encoding bugs
- Filtering the temporary categories to get rid of the completely useless categories right away
- Added <!--{{id|1=To be translated}}--> so Indonesian translations can be added later.
The upload will probably be finished tomorrow. Than comes the hard part: Categorization. I added temporary categories again, but this time I got some data from the Tropenmuseum describing the structure of these categories so I can build a tree. I will first do this for the geography tree. Multichill (talk) 22:19, 26 November 2009 (UTC)
[edit] Categorization
Moved to Commons:Tropenmuseum#Categorization to avoid redundancy
[edit] Opinions first part
- Making categorization easier: How about doing something like with the Fotothek upload? Like creating the temporary categories with a commons delinker link and a suggested category, waiting for a user to review it. And where are all these categories stored? I mean where can I find a list of all the temporary categories with how many files they contain so I could check for a better category name also for the delinker? Automatic Dutch to English translation would also make it a lot easier, instead of going to Google and translating...BTW, the upload is already finished right?--Diaa abdelmoneim (talk) 00:05, 18 September 2009 (UTC)
- You can find lists at User:Multichill/KIT/categories and User:Multichill/KIT/categories2. See the history for the progress. I already worked on most categories. Multichill (talk) 08:48, 18 September 2009 (UTC)
[edit] 3th batch
A third batch is expected somewhere in February 2010, but this might be much later. Until then we have plenty of images to keep us all busy. Multichill (talk) 22:57, 20 December 2009 (UTC)
[edit] Objects due to arrive soon
Just got an email. The next batch is in the (snail) mail now. The next batch is 6000 photo's of objects in the collection of the Tropenmuseum. Probably going to upload these objects in the next couple of days. Multichill (talk) 15:58, 16 June 2010 (UTC)
- We had some problems, but now I'm uploading new images. Multichill (talk) 19:39, 27 July 2010 (UTC)
[edit] Festivals uploads
Since this was suggested here
I sometimes cover festivals (like many other photographs) for the french chapter (Wikimédia France).
This mainly involves two kinds of events:
- book or comic strip festivals Comédie du Livre of Montpellier, O Tour de la Bulle of Montpellier, Festival of Sollies Ville, Festival of Roquebrune sur Argens, Festival of La Seyne sur Mer, Festival of Luminy etc.
- manga - anime - japanese culture festivals / convention.: Mang'Azur - Japan Matsuri, Japanîmes (next week) , JapanExpo (in one month)...
In the first kind, there are usually authors taken in photograph, preferably during interviews or autograph session.
For the second kind, there can be notable people to be taken in photograph, usually people performing on stage like 'HITT'. There can be also ambiance photograph to depict those festivals.
Those photograph are also tagged with {{Supported by Wikimedia France}} with a catname corresponding to the eventname concatened wmf (Support or Cover by), depending if an accreditation was needed to take the photograph.
Right now, I am uploading with Commonist a batch of photographs taken the last weekend, I intend to categorize each photographs by the category usually labeled here to category:'surname name'. I also intend to fill the value of the depicted parameter for each photographs of author, so this can be displayed in the special template corresponding to all those images.
Esby (talk) 22:04, 3 June 2010 (UTC)
[edit] Upload done
- Category:Comédie du Livre 2010 - 534 photographs of ~ 205 book / comic strip authors.
- Category:JapaNîmes 2010 - Anime convention of Japanîmes in Nîmes. (less than 200 photographs)
- Category:Bulles en Seyne 2010 - Comic Strip Festival - 78 photographs of authors. (might upload a few ambiance picture & panorama)
- Category:Festival Montpellier In Game 2010 - Game Festival of 'Montpellier' 2010 edition. 75 photographs - some ambiance photographs + a few photographs of japanese game creator.
- Category:Japan Expo 2010 - The biggest anime and japanese culture convention in France. about 704 photos (by user:okki & myself mainly.). Concert - Personalities - Ambiance.
- Category:Festival BD O Tour de la Bulle - Montpellier 2010 Comic strip festival of Montpellier (O Tour de la Bulle) - 56 photographs of authors.
- Unrelated with the festivals upload: Category:L'Illustration_1858 - Most of the files are online.
[edit] Upload in progress or to be done
Files that needs to be uploaded
- Category:Toulouse Game Show 2010 - Game - Anime related Convention of Toulouse - about 130 photographs - concert / personality / ambiance, upload in progress.
- Comic strip festival of Luminy (Photographs taken, but not yet uploaded on Commons).
[edit] Event cover planned
Photograph not taken yet.
[edit] Opinions
Hi, just wondering if there should/could be done something about the categorizing in both Category:Comédie du Livre 2010 and Category:Comédie du Livre 2010 - Supported by Wikimédia France. Should the description-template categorize the file, and/or should the category name include it's supporter ? Just wondering. –Krinkletalk 16:55, 5 June 2010 (UTC)
- This image was taken by someone external to Wikimedia France, so the file is in the normal category while not being in the one supported by Wikimédia France. This is done in order to limit the files directly in Category:Supported_by_Wikimedia_France. Esby (talk) 20:34, 6 June 2010 (UTC)
Assigned to | Progress | Bot name | Category |
---|---|---|---|
Esby (talk) 22:11, 3 June 2010 (UTC) | Upload done. | irrelevant ( still categorisation or edits might be performed by user:esby-mw-bot | All images are categorized in Category:Comédie_du_Livre_2010_-_Supported_by_Wikimédia_France |
[edit] US Air Force
And yet another branch of the military to pick clean. The US Air Force has a set of photos at their site. Not sure how much photo's we're talking about. The same logic as the Fema, navy and army can be used. Crawl all the galleries and extract the id (simple regex). The id's can be used like http://www.af.mil/photos/media_view.asp?id=314289 to get the image and metadata (beautifulsoup). The gallery structure can be used to make a temporary category structure. The name of the files should be like "US Air Force <id> <title>.jpg". Of course duplicate checking should be enabled like in all the other bots.
The source will be available here. Multichill (talk) 18:11, 23 October 2009 (UTC)
I felt like building something so I build two things:
- A category generator. I used it to generate a tree under Category:Images from the US Air Force based on the galleries at the website. {{Air Force header}} does all the magic. You can view the full list at Special:PrefixIndex/Category:Images from the US Air Force.
- A upload bot
The upload bot can work on these categories and fill them. If subject is set, it will upload to the subject category right away. Some example images can be found in:
- Category:Fairchild A-10 Thunderbolt II
- Category:F-15 Eagle
- Category:Images from the US Air Force, Airlift
Would be nice to set the subject on a lot of these categories before actually uploading the images. What do you think? Multichill (talk) 17:48, 16 January 2010 (UTC)
Assigned to | Progress | Bot name |
---|---|---|
Multichill | On hold (Commons is short on disk space) | BotMultichillT |
[edit] US Army
The Fema request got me started. The US Army has a nice set of images at http://search.ahp.us.army.mil/search/images/?per=10&page=1&search= . Judging from the latest id it's around 50.000 images. The bot should probably consist of two parts
- Loop over the search pages and find the location of all images like http://www.army.mil/-images/2009/10/14/53021/ . All pages seem to be in the form http://www.army.mil/-images/YYYY/MM/DD/photo_id/
- Work on all these images
Shouldn't be to hard with some regular expressions for the first part and screen scraping with beautifulsoup for the second part. Multichill (talk) 22:07, 14 October 2009 (UTC)
I wrote a bot for this (source). It basicly works the same as the other USgov bots. The main difference is that I'm unable to extract category information. The title is based on the title field, and as a fallback, the description. The first images can be found in Category:Images from the US Army needing categories as of 23 October 2009. Multichill (talk) 14:01, 23 October 2009 (UTC)
No response so I slowly fired up the upload. Multichill (talk) 11:31, 25 October 2009 (UTC)
[edit] Opinions
Assigned to | Progress | Bot name |
---|---|---|
Multichill | On hold (Commons is short on disk space). | BotMultichillT |
[edit]
The Fema request got me started. The US Navy got about 75.000(!) images available at http://www.navy.mil/view_photos_top.asp just waiting to be copied to Commons. I wrote a bot based on the FEMA upload.
- The bot loops over all the images.
- From the META fields I get the url, long description and short description
- A regex extracts the date from the long description
- A regex extracts the author from the long description
- A regex extracts the location from the long description
- The title is constructed based on the url and the short description
- Image is uploaded and ends up in one of these categories
This is just a general overview. The source is available here. Multichill (talk) 16:48, 16 October 2009 (UTC)
[edit] Opinions
- There is a template for the US Navy images {{ID-USMil}} you could use this or create one only for the US navy and add it in the source.--Diaa abdelmoneim (talk) 17:49, 16 October 2009 (UTC)
- Looks nice. I'll probably use it for the next files. Multichill (talk) 17:30, 19 October 2009 (UTC)
- I'm not sure if the ID should be stated first. Like on File:000629-N-5686B-001 Sailor Returns Home.jpg I think US Navy should be before the numbering.--Diaa abdelmoneim (talk) 17:49, 16 October 2009 (UTC)
- Sure, so it would be File:US Navy 000629-N-5686B-001 Sailor Returns Home.jpg in this case. Multichill (talk) 17:30, 19 October 2009 (UTC)
- Some images like File:020121-N-5563S-003 .50-Caliber Machine Gun.jpg don't have date and location. This is because the date isn't in brackets. It is however between ")" and "--" or ")" and "–". I also don't know why the location isn't grabbed...--Diaa abdelmoneim (talk) 17:49, 16 October 2009 (UTC)
- Looks like I have to improve the regex to catch these cases. Both date and location use the same regex for maching. Multichill (talk) 17:30, 19 October 2009 (UTC)
- You don't need the ID in the description. Create or use a source template for the upload where the ID is stated and a link to the site is given.--Diaa abdelmoneim (talk) 17:49, 16 October 2009 (UTC)
- I do to prevent naming collisions. Multichill (talk) 17:30, 19 October 2009 (UTC)
Ok. Bot is changed to include the suggestions. Now it's running again. Multichill (talk) 19:50, 21 October 2009 (UTC)
- One small problem - it doesn't seem to like image descriptions with quotation marks in them, and so cuts off partway through - eg/ File:US Navy 071227-N-4014G-037 An MH-60S Seahawk assigned to the .jpg; File:US Navy 071227-N-6125G-184 ailors attached to the Nimitz-class aircraft carrier USS Harry S. Truman (CVN 75) enjoy a USO concert preformed by the band .jpg. Shimgray (talk) 21:18, 23 October 2009 (UTC)
- Ah, an escaping problem. This probably only happens to a couple of images. We can always move them to a better name if the current name is not clear. Multichill (talk) 21:58, 23 October 2009 (UTC)
I worked my way through some of the aircraft carrier categories. Interesting! Still a lot of additional categorization to do though.
- For one carrier, generally several temporary "aboard USS .." categories could be combined into one.
- The ship based temporary categories seem more helpful than the ones for stable locations, e.g. "Arabian Gulf".
- For the captions, maybe {{original caption}} could have been used.
- Minor point: given the small size of the license tag, it could have been included directly into {{information}}.
- It might be worth going through the descriptions by bot to wikify names of units, ships, etc., linking them to the corresponding articles at en.wp
-- User:Docu at 06:43, 25 October 2009 (UTC), edited 06:59, 25 October 2009 (UTC), 08:10, 25 October 2009 (UTC)
-
- You should probably move it to topic categories right away. Maybe you could use a bot.
- Stable locations only seem to be useful for photos on land.
- That could have been used, but I didn't.
- That could have been done.
- Nice to see people working on this! Multichill (talk) 11:18, 25 October 2009 (UTC)
-
- I mentioned 3 and 4 mainly for future uploads. BTW I made a bot request at Commons:Bots/Requests/vertrepbot. -- User:Docu at 16:19, 26 October 2009 (UTC)
Hey Multichill, thanks for uploading all those Navy pics, I'm sorting through them now, looking for possible FP candidates.
A few things I found will I was looking through them:
- 1. Some images seem to have had something go wrong with their title during the upload; For example this one and this one. I'm assuming you meant to have 's around some words ('Sea Sparrow') but something's gone awry. You might want to fix it before you move on to the Army upload.
- 2. There also appears to be at least one categorisation error. If you take a look at Category:General views of USS Kearsarge (CV-33) and Category:Aboard USS Kearsarge (CV-33), you'll see a number of pictures of another ship of the same name; Category:USS Kearsarge (LHD-3). They are 2 different ships (the original Kearsage was an Essex class carrier, scrapped in 1974), and while I'm happy to re-categorise them (I'll be helping with the whole batch) is there something you can do to prevent this sort of thing happening in future?
- 3. Though you stated at the Village Pump that you'd built in a duplicate checker, I've found quite a few duplicates as I've browsed the batch. For example, your file duplicates File:USS Port Royal (CG 73) aground.jpg and File:USS Port Royal grounded.jpg. Also, this one and File:AAV Embarking.jpg. There are a few others I've spotted as well, though I didn't note them down.
Hope this helps.
Sarcastic ShockwaveLover (talk) 22:09, 26 October 2009 (UTC)
- Hi Sarcastic ShockwaveLover,
- 1.: "^ldquo" in the title seems to come from "“ in the description.
- 2.: Thanks for noticing. It should be fixed now. It's was correct when Multichill uploaded it. ;)
- 3.: If you look at the file size of this file, you will notice that File:USS Port Royal (CG 73) aground.jpg, isn't a duplicate, but a scaled-down version. File:USS Port Royal (CG 73) aground.jpg should be tagged with {{duplicate}} for deletion. The new file is an improvement over the old one. I found a few ones too and tagged the old ones for deletion.
- -- User:Docu at 14:41, 27 October 2009 (UTC)
-
- I'd rather you didn't delete this one, I rotated it and cropped it to correct the tilt, I'm planning on nominating it for FP status. Sarcastic ShockwaveLover (talk) 08:57, 28 October 2009 (UTC)
- 3. yes, I listed it under "other versions" instead. Looking closer at it, it doesn't appear to be an exact duplicate or scaled down version. The few images that silp through the bot's check are some where the file was edited (and not even scaled down), e.g. this one and File:AAV Embarking.jpg. -- User:Docu at 12:33, 28 October 2009 (UTC), edited 18:22, 28 October 2009 (UTC)
- I'd rather you didn't delete this one, I rotated it and cropped it to correct the tilt, I'm planning on nominating it for FP status. Sarcastic ShockwaveLover (talk) 08:57, 28 October 2009 (UTC)
- The maximum length of file names that are being used seems to be 231 chars. While sometimes in the distant future all filenames have to be that long to be unique, I wonder if we couldn't [have] kept them shorter in the meantime. -- User:Docu at 17:34, 28 October 2009 (UTC) (inserted "have" on 04:42, 30 October 2009 (UTC))
-
- That would mean a mammoth renaming effort. It's already going to be huge just categorising them. That said, I think files like this one should be renamed. Also, perhaps we could put the categorisation/cleanup effort on the front page, much like that large German upload a few months back? It might help get some more people involved. I've categorised about
100150 images so far using HotCat (thank God for that tool), but that's just a drop in the proverbial bucket. Sarcastic ShockwaveLover (talk) 11:58, 29 October 2009 (UTC)
- That would mean a mammoth renaming effort. It's already going to be huge just categorising them. That said, I think files like this one should be renamed. Also, perhaps we could put the categorisation/cleanup effort on the front page, much like that large German upload a few months back? It might help get some more people involved. I've categorised about
-
-
- No, I don't think we should move them. The advantages of the current file names are that they are generally descriptive titles and it's the title the Navy published it with.
- The ^ldquo,/^rdquo,/^rsquo, could be fairly easy to fix (by an adminbot), there are approx. 3500 (Special:Search/^ldquo, OR ^rdquo, OR ^rsquo, prefix:File:US Navy 0). As we generally don't do cosmetics on file names, we could leave them that way though.
- The categorization part should be easier once my bot has created additional categories (see here). I probably should get to work on that.
- Besides these hardware based categories, there is still much to be done to create categories for specific events/operations etc. (e.g. Category:Vertical replenishment). It's fairly easy to build temporary categories from search results. One just needs to go through the category afterwards and remove a few false positives, most categories of FEMA officials were done that way. What generally threw it off were images of "A. on the phone with B." or "A. B. and C. (not pictured) attending Z.)", but they were easy to sort. If you want me to prepare you some temporary categories to review, I'd be glad to do so. -- User:Docu at 04:42, 30 October 2009 (UTC)
-
-
-
- Please and thank you! Sarcastic ShockwaveLover (talk) 12:10, 30 October 2009 (UTC)
-
- I found two incomplete uploads (the only so far):
- -- User:Docu at 18:43, 31 October 2009 (UTC)
- I might be a bug in preview/thumb, looks ok in full resolution. -- User:Docu at 18:46, 31 October 2009 (UTC)
- I can confirm that are really incomplete (also in full resolution). But on the source its the same. Only the preview on source is fine. I cant fix. Only crop will be a solution. --Slick (talk) 07:56, 15 September 2012 (UTC)
- BTW These days, Emijrpbot is fixing the date format on this batch (sample: [25]) -- User:Docu at 14:00, 9 January 2010 (UTC)
- That's very nice. Multichill (talk) 14:16, 9 January 2010 (UTC)
- Can I archive this?--Diaa abdelmoneim (talk) 08:30, 25 April 2010 (UTC)
- The question is if the categorization has to be cleaned up before archiving or not.
Both the Starr batch and the 1st Geograph upload still have quite a few things to clean up, but the initial upload is done and further files could be in a new request. The first of these two had been archived, the second one not.
The Navy news one still has some 4000 location categories, some of which should be merged others removed (I merged approx. 100 of these into 30 one or two weeks ago). Avron is doing quite a lot of categorization on these, but personally I had lost interest sometimes last year. -- User:Docu at 08:40, 25 April 2010 (UTC)- Looks like most images get categorized by ship. I'm now adding temporary ship categories to help in this process. Multichill (talk) 08:53, 1 May 2010 (UTC)
- Some of the location categories were already in the form "Images from US Navy, Location Aboard <ship name>", with tons of spelling variations. Many of these were merged into "Aboard <ship name>" categories. Would be great if you'd help with that too. -- User:Docu at 09:14, 1 May 2010 (UTC)
- I spend some time on categorization. I first added a lot of temp ship categories and than moved images to real ship categories. I now changed the upload bot to first try to find a real ship category, fallback to a temp ship category or add a location category if no ship is found. I also nuked a lot of not so useful location categories (mainly seas). The aboard categories still have to be done. The same strategy could probably be applied. So the next step in big categorization is to either match a temp category with a real category (if it makes sense) or empty it out and nuke it (if the category doesn't make sense). Probably makes sense to start with the biggest temp categories, who wants to help? Multichill (talk) 11:36, 23 May 2010 (UTC)
- Looks like most images get categorized by ship. I'm now adding temporary ship categories to help in this process. Multichill (talk) 08:53, 1 May 2010 (UTC)
- The question is if the categorization has to be cleaned up before archiving or not.
Assigned to | Progress | Bot name |
---|---|---|
Multichill | Finished the initial upload, now resyncing and categorization | BotMultichillT |
[edit] Metropolitan Museum of Art
This is one I've been working on for a while. The Metropolitan Museum of Art has a large collection of about 60,000 images of works in their online collection database, at a variety of resolutions. These have to be filtered carefully by hand because they have many photographs of 3D works and many non-PD works.
For most images that have a high-res version, it is easy to extract it by simply taking the URL of the thumbnail or regular image and changing "thumb" or "regular" to "zoom". This trick works for all images except those in the "The Libraries" collection (which only contains 50 images). Many images contain a color guide and false copyright statement that will need to be cropped at some point.
- Did u try to contact them? Maybe they'd like to help.--Diaa abdelmoneim (talk) 08:51, 3 July 2009 (UTC)
- User:unforth of Flickr also has an extensive collection (several hundreds) of MET pictures like this one. Teofilo (talk) 08:17, 12 September 2009 (UTC)
-
- Any updates? Multichill (talk) 22:55, 20 December 2009 (UTC)
-
-
- I forgot about this one... for this number of images license sorting by hand is infeasible, so I need to come up with some kind of automated scheme for this. I may also want to repeat the rip from the beginning, since there may be new images since I started this. Dcoetzee (talk) 12:30, 31 January 2010 (UTC)
-
Okay, on reflection I think the best way to handle this is to avoid trying to handle all the images in the database at one time - instead it's best to start with the "low-hanging fruit" of categories and/or searches that are known to be all PD. I'll take another look at this, and I'll also post here about how to extract high-resolution images efficiently - I don't believe it's necessary to stitch for the MET. Dcoetzee (talk) 19:23, 4 March 2010 (UTC)
- Okay, here's the skinny on how to download these images at "zoom" resolution:
- Visit the objectview page for the image, e.g. [26], from search results or browsing.
- Save the image URL of the preview thumbnail, for example http://www.metmuseum.org/Imageshare/ep/regular/ep71.11.R.jpg
- Click the "zoom" icon to go the zoom view. If there is no such button then only the preview is available.
- View source. Search for "EQZoom(" in the page text. Take the first parameter to "new EQZoom", and substitute it for the filename at the end of the preview thumbnail URL above (it may be the same). Also, change "regular" to "zoom". This will grab the zoom version of the image, which may be either a TIFF or a JPEG. Zoom resolution varies widely between images and is high but not really high. In this example the URL is http://www.metmuseum.org/Imageshare/ep/zoom/ep71.11.R.jpg
- Dcoetzee (talk) 20:06, 4 March 2010 (UTC)
- I would rather do a partnership project with the MET instead of stripping clean their site. I met a lot of MET people (:P) in the US. I'll contact them and see what's possible. Multichill (talk) 16:50, 30 April 2010 (UTC)
- Bit busy and doing this from Europe isn't that practical. Maybe WMNY can step in here? Multichill (talk) 20:11, 5 January 2011 (UTC)
- I would rather do a partnership project with the MET instead of stripping clean their site. I met a lot of MET people (:P) in the US. I'll contact them and see what's possible. Multichill (talk) 16:50, 30 April 2010 (UTC)
Assigned to | Progress | Bot name |
---|---|---|
User:Dcoetzee | License sorting | User:Dcoetzee |
[edit] WLANL
[edit] Description
Batch upload of all suitable images in http://www.flickr.com/groups/wikilovesart/. These images were created for http://www.wikilovesart.nl/
I'm using BotMultichillT (talk · contributions · deleted user contributions · recent activity · logs · block log · global contribs · SULinfo) for the uploads.
[edit] How the bot works
The source. The bot works like this:
- The bot loops over all the images in the Flickr group
- The bot checks if a suitable license is on the image
- The bot checks if an allowed tag is on the image and not a disallowed tag
- If it's a suitable image the bot will pull the description from Flinfo
- The description is improved based on the added tags by using a template trick and User:Multichill/WLANL/descriptions
- Categories are added using the same trick and User:Multichill/WLANL/museums
- The image is marked as reviewed by Multichill (talk)
- The categories are filtered using the functions in imagerecat.py
- The filename is derived from the username and the title assigned by the user
For more details see the source code.
[edit] Update
I received a lot of permissions, I used a modified (hacked) version of flickrripper to upload these images. Also received permission from the remaining museums. I'm uploading them now. I will do a flickrripper run over the whole pool at the end to catch images not tagged correctly. Multichill (talk) 12:48, 29 October 2009 (UTC)
[edit] Opinions
- Since the description is in Dutch I suggest using {{nl|}} for the descriptions.--Diaa abdelmoneim (talk) 20:50, 19 August 2009 (UTC)
- Also {{WLANL}} should have a link to Wiki Loves Art project page or Flickr group.--Diaa abdelmoneim (talk) 20:50, 19 August 2009 (UTC)
- I suggest moving {{WLANL}} to the source parameter in the information template and adding |url= as a parameter in {{WLANL}} with the Flickr source link.--Diaa abdelmoneim (talk) 20:50, 19 August 2009 (UTC)
- Flinfo doesn't add a description using the image's name so for http://www.flickr.com/photos/tainab/3675827235/ it doesn't add "Amandelbloesem, Vincent van Gogh (1890)" which would be a good description.--Diaa abdelmoneim (talk) 20:50, 19 August 2009 (UTC)
- Generally, I think uploading to Flickr isn't the best choice since it reduces image quality for non pro members like here http://www.flickr.com/photos/petertf/3638745721/ . Maybe offering Flickr Pro accounts to participants would dissolve this issue :-)--Diaa abdelmoneim (talk) 20:50, 19 August 2009 (UTC)
- Thanks for your input.
- Good point. I will change this
- {{WLANL}} sure needs improvement. This is just a quick hacked up version I created because otherwise I would have a red link. It's on my list
- I like it at the bottom because it doesn't clutter op {{Information}}
- This should be changed in Flinfo. I'll do a request at User talk:Flominator#Flinfo request
- Yes. Not having the originals sucks. I'll leave a note asking users to upload the original version if possible. It would be nice if these kind of projects would upload to Commons directly, but with the current tools that's kind of hard.
- Multichill (talk) 07:10, 20 August 2009 (UTC)
- All images use {{nl}} now. Flominator changed Flinfo to also pull the description from the title.
- I fired up the bot to upload. Issues i'm currently aware of:
- Some images get tagged as uncategorized, but these images are categorized.
- Some names are not that good
- I have some name collisions.
- Issues are not that serious and can be fixed later on. Multichill (talk) 14:01, 22 August 2009 (UTC)
There seems to be a problem with getting descriptions from titles. It adds the title in the description, but does so twice. This happened to multiple images:
- File:WLANL - wendier - de Jonge.jpg
- File:WLANL - wendier - Zielenprauw .jpg
- File:WLANL - wendier - Verenvitrine Suriname.jpg
- File:WLANL - wendier - Danseres Nias.jpg
- First part is from User:Multichill/WLANL/descriptions, second part is from the title. Multichill (talk) 16:53, 22 August 2009 (UTC)
[edit] Last update?
I received a couple of permissions. Don't expect to receive more of them. I just have to upload the images for which I received permission and do one final run to see if I missed anything. When I've done these two things, this batch upload is (finally) finished. Multichill (talk) 22:59, 20 December 2009 (UTC)
- Is the upload still in progress?--Diaa abdelmoneim (talk) 12:43, 26 March 2010 (UTC)
- I should probably do some last checking. Multichill (talk) 17:40, 26 March 2010 (UTC)
Assigned to | Progress | Bot name | Category |
---|---|---|---|
Multichill | Waiting for permission in OTRS | BotMultichillT |
[edit] Images from NYPL Digital Gallery
Assigned to | Progress | Bot name |
---|---|---|
Dcoetzee | Uploading | Dcoetzee |
Will be great if we batch upload PD-images from NYPL Digital Gallery - http://digitalgallery.nypl.org/nypldigital/index.cfm NYPL Digital Gallery provides free and open access to over 685,000 images digitized from the The New York Public Library's vast collections, including illuminated manuscripts, historical maps, vintage posters, rare prints, photographs and more. --Butko (talk) 14:45, 14 April 2009 (UTC)
- This collection turned out to be more promising than I supposed. They use LizardTech ContentServer to serve up their images, whose API is described here. Here's how you extract original TIFFs at full size: first use a "browse" query to obtain some XML including the image dimensions, like this one [27]. The folder name and image name can be obtained from URL of the zoom view. Then, use a getimage query like this one [28] to get the full size TIFF, specifying the dimensions from the previous query. Tada. Close examination shows no artifacts in the TIFF - these are original scans (internally, they are SID images). The first one I extracted was 3845 × 4947, about 60 MB as a TIFF, and 27 MB as a PNG (which you can preview here). They throttle you at 80 KB/s per transfer, but they do allow simultaneous transfers; any way you look at it though it would take a long time to fetch all the images we need. In light of the long download time per image, we're going to want to license filter before downloading. Dcoetzee (talk) 07:00, 15 April 2009 (UTC)
- Update: their complete collection of high-resolution images is browsable here. This can be used to easily obtain a list of folder-name pairs. I'll presently begin downloading. Dcoetzee (talk) 06:23, 16 April 2009 (UTC)
- Update: a better way to download these is to use the "getfile" function to get the raw .sid files, which are highly compressed (as in [29]) and then use LizardTech's command-line decoder to convert to TIFF ([30]). This is a quicker download and doesn't even require the dimensions. Dcoetzee (talk) 22:12, 16 April 2009 (UTC)
- I'm still in the middle of grabbing these. Enumerating IDs turned out to be trickier than I thought, because the folders are so large the browse interface times out on them. I ended up enumerating them instead using wildcard searches on single letters. Even just looking at the high res images, it's a lot of data. All told we're talking at least 100 GB in PNGs, and I'm pretty sure all of the high-resolution images are public domain works, although that will require further confirmation. It's an excellent source. Dcoetzee (talk) 06:59, 22 April 2009 (UTC)
- Update: I've enumerated about 65000 high-res images, and am in the process of downloading and converting them to PNGs, slow enough to not overwhelm their bandwidth. So far I've retrieved about 17250, occupying 323 GB. I'm also in the process of generating image descriptions of them based on NYPL metadata. I've created Category:New York Public Library Digital Gallery and plan to start uploading some of them soon. Dcoetzee (talk) 13:18, 16 May 2009 (UTC)
- Update: I've had contact from a representative of the NYPL, who has been very helpful in furnishing IDs and sanctioning the sharing of their public domain images. He gave me a list of about 40,000 stereographs which I can begin uploading immediately as soon as I put together a suitable fully-automated upload tool for the task. Dcoetzee (talk) 21:43, 25 June 2009 (UTC)
- Great work. I think this is good news and I'm very happy that someone over there is nice enough to help out.--Diaa abdelmoneim (talk) 20:46, 27 June 2009 (UTC)
-
-
- I have just begun automated uploading of this collection of 40,000 images, which are being placed along with existing images in Category:Images from the New York Public Library. Each image and its metadata is being downloaded from NYPL on-the-fly. Dcoetzee (talk) 03:11, 28 June 2009 (UTC)
- Update: I've estimated that at my present rate of upload, the current collection being uploaded (which actually contains 84000 images) will require about 7 weeks to upload, and will occupy about 500 GB. Dcoetzee (talk) 10:38, 28 June 2009 (UTC)
-
Nice upload, but I have a couple of points you should address:
- I don't like the two versions (png & jpg). Who cares about thumbnail size? Are you sure you want two upload two versions of every image? And why not upload the original tiffs for our restoration people?
- The files are uncategorized, please tag them with {{subst:unc}} right away.
- How are you going to get these files categorized? The images should probably all in a subcategory of Category:Stereo cards and in one or more topic categories
Other versions field seems to be brokenthat was an easy fix. Multichill (talk) 11:30, 28 June 2009 (UTC)
Multichill (talk) 11:20, 28 June 2009 (UTC)
More to question:
- Do u mean by 84000 images, 42000 png and 42000 jpg?
- Why don't u merge the source template into the source field in the {{NYPL-image-full}} template?
- Does the bot auto categorize?
- What's the license of these images? why are they pd? I mean why is the original file before the scan pd?--Diaa abdelmoneim (talk) 12:08, 28 June 2009 (UTC)
- They're all PD due to age ({{PD-1923}}), according to the NYPL, although some of them don't list a specific date on their page (for many of them, you have to click through to the original source description to verify the age). There was one date field that I was not grabbing, which I am currently modifying it to grab. The bot does not do autocategories (I don't have that functionality, and I don't trust autocategories anyway), but I am now automatically marking them as uncategorized. Uploading the TIFFs doesn't make any sense, because they are derived from MrSID files and contain exactly the same data as the PNG files (there is no metadata).
- I also prefer not to have two versions, but thumbnail size is a very real concern, and unfortunately the software does not support JPEG thumbnails for PNG files. For example, a typical image of width 300 would be about 30 KB in size, which is prohibitive for modem users when many such images are used on a page. When the software adds a proper feature for this, they can all be deleted. Oh, and no, I mean 84000 PNG and 84000 JPEG.
- Should I be putting these all in the root category Category:Stereo cards? Dcoetzee (talk) 17:25, 28 June 2009 (UTC)
Categorizing
- I'm currently categorizing to the "Category:Robert N. Dennis collection of stereoscopic views"--Diaa abdelmoneim (talk) 17:22, 28 June 2009 (UTC)
- I can take care of categorizing by source collection automatically if you wish - please don't go to unnecessary manual effort. :-) Dcoetzee (talk) 17:26, 28 June 2009 (UTC)
- I started a bot that that does this for the first 1600 images. It would be good if u do this with all your upcoming uploads. And you said 84000 images as a first batch. How many more batches are there? If it is possible for me to assist in the upload I would be glad to do so. Multichil also has a university connection or a very high speed connection I'm sure if we ask him kindly he would help in the upload. If we work together we can upload this in a week. And please don't add the images in the stereo card root category. Just in the Category:Robert N. Dennis collection of stereoscopic views.--Diaa abdelmoneim (talk) 17:49, 28 June 2009 (UTC)
- Unfortunately that may not be an option, depending on how fast the NYPL wants their servers hit. I can inquire about it. I can deal at least with the Robert N. Dennis collection right now, but other subcollections will have to wait until I see how many collections there are and how meaningful they are. Dcoetzee (talk) 17:53, 28 June 2009 (UTC)
- So should I keep categorizing the first 1600 images of the batch? I don't want there to be a double category or something. How many images do u upload daily? And how big of a PD collection do they have?--Diaa abdelmoneim (talk) 18:00, 28 June 2009 (UTC)
- No, I'll go back for them a bit later this week, don't worry. :-) And I'll check for any existing category so double categories will not occur. I upload roughly one image every 50 seconds or 1728 per day (this includes both the PNG and JPEG). I have no idea how large their complete PD collection is, and I don't think they do yet either. Dcoetzee (talk) 18:08, 28 June 2009 (UTC)
- So should I keep categorizing the first 1600 images of the batch? I don't want there to be a double category or something. How many images do u upload daily? And how big of a PD collection do they have?--Diaa abdelmoneim (talk) 18:00, 28 June 2009 (UTC)
- Unfortunately that may not be an option, depending on how fast the NYPL wants their servers hit. I can inquire about it. I can deal at least with the Robert N. Dennis collection right now, but other subcollections will have to wait until I see how many collections there are and how meaningful they are. Dcoetzee (talk) 17:53, 28 June 2009 (UTC)
- I started a bot that that does this for the first 1600 images. It would be good if u do this with all your upcoming uploads. And you said 84000 images as a first batch. How many more batches are there? If it is possible for me to assist in the upload I would be glad to do so. Multichil also has a university connection or a very high speed connection I'm sure if we ask him kindly he would help in the upload. If we work together we can upload this in a week. And please don't add the images in the stereo card root category. Just in the Category:Robert N. Dennis collection of stereoscopic views.--Diaa abdelmoneim (talk) 17:49, 28 June 2009 (UTC)
- I can take care of categorizing by source collection automatically if you wish - please don't go to unnecessary manual effort. :-) Dcoetzee (talk) 17:26, 28 June 2009 (UTC)
- Could the bot also categorize to location? Like in File:Camping_out,_from_Robert_N._Dennis_collection_of_stereoscopic_views.jpg the location being Michigan? --Diaa abdelmoneim (talk) 18:31, 28 June 2009 (UTC)
- The past couple of files have been very low res. Is this a mistake by the bot or are these really low res?--Diaa abdelmoneim (talk) 18:34, 28 June 2009 (UTC)
- Some files do not have SID files available from the NYPL - for these I upload the highest available resolution, which is about 700px wide. And yes, I may be able to extract the rough location from the Original Source field. For now I must go away but back later. :-) Dcoetzee (talk) 18:44, 28 June 2009 (UTC)
- There are till now about 230 files concerning the Union Pacific Railroad. Could u automatically add the category to it?--Diaa abdelmoneim (talk) 20:52, 28 June 2009 (UTC)
Looks like all images are now tagged with Category:Robert N. Dennis collection of stereoscopic views and {{Uncategorized}}. This seems like a good starting point to me, but i rather have a dedicated uncategorized template just like with Barch and Fotothek. Could you please tag the images with {{Uncategorized-NYPL}}. I'll create the remaining structure later this week. This will prevent your uploads from flooding the regular tree and messages like this one. Multichill (talk) 20:07, 29 June 2009 (UTC)
- Ok. The basics are there. If everyone agrees we only need to run a bot to change the old uploads (
replace.py -lang:commons -family:commons -transcludes:NYPL-image-full -regex -nocase "\{\{Uncategorized\|" "{{Uncategorized-NYPL|"
). Multichill (talk) 20:21, 29 June 2009 (UTC)
subject Categories
Could u or Multichil create a bot that automatically adds a temporary subject category to each file that would be checked and if correct be moved into a permanent category like what has been done with Fotothek or BArchive? I'm not sure we should wait till the first 80,000 images are up and then start cating. BTW the NYPL has started receiving funds again from the city of New York so they might stop throttling downloads. It would be beneficial if u would inquire about that.--Diaa abdelmoneim (talk) 20:22, 30 June 2009 (UTC)
- I'd be happy to do this but haven't seen this type of thing before - is there an example or description of this process somewhere? Many of these can (if nothing else) be automatically categorized into the category for the city where they were taken. Dcoetzee (talk) 22:15, 30 June 2009 (UTC)
- Commons:Fotothek has categories assigned to their files based on the description. In "Original source: " it is mostly written at the end what the subject or where the photo was taken. Dividing the image in such categories would make further categorization easier. So for example File:Camping_out,_from_Robert_N._Dennis_collection_of_stereoscopic_views.jpg has "Original source: Robert N. Dennis collection of stereoscopic views. / United States. / States / Michigan / Stereoscopic views of Lake Superior Scenery." You could grab from there "Stereoscopic views of Lake Superior Scenery" cause it's after a slash and before a bracket. The category would later be reviewed and approved by a user. The temp category would be "NYPL_Stereoscopic views of Lake Superior Scenery" This would serve as preliminary categories.--Diaa abdelmoneim (talk) 22:23, 30 June 2009 (UTC)
- That makes sense - incidentally, is there an easy way to merge a category into a different existing category? Will CommonsDelinker do this? For many of these the corresponding existing category is obvious, and automated merging would be desirable. Dcoetzee (talk) 22:40, 30 June 2009 (UTC)
- I'm currently automatically subcategorizing the images and placing the categories in Category:Temporary categories for images from the New York Public Library. I'm also updating the uncategorized tags and Robert N. Dennis category on my initial uploads. Dcoetzee (talk) 01:55, 1 July 2009 (UTC)
- See User:CommonsDelinker/commands/documentation#Categorize uncategorized images. Multichill (talk) 19:37, 1 July 2009 (UTC)
- Is it possible to have a template like the one found on http://commons.wikimedia.org/wiki/Category:Images_from_the_Deutsche_Fotothek,_location_Dresden ? so that it makes categorizing easier?--Diaa abdelmoneim (talk) 09:37, 2 July 2009 (UTC)
- That sounds like a good idea. However, I'd want to be sure first that CommonsDelinker recognizes the new Uncategorized-NYPL... Dcoetzee (talk) 10:55, 2 July 2009 (UTC)
- Dcoetzee, you should probably only add Uncategorized-NYPL if you can't don't have a proper temp category. This way we can just use the normal category move bots to move images from a temp cat to a proper topic category. Multichill (talk) 11:01, 2 July 2009 (UTC)
- Dcoetzee, can we delete a temp category once it's cleaned out or do you expect more images to go into these categories? Multichill (talk) 17:03, 2 July 2009 (UTC)
- That sounds like a good idea. However, I'd want to be sure first that CommonsDelinker recognizes the new Uncategorized-NYPL... Dcoetzee (talk) 10:55, 2 July 2009 (UTC)
- Is it possible to have a template like the one found on http://commons.wikimedia.org/wiki/Category:Images_from_the_Deutsche_Fotothek,_location_Dresden ? so that it makes categorizing easier?--Diaa abdelmoneim (talk) 09:37, 2 July 2009 (UTC)
- See User:CommonsDelinker/commands/documentation#Categorize uncategorized images. Multichill (talk) 19:37, 1 July 2009 (UTC)
- Commons:Fotothek has categories assigned to their files based on the description. In "Original source: " it is mostly written at the end what the subject or where the photo was taken. Dividing the image in such categories would make further categorization easier. So for example File:Camping_out,_from_Robert_N._Dennis_collection_of_stereoscopic_views.jpg has "Original source: Robert N. Dennis collection of stereoscopic views. / United States. / States / Michigan / Stereoscopic views of Lake Superior Scenery." You could grab from there "Stereoscopic views of Lake Superior Scenery" cause it's after a slash and before a bracket. The category would later be reviewed and approved by a user. The temp category would be "NYPL_Stereoscopic views of Lake Superior Scenery" This would serve as preliminary categories.--Diaa abdelmoneim (talk) 22:23, 30 June 2009 (UTC)
- For stereoscopic view #9466, I made a gallery with all 80 versions, i.e. 10 (files) * 2 (file types) * 2 (sterescopic) * 2 (it's "Mirror Lake"). I'm wondering if I should also put them into a specific category with 9466 in its name. -- User:Docu at 11:12, 25 April 2010 (UTC)
- Ideally we would also have at least one (non-stereoscopic) image selected from them. -- User:Docu at 11:18, 25 April 2010 (UTC)
- To make it possible to sort the files into topical categories without overwhelming them, I set the sortkey in the template. They now appear after other images, e.g. in Category:Mirror Lake (California). -- User:Docu at 11:12, 25 April 2010 (UTC)
NYPL and PD-Scan
Dcoetzee I'm a little unhappy with the way our images are tagged as PD-Scan only. Many of the images don't have their original publish date and someone who looks on the picture can't be sure if it's PD as there is no clear sign of it. For example File:Arch_on_St._George_Avenue,_from_Robert_N._Dennis_collection_of_stereoscopic_views.png has only "Digital item published 5-5-2005; updated 2-12-2009." which doesn't assert PD-old. There is an NYPL page about the collection which may hold clues about why the collection is PD. I think after we clear why the collection is PD we should create a template stating why it is PD, which goes along the PD scan. --Diaa abdelmoneim (talk) 10:58, 4 July 2009 (UTC)
- I agree, the NYPL image metadata does not generally contain sufficient metadata to clearly establish their copyright status. I have only the word of the NYPL that these are public domain, and they may not as be as conservative in evaluating copyright status as we are. I don't really want to filter them before upload though, because I'm fairly confident most of these actually are PD and are just missing the metadata to prove it. There are two things I can do here: I can fetch the "Imprint" date from the collection, and I can tag any images that do not have a clear indicator of copyright status for human review with Category:PD files for review. This could prove to be rather difficult though, because dates are specified in a variety of strange formats that are difficult to parse. Dcoetzee (talk) 22:15, 4 July 2009 (UTC)
- Or just an OTRS confirmation, or a rights information page on their site saying "no known restrictions". Don't tag anything please. I'm sure all images are PD but only need a legal confirmation.--Diaa abdelmoneim (talk) 22:19, 4 July 2009 (UTC)
- As far as I know OTRS is inappropriate for public domain images - that's for the copyright holder confirming that they've released a work, and NYPL is not the copyright holder. Their copyright status will need to be confirmed based on the available information, and PD review has already agreed to help me with kind of thing in the past. As for "no known restrictions", every one of these image description pages says that in its HTML metadata - their evaluation can't be trusted. Dcoetzee (talk) 23:30, 4 July 2009 (UTC)
- Or just an OTRS confirmation, or a rights information page on their site saying "no known restrictions". Don't tag anything please. I'm sure all images are PD but only need a legal confirmation.--Diaa abdelmoneim (talk) 22:19, 4 July 2009 (UTC)
- Status
What's the status of this upload? Multichill (talk) 12:29, 17 September 2009 (UTC)
- Sorry for the delay. I'm working on getting a Toolserver account so I can continue the upload with my existing tools and Mono, or with a rewrite of the tools. It should be able to pick up right where I left off. I don't have enough bandwidth at home to do the upload. Dcoetzee (talk) 08:48, 25 September 2009 (UTC)
- Any update ?--Diaa abdelmoneim (talk) 12:42, 11 December 2009 (UTC)
- The NYPL upgraded their software and it's no longer possible with the new default settings to download the images in the same manner in which I originally did, so I've been forced to suspend progress on this. I asked Josh from NYPL about this and on Jan 15 and he said: "No progress on that front, but I might actually be able to open another door within the next month or so (might just be able to get you direct access to a batch of jpg full-res derivatives)...will follow up with details soon..." Dcoetzee (talk) 12:32, 31 January 2010 (UTC)
- I just checked and at some point in the last few months the NYPL listened and re-enabled the SID interface, allowing this upload to continue, so I'm starting it back up. Dcoetzee (talk) 00:02, 16 April 2010 (UTC)
- Finally!!! Please change the status to uploading when u do. =) Congrats.--Diaa abdelmoneim (talk) 07:54, 16 April 2010 (UTC)
- Done :-) I can only upload at a fast rate when I'm at school since my upload bandwidth at home sucks - but I'm there pretty often and my updated tool uploads at a rate of about 5-6 image pairs per minute there. Dcoetzee (talk) 10:12, 16 April 2010 (UTC)
- Another small update on this - it turns out I've only been uploading the fronts of these cards, and not the back. This is probably a good thing, since the backs are usually just blank with a bit of writing, and not nearly as useful for educational purposes. Because of this, there are actually only 42,000 images, not 84,000, in the stereographic collection. Dcoetzee (talk) 00:42, 17 April 2010 (UTC)
- Finally!!! Please change the status to uploading when u do. =) Congrats.--Diaa abdelmoneim (talk) 07:54, 16 April 2010 (UTC)
- Still working on this upload. I'm bandwidth-limited at the moment so it's taking quite a while. It's probably more than half done. Dcoetzee (talk) 02:07, 9 November 2010 (UTC)
-
- I've now finished all of the stereographic views from the New York Public Library that were supplied by Josh Greenberg. I will contact Josh to see what other images he has to offer. I'm open at this point to feedback about how I can improve the process (besides obviously uploading images more quickly - I think this is a good time to port the tool to Toolserver). I'm also considering uploading only high-quality JPEGs, instead of both a JPEG and a PNG version. Let me know what you think. Dcoetzee (talk) 07:19, 14 November 2010 (UTC)
Hello. What is the status? InverseHypercube (talk) 05:51, 2 April 2011 (UTC)
This project has probably seen its better days, but I found this through searching for NYPL images. I tried all the parameters given in the LizardTech Express 8 manual with this image http://digitalgallery.nypl.org/nypldigital/id?1527362, but I simply am not able to download it. Even doing what the manual says on downloading the file itself (getitem?cat=*&item=*.sid) or with the parameters of width and height, I only get "Invalid dimensions".
I'm especially interested in all the New York real estate maps, including the famous Sanborn Maps, a collection with unprecedented detail of buildings throughout the years.
The system NYPL uses is almost misanthropic. If the data is free and open, then it shouldn't be behind artifically restrictive systems. And then put a fee on their own file-acquisition service.
I'd be glad to help, but unfortunately I have no idea how to code a bot, and I have very little coding skills. As to file formats, I'm partial to retaining the highest possible quality. Barring TIFF's, a lossless PNG would be the next choice. ~ Nelg (talk) 23:25, 31 March 2013 (UTC)
[edit] Minerals from various sources on mindat.org
Besides Rob Lavinsky other uploaders to mindat.org have either published their work on a free and usable license or have granted an OTRS permission to their images.
This upload will mostly use the same procedures, categories and file names as the upload of Rob Lavinsky's pictures from mindat.org.
[edit] Progress of the request (failed, uploading, coding, done)
Assigned to | Progress | Bot name | Category |
---|---|---|---|
Reinhard Kraasch | in progress | RKBot | Category:Files by Leon Hupperichs Category:Files by Christian Rewitzer from mindat |
[edit] Quantity structure
- 200 files by Christian Rewitzer
- 702 files by Leon Hupperichs
(several of them have already been uploaded)
[edit] Problems
None so far.
[edit] Test upload
[edit] Comments
- Hi Reinhard, wonderful pictures ;-), but I think, it's better, to have the otrs-permission into the field "Permission" and the mineral category (here: Category:Sonoraite) is missing. -- Ra'ike T C 22:31, 26 March 2011 (UTC)
- I used the same script as with the Lavinsky upload - so the description page looks almost the same as with these images, e.g.: File:Spodumene-18945.jpg. The missing mineral categories can be fixed by the bot (as with the Lavinsky upload), but I guess it's easier to do it by hand, since it will probably only very few categories. --Reinhard Kraasch (talk) 20:36, 6 April 2011 (UTC)
[edit] Details
Assigned to | Job | Status | Comments |
---|---|---|---|
Reinhard Kraasch | Image (and description) download from mindat.org | Status: Done 12:24, 15 March 2010 (UTC) | All mindat.org images have been downloaded |
Reinhard Kraasch | Generate image descriptions, autotranslate locality info | Status: Done 19:45, 21 March 2010 (UTC) | |
Reinhard Kraasch | Generate and autotranslate category info | Status: Done 19:45, 21 March 2010 (UTC) | |
Reinhard Kraasch | Test upload | Status: Done 21:38, 26 March 2011 (UTC) | (2 images) |
Various | Discussion of test upload | Status: Done 19:43, 10 April 2011 (UTC) | |
Reinhard Kraasch | Actual image upload | Status: Done 21:35, 10 April 2011 (UTC) | |
Reinhard Kraasch | Identify duplicates | Status: Done 19:39, 12 April 2011 (UTC) | |
Reinhard Kraasch | Generate missing categories | Status: Done 21:17, 12 April 2011 (UTC) |
[edit] Results
- Image duplicates: User:Reinhard Kraasch/Duplicate minerals
[edit] Anefo
A set of 140.000 Dutch press photos from the period 1959-1989, made available under CC-BY-SA license by the Nationaal Archief, at [31]. The upload contains a (Dutch) description, date and licensing information, and some suggested categories, which are however in Dutch and often not very useful anyway. All images are put in Category:Images from Anefo and Category:Uncategorized images from Anefo.
[edit] Opinions
- Please read Commons:Guide to batch uploading
- Change the naming of the files to <title> Anefo <id> and don't make the titles so short. Current names mess up sorting (everything at A)
- Use {{nl}} (Update, you should use it like
{{nl|1=}}
) - Include deep links to the originals (id should resolve to a location)
- Use the attribution field of {{cc-by-sa-3.0-nl}} to include the correct attribution
- You should probably make a Partnership template
- Some statistics on the subjects would be nice for easy mapping to real categories. You should probably not mix subject and coverage. Doing 140.000 files by had is too much. You do that by making a list of top subject and a list of top coverage. You ask users to map these to real categories (where possible) and use that at upload. That's a lot less work than doing everything by hand.
- Please apply these corrections to the already uploaded files Multichill (talk) 12:32, 17 June 2012 (UTC)
- I don't think file naming and sorting is much of an issue. Other batch uploads are done in a similar way.
- Please use
{{nl|1=}}
rather than {{nl}}. - I mapped some of the suggested topics at Category:Anefo temporary redirects. The redirect bot would eventually move them to the correct categories. Once the upload is finished, these can be deleted. -- Docu at 13:17, 17 June 2012 (UTC)
- Taking File:Anefo 910-6881 Blauw-Wit tegen.jpg as an example.
- The identifier field contains "naa:2f7593fe-b9a1-11df-ba8e-03c82bd9ba46:a99dfe84-d0b4-102d-bcf8-003048976d84". If you chop off the first part you get the UUID and you use {{Nationaal Archief-source}} to make a deeplink like http://proxy.handle.net/10648/a99dfe84-d0b4-102d-bcf8-003048976d84
- The author (Pot, Harry / Anefo) is missing. Multichill (talk) 16:48, 17 June 2012 (UTC)
Assigned to | Progress | Bot name | Category |
---|---|---|---|
Andre Engels | running, about 1500 done (17-06-2012) | Robbot | Category:Uncategorized images from Anefo |
[edit] Batch uploads on hold
[edit] Past batch uploads
Date | Name (Subpage) | Description | Images | Scripter | Uploader | Script | Category | File naming |
---|---|---|---|---|---|---|---|---|
10,000 paintings from Directmedia | 10,000 public domain images digitized by the Yorck project and contributed to commons | 10,000 | Eloquence | File Upload Bot (Eloquence) | PD-Art (Yorck Project) | |||
Picswiss project | Roland Zumbühl agreed on releaseing his images as GFDL, depicting various areas and subjects in Switzerland. | 5,000 of 13,000 | Dake | Dake | Images from Picswiss | |||
Bundesarchiv | From the German Federal Archive, the images depict Germany between the 19th and 20th century including valuable photographs of the Nazi era and World War II. | 100,000 | Duesentrieb | BArchBot | Information fetch | Images from the German Federal Archive | Bundesarchiv <id>, <desc> | |
Starr images | Images of plants of Hawaii | 60,000 | Multichill | Multichill | Images from Forest & Kim Starr | Starr <date>-number <taxon/desc> | ||
Wenceslas Hollar Digital Collection | A collection of 2700 high resolution images of engravings of Wenceslas Hollar, about 90% of his life works | 2,700 | Dcoetzee | Dcoetzee | University of Toronto Wenceslas Hollar Digital Collection | |||
National Portrait Gallery | Various portraits of famous people between the 16th and 19th century. | 3,000 | Dcoetzee | Dcoetzee | National Portrait Gallery, London | |||
Deutsche Fotothek | Images from Deutsche Fotothek mainly about east Germany between the 19th and 20th century including the Bombardment of Dresden and other events. Only 25% of the images have been uploaded till now. | 62,011 of 250,000 | Multichill | FotothekBot | Tools used | Images from the Deutsche Fotothek | Fotothek <id> <desc> | |
Berger Collection | A collection of high resolution images of paintings and other works from the Berger Collection, depicting British art, culture and people. | 140 | Dcoetzee | Dcoetzee | Berger Collection | |||
Great Images in NASA | Images from Great Images in NASA | 1,400 | TheDJ | Multichill | Great Images in NASA | |||
Alaska-Yukon-Pacific Exposition of 1909 | High-resolution scans of documents from the Alaska-Yukon-Pacific Exposition found here. | 700 | Dcoetzee | Dcoetzee | Alaska-Yukon-Pacific Exposition | |||
Commanster | Pictures of plants, animals, birds and insects of Commanster, Belgium by James Lindsey | 6,000 | Sarefo | Sarefo | Pictures by James Lindsey | |||
WLANL | Images from Wiki Loves art Netherland imported from the flickr group pool, depicting Netherland and its different museums. | 4,000 | Multichill | BotMultichillT | Images from Wiki Loves Art Netherlands | WLANL - <team> - <desc> | ||
FEMA site | All the images found on US Federal Emergency Management Agency Disaster Photo Librarywas copied to Commons, depicting US environmental disasters and emergency actions. | 20,000 | Multichill | BotMultichillT | script | PD US FEMA | FEMA - <id> - Photograph by <photographer> taken on <date> in <location> | |
AntWeb images | All the images found on http://www.antweb.org/ depicting different species of ants. | 32,000 | Dave Thau | File Upload Bot (AntWeb) | Images from AntWeb | <desc> <specimenID> profile <viewnumber> | ||
Images of erosion | All the images found on http://picasaweb.google.com/VolkerPrasuhn depicting erosions. | 700 | Leyo | manual | Images by Volker Prasuhn | |||
livepict.com | All the images found on http://livepict.com/ depicting bands. | 1000 | Justass | Justass | Images from LivePict | |||
Tropenmuseum | A partnership with Tropenmuseum | 40,000 | Multichill | KITbot | svn | Images from the Tropenmuseum | COLLECTIE TROPENMUSEUM <desc> TMnr <id> | |
Randolph Caldecott | All pages in The complete collection of pictures & songs / by Randolph Caldecott | 510 | Diaa abdelmoneim | Dudubot | upload.py | The complete collection of pictures & songs by Randolph Caldecott | Randolph Caldecott collection-page <page> | |
Rob Lavinsky | Mineral images from Rob Lavinsky on mindat.org | 34,917 | Reinhard Kraasch | RKBot | upload.py + pyodbc | Images by Rob Lavinsky | <mineral1>[-<mineral2>[<mineral3>]]-<mindatID> | |
Rob Lavinsky | Mineral images from Rob Lavinsky on irocks.com | 20,582 | Reinhard Kraasch | RKBot | upload.py + pyodbc | Images by Rob Lavinsky | <mineral1>[-<mineral2>[<mineral3>]]-<irocks file name> | |
Bibliothèque Nationale de France | Books provided by the Bibliothèque Nationale de France (French National Library) as part of a partnership with Wikimédia France | 1,413 | Seb35 (with help from Plyd and Jean-Fred) | BnF import, operated by Tim Starling | svn | Books provided by the BNF | <Author> - <Title>.djvu | |
Erling Mandelmann | Portraits of notable people donated from Erling Mandelmann | 581 | Diaa abdelmoneim | Dudubot | Photographs by Erling Mandelmann | <Title> - <Author> | ||
Travelers in the Middle East Archiven | Historical images from books about the Middle East from Travelers in the Middle East Archive, provided by Rice University | 2,277 | Diaa abdelmoneim | Dudubot | Images from the Travelers in the Middle East Archive | "<Title>" (<Year>) - TIMEA | ||
Fonds Eugène Trutat | Photographs by famous French photographer Eugène Trutat, donated by the City Archives of Toulouse as part of a partnership with Wikimédia France | 200 | Jean-Frédéric | TrutatBot | GitHub | Fonds Trutat - Archives municipales de Toulouse | <Title> (<Year>) - <Id> - Fonds Trutat | |
Nordiska Museet | A collection of early photographs, donated by Nordiska Museet as part of a collaboration with Wikimedia Sverige. | 1,000 | Prolineserver | NordiskaMuseetBot | Toolserver | Images from Nordiska museet | <Title> - Nordiska Museet - <Id>.jpg | |
Web Gallery of Art | Large collection of well documented artworks. Uploaded ~15k new files and synchronization metadata for ~6k already uploaded files | 21,700 | Jarekt | JarektUploadBot | UploadWGA.py FixWGAMetadataInfo.py FixWGAMetadataArt.py |
Images from Web Gallery of Art | <Author> - <Title> - WGA<ID>.jpg | |
Commons:Chris's Acorns | Large collection of Acorn computer hardware and peripherals from Chris's Acorns | 1700 | Smallman12q | Smallbot | C#4 w/ LINQ and MSHTML interop | Chris's Acorns | just filename...no format | |
Walters Art Museum | Collection of 3D and 2D artworks from around the world | 19,000 | Kaldari | File Upload Bot (Kaldari) | modified botclasses.php | Media contributed by the Walters Art Museum | <Author> - <Title> - Walters <ID> - <View>.jpg | |
Commons:Bible Illustrations | Bible illustrations | 2993 | Smallman12q | OrophinBot | VBScript, XHR, XMLDOM, MSHTML, COM | Media contributed by the Sweet Publishing | <name> <chapter>-<section> (Bible Illustrations by Sweet Media).jpg | |
Flora Batava | Illustrations of all plants in the Netherlands | 1582 | Rillke | FloraUploadR | own implementation using VB6/COM/C++ | Files uploaded from Flora Batava by FloraUploadR | <latin plant name> — Flora Batava — Volume v<number>.jpg | |
Commons:Bots/Requests/Smallbot 2 | Oregon Historical County Records Guide | 4273 | Smallman12q | Smallbot | VBScript, XHR, XMLDOM, MSHTML, COM | Category:Images_from_Oregon_Historical_County_Records_Guide | <name> (<Countyname> County, Oregon scenic images) (<id>).jpg | |
The World's Columbian Exposition | PD-Photos of the The World's Columbian Exposition | 115 | Rillke | RillkeBot | own implementation using VB6/COM/C++ | World Columbian Exposition taken by Press Chicago Photo-Gravure Co. | <caption> — Official Views Of The World's Columbian Exposition — <file number>.jpg | |
Defenselink | Defense.gov News Photos | 14572 | Slick | Slick-o-bot | pywikipediabot and some bash scripts | Defense.gov News Photos to check | Defense.gov News Photo <VRIN>[ - description].jpg | |
U.S. Army Map Service | Maps of India and Pakistan from the U.S. Army Map Service | 304 | Slick | Slick-o-bot | pywikipediabot and some bash scripts | India maps by U.S. Army Map Service | Map India and Pakistan 1-250,000 Tile <tile name>.jpg | |
Defense.gov Photo Essays | Defense.gov Photo Essays | 23106 | Slick | Slick-o-bot | pywikipediabot and some bash scripts | Category:Defense.gov photo essays to check | Defense.gov photo essay <VRIN>.jpg | |
Navy SEAL pics and vids | Navy SEAL pics and vids | 682 | Slick | Slick-o-bot | pywikipediabot and some bash scripts | Category:United States Navy SEALs Images to check | United States Navy SEALs <NUMBER>.jpg | |
Beaverton, Oregon Historical Photo Gallery | Beaverton, Oregon Historical Photo Gallery | 305 | Smallman12q | Smallbot | VBScript, XHR, XMLDOM, MSHTML, COM | Category:Beaverton, Oregon Historical Photo Gallery | <name> (Beaverton, Oregon Historical Photo Gallery) (<number>).jpg | |
ForestWander | Mostly nature photos from West Virginia | 2600 | Rillke | Forestwander Nature Photography upload bot | own implementation using VB6/COM/C++ | Category:Bot-uploaded files from Forestwander Nature Photography | <name> - [West Virginia|Virginia] - ForestWander.jpg | |
Navy SEAL pics and vids | U.S. Navy SEALs pictures and videos | 681 pics, 56 vids | Slick | Slick-o-bot | pywikipediabot and some bash scripts | Category:United States Navy SEALs Images to check Category:United States Navy SEALs Videos to check | images: United States Navy SEALs <number>.jpg, videos: different | |
Umair Zafar fashion shoot | Umair Zafar fashion shoot | 91 | Slick | Slick-o-bot | pywikipediabot and some bash scripts | Category:Images from Umair Zafar fashion shoot to check | different | |
New Orleans Bee | New Orleans Bee | 136667 | Slick | Slick-o-bot | pywikipediabot and some bash scripts | Category:The_New_Orleans_Bee_by_year | The New Orleans Bee <year> <month> <number>.pdf | |
Brooklyn Museum | Brooklyn Museum | 3629 | Slick | Slick-o-bot | pywikipediabot and some bash scripts | Category:African art in the Brooklyn Museum | Brooklyn Museum <ID> <SHORT DESC>.jpg | |
U.S. Marines Corps | U.S. Marines Corps | 77288 | Slick | Slick-o-bot | pywikipediabot and some bash scripts | Category:Marines.mil_images_to_check | USMC-<NUMBER>.jpg or USMC-<VRIN>.jpg | |
Photographic History of the Civil War | Photographic History of the Civil War | 3668 | Mattwj2002, Slick | Mattwj2002, Slick-o-bot | pywikipediabot and some bash scripts | Category:The_Photographic_History_of_The_Civil_War | The Photographic History of The Civil War Volume <VOLUME> Page <NUMBER>.jpg |
- Commons:Batch uploading/Web Gallery of Art
- Commons:Batch uploading/Monument lists
- Commons:Batch uploading/Flickr Fotostream of NOAA Photo Library
[edit] Failed
- Commons:Batch uploading/Ekta Media, not done
- Flickr Imre Solt collection (April 2009) has been denied because the UAE doesn't have FOP laws which result in most image being copyvios.
- Commons:Batch uploading/Modern Egypt Digital Archive (April 2009) Egyptian copyright doesn't have a limit for copyright of photographs, only that it becomes pd 50 years after the author is dead. Not enough images for a batch.
- Commons:Batch uploading/Images from LIFE (June 2009) Most of the images didn't have a clear copyright label.
- Commons:Batch uploading/Gathering the Jewels (September 2009) Images don't appear to be free.
- Commons:Batch uploading/Staffordshire Gold Hoard (en.Wikipedia front page news) (September 2009) the images were quickly changed from Share Alike to Non-commercial on the same day.
- Commons:Batch uploading/World War II in Africa from Flickr user gbaku (September 2001) User wasn't author of the album, only purchased the images.
- Commons:Batch uploading/Kartrummet (February 2010) Website did not show interest for partnership, license verifications not possible.
- Commons:Batch uploading/Dermnet (March 2010) Owner of the website doesn't own the images.
- EVDeportes (July 2010) Already uploaded on commons.
- Commons:Batch uploading/Sir William MacArthur Botanical Images
- Commons:Batch uploading/Spanking Art Wiki
- Commons:Batch uploading/Media of "banco de imágenes" of Ministry of Education of Spain cc-by-nc
- Commons:Batch uploading/WWII unclear situation of authorship
- Commons:Batch uploading/beeldengeluidwiki unclear situation of authorship