r107985 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r107984‎ | r107985 | r107986 >
Date:02:13, 4 January 2012
Author:diederik
Status:deferred (Comments)
Tags:
Comment:
UDP filter for the GLAM NARA project.
Modified paths:
  • /trunk/udplog/filters/glam.py (added) (history)

Diff [purge]

Index: trunk/udplog/filters/glam.py
@@ -0,0 +1,33 @@
 2+import re
 3+import sys
 4+nara = re.compile('_(NARA|National_Archives)_.*\.(jpg|tif)')
 5+
 6+'''
 7+This is a simple GLAM UDP filter that will filter images that are part of
 8+the National Archives and Records Administration project.
 9+For questions, email Diederik van Liere at dvanliere@wikimedia.org
 10+
 11+'''
 12+
 13+
 14+def test():
 15+ t1 = 'http://commons.wikimedia.org/wiki/File:German_prisoners_in_a_French_prison_camp._French_Pictorial_Service.,_1917_-_1919_-_NARA_-_533724.tif'
 16+ t2 = 'http://commons.wikimedia.org/wiki/File:Aerial_view_of_ruins_of_Vaux,_France,_1918,_ca._03-1918_-_ca._11-1918_-_NARA_-_512862.tif'
 17+ t3 = 'http://commons.wikimedia.org/wiki/File:The_patient%27s_skin_is_burned_in_a_pattern_corresponding_to_the_dark_portions_of_a_kimono_-_NARA_-_519686.jpg'
 18+ t4 = 'http://commons.wikimedia.org/wiki/File:Ansel_Adams_-_National_Archives_79-AAB-02.jpg'
 19+ t5 = 'http://commons.wikimedia.org/wiki/File:THE_CENTRAL_POLICE_CONTROL_STATION,_MANNED_24_HOURS_A_DAY_CONTROLS_ALL_TRAFFIC_LIGHTS,_RECEIVES_REMOTE_TV_INPUTS_FROM..._-_NARA_-_551905.tif'
 20+
 21+ pics = [t1,t2,t3,t4,t5]
 22+ for pic in pics:
 23+ match = re.search(nara, pic)
 24+
 25+def main():
 26+ while True:
 27+ line = sys.stdin.readline()
 28+ match = re.search(nara, pic)
 29+ if match:
 30+ sys.stdout.write(line)
 31+
 32+if __name__ == '__main__':
 33+ #test()
 34+ main()
\ No newline at end of file
Property changes on: trunk/udplog/filters/glam.py
___________________________________________________________________
Added: svn:eol-style
135 + native
Added: svn:mime-type
236 + text/plain

Comments

#Comment by Multichill (talk | contribs)   20:33, 4 January 2012

That's fast! The regular expression can be a bit more precise if you like: '_-_NARA_-_\d+\.(jpg|tif)'

The links in def test() are not actual images, but image pages. Is this intentional? I would expect something like 'http://upload.wikimedia.org/wikipedia/commons/f/f8/German_prisoners_in_a_French_prison_camp._French_Pictorial_Service.%2C_1917_-_1919_-_NARA_-_533724.tif'

match = re.search(nara, pic) should probably be match = re.search(nara, line)

Status & tagging log