r70860 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r70859‎ | r70860 | r70861 >
Date:08:30, 11 August 2010
Author:bawolff
Status:deferred (Comments)
Tags:
Comment:
Read xmp data in png files

* Also make xmp recognize tiff and aux properties.
* some cleanup of BitmapMetadataHandler.
Modified paths:
  • /branches/img_metadata/phase3/includes/AutoLoader.php (modified) (history)
  • /branches/img_metadata/phase3/includes/DefaultSettings.php (modified) (history)
  • /branches/img_metadata/phase3/includes/Exif.php (modified) (history)
  • /branches/img_metadata/phase3/includes/media/Bitmap.php (modified) (history)
  • /branches/img_metadata/phase3/includes/media/BitmapMetadataHandler.php (modified) (history)
  • /branches/img_metadata/phase3/includes/media/Generic.php (modified) (history)
  • /branches/img_metadata/phase3/includes/media/Jpeg.php (modified) (history)
  • /branches/img_metadata/phase3/includes/media/JpegMetadataExtractor.php (added) (history)
  • /branches/img_metadata/phase3/includes/media/PNG.php (modified) (history)
  • /branches/img_metadata/phase3/includes/media/PNGMetadataExtractor.php (modified) (history)
  • /branches/img_metadata/phase3/includes/media/XMPInfo.php (modified) (history)
  • /branches/img_metadata/phase3/includes/media/XMPValidate.php (modified) (history)
  • /branches/img_metadata/phase3/languages/messages/MessagesEn.php (modified) (history)
  • /branches/img_metadata/phase3/maintenance/language/messageTypes.inc (modified) (history)
  • /branches/img_metadata/phase3/maintenance/language/messages.inc (modified) (history)

Diff [purge]

Index: branches/img_metadata/phase3/maintenance/language/messages.inc
@@ -2604,7 +2604,6 @@
26052605 'exif-stripbytecounts',
26062606 'exif-jpeginterchangeformat',
26072607 'exif-jpeginterchangeformatlength',
2608 - 'exif-transferfunction',
26092608 'exif-whitepoint',
26102609 'exif-primarychromaticities',
26112610 'exif-ycbcrcoefficients',
@@ -2734,6 +2733,9 @@
27352734 'exif-datetimereleased',
27362735 'exif-originaltransmissionref',
27372736 'exif-identifier',
 2737+ 'exif-lens',
 2738+ 'exif-serialnumber',
 2739+
27382740 ),
27392741 'exif-values' => array(
27402742 'exif-make-value',
Index: branches/img_metadata/phase3/maintenance/language/messageTypes.inc
@@ -386,7 +386,6 @@
387387 'exif-stripbytecounts',
388388 'exif-jpeginterchangeformat',
389389 'exif-jpeginterchangeformatlength',
390 - 'exif-transferfunction',
391390 'exif-whitepoint',
392391 'exif-primarychromaticities',
393392 'exif-ycbcrcoefficients',
@@ -647,4 +646,7 @@
648647 'exif-dc-rights',
649648 'exif-dc-source',
650649 'exif-dc-type',
 650+ 'exif-lens',
 651+ 'exif-serialnumber',
 652+
651653 );
Index: branches/img_metadata/phase3/includes/Exif.php
@@ -139,7 +139,7 @@
140140 'JPEGInterchangeFormatLength' => Exif::SHORT.','.Exif::LONG, # Bytes of JPEG data
141141
142142 # Tags relating to image data characteristics
143 - 'TransferFunction' => Exif::SHORT, # Transfer function
 143+ 'TransferFunction' => Exif::IGNORE, # Transfer function
144144 'WhitePoint' => array( Exif::RATIONAL, 2), # White point chromaticity
145145 'PrimaryChromaticities' => array( Exif::RATIONAL, 6), # Chromaticities of primarities
146146 'YCbCrCoefficients' => array( Exif::RATIONAL, 3), # Color space transformation matrix coefficients #p27
@@ -1382,6 +1382,8 @@
13831383 case 'dc-rights':
13841384 case 'dc-source':
13851385 case 'dc-type':
 1386+ case 'Lens':
 1387+ case 'SerialNumber':
13861388
13871389 $val = htmlspecialchars( $val );
13881390 break;
Index: branches/img_metadata/phase3/includes/media/BitmapMetadataHandler.php
@@ -3,8 +3,9 @@
44 Class to deal with reconciling and extracting metadata from bitmap images.
55 This is meant to comply with http://www.metadataworkinggroup.org/pdf/mwg_guidance.pdf
66
7 -@todo finish IPTC
8 -@todo xmp
 7+This sort of acts as an intermediary between MediaHandler::getMetadata
 8+and the various metadata extractors.
 9+
910 @todo other image formats.
1011 */
1112 class BitmapMetadataHandler {
@@ -12,118 +13,21 @@
1314 // the max segment is a sanity check.
1415 // A jpeg file should never even remotely have
1516 // that many segments. Your average file has about 10.
16 - private $filetype;
1717 private $filename;
1818 private $metadata = Array();
1919 private $metaPriority = Array(
2020 20 => Array( 'other' ),
2121 40 => Array( 'file-comment' ),
22 - 60 => Array( 'iptc-bad-hash' ),
 22+ 60 => Array( 'iptc-good-hash', 'iptc-no-hash' ),
2323 70 => Array( 'xmp-deprected' ),
2424 80 => Array( 'xmp-general' ),
2525 90 => Array( 'xmp-exif' ),
26 - 100 => Array( 'iptc-good-hash', 'iptc-no-hash' ),
 26+ 100 => Array( 'iptc-bad-hash' ),
2727 120 => Array( 'exif' ),
2828 );
2929 private $iptcType = 'iptc-no-hash';
3030
31 - /** Function to extract metadata segmants of interest from jpeg files
32 - * based on GIFMetadataExtractor.
33 - *
34 - * we can almost use getimagesize to do this
35 - * but gis doesn't support having multiple app1 segments
36 - * and those can't extract xmp on files containing both exif and xmp data
37 - *
38 - * I'm not sure if this should be in a class of its own, like GIFMetadataExtractor
39 - *
40 - * @param String $filename name of jpeg file
41 - * @return Array of interesting segments.
42 - * Can throw an exception if given invalid file.
43 - */
44 - function jpegSegmentSplitter () {
45 - $filename = $this->filename;
46 - $segmentCount = 0;
47 - if ( $this->filetype !== 'image/jpeg' ) throw new MWException( "jpegSegmentSplitter called on non-jpeg" );
48 -
49 - $segments = Array( 'XMP_ext' => array(), 'COM' => array() );
50 -
51 - if ( !$filename ) throw new MWException( "No filename specified for BitmapMetadataHandler" );
52 - if ( !file_exists( $filename ) || is_dir( $filename ) ) throw new MWException( "Invalid file $filename passed to BitmapMetadataHandler" );
53 -
54 - $fh = fopen( $filename, "rb" );
55 -
56 - if ( !$fh ) throw new MWException( "Could not open file $filename" );
57 -
58 - $buffer = fread( $fh, 2 );
59 - if ( $buffer !== "\xFF\xD8" ) throw new MWException( "Not a jpeg, no SOI" );
60 - while ( !feof( $fh ) ) {
61 - $buffer = fread( $fh, 1 );
62 - $segmentCount++;
63 - if ( $segmentCount > self::MAX_JPEG_SEGMENTS ) {
64 - // this is just a sanity check
65 - throw new MWException('Too many jpeg segments. Aborting');
66 - }
67 - if ( $buffer !== "\xFF" ) {
68 - throw new MWException( "Error reading jpeg file marker" );
69 - }
70 -
71 - $buffer = fread( $fh, 1 );
72 - if ( $buffer === "\xFE" ) {
73 - // COM section -- file comment
74 - // First see if valid utf-8,
75 - // if not try to convert it to windows-1252.
76 - $com = $oldCom = trim( self::jpegExtractMarker( $fh ) );
77 - UtfNormal::quickIsNFCVerify( $com );
78 - //turns $com to valid utf-8.
79 - //thus if no change, its utf-8, otherwise its something else.
80 - if ( $com !== $oldCom ) {
81 - $oldCom = iconv( 'windows-1252', 'UTF-8//IGNORE', $oldCom );
82 - }
83 - $segments["COM"][] = $oldCom;
84 -
85 - } elseif ( $buffer === "\xE1" ) {
86 - // APP1 section (Exif, XMP, and XMP extended)
87 - $temp = self::jpegExtractMarker( $fh );
88 -
89 - // check what type of app segment this is.
90 - if ( substr( $temp, 0, 29 ) === "http://ns.adobe.com/xap/1.0/\x00" ) {
91 - $segments["XMP"] = substr( $temp, 29 );
92 - } elseif ( substr( $temp, 0, 35 ) === "http://ns.adobe.com/xmp/extension/\x00" ) {
93 - $segments["XMP_ext"][] = substr( $temp, 35 );
94 - }
95 - } elseif ( $buffer === "\xED" ) {
96 - // APP13 - PSIR. IPTC and some photoshop stuff
97 - $temp = self::jpegExtractMarker( $fh );
98 - if ( substr( $temp, 0, 14 ) === "Photoshop 3.0\x00" ) {
99 - $segments["PSIR"] = $temp;
100 - }
101 - } elseif ( $buffer === "\xD9" || $buffer === "\xDA" ) {
102 - // EOI - end of image or SOS - start of scan. either way we're past any interesting segments
103 - return $segments;
104 - } else {
105 - // segment we don't care about, so skip
106 - $size = unpack( "nint", fread( $fh, 2 ) );
107 - if ( $size['int'] <= 2 ) throw new MWException( "invalid marker size in jpeg" );
108 - fseek( $fh, $size['int'] - 2, SEEK_CUR );
109 - }
110 -
111 - }
112 - // shouldn't get here.
113 - throw new MWException( "Reached end of jpeg file unexpectedly" );
114 -
115 - }
11631 /**
117 - * Helper function for jpegSegmentSplitter
118 - * @param &$fh FileHandle for jpeg file
119 - * @return data content of segment.
120 - */
121 - private function jpegExtractMarker( &$fh ) {
122 - $size = unpack( "nint", fread( $fh, 2 ) );
123 - if ( $size['int'] <= 2 ) throw new MWException( "invalid marker size in jpeg" );
124 - return fread( $fh, $size['int'] - 2 );
125 - }
126 -
127 - /**
12832 * This does the photoshop image resource app13 block
12933 * of interest, IPTC-IIM metadata is stored here.
13034 *
@@ -132,107 +36,13 @@
13337 * @param String $app13 String containing app13 block from jpeg file
13438 */
13539 private function doApp13 ( $app13 ) {
136 - $this->doPSIR( $app13 );
 40+ $this->iptcType = JpegMetadataExtractor::doPSIR( $app13 );
13741
13842 $iptc = IPTC::parse( $app13 );
13943 $this->addMetadata( $iptc, $this->iptcType );
14044 }
14145
142 - /**
143 - * This reads the photoshop image resource.
144 - * Currently it only compares the iptc/iim hash
145 - * with the stored hash, which is used to determine the precedence
146 - * of the iptc data. In future it may extract some other info, like
147 - * url of copyright license.
148 - *
149 - * This should generally be called by doApp13()
150 - *
151 - * @param String $app13 photoshop psir app13 block from jpg.
152 - */
153 - private function doPSIR ( $app13 ) {
154 - if ( !$app13 ) return;
155 - // First compare hash with real thing
156 - // 0x404 contains IPTC, 0x425 has hash
157 - // This is used to determine if the iptc is newer than
158 - // the xmp data, as xmp programs update the hash,
159 - // where non-xmp programs don't.
16046
161 - $offset = 14; // skip past PHOTOSHOP 3.0 identifier. should already be checked.
162 - $appLen = strlen( $app13 );
163 - $realHash = "";
164 - $recordedHash = "";
165 -
166 - // the +12 is the length of an empty item.
167 - while ( $offset + 12 <= $appLen ) {
168 - $valid = true;
169 - $id = false;
170 - $lenName = false;
171 - $lenData = false;
172 -
173 - if ( substr( $app13, $offset, 4 ) !== '8BIM' ) {
174 - // its supposed to be 8BIM
175 - // but apperently sometimes isn't esp. in
176 - // really old jpg's
177 - $valid = false;
178 - }
179 - $offset += 4;
180 - $id = substr( $app13, $offset, 2 );
181 - // id is a 2 byte id number which identifies
182 - // the piece of info this record contains.
183 -
184 - $offset += 2;
185 -
186 - // some record types can contain a name, which
187 - // is a pascal string 0-padded to be an even
188 - // number of bytes. Most times (and any time
189 - // we care) this is empty, making it two null bytes.
190 -
191 - $lenName = ord( substr( $app13, $offset, 1 ) ) + 1;
192 - // we never use the name so skip it. +1 for length byte
193 - if ( $lenName % 2 == 1 ) $lenName++; // pad to even.
194 - $offset += $lenName;
195 -
196 - // now length of data (unsigned long big endian)
197 - $lenData = unpack( 'Nlen', substr( $app13, $offset, 4 ) );
198 - $offset += 4; // 4bytes length field;
199 -
200 - // this should not happen, but check.
201 - if ( $lenData['len'] + $offset > $appLen ) {
202 - wfDebug( __METHOD__ . ' PSIR data too long.' );
203 - return false;
204 - }
205 -
206 - if ( $valid ) {
207 - switch ( $id ) {
208 - case "\x04\x04":
209 - // IPTC block
210 - $realHash = md5( substr( $app13, $offset, $lenData['len'] ), true );
211 - break;
212 - case "\x04\x25":
213 - $recordedHash = substr( $app13, $offset, $lenData['len'] );
214 - break;
215 - }
216 - }
217 -
218 - // if odd, add 1 to length to account for
219 - // null pad byte.
220 - if ( $lenData['len'] % 2 == 1 ) $lenData['len']++;
221 - $offset += $lenData['len'];
222 -
223 - }
224 -
225 - if ( !$realHash ) return false; // no iptc data
226 -
227 - if ( !$recordedHash ) {
228 - $this->iptcType = 'iptc-no-hash';
229 - } elseif ( $realHash === $recordedHash ) {
230 - $this->iptcType = 'iptc-good-hash';
231 - } else { /*$realHash !== $recordedHash */
232 - $this->iptcType = 'iptc-bad-hash';
233 - }
234 -
235 - }
236 -
23747 /** get exif info using exif class.
23848 * Basically what used to be in BitmapHandler::getMetadata().
23949 * Just calls stuff in the Exif class.
@@ -300,34 +110,32 @@
301111
302112 /** constructor.
303113 * This generally shouldn't be called directly
304 - * instead BitmapMetadataHandler::newForJpeg should be used.
 114+ * instead one of the static methods should be used
305115 *
306116 * @param string $file - full path to file
307 - * @param string $type - mime type of file
308117 */
309 - function __construct ( $file, $type ) {
 118+ function __construct ( $file ) {
310119 $this->filename = $file;
311 - $this->filetype = $type;
312120 }
313 - /** factory function special for jpg's.
314 - * This is how new BitmapMetadataHandler's should be made.
315 - * at some point there should be a newForPNG, etc.
 121+ /** Main entry point for jpeg's.
316122 *
317123 * @param string $file filename (with full path)
318 - * @return BitmapMetadataHandler
 124+ * @return metadata result array.
 125+ * @throws MWException on invalid file.
319126 */
320 - static function newForJpeg ( $file ) {
321 - $meta = new self( $file, 'image/jpeg' );
 127+ static function Jpeg ( $file ) {
 128+ global $wgShowXMP;
 129+ $meta = new self( $file );
322130 $meta->getExif();
323131 $seg = Array();
324 - $seg = $meta->JpegSegmentSplitter();
 132+ $seg = JpegMetadataExtractor::segmentSplitter( $file );
325133 if ( isset( $seg['COM'] ) && isset( $seg['COM'][0] ) ) {
326134 $meta->addMetadata( Array( 'JPEGFileComment' => $seg['COM'] ), 'file-comment' );
327135 }
328136 if ( isset( $seg['PSIR'] ) ) {
329137 $meta->doApp13( $seg['PSIR'] );
330138 }
331 - if ( isset( $seg['XMP'] ) ) {
 139+ if ( isset( $seg['XMP'] ) && $wgShowXMP ) {
332140 $xmp = new XMPReader();
333141 $xmp->parse( $seg['XMP'] );
334142 foreach( $seg['XMP_ext'] as $xmpExt ) {
@@ -342,7 +150,33 @@
343151 $meta->addMetadata( $array, $type );
344152 }
345153 }
346 - return $meta;
 154+ return $meta->getMetadataArray();
347155 }
 156+ /** Entry point for png
 157+ * At some point in the future this might
 158+ * merge the png various tEXt chunks to that
 159+ * are interesting, but for now it only does XMP
 160+ *
 161+ * @param $filename String full path to file
 162+ * @return Array Array for storage in img_metadata.
 163+ */
 164+ static public function PNG( $filename ) {
 165+ global $wgShowXMP;
348166
 167+ $meta = new self( $filename );
 168+ $array = PNGMetadataExtractor::getMetadata( $filename );
 169+ if ( isset( $array['xmp'] ) && $array['xmp'] !== '' && $wgShowXMP) {
 170+ $xmp = new XMPReader();
 171+ $xmp->parse($array['xmp']);
 172+ $xmpRes = $xmp->getResults();
 173+ foreach( $xmpRes as $type => $xmpSection ) {
 174+ $meta->addMetadata( $xmpSection, $type );
 175+ }
 176+ }
 177+ unset( $array['xmp'] );
 178+ $array['metadata'] = $meta->getMetadataArray();
 179+ $array['metadata']['_MW_PNG_VERSION'] = '1';
 180+ return $array;
 181+ }
 182+
349183 }
Index: branches/img_metadata/phase3/includes/media/Bitmap.php
@@ -413,10 +413,6 @@
414414 }
415415
416416 function formatMetadata( $image ) {
417 - $result = array(
418 - 'visible' => array(),
419 - 'collapsed' => array()
420 - );
421417 $metadata = $image->getMetadata();
422418 if ( !$metadata ) {
423419 return false;
@@ -426,20 +422,7 @@
427423 return false;
428424 }
429425 unset( $exif['MEDIAWIKI_EXIF_VERSION'] );
430 - $format = new FormatExif( $exif );
431 -
432 - $formatted = $format->getFormattedData();
433 - // Sort fields into visible and collapsed
434 - $visibleFields = $this->visibleMetadataFields();
435 - foreach ( $formatted as $name => $value ) {
436 - $tag = strtolower( $name );
437 - self::addMeta( $result,
438 - in_array( $tag, $visibleFields ) ? 'visible' : 'collapsed',
439 - 'exif',
440 - $tag,
441 - $value
442 - );
443 - }
444 - return $result;
 426+ return $this->formatMetadataHelper( $exif );
445427 }
 428+
446429 }
Index: branches/img_metadata/phase3/includes/media/Generic.php
@@ -266,6 +266,37 @@
267267 return false;
268268 }
269269
 270+ /** sorts the visible/invisible field.
 271+ * Split off from ImageHandler::formatMetadata, as used by more than
 272+ * one type of handler.
 273+ *
 274+ * This is used by the media handlers that use the FormatExif class
 275+ *
 276+ * @param $metadataArray Array metadata array
 277+ * @return array for use displaying metadata.
 278+ */
 279+ function formatMetadataHelper( $metadataArray ) {
 280+ $result = array(
 281+ 'visible' => array(),
 282+ 'collapsed' => array()
 283+ );
 284+ $format = new FormatExif( $metadataArray );
 285+
 286+ $formatted = $format->getFormattedData();
 287+ // Sort fields into visible and collapsed
 288+ $visibleFields = $this->visibleMetadataFields();
 289+ foreach ( $formatted as $name => $value ) {
 290+ $tag = strtolower( $name );
 291+ self::addMeta( $result,
 292+ in_array( $tag, $visibleFields ) ? 'visible' : 'collapsed',
 293+ 'exif',
 294+ $tag,
 295+ $value
 296+ );
 297+ }
 298+ return $result;
 299+ }
 300+
270301 /**
271302 * @todo Fixme: document this!
272303 * 'value' thingy goes into a wikitext table; it used to be escaped but
Index: branches/img_metadata/phase3/includes/media/XMPValidate.php
@@ -49,13 +49,31 @@
5050 // this only validates standalone properties, not arrays, etc
5151 return;
5252 }
53 - if ( !preg_match( '/^(-?\d+)\/(\d+[1-9]|[1-9]\d*)$/', $val ) ) {
 53+ if ( !preg_match( '/^(?:-?\d+)\/(?:\d+[1-9]|[1-9]\d*)$/D', $val ) ) {
5454 wfDebugLog( 'XMP', __METHOD__ . " Expected rational but got $val" );
5555 $val = null;
5656 }
5757
5858 }
5959 /**
 60+ * function to validate integers
 61+ *
 62+ * @param $info Array information about current property
 63+ * @param &$val Mixed current value to validate
 64+ * @param $standalone Boolean if this is a simple property or array
 65+ */
 66+ public static function validateInteger( $info, &$val, $standalone ) {
 67+ if ( !$standalone ) {
 68+ // this only validates standalone properties, not arrays, etc
 69+ return;
 70+ }
 71+ if ( !preg_match( '/^[-+]?\d+$/D', $val ) ) {
 72+ wfDebugLog( 'XMP', __METHOD__ . " Expected integer but got $val" );
 73+ $val = null;
 74+ }
 75+
 76+ }
 77+ /**
6078 * function to validate properties with a fixed number of allowed
6179 * choices. (closed choice)
6280 *
Index: branches/img_metadata/phase3/includes/media/XMPInfo.php
@@ -188,6 +188,142 @@
189189 'mode' => XMPReader::MODE_SEQ,
190190 ),
191191 ),
 192+ 'http://ns.adobe.com/tiff/1.0/' => array(
 193+ 'Artist' => array(
 194+ 'map_group' => 'exif',
 195+ 'mode' => XMPReader::MODE_SIMPLE,
 196+ ),
 197+ 'BitsPerSample' => array(
 198+ 'map_group' => 'exif',
 199+ 'mode' => XMPReader::MODE_SEQ,
 200+ 'validate' => 'validateInteger',
 201+ ),
 202+ 'Compression' => array(
 203+ 'map_group' => 'exif',
 204+ 'mode' => XMPReader::MODE_SIMPLE,
 205+ 'validate' => 'validateClosed',
 206+ 'choices' => array( '1' => true, '6' => true ),
 207+ ),
 208+ /* this prop should not be used in XMP. dc:rights is the correct prop */
 209+ 'Copyright' => array(
 210+ 'map_group' => 'exif',
 211+ 'mode' => XMPReader::MODE_LANG,
 212+ ),
 213+ 'DateTime' => array( /* proper prop is xmp:ModifyDate */
 214+ 'map_group' => 'exif',
 215+ 'mode' => XMPReader::MODE_SIMPLE,
 216+ 'validate' => 'validateDate',
 217+ ),
 218+ 'ImageDescription' => array( /* proper one is dc:description */
 219+ 'map_group' => 'exif',
 220+ 'mode' => XMPReader::MODE_LANG,
 221+ ),
 222+ 'ImageLength' => array(
 223+ 'map_group' => 'exif',
 224+ 'mode' => XMPReader::MODE_SIMPLE,
 225+ 'validate' => 'validateInteger',
 226+ ),
 227+ 'ImageWidth' => array(
 228+ 'map_group' => 'exif',
 229+ 'mode' => XMPReader::MODE_SIMPLE,
 230+ 'validate' => 'validateInteger',
 231+ ),
 232+ 'Make' => array(
 233+ 'map_group' => 'exif',
 234+ 'mode' => XMPReader::MODE_SIMPLE,
 235+ ),
 236+ 'Model' => array(
 237+ 'map_group' => 'exif',
 238+ 'mode' => XMPReader::MODE_SIMPLE,
 239+ ),
 240+ 'Orientation' => array(
 241+ 'map_group' => 'exif',
 242+ 'mode' => XMPReader::MODE_SIMPLE,
 243+ 'validate' => 'validateClosed',
 244+ 'choices' => array( '1' => true, '2' => true, '3' => true, '4' => true, 5 => true,
 245+ '6' => true, '7' => true, '8' => true ),
 246+ ),
 247+ 'PhotometricInterpretation' => array(
 248+ 'map_group' => 'exif',
 249+ 'mode' => XMPReader::MODE_SIMPLE,
 250+ 'validate' => 'validateClosed',
 251+ 'choices' => array( '2' => true, '6' => true ),
 252+ ),
 253+ 'PlanerConfiguration' => array(
 254+ 'map_group' => 'exif',
 255+ 'mode' => XMPReader::MODE_SIMPLE,
 256+ 'validate' => 'validateClosed',
 257+ 'choices' => array( '1' => true, '2' => true ),
 258+ ),
 259+ 'PrimaryChromaticities' => array(
 260+ 'map_group' => 'exif',
 261+ 'mode' => XMPReader::MODE_SEQ,
 262+ 'validate' => 'validateRational',
 263+ ),
 264+ 'ReferenceBlackWhite' => array(
 265+ 'map_group' => 'exif',
 266+ 'mode' => XMPReader::MODE_SEQ,
 267+ 'validate' => 'validateRational',
 268+ ),
 269+ 'ResolutionUnit' => array(
 270+ 'map_group' => 'exif',
 271+ 'mode' => XMPReader::MODE_SIMPLE,
 272+ 'validate' => 'validateClosed',
 273+ 'choices' => array( '2' => true, '3' => true ),
 274+ ),
 275+ 'SamplesPerPixel' => array(
 276+ 'map_group' => 'exif',
 277+ 'mode' => XMPReader::MODE_SIMPLE,
 278+ 'validate' => 'validateInteger',
 279+ ),
 280+ 'Software' => array( /* see xmp:CreatorTool */
 281+ 'map_group' => 'exif',
 282+ 'mode' => XMPReader::MODE_SIMPLE,
 283+ ),
 284+ /* ignore TransferFunction */
 285+ 'WhitePoint' => array(
 286+ 'map_group' => 'exif',
 287+ 'mode' => XMPReader::MODE_SEQ,
 288+ 'validate' => 'validateRational',
 289+ ),
 290+ 'XResolution' => array(
 291+ 'map_group' => 'exif',
 292+ 'mode' => XMPReader::MODE_SIMPLE,
 293+ 'validate' => 'validateRational',
 294+ ),
 295+ 'YResolution' => array(
 296+ 'map_group' => 'exif',
 297+ 'mode' => XMPReader::MODE_SIMPLE,
 298+ 'validate' => 'validateRational',
 299+ ),
 300+ 'YCbCrCoefficients' => array(
 301+ 'map_group' => 'exif',
 302+ 'mode' => XMPReader::MODE_SEQ,
 303+ 'validate' => 'validateRational',
 304+ ),
 305+ 'YCbCrPositioning' => array(
 306+ 'map_group' => 'exif',
 307+ 'mode' => XMPReader::MODE_SIMPLE,
 308+ 'validate' => 'validateClosed',
 309+ 'choices' => array( '1' => true, '2' => true ),
 310+ ),
 311+ 'YCbCrSubSampling' => array(
 312+ 'map_group' => 'exif',
 313+ 'mode' => XMPReader::MODE_SEQ,
 314+ 'validate' => 'validateClosed',
 315+ 'choices' => array( '1' => true, '2' => true ),
 316+ ),
 317+ ),
 318+ 'http://ns.adobe.com/exif/1.0/aux/' => array(
 319+ 'Lens' => array(
 320+ 'map_group' => 'exif',
 321+ 'mode' => XMPReader::MODE_SIMPLE,
 322+ ),
 323+ 'SerialNumber' => array(
 324+ 'map_group' => 'exif',
 325+ 'mode' => XMPReader::MODE_SIMPLE,
 326+ ),
 327+ ),
192328 'http://purl.org/dc/elements/1.1/' => array(
193329 'title' => array(
194330 'map_group' => 'general',
Index: branches/img_metadata/phase3/includes/media/PNGMetadataExtractor.php
@@ -17,6 +17,8 @@
1818 $frameCount = 0;
1919 $loopCount = 1;
2020 $duration = 0.0;
 21+ $xmp = '';
 22+ $meta = array();
2123
2224 if (!$filename)
2325 throw new Exception( __METHOD__ . ": No file name specified" );
@@ -61,9 +63,23 @@
6264 if( $fctldur['delay_num'] ) {
6365 $duration += $fctldur['delay_num'] / $fctldur['delay_den'];
6466 }
65 - } elseif ( ( $chunk_type == "IDAT" || $chunk_type == "IEND" ) && $frameCount == 0 ) {
66 - // Not a valid animated image. No point in continuing.
67 - break;
 67+ } elseif ( $chunk_type == "iTXt" ) {
 68+ // At the moment this only does XMP iText chunks,
 69+ // but in the future might extract other metadata chunks.
 70+ if( $chunk_size <= 22 ) {
 71+ // something weird, so skip
 72+ fseek( $fh, $chunk_size, SEEK_CUR );
 73+ continue;
 74+ }
 75+ $itxtHeader = fread( $fh, 22 );
 76+ if( !$itxtHeader ) { throw new Exception( __METHOD__ . ": Read error" ); return; }
 77+ if( $itxtHeader !== "XML:com.adobe.xmp\x00\x00\x00\x00\x00" ) {
 78+ // some other iTXt chunk.
 79+ fseek( $fh, $chunk_size - 22, SEEK_CUR );
 80+ continue;
 81+ }
 82+ $xmp = fread( $fh, $chunk_size - 22 );
 83+ if( !$xmp ) { throw new Exception( __METHOD__ . ": Read error" ); return; }
6884 } elseif ( $chunk_type == "IEND" ) {
6985 break;
7086 } else {
@@ -80,7 +96,8 @@
8197 return array(
8298 'frameCount' => $frameCount,
8399 'loopCount' => $loopCount,
84 - 'duration' => $duration
 100+ 'duration' => $duration,
 101+ 'xmp' => $xmp,
85102 );
86103
87104 }
Index: branches/img_metadata/phase3/includes/media/JpegMetadataExtractor.php
@@ -0,0 +1,205 @@
 2+<?php
 3+/**
 4+* Class for reading jpegs and extracting metadata.
 5+* see also BitmapMetadataHandler.
 6+*
 7+* Based somewhat on GIFMetadataExtrator.
 8+*/
 9+class JpegMetadataExtractor {
 10+ const MAX_JPEG_SEGMENTS = 200;
 11+ // the max segment is a sanity check.
 12+ // A jpeg file should never even remotely have
 13+ // that many segments. Your average file has about 10.
 14+
 15+ /** Function to extract metadata segmants of interest from jpeg files
 16+ * based on GIFMetadataExtractor.
 17+ *
 18+ * we can almost use getimagesize to do this
 19+ * but gis doesn't support having multiple app1 segments
 20+ * and those can't extract xmp on files containing both exif and xmp data
 21+ *
 22+ * @param String $filename name of jpeg file
 23+ * @return Array of interesting segments.
 24+ * @throws MWException if given invalid file.
 25+ */
 26+ static function segmentSplitter ($filename) {
 27+ global $wgShowXMP;
 28+
 29+ $segmentCount = 0;
 30+
 31+ $segments = Array( 'XMP_ext' => array(), 'COM' => array() );
 32+
 33+ if ( !$filename ) throw new MWException( "No filename specified for " . __METHOD__ );
 34+ if ( !file_exists( $filename ) || is_dir( $filename ) ) throw new MWException( "Invalid file $filename passed to " . __METHOD__ );
 35+
 36+ $fh = fopen( $filename, "rb" );
 37+
 38+ if ( !$fh ) throw new MWException( "Could not open file $filename" );
 39+
 40+ $buffer = fread( $fh, 2 );
 41+ if ( $buffer !== "\xFF\xD8" ) throw new MWException( "Not a jpeg, no SOI" );
 42+ while ( !feof( $fh ) ) {
 43+ $buffer = fread( $fh, 1 );
 44+ $segmentCount++;
 45+ if ( $segmentCount > self::MAX_JPEG_SEGMENTS ) {
 46+ // this is just a sanity check
 47+ throw new MWException('Too many jpeg segments. Aborting');
 48+ }
 49+ if ( $buffer !== "\xFF" ) {
 50+ throw new MWException( "Error reading jpeg file marker" );
 51+ }
 52+
 53+ $buffer = fread( $fh, 1 );
 54+ if ( $buffer === "\xFE" ) {
 55+
 56+ // COM section -- file comment
 57+ // First see if valid utf-8,
 58+ // if not try to convert it to windows-1252.
 59+ $com = $oldCom = trim( self::jpegExtractMarker( $fh ) );
 60+
 61+ UtfNormal::quickIsNFCVerify( $com );
 62+ //turns $com to valid utf-8.
 63+ //thus if no change, its utf-8, otherwise its something else.
 64+ if ( $com !== $oldCom ) {
 65+ $oldCom = iconv( 'windows-1252', 'UTF-8//IGNORE', $oldCom );
 66+ }
 67+ $segments["COM"][] = $oldCom;
 68+
 69+ } elseif ( $buffer === "\xE1" && $wgShowXMP ) {
 70+ // APP1 section (Exif, XMP, and XMP extended)
 71+ // only extract if XMP is enabled.
 72+ $temp = self::jpegExtractMarker( $fh );
 73+
 74+ // check what type of app segment this is.
 75+ if ( substr( $temp, 0, 29 ) === "http://ns.adobe.com/xap/1.0/\x00" ) {
 76+ $segments["XMP"] = substr( $temp, 29 );
 77+ } elseif ( substr( $temp, 0, 35 ) === "http://ns.adobe.com/xmp/extension/\x00" ) {
 78+ $segments["XMP_ext"][] = substr( $temp, 35 );
 79+ }
 80+ } elseif ( $buffer === "\xED" ) {
 81+ // APP13 - PSIR. IPTC and some photoshop stuff
 82+ $temp = self::jpegExtractMarker( $fh );
 83+ if ( substr( $temp, 0, 14 ) === "Photoshop 3.0\x00" ) {
 84+ $segments["PSIR"] = $temp;
 85+ }
 86+ } elseif ( $buffer === "\xD9" || $buffer === "\xDA" ) {
 87+ // EOI - end of image or SOS - start of scan. either way we're past any interesting segments
 88+ return $segments;
 89+ } else {
 90+ // segment we don't care about, so skip
 91+ $size = unpack( "nint", fread( $fh, 2 ) );
 92+ if ( $size['int'] <= 2 ) throw new MWException( "invalid marker size in jpeg" );
 93+ fseek( $fh, $size['int'] - 2, SEEK_CUR );
 94+ }
 95+
 96+ }
 97+ // shouldn't get here.
 98+ throw new MWException( "Reached end of jpeg file unexpectedly" );
 99+
 100+ }
 101+ /**
 102+ * Helper function for jpegSegmentSplitter
 103+ * @param &$fh FileHandle for jpeg file
 104+ * @return data content of segment.
 105+ */
 106+ private static function jpegExtractMarker( &$fh ) {
 107+ $size = unpack( "nint", fread( $fh, 2 ) );
 108+ if ( $size['int'] <= 2 ) throw new MWException( "invalid marker size in jpeg" );
 109+ return fread( $fh, $size['int'] - 2 );
 110+ }
 111+
 112+ /**
 113+ * This reads the photoshop image resource.
 114+ * Currently it only compares the iptc/iim hash
 115+ * with the stored hash, which is used to determine the precedence
 116+ * of the iptc data. In future it may extract some other info, like
 117+ * url of copyright license.
 118+ *
 119+ * This should generally be called by BitmapMetadataHandler::doApp13()
 120+ *
 121+ * @param String $app13 photoshop psir app13 block from jpg.
 122+ * @return String if the iptc hash is good or not.
 123+ */
 124+ public static function doPSIR ( $app13 ) {
 125+ if ( !$app13 ) return;
 126+ // First compare hash with real thing
 127+ // 0x404 contains IPTC, 0x425 has hash
 128+ // This is used to determine if the iptc is newer than
 129+ // the xmp data, as xmp programs update the hash,
 130+ // where non-xmp programs don't.
 131+
 132+ $offset = 14; // skip past PHOTOSHOP 3.0 identifier. should already be checked.
 133+ $appLen = strlen( $app13 );
 134+ $realHash = "";
 135+ $recordedHash = "";
 136+
 137+ // the +12 is the length of an empty item.
 138+ while ( $offset + 12 <= $appLen ) {
 139+ $valid = true;
 140+ $id = false;
 141+ $lenName = false;
 142+ $lenData = false;
 143+
 144+ if ( substr( $app13, $offset, 4 ) !== '8BIM' ) {
 145+ // its supposed to be 8BIM
 146+ // but apperently sometimes isn't esp. in
 147+ // really old jpg's
 148+ $valid = false;
 149+ }
 150+ $offset += 4;
 151+ $id = substr( $app13, $offset, 2 );
 152+ // id is a 2 byte id number which identifies
 153+ // the piece of info this record contains.
 154+
 155+ $offset += 2;
 156+
 157+ // some record types can contain a name, which
 158+ // is a pascal string 0-padded to be an even
 159+ // number of bytes. Most times (and any time
 160+ // we care) this is empty, making it two null bytes.
 161+
 162+ $lenName = ord( substr( $app13, $offset, 1 ) ) + 1;
 163+ // we never use the name so skip it. +1 for length byte
 164+ if ( $lenName % 2 == 1 ) $lenName++; // pad to even.
 165+ $offset += $lenName;
 166+
 167+ // now length of data (unsigned long big endian)
 168+ $lenData = unpack( 'Nlen', substr( $app13, $offset, 4 ) );
 169+ $offset += 4; // 4bytes length field;
 170+
 171+ // this should not happen, but check.
 172+ if ( $lenData['len'] + $offset > $appLen ) {
 173+ wfDebug( __METHOD__ . ' PSIR data too long.' );
 174+ return 'iptc-no-hash';
 175+ }
 176+
 177+ if ( $valid ) {
 178+ switch ( $id ) {
 179+ case "\x04\x04":
 180+ // IPTC block
 181+ $realHash = md5( substr( $app13, $offset, $lenData['len'] ), true );
 182+ break;
 183+ case "\x04\x25":
 184+ $recordedHash = substr( $app13, $offset, $lenData['len'] );
 185+ break;
 186+ }
 187+ }
 188+
 189+ // if odd, add 1 to length to account for
 190+ // null pad byte.
 191+ if ( $lenData['len'] % 2 == 1 ) $lenData['len']++;
 192+ $offset += $lenData['len'];
 193+
 194+ }
 195+
 196+ if ( !$realHash || !$recordedHash ) {
 197+ return 'iptc-no-hash';
 198+ } elseif ( $realHash === $recordedHash ) {
 199+ return 'iptc-good-hash';
 200+ } else { /*$realHash !== $recordedHash */
 201+ return 'iptc-bad-hash';
 202+ }
 203+
 204+ }
 205+
 206+}
Property changes on: branches/img_metadata/phase3/includes/media/JpegMetadataExtractor.php
___________________________________________________________________
Added: svn:eol-style
1207 + native
Index: branches/img_metadata/phase3/includes/media/PNG.php
@@ -12,22 +12,32 @@
1313 class PNGHandler extends BitmapHandler {
1414
1515 function getMetadata( $image, $filename ) {
16 - if ( !isset($image->parsedPNGMetadata) ) {
17 - try {
18 - $image->parsedPNGMetadata = PNGMetadataExtractor::getMetadata( $filename );
19 - } catch( Exception $e ) {
20 - // Broken file?
21 - wfDebug( __METHOD__ . ': ' . $e->getMessage() . "\n" );
22 - return '0';
23 - }
 16+ try {
 17+ $metadata = BitmapMetadataHandler::PNG( $filename );
 18+ } catch( Exception $e ) {
 19+ // Broken file?
 20+ wfDebug( __METHOD__ . ': ' . $e->getMessage() . "\n" );
 21+ return '0';
2422 }
2523
26 - return serialize($image->parsedPNGMetadata);
27 -
 24+ return serialize($metadata);
2825 }
2926
3027 function formatMetadata( $image ) {
31 - return false;
 28+ $meta = $image->getMetadata();
 29+
 30+ if ( !$meta ) {
 31+ return false;
 32+ }
 33+ $meta = unserialize( $meta );
 34+ if ( !isset( $meta['metadata'] ) || count( $meta['metadata'] ) <= 1 ) {
 35+ return false;
 36+ }
 37+
 38+ if ( isset( $meta['metadata']['_MW_PNG_VERSION'] ) ) {
 39+ unset( $meta['metadata']['_MW_PNG_VERSION'] );
 40+ }
 41+ return $this->formatMetadataHelper( $meta['metadata'] );
3242 }
3343
3444 function isAnimatedImage( $image ) {
@@ -47,7 +57,20 @@
4858 wfSuppressWarnings();
4959 $data = unserialize( $metadata );
5060 wfRestoreWarnings();
51 - return (boolean) $data;
 61+ if ( $data === '0' ) {
 62+ // Do not repetitivly regenerate metadata on broken file.
 63+ return self::METADATA_GOOD;
 64+ }
 65+ if ( !$data || !is_array( $data ) ) {
 66+ wfDebug(__METHOD__ . ' invalid png metadata' );
 67+ return self::METADATA_BAD;
 68+ }
 69+
 70+ if ( !isset( $data['metadata']['_MW_PNG_VERSION'] ) ) {
 71+ wfDebug(__METHOD__ . ' old but compatible png metadata' );
 72+ return self::METADATA_COMPATIBLE;
 73+ }
 74+ return self::METADATA_GOOD;
5275 }
5376 function getLongDesc( $image ) {
5477 global $wgUser, $wgLang;
Index: branches/img_metadata/phase3/includes/media/Jpeg.php
@@ -12,12 +12,13 @@
1313
1414 function getMetadata ( $image, $filename ) {
1515 try {
16 - $meta = BitmapMetadataHandler::newForJpeg( $filename );
17 - $temp = $meta->getMetadataArray();
18 - if ( $temp ) {
19 - $temp['MEDIAWIKI_EXIF_VERSION'] = Exif::version();
20 - return serialize( $temp );
 16+ $meta = BitmapMetadataHandler::Jpeg( $filename );
 17+ if ( $meta ) {
 18+ $meta['MEDIAWIKI_EXIF_VERSION'] = Exif::version();
 19+ return serialize( $meta );
2120 } else {
 21+ /* FIXME, this should probably be something else to do versioning
 22+ with older files that say have no exif, but have xmp */
2223 return '0';
2324 }
2425 }
Index: branches/img_metadata/phase3/includes/AutoLoader.php
@@ -455,6 +455,7 @@
456456 'PNGMetadataExtractor' => 'includes/media/PNGMetadataExtractor.php',
457457 'SvgHandler' => 'includes/media/SVG.php',
458458 'JpegHandler' => 'includes/media/Jpeg.php',
 459+ 'JpegMetadataExtractor' => 'includes/media/JpegMetadataExtractor.php',
459460 'BitmapMetadataHandler' => 'includes/media/BitmapMetadataHandler.php',
460461 'IPTC' => 'includes/media/IPTC.php',
461462 'ThumbnailImage' => 'includes/media/MediaTransformOutput.php',
Index: branches/img_metadata/phase3/includes/DefaultSettings.php
@@ -388,6 +388,11 @@
389389 $wgShowEXIF = function_exists( 'exif_read_data' );
390390
391391 /**
 392+ * Show/extract XMP metadata (similar to above exif setting)
 393+ */
 394+$wgShowXMP = function_exists( 'xml_parser_create_ns' );
 395+
 396+/**
392397 * If to automatically update the img_metadata field
393398 * if the metadata field is outdated but compatible with the current version.
394399 * Defaults to false.
Index: branches/img_metadata/phase3/languages/messages/MessagesEn.php
@@ -3674,7 +3674,6 @@
36753675 'exif-stripbytecounts' => 'Bytes per compressed strip',
36763676 'exif-jpeginterchangeformat' => 'Offset to JPEG SOI',
36773677 'exif-jpeginterchangeformatlength' => 'Bytes of JPEG data',
3678 -'exif-transferfunction' => 'Transfer function',
36793678 'exif-whitepoint' => 'White point chromaticity',
36803679 'exif-primarychromaticities' => 'Chromaticities of primarities',
36813680 'exif-ycbcrcoefficients' => 'Color space transformation matrix coefficients',
@@ -3804,6 +3803,8 @@
38053804 'exif-datetimeexpires' => 'Do not use after',
38063805 'exif-datetimereleased' => 'Released on',
38073806 'exif-originaltransmissionref' => 'Original transmission location code',
 3807+'exif-lens' => 'Lens used',
 3808+'exif-serialnumber' => 'Serial number of camera',
38083809
38093810 # Make & model, can be wikified in order to link to the camera and model name
38103811 'exif-make-value' => '$1', # do not translate or duplicate this message to other languages

Follow-up revisions

RevisionCommit summaryAuthorDate
r70922Follow up to r70860. Fix some stylistic info, and don't...bawolff22:37, 11 August 2010

Comments

#Comment by Nikerabbit (talk | contribs)   11:17, 11 August 2010
+	static function Jpeg ( $file ) {
+	static public function PNG( $filename ) {

Space after method name. Should both of the parameters be called $filename?

+		} catch( Exception $e ) {

Can you control what exceptions are thrown? Perhaps could specialize them so that other kind of exceptions are not caught accidentally.

+$wgShowXMP = function_exists( 'xml_parser_create_ns' );

Are these ever supposed to be changed by admins? If not, why put them into DefaultSettings? Could be checked in-place, or perhaps refactored into a static helper method if necessary?

#Comment by Bawolff (talk | contribs)   20:37, 11 August 2010
>Space after method name. Should both of the parameters be called $filename?

yep. I'll fix that next commit.


>Can you control what exceptions are thrown? Perhaps could specialize them so that other kind of exceptions are not caught accidentally.


The PNGMetadataExtractor class is using generic Exceptions because " Deliberately not using MWExceptions to avoid external dependencies, encouraging redistribution.". I'm not sure how valid of a reason that is, but since it was like that before I touched it, I thought it best to leave as is.


>Are these ever supposed to be changed by admins? If not, why put them into DefaultSettings? Could be checked in-place, or perhaps refactored into a static helper method if necessary?

I actually wasn't sure if to include that or not. I modelled it after how Exif was done. I suppose there is a slight chance someone might want to disable showing XMP data, but it does seems unlikely.

Status & tagging log