r83374 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r83373‎ | r83374 | r83375 >
Date:08:15, 6 March 2011
Author:bawolff
Status:ok (Comments)
Tags:
Comment:
(bug 27508) SVGMetadataExtractor takes too much resources on huge svg's. Change it so it only looks at begining of file.

Add a new config variable $wgSVGMetadataCutoff (currently set to 256kb, chosen rather arbitrarily)
and only read that much of the svg file when finding metadata. In general:
*Most (non-crazy huge map) svgs aren't that big, so there'd be no change in general
*Almost all files have any relevent metadata (well except for when we look for animation tags) is at the begining of the file
before actual image data.
*At the end of the day, even if this does miss metadata in some files (which I really doubt it would), I'd consider that a better
situation then the current situation where it can take 10 minutes or have OOM to parse the likes of [[:File:Puerto_Rico_ecosystems_map-fr.svg]]

Also has parts of/parts are based on Hartman's patch from bugzilla in it.

Also changes how it recurses into child elements looking for animation, to do so only when neccesary.

Trims the results of reading values, because i was getting extra leading spaces when testing this.

Last of all, add a comment to the MediaHandler class about how the first parameter of MediaHandler::getMetadata is kind of useless.
(it confused me when I was doing this)
Modified paths:
  • /trunk/phase3/RELEASE-NOTES (modified) (history)
  • /trunk/phase3/includes/DefaultSettings.php (modified) (history)
  • /trunk/phase3/includes/media/Generic.php (modified) (history)
  • /trunk/phase3/includes/media/SVGMetadataExtractor.php (modified) (history)

Diff [purge]

Index: trunk/phase3/includes/media/Generic.php
@@ -83,7 +83,8 @@
8484 /**
8585 * Get handler-specific metadata which will be saved in the img_metadata field.
8686 *
87 - * @param $image File: the image object, or false if there isn't one
 87+ * @param $image File: the image object, or false if there isn't one.
 88+ * Warning, File::getPropsFromPath might pass an (object)array() instead (!)
8889 * @param $path String: the filename
8990 * @return String
9091 */
Index: trunk/phase3/includes/media/SVGMetadataExtractor.php
@@ -47,13 +47,40 @@
4848 * @param $source String: URI from which to read
4949 */
5050 function __construct( $source ) {
 51+ global $wgSVGMetadataCutoff;
5152 $this->reader = new XMLReader();
52 - $this->reader->open( $source, null, LIBXML_NOERROR | LIBXML_NOWARNING );
5353
 54+ // Don't use $file->getSize() since file object passed to SVGHandler::getMetadata is bogus.
 55+ $size = filesize( $source );
 56+ if ( $size === false ) {
 57+ throw new MWException( "Error getting filesize of SVG." );
 58+ }
 59+
 60+ if ( $size > $wgSVGMetadataCutoff ) {
 61+ $this->debug( "SVG is $size bytes, which is bigger than $wgSVGMetadataCutoff. Truncating." );
 62+ $contents = file_get_contents( $source, false, null, -1, $wgSVGMetadataCutoff );
 63+ if ($contents === false) {
 64+ throw new MWException( 'Error reading SVG file.' );
 65+ }
 66+ $this->reader->XML( $contents, null, LIBXML_NOERROR | LIBXML_NOWARNING );
 67+ } else {
 68+ $this->reader->open( $source, null, LIBXML_NOERROR | LIBXML_NOWARNING );
 69+ }
 70+
5471 $this->metadata['width'] = self::DEFAULT_WIDTH;
5572 $this->metadata['height'] = self::DEFAULT_HEIGHT;
5673
57 - $this->read();
 74+ // Because we cut off the end of the svg making an invalid one. Complicated
 75+ // try catch thing to make sure warnings get restored. Seems like there should
 76+ // be a better way.
 77+ wfSuppressWarnings();
 78+ try {
 79+ $this->read();
 80+ } catch( Exception $e ) {
 81+ wfRestoreWarnings();
 82+ throw $e;
 83+ }
 84+ wfRestoreWarnings();
5885 }
5986
6087 /*
@@ -83,7 +110,6 @@
84111
85112 $exitDepth = $this->reader->depth;
86113 $keepReading = $this->reader->read();
87 - $skip = false;
88114 while ( $keepReading ) {
89115 $tag = $this->reader->name;
90116 $type = $this->reader->nodeType;
@@ -100,17 +126,15 @@
101127 $this->readXml( $tag, 'metadata' );
102128 } elseif ( $tag !== '#text' ) {
103129 $this->debug( "Unhandled top-level XML tag $tag" );
104 - $this->animateFilter( $tag );
105 - //$skip = true;
 130+
 131+ if ( !isset( $this->metadata['animated'] ) ) {
 132+ // Recurse into children of current tag, looking for animation.
 133+ $this->animateFilter( $tag );
 134+ }
106135 }
107136
108 - if ($skip) {
109 - $keepReading = $this->reader->next();
110 - $skip = false;
111 - $this->debug( "Skip" );
112 - } else {
113 - $keepReading = $this->reader->read();
114 - }
 137+ // Goto next element, which is sibling of current (Skip children).
 138+ $keepReading = $this->reader->next();
115139 }
116140
117141 $this->reader->close();
@@ -134,7 +158,7 @@
135159 if( $this->reader->name == $name && $this->reader->nodeType == XmlReader::END_ELEMENT ) {
136160 break;
137161 } elseif( $this->reader->nodeType == XmlReader::TEXT ){
138 - $this->metadata[$metafield] = $this->reader->value;
 162+ $this->metadata[$metafield] = trim( $this->reader->value );
139163 }
140164 $keepReading = $this->reader->read();
141165 }
Index: trunk/phase3/includes/DefaultSettings.php
@@ -675,6 +675,9 @@
676676 $wgSVGConverterPath = '';
677677 /** Don't scale a SVG larger than this */
678678 $wgSVGMaxSize = 2048;
 679+/** Don't read SVG metadata beyond this point.
 680+ * Default is 1024*256 bytes */
 681+$wgSVGMetadataCutoff = 262144;
679682
680683 /**
681684 * MediaWiki will reject HTMLesque tags in uploaded files due to idiotic browsers which can't
Index: trunk/phase3/RELEASE-NOTES
@@ -49,6 +49,8 @@
5050 * The transliteration for passwords in case they were migrated from an old Latin-1
5151 install (previous to MediaWiki 1.5) is now only done for wikis with
5252 $wgLegacyEncoding set.
 53+* (bug 27508) Add $wgSVGMetadataCutoff to limit the maximum amount of an svg we
 54+ look at when finding metadata to prevent excessive resource usage.
5355
5456 === New features in 1.18 ===
5557 * Added a special page, disabled by default, that allows users with the

Follow-up revisions

RevisionCommit summaryAuthorDate
r834871.17wmf1: MFT r82696, r83270, r83284, r83374, r83390, r83392, r83402, r83403,...catrope21:44, 7 March 2011
r85354MFT r82518, r82530, r82538, r82547, r82550, r82565, r82572, r82608, r82696, r...demon18:25, 4 April 2011

Comments

#Comment by Bawolff (talk | contribs)   08:21, 6 March 2011

One thing I don't like about this is the entire thing is in a wfSuppressErrors block, but I don't see anyway around that since it creates invalid XML when truncating.

Status & tagging log