r72023 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r72022‎ | r72023 | r72024 >
Date:13:47, 31 August 2010
Author:daniel
Status:ok (Comments)
Tags:
Comment:
Follow-up to r71944: Interoducing MimeMagic::improveTypeFromExtension() for two reasons:
a) avoid redundant inspection of file contents when validating uploads, caused by multiple calls to guessMimeType
b) deprecated obscure use of the file extension when guessing mime types, using an explicit call to improveTypeFromExtension() instead

Note that File::getPropsFromPath() will now return an additional field: $props['file-mime'] contains the mime type as determined solely from the file's content, $props['mime'] contains the type that was derived considering the file extension too.
Modified paths:
  • /trunk/phase3/includes/MimeMagic.php (modified) (history)
  • /trunk/phase3/includes/filerepo/File.php (modified) (history)
  • /trunk/phase3/includes/upload/UploadBase.php (modified) (history)

Diff [purge]

Index: trunk/phase3/includes/upload/UploadBase.php
@@ -302,7 +302,7 @@
303303 * @param $mime string representing the mime
304304 * @return mixed true if the file is verified, an array otherwise
305305 */
306 - protected function verifyMimeType( $magic, $mime ) {
 306+ protected function verifyMimeType( $mime ) {
307307 global $wgVerifyMimeType;
308308 if ( $wgVerifyMimeType ) {
309309 wfDebug ( "\n\nmime: <$mime> extension: <{$this->mFinalExtension}>\n\n");
@@ -319,6 +319,8 @@
320320 $fp = fopen( $this->mTempPath, 'rb' );
321321 $chunk = fread( $fp, 256 );
322322 fclose( $fp );
 323+
 324+ $magic = MimeMagic::singleton();
323325 $extMime = $magic->guessTypesForExtension( $this->mFinalExtension );
324326 $ieTypes = $magic->getIEMimeTypes( $this->mTempPath, $chunk, $extMime );
325327 foreach ( $ieTypes as $ieType ) {
@@ -344,12 +346,9 @@
345347 $this->mFileProps = File::getPropsFromPath( $this->mTempPath, $this->mFinalExtension );
346348 $this->checkMacBinary();
347349
348 - # magically determine mime type
349 - $magic = MimeMagic::singleton();
350 - $mime = $magic->guessMimeType( $this->mTempPath, false );
351 -
352350 # check mime type, if desired
353 - $status = $this->verifyMimeType( $magic, $mime );
 351+ $mime = $this->mFileProps[ 'file-mime' ];
 352+ $status = $this->verifyMimeType( $mime );
354353 if ( $status !== true ) {
355354 return $status;
356355 }
Index: trunk/phase3/includes/MimeMagic.php
@@ -409,18 +409,70 @@
410410 return in_array( strtolower( $extension ), $types );
411411 }
412412
 413+ /** improves a mime type using the file extension. Some file formats are very generic,
 414+ * so their mime type is not very meaningful. A more useful mime type can be derived
 415+ * by looking at the file extension. Typically, this method would be called on the
 416+ * result of guessMimeType().
 417+ *
 418+ * Currently, this method does the following:
 419+ *
 420+ * If $mime is "unknown/unknown" and isRecognizableExtension( $ext ) returns false,
 421+ * return the result of guessTypesForExtension($ext).
 422+ *
 423+ * If $mime is "application/x-opc+zip" and isMatchingExtension( $ext, $mime )
 424+ * gives true, return the result of guessTypesForExtension($ext).
 425+ *
 426+ * @param $mime String: the mime type, typically guessed from a file's content.
 427+ * @param $ext String: the file extension, as taken from the file name
 428+ *
 429+ * @return string the mime type
 430+ */
 431+ function improveTypeFromExtension( $mime, $ext ) {
 432+ if ( $mime === "unknown/unknown" ) {
 433+ if( $this->isRecognizableExtension( $ext ) ) {
 434+ wfDebug( __METHOD__. ": refusing to guess mime type for .$ext file, we should have recognized it\n" );
 435+ } else {
 436+ /* Not something we can detect, so simply
 437+ * trust the file extension */
 438+ $mime = $this->guessTypesForExtension( $ext );
 439+ }
 440+ }
 441+ else if ( $mime === "application/x-opc+zip" ) {
 442+ if ( $this->isMatchingExtension( $ext, $mime ) ) {
 443+ /* A known file extension for an OPC file,
 444+ * find the proper mime type for that file extension */
 445+ $mime = $this->guessTypesForExtension( $ext );
 446+ } else {
 447+ wfDebug( __METHOD__. ": refusing to guess better type for $mime file, .$ext is not a known OPC extension.\n" );
 448+ $mime = "application/zip";
 449+ }
 450+ }
413451
 452+ if ( isset( $this->mMimeTypeAliases[$mime] ) ) {
 453+ $mime = $this->mMimeTypeAliases[$mime];
 454+ }
 455+
 456+ wfDebug(__METHOD__.": improved mime type for .$ext: $mime\n");
 457+ return $mime;
 458+ }
 459+
414460 /** mime type detection. This uses detectMimeType to detect the mime type of the file,
415461 * but applies additional checks to determine some well known file formats that may be missed
416 - * or misinterpreter by the default mime detection (namely xml based formats like XHTML or SVG).
 462+ * or misinterpreter by the default mime detection (namely XML based formats like XHTML or SVG,
 463+ * as well as ZIP based formats like OPC/ODF files).
417464 *
418465 * @param $file String: the file to check
419 - * @param $ext Mixed: the file extension, or true to extract it from the filename.
420 - * Set it to false to ignore the extension.
 466+ * @param $ext Mixed: the file extension, or true (default) to extract it from the filename.
 467+ * Set it to false to ignore the extension. DEPRECATED! Set to false, use
 468+ * improveTypeFromExtension($mime, $ext) later to improve mime type.
421469 *
422470 * @return string the mime type of $file
423471 */
424472 function guessMimeType( $file, $ext = true ) {
 473+ if( $ext ) { # TODO: make $ext default to false. Or better, remove it.
 474+ wfDebug( __METHOD__.": WARNING: use of the \$ext parameter is deprecated. Use improveTypeFromExtension(\$mime, \$ext) instead.\n" );
 475+ }
 476+
425477 $mime = $this->doGuessMimeType( $file, $ext );
426478
427479 if( !$mime ) {
@@ -432,11 +484,11 @@
433485 $mime = $this->mMimeTypeAliases[$mime];
434486 }
435487
436 - wfDebug(__METHOD__.": final mime type of $file: $mime\n");
 488+ wfDebug(__METHOD__.": guessed mime type of $file: $mime\n");
437489 return $mime;
438490 }
439491
440 - function doGuessMimeType( $file, $ext = true ) {
 492+ private function doGuessMimeType( $file, $ext ) { # TODO: remove $ext param
441493 // Read a chunk of the file
442494 wfSuppressWarnings();
443495 $f = fopen( $file, "rt" );
@@ -447,6 +499,8 @@
448500 $tail = fread( $f, 65558 ); // 65558 = maximum size of a zip EOCDR
449501 fclose( $f );
450502
 503+ wfDebug( __METHOD__ . ": analyzing head and tail of $file for magic numbers.\n" );
 504+
451505 // Hardcode a few magic number checks...
452506 $headers = array(
453507 // Multimedia...
@@ -602,11 +656,16 @@
603657 * @param $header String: some reasonably-sized chunk of file header
604658 * @param $tail String: the tail of the file
605659 * @param $ext Mixed: the file extension, or true to extract it from the filename.
606 - * Set it to false to ignore the extension.
 660+ * Set it to false (default) to ignore the extension. DEPRECATED! Set to false,
 661+ * use improveTypeFromExtension($mime, $ext) later to improve mime type.
607662 *
608663 * @return string
609664 */
610665 function detectZipType( $header, $tail = null, $ext = false ) {
 666+ if( $ext ) { # TODO: remove $ext param
 667+ wfDebug( __METHOD__.": WARNING: use of the \$ext parameter is deprecated. Use improveTypeFromExtension(\$mime, \$ext) instead.\n" );
 668+ }
 669+
611670 $mime = 'application/zip';
612671 $opendocTypes = array(
613672 'chart-template',
@@ -637,7 +696,8 @@
638697 wfDebug( __METHOD__.": detected $mime from ZIP archive\n" );
639698 } elseif( preg_match( $openxmlRegex, substr( $header, 30 ) ) ) {
640699 $mime = "application/x-opc+zip";
641 - if( $ext !== true && $ext !== false ) {
 700+ # TODO: remove the block below, as soon as improveTypeFromExtension is used everywhere
 701+ if( $ext !== true && $ext !== false ) {
642702 /** This is the mode used by getPropsFromPath
643703 * These mime's are stored in the database, where we don't really want
644704 * x-opc+zip, because we use it only for internal purposes
@@ -695,15 +755,20 @@
696756 * If no mime type can be determined, this function returns "unknown/unknown".
697757 *
698758 * @param $file String: the file to check
699 - * @param $ext Mixed: the file extension, or true to extract it from the filename.
700 - * Set it to false to ignore the extension.
 759+ * @param $ext Mixed: the file extension, or true (default) to extract it from the filename.
 760+ * Set it to false to ignore the extension. DEPRECATED! Set to false, use
 761+ * improveTypeFromExtension($mime, $ext) later to improve mime type.
701762 *
702763 * @return string the mime type of $file
703764 * @access private
704765 */
705 - function detectMimeType( $file, $ext = true ) {
 766+ private function detectMimeType( $file, $ext = true ) {
706767 global $wgMimeDetectorCommand;
707768
 769+ if( $ext ) { # TODO: make $ext default to false. Or better, remove it.
 770+ wfDebug( __METHOD__.": WARNING: use of the \$ext parameter is deprecated. Use improveTypeFromExtension(\$mime, \$ext) instead.\n" );
 771+ }
 772+
708773 $m = null;
709774 if ( $wgMimeDetectorCommand ) {
710775 $fn = wfEscapeShellArg( $file );
Index: trunk/phase3/includes/filerepo/File.php
@@ -1186,7 +1186,14 @@
11871187 if ( $info['fileExists'] ) {
11881188 $magic = MimeMagic::singleton();
11891189
1190 - $info['mime'] = $magic->guessMimeType( $path, $ext );
 1190+ if ( $ext === true ) {
 1191+ $i = strrpos( $path, '.' );
 1192+ $ext = strtolower( $i ? substr( $path, $i + 1 ) : '' );
 1193+ }
 1194+
 1195+ $info['file-mime'] = $magic->guessMimeType( $path, false ); # mime type according to file contents
 1196+ $info['mime'] = $magic->improveTypeFromExtension( $info['file-mime'], $ext ); # logical mime type
 1197+
11911198 list( $info['major_mime'], $info['minor_mime'] ) = self::splitMime( $info['mime'] );
11921199 $info['media_type'] = $magic->getMediaType( $path, $info['mime'] );
11931200

Follow-up revisions

RevisionCommit summaryAuthorDate
r72889Fixed some overly-long lines, mostly from r72023.tstarling03:10, 13 September 2010

Past revisions this follows-up on

RevisionCommit summaryAuthorDate
r71944as per r71789: don't guess mime type again, that's expensive. Re-use the valu...daniel12:51, 30 August 2010

Comments

#Comment by Duesentrieb (talk | contribs)   13:48, 31 August 2010

recommending merge to the live site, since this patch will significantly reduce the number of file system operations performed on every upload.

#Comment by TheDJ (talk | contribs)   14:36, 31 August 2010

File.php should use this->getExtension() I presume. Wondering wether it is safe to use file-mime for all the other options in verifyFile. It's good for verifyMimeType(), and probably harmless for the rest of the function, but it requires checking.

#Comment by Duesentrieb (talk | contribs)   12:16, 3 September 2010

using this->getExtension() would be nice, but getPropsFromPath is a static method that takes the path as a parameter. I don't see how it could be done. Unless I move the actual logic to a static getExtensionFromPath().

#Comment by MaxSem (talk | contribs)   18:10, 31 August 2010

Probably, deduceTypeFromExtension() is a more clear name?

#Comment by TheDJ (talk | contribs)   11:58, 1 September 2010

well we already have guessTypesForExtension(). This function really does some unrelated things. take care of some corner cases, resolve aliases etc.

#Comment by Tim Starling (talk | contribs)   07:00, 13 September 2010

This can't be trivially backported to 1.16wmf4 since it depends on r65152, r68279 and r68873. If you write and test an equivalent test with a 1.16wmf4 base, then maybe I can apply that instead.

#Comment by Tim Starling (talk | contribs)   07:04, 13 September 2010

s/equivalent test/equivalent patch/

Status & tagging log