r100535 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r100534‎ | r100535 | r100536 >
Date:09:36, 23 October 2011
Author:aaron
Status:resolved (Comments)
Tags:
Comment:
Added a basic thumb-handler.php file, configured via thumb.config.php. The code is based on the wmf thumb handler, but simplified. It is disabled by default.
* The thumb.php parameter extraction can also be overridden by the config to handle more complex setups and things like OggHandler and PagedTiffHandler.
* A simple 404 error page is also included. It can be overridden by the config.
* Additional HTTP headers can be passed through cURL via the config.
Modified paths:
  • /trunk/phase3/404.php (added) (history)
  • /trunk/phase3/thumb-handler.php (added) (history)
  • /trunk/phase3/thumb.config.sample (added) (history)

Diff [purge]

Index: trunk/phase3/404.php
@@ -0,0 +1,29 @@
 2+<?php
 3+
 4+header( 'HTTP/1.1 404 Not Found' );
 5+header( 'Content-Type: text/html;charset=utf-8' );
 6+
 7+# $_SERVER['REQUEST_URI'] has two different definitions depending on PHP version
 8+if ( preg_match( '!^([a-z]*://)([a-z.]*)(/.*)$!', $_SERVER['REQUEST_URI'], $matches ) ) {
 9+ $prot = $matches[1];
 10+ $serv = $matches[2];
 11+ $loc = $matches[3];
 12+} else {
 13+ $prot = "http://";
 14+ $serv = strlen( $_SERVER['HTTP_HOST'] ) ? $_SERVER['HTTP_HOST'] : $_SERVER['SERVER_NAME'];
 15+ $loc = $_SERVER["REQUEST_URI"];
 16+}
 17+$encUrl = htmlspecialchars( $prot . $serv . $loc );
 18+
 19+// Looks like a typical apache2 error
 20+$standard_404 = <<<ENDTEXT
 21+<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
 22+<html><head>
 23+<title>404 Not Found</title>
 24+</head><body>
 25+<h1>Not Found</h1>
 26+<p>The requested URL $encUrl was not found on this server.</p>
 27+</body></html>
 28+ENDTEXT;
 29+
 30+echo $standard_404;
Property changes on: trunk/phase3/404.php
___________________________________________________________________
Added: svn:eol-style
131 + native
Index: trunk/phase3/thumb.config.sample
@@ -0,0 +1,39 @@
 2+<?php
 3+/**
 4+ * @cond file_level_code
 5+ * This is not a valid entry point, perform no further processing unless THUMB_HANDLER is defined
 6+ */
 7+if ( !defined( 'THUMB_HANDLER' ) ) {
 8+ echo "This file is part of MediaWiki and is not a valid entry point\n";
 9+ die( 1 );
 10+}
 11+
 12+/**
 13+ * Sample configuration file for thumb-handler.php.
 14+ * In order to use thumb-handler.php:
 15+ * 1) Copy this file to thumb.config.php and modify the settings.
 16+ * 2) The webserver must be setup to have thumb-handler.php as a 404 handler.
 17+ * This can be done in apache by editing .htaccess in the /thumb directory by adding:
 18+ * ErrorDocument 404 /path/to/thumb-handler.php
 19+ */
 20+
 21+# URL name of the server (e.g. "upload.wikipedia.org").
 22+$thgThumbServer = "http://localhost";
 23+# URL fragment after the server name to the thumb directory
 24+$thgThumbFragment = "MW_trunk/images/thumb";
 25+# URL regex fragment correspond to the directory hashing of thumbnails.
 26+# This must correspond to $wgLocalFileRepo['hashLevels'].
 27+$thgThumbHashFragment = '\w/\w\w/'; // 2-level directory hashing
 28+
 29+# The URL to thumb.php, accessible from the web server.
 30+$thgThumbScriptPath = "http://localhost/MW_trunk/thumb.php";
 31+
 32+# Timeout to use for cURL request to thumb.php.
 33+# Leave it long enough to generate a ulimit timeout in ordinary
 34+# cases, but short enough to avoid a local PHP timeout.
 35+$thgThumbCurlTimeout = 53;
 36+# Optional proxy server to use to access thumb.php
 37+$thgThumbCurlProxy = null; // proxy to thumb.php
 38+
 39+# File path to a php file the gives a 404 error message
 40+$thgThumb404File = "404.php";
Index: trunk/phase3/thumb-handler.php
@@ -0,0 +1,230 @@
 2+<?php
 3+
 4+# Valid web server entry point
 5+define( 'THUMB_HANDLER', true );
 6+
 7+# Load thumb-handler configuration. We don't want to use
 8+# WebStart.php or the like as it would kill performance.
 9+$configPath = dirname( __FILE__ ) . "/thumb.config.php";
 10+if ( !file_exists( $configPath ) ) {
 11+ die( "Thumb-handler.php is not enabled for this wiki.\n" );
 12+}
 13+require( $configPath );
 14+
 15+function wfHandleThumb404() {
 16+ global $thgThumb404File;
 17+
 18+ # lighttpd puts the original request in REQUEST_URI, while
 19+ # sjs sets that to the 404 handler, and puts the original
 20+ # request in REDIRECT_URL.
 21+ if ( isset( $_SERVER['REDIRECT_URL'] ) ) {
 22+ # The URL is un-encoded, so put it back how it was.
 23+ $uri = str_replace( "%2F", "/", urlencode( $_SERVER['REDIRECT_URL'] ) );
 24+ } else {
 25+ $uri = $_SERVER['REQUEST_URI'];
 26+ }
 27+
 28+ # Extract thumb.php params from the URI.
 29+ if ( function_exists( 'wfCustomExtractThumbParams' ) ) {
 30+ $params = wfCustomExtractThumbParams( $uri ); // overridden by configuration
 31+ } else {
 32+ $params = wfExtractThumbParams( $uri ); // basic wiki URL param extracting
 33+ }
 34+ if ( $params === null ) { // not a valid thumb request
 35+ header( 'X-Debug: no regex match' ); // useful for debugging
 36+ require_once( $thgThumb404File ); // standard 404 message
 37+ return;
 38+ }
 39+
 40+ # Do some basic checks on the filename...
 41+ if ( preg_match( '/[\x80-\xff]/', $uri ) ) {
 42+ header( 'HTTP/1.0 400 Bad request' );
 43+ header( 'Content-Type: text/html' );
 44+ echo "<html><head><title>Bad request</title></head><body>" .
 45+ "The URI contained bytes with the high bit set, this is not allowed." .
 46+ "</body></html>";
 47+ return;
 48+ } elseif ( strpos( $params['f'], '%20' ) !== false ) {
 49+ header( 'HTTP/1.0 404 Not found' );
 50+ header( 'Content-Type: text/html' );
 51+ header( 'X-Debug: filename contains a space' ); // useful for debugging
 52+ echo "<html><head><title>Not found</title></head><body>" .
 53+ "The URL contained spaces, we don't have any thumbnail files with spaces." .
 54+ "</body></html>";
 55+ return;
 56+ }
 57+
 58+ wfStreamThumbViaCurl( $params, $uri );
 59+}
 60+
 61+/**
 62+ * Extract the required params for thumb.php from the thumbnail request URI.
 63+ * At least 'width' and 'f' should be set if the result is an array.
 64+ *
 65+ * @param $uri String Thumbnail request URI
 66+ * @return Array|null associative params array or null
 67+ */
 68+function wfExtractThumbParams( $uri ) {
 69+ global $thgThumbServer, $thgThumbFragment, $thgThumbHashFragment;
 70+
 71+ $thumbRegex = '!^(?:' . preg_quote( $thgThumbServer ) . ')?/' .
 72+ preg_quote( $thgThumbFragment ) . '(/archive|/temp|)/' .
 73+ $thgThumbHashFragment . '([^/]*)/' . '(page(\d*)-)*(\d*)px-([^/]*)$!';
 74+
 75+ # Is this a thumbnail?
 76+ if ( preg_match( $thumbRegex, $uri, $matches ) ) {
 77+ list( $all, $archOrTemp, $filename, $pagefull, $pagenum, $size, $fn2 ) = $matches;
 78+ $params = array( 'f' => $filename, 'width' => $size );
 79+ if ( $pagenum ) {
 80+ $params['page'] = $pagenum;
 81+ }
 82+ if ( $archOrTemp == '/archive' ) {
 83+ $params['archived'] = 1;
 84+ } elseif ( $archOrTemp == '/temp' ) {
 85+ $params['temp'] = 1;
 86+ }
 87+ } else {
 88+ $params = null;
 89+ }
 90+
 91+ return $params;
 92+}
 93+
 94+/**
 95+ * cURL to thumb.php and stream back the resulting file or give an error message.
 96+ *
 97+ * @param $params Array Parameters to thumb.php
 98+ * @param $uri String Thumbnail request URI
 99+ * @return void
 100+ */
 101+function wfStreamThumbViaCurl( array $params, $uri ) {
 102+ global $thgThumbScriptPath, $thgThumbCurlProxy, $thgThumbCurlTimeout;
 103+
 104+ if ( !function_exists( 'curl_init' ) ) {
 105+ header( 'HTTP/1.0 404 Not found' );
 106+ header( 'Content-Type: text/html' );
 107+ header( 'X-Debug: cURL is not enabled' ); // useful for debugging
 108+ echo "<html><head><title>Not found</title></head><body>" .
 109+ "cURL is not enabled for PHP on this wiki. Unable to send request thumb.php." .
 110+ "</body></html>";
 111+ return;
 112+ }
 113+
 114+ # Build up the request URL to use with CURL...
 115+ $reqURL = "{$thgThumbScriptPath}?";
 116+ $first = true;
 117+ foreach ( $params as $name => $value ) {
 118+ if ( $first ) {
 119+ $first = false;
 120+ } else {
 121+ $reqURL .= '&';
 122+ }
 123+ // Note: value is already urlencoded
 124+ $reqURL .= "$name=$value";
 125+ }
 126+
 127+ $ch = curl_init( $reqURL );
 128+ if ( $thgThumbCurlProxy ) {
 129+ curl_setopt( $ch, CURLOPT_PROXY, $thgThumbCurlProxy );
 130+ }
 131+
 132+ $headers = array(); // HTTP headers
 133+ # Set certain headers...
 134+ $headers[] = "X-Original-URI: " . str_replace( "\n", '', $uri );
 135+ if ( function_exists( 'wfCustomThumbRequestHeaders' ) ) {
 136+ wfCustomThumbRequestHeaders( $headers ); // add on any custom headers (like XFF)
 137+ }
 138+ # Pass through some other headers...
 139+ $passthrough = array( 'If-Modified-Since', 'Referer', 'User-Agent' );
 140+ foreach ( $passthrough as $headerName ) {
 141+ $serverVarName = 'HTTP_' . str_replace( '-', '_', strtoupper( $headerName ) );
 142+ if ( !empty( $_SERVER[$serverVarName] ) ) {
 143+ $headers[] = $headerName . ': ' .
 144+ str_replace( "\n", '', $_SERVER[$serverVarName] );
 145+ }
 146+ }
 147+
 148+ curl_setopt( $ch, CURLOPT_HTTPHEADER, $headers );
 149+ curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true );
 150+ curl_setopt( $ch, CURLOPT_TIMEOUT, $thgThumbCurlTimeout );
 151+
 152+ # Actually make the request
 153+ $text = curl_exec( $ch );
 154+
 155+ # Send it on to the client
 156+ $errno = curl_errno( $ch );
 157+ $contentType = curl_getinfo( $ch, CURLINFO_CONTENT_TYPE );
 158+ $httpCode = curl_getinfo( $ch, CURLINFO_HTTP_CODE );
 159+ if ( $errno ) {
 160+ header( 'HTTP/1.1 500 Internal server error' );
 161+ header( 'Cache-Control: no-cache' );
 162+ list( $text, $contentType ) = wfCurlErrorText( $ch );
 163+ } elseif ( $httpCode == 304 ) {
 164+ header( 'HTTP/1.1 304 Not modified' );
 165+ $contentType = '';
 166+ $text = '';
 167+ } elseif ( strval( $text ) == '' ) {
 168+ header( 'HTTP/1.1 500 Internal server error' );
 169+ header( 'Cache-Control: no-cache' );
 170+ list( $text, $contentType ) = wfCurlEmptyText( $ch );
 171+ } elseif ( $httpCode == 404 ) {
 172+ header( 'HTTP/1.1 404 Not found' );
 173+ header( 'Cache-Control: s-maxage=300, must-revalidate, max-age=0' );
 174+ } elseif ( $httpCode != 200
 175+ || substr( $contentType, 0, 9 ) == 'text/html'
 176+ || substr( $text, 0, 5 ) == '<html' )
 177+ {
 178+ # Error message, suppress cache
 179+ header( 'HTTP/1.1 500 Internal server error' );
 180+ header( 'Cache-Control: no-cache' );
 181+ }
 182+
 183+ if ( !$contentType ) {
 184+ header( 'Content-Type:' );
 185+ } else {
 186+ header( "Content-Type: $contentType" );
 187+ }
 188+
 189+ print $text; // thumb data or error text
 190+
 191+ curl_close( $ch );
 192+}
 193+
 194+/**
 195+ * Get error message and content type for when the cURL response is empty.
 196+ *
 197+ * @param $ch cURL handle
 198+ * @return Array (error html, content type)
 199+ */
 200+function wfCurlErrorText( $ch ) {
 201+ $contentType = 'text/html';
 202+ $error = htmlspecialchars( curl_error( $ch ) );
 203+ $text = <<<EOT
 204+<html>
 205+<head><title>Thumbnail error</title></head>
 206+<body>Error retrieving thumbnail from scaling server: $error</body>
 207+</html>
 208+EOT;
 209+ return array( $text, $contentType );
 210+}
 211+
 212+/**
 213+ * Get error message and content type for when the cURL response is an error.
 214+ *
 215+ * @param $ch cURL handle
 216+ * @return Array (error html, content type)
 217+ */
 218+function wfCurlEmptyText( $ch ) {
 219+ $contentType = 'text/html';
 220+ $error = htmlspecialchars( curl_error( $ch ) );
 221+ $text = <<<EOT
 222+<html>
 223+<head><title>Thumbnail error</title></head>
 224+<body>Error retrieving thumbnail from scaling server: empty response</body>
 225+</html>
 226+EOT;
 227+ return array( $text, $contentType );
 228+}
 229+
 230+# Entry point
 231+wfHandleThumb404();
Property changes on: trunk/phase3/thumb-handler.php
___________________________________________________________________
Added: svn:eol-style
1232 + native

Sign-offs

UserFlagDate
RussNelsoninspected19:31, 28 October 2011

Follow-up revisions

RevisionCommit summaryAuthorDate
r100577FU r100535:...aaron03:08, 24 October 2011
r100782FU r100535:...aaron04:13, 26 October 2011
r101077FU r100535: renamed thumb-handler.php to match the other files hereaaron22:42, 27 October 2011
r101117FU r100535:...aaron05:12, 28 October 2011

Comments

#Comment by MaxSem (talk | contribs)   17:29, 25 October 2011
  1. Lacks .php5 support.
  2. Why mimick Apache errors with funny HTML 2.0?
#Comment by Platonides (talk | contribs)   15:17, 27 October 2011

Why preg_match and not parse_url?

MWHttpRequest should be preferable to curl.

#Comment by Aaron Schulz (talk | contribs)   16:08, 27 October 2011

MWHttpRequest requires using loading MW, which is too slow.

#Comment by Platonides (talk | contribs)   19:44, 27 October 2011

Not so much, it only seems to need Status, CookieJar, MWException and a few globals. I think the cost would be the configuration loading.

#Comment by Aaron Schulz (talk | contribs)   19:46, 27 October 2011

That's what I mean, going throw WebStart.php and all that stuff.

#Comment by Platonides (talk | contribs)   19:56, 27 October 2011

WebStart? That isn't needed.

#Comment by Aaron Schulz (talk | contribs)   20:04, 27 October 2011

MWHttpRequest and the file it belongs to use globals like $wgHTTPProxy. You have to load autoloader, defaultsettings, localsettings, and everything else...pretty much what doMaintenance.php and most of WebStart.php do.

#Comment by Platonides (talk | contribs)   20:37, 27 October 2011

It uses the globals $wgCommandLineMode, $wgConf (this one may be problematic for a big site like WMF), $wgVersion, $wgHTTPTimeout, $wgHTTPProxy, $wgTitle (can be set to null)

#Comment by Platonides (talk | contribs)   20:58, 27 October 2011

Most files use CamelCase, with a few being lowercase in phase3. And a couple of odd ones have an underscore (img_auth.php and opensearch_desc.php).

Yet the files added here use two completely new conventions: thumb.config.sample thumb-handler.php

#Comment by Aaron Schulz (talk | contribs)   21:14, 27 October 2011

Would you prefer thumb_handler.php? I'd be OK with renaming it to that.

#Comment by Platonides (talk | contribs)   22:17, 27 October 2011

Yes, it would be more consistent. I'm not convinced of the usefulness of this for the end user, though. You have thumb.php and thumb-handler.php files, yet they are not needed for anything by default.

#Comment by Aaron Schulz (talk | contribs)   22:38, 27 October 2011

Of course neither can be used by default as a few lines of server config is needed. In any case, I'd prefer that we push thumbnail handling more than we currently do.

It's better to use 'transformVia404' to avoid a server having to make thumbnails while already saddled with parsing a page. In functions like prepareTextForEdit() or things that parse just to do links updates it's even more wasteful.

Using a 404 handler also makes thumbnails more robust, such as when a thumbnail wasn't made on parse for some reason due to intermittent failure.

Also, we should move towards standardizing thumb names more, as things like TiffHandler and OggHandler tack on their own little bits onto the thumb name to specify parameters. We can do the [thumb name] -> [thumb.php parameter] handling in PHP to deal with this (given a way to register extensions in the thumbnail config). It also means that site admins don't have to keep messing with server config to make things work.

#Comment by Platonides (talk | contribs)   16:39, 28 October 2011

Sure, but simple sites they would be served by just using thumb.php I am not convinced it's appropiate to also have thumb-handler here (of course we can link to it when documenting this).

#Comment by RussNelson (talk | contribs)   19:36, 28 October 2011

Even simple sites might like to have a 404 handler. It's kinda useful to be able to just link to whatever size thumbnail you want.

#Comment by Platonides (talk | contribs)   21:13, 28 October 2011

Sure, but I don't think they would need to pass that to another server.

#Comment by Platonides (talk | contribs)   21:12, 27 October 2011

Why does it use curl instead of calling thumb.php itself ? Is it for having a diferent pool of scalers? Something like an internal redirect looks more appropiae.

#Comment by Aaron Schulz (talk | contribs)   21:18, 27 October 2011

See http://svn.wikimedia.org/viewvc/mediawiki/trunk/tools/upload-scripts/thumb-handler.php?revision=100362&view=markup

Yes, it's used to forward requests to an LVS for dedicated thumb scalers (with proper image tools and RAM amounts installed).

#Comment by Platonides (talk | contribs)   22:14, 27 October 2011

Why didn't you preserve history when adding the files?

Still, a url rewrite could have been more suited to the case.

#Comment by Aaron Schulz (talk | contribs)   03:43, 8 December 2011

This was basically totally redone in r105512.

Status & tagging log