r105512 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r105511‎ | r105512 | r105513 >
Date:03:43, 8 December 2011
Author:aaron
Status:resolved (Comments)
Tags:
Comment:
FU r101117: removed cURL thumb handler code and made thumb_handler.php a thin wrapper around thumb.php
* Moved original URL fetching code and parameter extraction code to thumb.php
* Made use of local repo URL and hash settings to avoid extra config code
* This makes it easy to add hooks for extensions/config to alter behavoir (ExtractThumbParameters hook added)
* Added FileRepo::getHashLevels()
Modified paths:
  • /trunk/phase3/docs/hooks.txt (modified) (history)
  • /trunk/phase3/includes/filerepo/FileRepo.php (modified) (history)
  • /trunk/phase3/thumb.config.sample (deleted) (history)
  • /trunk/phase3/thumb.php (modified) (history)
  • /trunk/phase3/thumb_handler.php (modified) (history)

Diff [purge]

Index: trunk/phase3/thumb.config.sample
@@ -1,49 +0,0 @@
2 -<?php
3 -/**
4 - * @cond file_level_code
5 - * This is not a valid entry point, perform no further processing unless THUMB_HANDLER is defined
6 - */
7 -if ( !defined( 'THUMB_HANDLER' ) ) {
8 - echo "This file is part of MediaWiki and is not a valid entry point\n";
9 - die( 1 );
10 -}
11 -
12 -/**
13 - * Sample configuration file for thumb-handler.php.
14 - * In order to use thumb-handler.php:
15 - * 1) Copy this file to thumb.config.php and modify the settings.
16 - * 2) The webserver must be setup to have thumb-handler.php as a 404 handler.
17 - * This can be done in apache by editing .htaccess in the /thumb directory by adding:
18 - * <IfModule rewrite_module>
19 - * RewriteEngine on
20 - * RewriteCond %{REQUEST_FILENAME} !-f
21 - * RewriteCond %{REQUEST_FILENAME} !-d
22 - * RewriteRule ^([^/]+/)?[0-9a-f]/[0-9a-f][0-9a-f]/[^/]+/[^/]+$ /path/to/thumb_handler.php [L]
23 - * </IfModule>
24 - */
25 -
26 -$thgThumbUrlMatch = array(
27 - # URL name of the server (e.g. "upload.wikipedia.org").
28 - 'server' => 'http://localhost',
29 - # URL fragment to the thumb/ directory
30 - 'dirFragment' => 'MW_trunk/images/thumb',
31 - # URL regex fragment correspond to the directory hashing of thumbnails.
32 - # This must correspond to $wgLocalFileRepo['hashLevels'].
33 - 'hashFragment' => '[0-9a-f]/[0-9a-f][0-9a-f]/' // 2-level directory hashing
34 -);
35 -
36 -$thgThumbCurlConfig = array(
37 - # Optionally cURL to thumb.php instead of using it directly
38 - 'enabled' => false,
39 - # The URL to thumb.php, accessible from the web server.
40 - 'url' => 'http://localhost/MW_trunk/thumb.php',
41 - # Optional proxy server to use to access thumb.php
42 - 'proxy' => null,
43 - # Timeout to use for cURL request to thumb.php.
44 - # Leave it long enough to generate a ulimit timeout in ordinary
45 - # cases, but short enough to avoid a local PHP timeout.
46 - 'timeout' => 53
47 -);
48 -
49 -# Custom functions for overriding aspects of thumb handling
50 -$thgThumbCallbacks = array();
Index: trunk/phase3/docs/hooks.txt
@@ -867,6 +867,10 @@
868868 'ExtensionTypes': called when generating the extensions credits, use this to change the tables headers
869869 &$extTypes: associative array of extensions types
870870
 871+'ExtractThumbParameters': called when extracting thumbnail parameters from a thumbnail file name
 872+$thumbname: the base name of the thumbnail file
 873+&$params: the currently extracted params (has source name, temp or archived zone)
 874+
871875 'FetchChangesList': When fetching the ChangesList derivative for
872876 a particular user
873877 $user: User the list is being fetched for
Index: trunk/phase3/thumb_handler.php
@@ -3,242 +3,21 @@
44 # Valid web server entry point
55 define( 'THUMB_HANDLER', true );
66
7 -# Load thumb-handler configuration. Avoids WebStart.php for performance.
8 -if ( !file_exists( dirname( __FILE__ ) . "/thumb.config.php" ) ) {
9 - die( "thumb_handler.php is not enabled for this wiki.\n" );
10 -}
11 -require( dirname( __FILE__ ) . "/thumb.config.php" );
12 -
13 -# Execute thumb.php if not handled via cURL
14 -if ( wfHandleThumb404Main() === 'wfThumbMain' ) {
 7+if ( $_SERVER['REQUEST_URI'] === $_SERVER['SCRIPT_NAME'] ) {
 8+ # Directly requesting this script is not a use case.
 9+ # Instead of giving a thumbnail error, give a generic 404.
 10+ wfDisplay404Error(); // go away, nothing to see here
 11+} else {
 12+ # Execute thumb.php, having set THUMB_HANDLER so that
 13+ # it knows to extract params from a thumbnail file URL.
1514 require( dirname( __FILE__ ) . '/thumb.php' );
1615 }
1716
18 -function wfHandleThumb404Main() {
19 - global $thgThumbCallbacks, $thgThumbCurlConfig;
20 -
21 - # lighttpd puts the original request in REQUEST_URI, while
22 - # sjs sets that to the 404 handler, and puts the original
23 - # request in REDIRECT_URL.
24 - if ( isset( $_SERVER['REDIRECT_URL'] ) ) {
25 - # The URL is un-encoded, so put it back how it was.
26 - $uri = str_replace( "%2F", "/", urlencode( $_SERVER['REDIRECT_URL'] ) );
27 - } else {
28 - $uri = $_SERVER['REQUEST_URI'];
29 - }
30 -
31 - # Extract thumb.php params from the URI...
32 - if ( isset( $thgThumbCallbacks['extractParams'] )
33 - && is_callable( $thgThumbCallbacks['extractParams'] ) ) // overridden by configuration?
34 - {
35 - $params = call_user_func_array( $thgThumbCallbacks['extractParams'], array( $uri ) );
36 - } else {
37 - $params = wfExtractThumbParams( $uri ); // basic wiki URL param extracting
38 - }
39 -
40 - # Show 404 error if this is not a valid thumb request...
41 - if ( !is_array( $params ) ) {
42 - header( 'X-Debug: no regex match' ); // useful for debugging
43 - if ( isset( $thgThumbCallbacks['error404'] )
44 - && is_callable( $thgThumbCallbacks['error404'] ) ) // overridden by configuration?
45 - {
46 - call_user_func( $thgThumbCallbacks['error404'] );
47 - } else {
48 - wfDisplay404Error(); // standard 404 message
49 - }
50 - return;
51 - }
52 -
53 - # Obtain and stream the thumbnail or setup for wfThumbMain() call...
54 - if ( $thgThumbCurlConfig['enabled'] ) {
55 - wfStreamThumbViaCurl( $params, $uri );
56 - return true; // done
57 - } else {
58 - $_REQUEST = $params; // pass params to thumb.php
59 - return 'wfThumbMain';
60 - }
61 -}
62 -
6317 /**
64 - * Extract the required params for thumb.php from the thumbnail request URI.
65 - * At least 'width' and 'f' should be set if the result is an array.
 18+ * Print out a generic 404 error message
6619 *
67 - * @param $uri String Thumbnail request URI
68 - * @return Array|null associative params array or null
69 - */
70 -function wfExtractThumbParams( $uri ) {
71 - global $thgThumbUrlMatch;
72 -
73 - $thumbRegex = '!^(?:' . preg_quote( $thgThumbUrlMatch['server'] ) . ')?/' .
74 - preg_quote( $thgThumbUrlMatch['dirFragment'] ) . '(/archive|/temp|)/' .
75 - $thgThumbUrlMatch['hashFragment'] . '([^/]*)/(page(\d*)-)*(\d*)px-[^/]*$!';
76 -
77 - if ( preg_match( $thumbRegex, $uri, $matches ) ) {
78 - list( $all, $archOrTemp, $filename, $pagefull, $pagenum, $size ) = $matches;
79 - $params = array( 'f' => $filename, 'width' => $size );
80 - if ( $pagenum ) {
81 - $params['page'] = $pagenum;
82 - }
83 - if ( $archOrTemp == '/archive' ) {
84 - $params['archived'] = 1;
85 - } elseif ( $archOrTemp == '/temp' ) {
86 - $params['temp'] = 1;
87 - }
88 - } else {
89 - $params = null; // not a valid thumbnail URL
90 - }
91 -
92 - return $params;
93 -}
94 -
95 -/**
96 - * cURL to thumb.php and stream back the resulting file or give an error message.
97 - *
98 - * @param $params Array Parameters to thumb.php
99 - * @param $uri String Thumbnail request URI
10020 * @return void
10121 */
102 -function wfStreamThumbViaCurl( array $params, $uri ) {
103 - global $thgThumbCallbacks, $thgThumbCurlConfig;
104 -
105 - # Check any backend caches for the thumbnail...
106 - if ( isset( $thgThumbCallbacks['checkCache'] )
107 - && is_callable( $thgThumbCallbacks['checkCache'] ) )
108 - {
109 - if ( call_user_func_array( $thgThumbCallbacks['checkCache'], array( $uri, $params ) ) ) {
110 - return; // file streamed from backend thumb cache
111 - }
112 - }
113 -
114 - if ( !extension_loaded( 'curl' ) ) {
115 - die( "cURL is not enabled for PHP on this wiki.\n" ); // sanity
116 - }
117 -
118 - # Build up the request URL to use with CURL...
119 - $reqURL = $thgThumbCurlConfig['url'] . '?';
120 - $first = true;
121 - foreach ( $params as $name => $value ) {
122 - if ( $first ) {
123 - $first = false;
124 - } else {
125 - $reqURL .= '&';
126 - }
127 - $reqURL .= "$name=$value"; // Note: value is already urlencoded
128 - }
129 -
130 - # Set relevant HTTP headers...
131 - $headers = array();
132 - $headers[] = "X-Original-URI: " . str_replace( "\n", '', $uri );
133 - if ( isset( $thgThumbCallbacks['curlHeaders'] )
134 - && is_callable( $thgThumbCallbacks['curlHeaders'] ) )
135 - {
136 - # Add on any custom headers (like XFF)
137 - call_user_func_array( $thgThumbCallbacks['curlHeaders'], array( &$headers ) );
138 - }
139 -
140 - # Pass through some other headers...
141 - $passThrough = array( 'If-Modified-Since', 'Referer', 'User-Agent' );
142 - foreach ( $passThrough as $headerName ) {
143 - $serverVarName = 'HTTP_' . str_replace( '-', '_', strtoupper( $headerName ) );
144 - if ( !empty( $_SERVER[$serverVarName] ) ) {
145 - $headers[] = $headerName . ': ' .
146 - str_replace( "\n", '', $_SERVER[$serverVarName] );
147 - }
148 - }
149 -
150 - $ch = curl_init( $reqURL );
151 - if ( $thgThumbCurlConfig['proxy'] ) {
152 - curl_setopt( $ch, CURLOPT_PROXY, $thgThumbCurlConfig['proxy'] );
153 - }
154 -
155 - curl_setopt( $ch, CURLOPT_HTTPHEADER, $headers );
156 - curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true );
157 - curl_setopt( $ch, CURLOPT_TIMEOUT, $thgThumbCurlConfig['timeout'] );
158 -
159 - # Actually make the request
160 - $text = curl_exec( $ch );
161 -
162 - # Send it on to the client...
163 - $errno = curl_errno( $ch );
164 - $contentType = curl_getinfo( $ch, CURLINFO_CONTENT_TYPE );
165 - $httpCode = curl_getinfo( $ch, CURLINFO_HTTP_CODE );
166 - if ( $errno ) {
167 - header( 'HTTP/1.1 500 Internal server error' );
168 - header( 'Cache-Control: no-cache' );
169 - $contentType = 'text/html';
170 - $text = wfCurlErrorText( $ch );
171 - } elseif ( $httpCode == 304 ) { // OK
172 - header( 'HTTP/1.1 304 Not modified' );
173 - $contentType = '';
174 - $text = '';
175 - } elseif ( strval( $text ) == '' ) {
176 - header( 'HTTP/1.1 500 Internal server error' );
177 - header( 'Cache-Control: no-cache' );
178 - $contentType = 'text/html';
179 - $text = wfCurlEmptyText( $ch );
180 - } elseif ( $httpCode == 404 ) {
181 - header( 'HTTP/1.1 404 Not found' );
182 - header( 'Cache-Control: s-maxage=300, must-revalidate, max-age=0' );
183 - } elseif ( $httpCode != 200 || substr( $contentType, 0, 9 ) == 'text/html' ) {
184 - # Error message, suppress cache
185 - header( 'HTTP/1.1 500 Internal server error' );
186 - header( 'Cache-Control: no-cache' );
187 - } else {
188 - # OK thumbnail; save to any backend caches...
189 - if ( isset( $thgThumbCallbacks['fillCache'] )
190 - && is_callable( $thgThumbCallbacks['fillCache'] ) )
191 - {
192 - call_user_func_array( $thgThumbCallbacks['fillCache'], array( $uri, $text ) );
193 - }
194 - }
195 -
196 - if ( !$contentType ) {
197 - header( 'Content-Type:' );
198 - } else {
199 - header( "Content-Type: $contentType" );
200 - }
201 -
202 - print $text; // thumb data or error text
203 -
204 - curl_close( $ch );
205 -}
206 -
207 -/**
208 - * Get error message HTML for when the cURL response is an error.
209 - *
210 - * @param $ch cURL handle
211 - * @return string
212 - */
213 -function wfCurlErrorText( $ch ) {
214 - $error = htmlspecialchars( curl_error( $ch ) );
215 - return <<<EOT
216 -<html>
217 -<head><title>Thumbnail error</title></head>
218 -<body>Error retrieving thumbnail from scaling server: $error</body>
219 -</html>
220 -EOT;
221 -}
222 -
223 -/**
224 - * Get error message HTML for when the cURL response is empty.
225 - *
226 - * @param $ch cURL handle
227 - * @return string
228 - */
229 -function wfCurlEmptyText( $ch ) {
230 - return <<<EOT
231 -<html>
232 -<head><title>Thumbnail error</title></head>
233 -<body>Error retrieving thumbnail from scaling server: empty response</body>
234 -</html>
235 -EOT;
236 -}
237 -
238 -/**
239 - * Print out a generic 404 error message.
240 - *
241 - * @return void
242 - */
24322 function wfDisplay404Error() {
24423 header( 'HTTP/1.1 404 Not Found' );
24524 header( 'Content-Type: text/html;charset=utf-8' );
Index: trunk/phase3/includes/filerepo/FileRepo.php
@@ -277,6 +277,15 @@
278278 }
279279
280280 /**
 281+ * Get the number of hash directory levels
 282+ *
 283+ * @return integer
 284+ */
 285+ function getHashLevels() {
 286+ return $this->hashLevels;
 287+ }
 288+
 289+ /**
281290 * Get a relative path including trailing slash, e.g. f/fa/
282291 * If the repo is not hashed, returns an empty string
283292 *
Index: trunk/phase3/thumb.php
@@ -13,27 +13,69 @@
1414 require ( dirname( __FILE__ ) . '/includes/WebStart.php' );
1515 }
1616
17 -$wgTrivialMimeDetection = true; //don't use fancy mime detection, just check the file extension for jpg/gif/png.
 17+// Don't use fancy mime detection, just check the file extension for jpg/gif/png
 18+$wgTrivialMimeDetection = true;
1819
19 -wfThumbMain();
 20+if ( defined( 'THUMB_HANDLER' ) ) {
 21+ // Called from thumb_handler.php via 404; extract params from the URI...
 22+ wfThumbHandle404();
 23+} else {
 24+ // Called directly, use $_REQUEST params
 25+ wfThumbHandleRequest();
 26+}
2027 wfLogProfilingData();
2128
2229 //--------------------------------------------------------------------------
2330
24 -function wfThumbMain() {
25 - wfProfileIn( __METHOD__ );
 31+/**
 32+ * Handle a thumbnail request via query parameters
 33+ *
 34+ * @return void
 35+ */
 36+function wfThumbHandleRequest() {
 37+ $params = get_magic_quotes_gpc()
 38+ ? array_map( 'stripslashes', $_REQUEST )
 39+ : $_REQUEST;
2640
27 - $headers = array();
 41+ wfStreamThumb( $params ); // stream the thumbnail
 42+}
2843
29 - // Get input parameters
30 - if ( defined( 'THUMB_HANDLER' ) ) {
31 - $params = $_REQUEST; // called from thumb_handler.php
 44+/**
 45+ * Handle a thumbnail request via thumbnail file URL
 46+ *
 47+ * @return void
 48+ */
 49+function wfThumbHandle404() {
 50+ # lighttpd puts the original request in REQUEST_URI, while
 51+ # sjs sets that to the 404 handler, and puts the original
 52+ # request in REDIRECT_URL.
 53+ if ( isset( $_SERVER['REDIRECT_URL'] ) ) {
 54+ # The URL is un-encoded, so put it back how it was.
 55+ $uri = str_replace( "%2F", "/", urlencode( $_SERVER['REDIRECT_URL'] ) );
3256 } else {
33 - $params = get_magic_quotes_gpc()
34 - ? array_map( 'stripslashes', $_REQUEST )
35 - : $_REQUEST;
 57+ $uri = $_SERVER['REQUEST_URI'];
3658 }
3759
 60+ $params = wfExtractThumbParams( $uri ); // basic wiki URL param extracting
 61+ if ( $params == null ) {
 62+ wfThumbError( 404, 'The source file for the specified thumbnail does not exist.' );
 63+ return;
 64+ }
 65+
 66+ wfStreamThumb( $params ); // stream the thumbnail
 67+}
 68+
 69+/**
 70+ * Stream a thumbnail specified by parameters
 71+ *
 72+ * @param $params Array
 73+ * @return void
 74+ */
 75+function wfStreamThumb( array $params ) {
 76+ wfProfileIn( __METHOD__ );
 77+
 78+ $headers = array(); // HTTP headers to send
 79+
3880 $fileName = isset( $params['f'] ) ? $params['f'] : '';
3981 unset( $params['f'] );
4082
@@ -64,7 +106,7 @@
65107 return;
66108 }
67109 $title = Title::makeTitleSafe( NS_FILE, $bits[1] );
68 - if ( is_null( $title ) ) {
 110+ if ( !$title ) {
69111 wfThumbError( 404, wfMsg( 'badtitletext' ) );
70112 wfProfileOut( __METHOD__ );
71113 return;
@@ -169,11 +211,63 @@
170212 }
171213
172214 /**
173 - * @param $status
174 - * @param $msg
 215+ * Extract the required params for thumb.php from the thumbnail request URI.
 216+ * At least 'width' and 'f' should be set if the result is an array.
 217+ *
 218+ * @param $uri String Thumbnail request URI
 219+ * @return Array|null associative params array or null
175220 */
 221+function wfExtractThumbParams( $uri ) {
 222+ $repo = RepoGroup::singleton()->getLocalRepo();
 223+
 224+ $hashDirRegex = $subdirRegex = '';
 225+ for ( $i = 0; $i < $repo->getHashLevels(); $i++ ) {
 226+ $subdirRegex .= '[0-9a-f]';
 227+ $hashDirRegex .= "$subdirRegex/";
 228+ }
 229+ $zoneUrlRegex = preg_quote( $repo->getZoneUrl( 'thumb' ) );
 230+
 231+ $thumbUrlRegex = "!^$zoneUrlRegex(/archive|/temp|)/$hashDirRegex([^/]*)/([^/]*)$!";
 232+
 233+ // Check if this is a valid looking thumbnail request...
 234+ if ( preg_match( $thumbUrlRegex, $uri, $matches ) ) {
 235+ list( /* all */, $archOrTemp, $filename, $thumbname ) = $matches;
 236+
 237+ $params = array( 'f' => $filename );
 238+ if ( $archOrTemp == '/archive' ) {
 239+ $params['archived'] = 1;
 240+ } elseif ( $archOrTemp == '/temp' ) {
 241+ $params['temp'] = 1;
 242+ }
 243+
 244+ // Check if the parameters can be extracted from the thumbnail name...
 245+ // @TODO: remove 'page' stuff and make ProofreadPage handle it via hook.
 246+ if ( preg_match( '!^(page(\d*)-)*(\d*)px-[^/]*$!', $thumbname, $matches ) ) {
 247+ list( /* all */, $pagefull, $pagenum, $size ) = $matches;
 248+ $params['width'] = $size;
 249+ if ( $pagenum ) {
 250+ $params['page'] = $pagenum;
 251+ }
 252+ return $params; // valid thumbnail URL
 253+ // Hooks return false if they manage to *resolve* the parameters
 254+ } elseif ( !wfRunHooks( 'ExtractThumbParameters', array( $thumbname, &$params ) ) ) {
 255+ return $params; // valid thumbnail URL (via extension or config)
 256+ }
 257+ }
 258+
 259+ return null; // not a valid thumbnail URL
 260+}
 261+
 262+/**
 263+ * Output a thumbnail generation error message
 264+ *
 265+ * @param $status integer
 266+ * @param $msg string
 267+ * @return void
 268+ */
176269 function wfThumbError( $status, $msg ) {
177270 global $wgShowHostnames;
 271+
178272 header( 'Cache-Control: no-cache' );
179273 header( 'Content-Type: text/html; charset=utf-8' );
180274 if ( $status == 404 ) {

Sign-offs

UserFlagDate
Hasharinspected21:17, 11 January 2012

Follow-up revisions

RevisionCommit summaryAuthorDate
r105516FU r105512: just always use thumb.php style errorsaaron04:58, 8 December 2011
r107020FU r105512: urldecode() the file and thumb name in wfExtractThumbParams() for...aaron00:43, 22 December 2011
r108937r105512: Handle REDIRECT_URL discrepancies and always work with URI paths for...aaron19:17, 14 January 2012

Past revisions this follows-up on

RevisionCommit summaryAuthorDate
r101117FU r100535:...aaron05:12, 28 October 2011

Comments

#Comment by Aaron Schulz (talk | contribs)   04:16, 8 December 2011

The $_SERVER['SCRIPT_NAME'] doesn't actually catch URLs with query strings (they give regular thumbnail errors).

#Comment by Platonides (talk | contribs)   14:47, 10 December 2011

Why not directly kill thumb_handler.php and make thumb.php extract the name if it's given them through eg. PATH_INFO ?

#Comment by Aaron Schulz (talk | contribs)   17:11, 10 December 2011

Do you have a patch in mind?

#Comment by Platonides (talk | contribs)   23:31, 10 December 2011

I had some idea of replacing the old thumb_handler.php with a mod_rewrite. It should be easier with this recivion, as the extraction would be done by the script. But i'd need to test the behavior.

Status & tagging log