r94881 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r94880‎ | r94881 | r94882 >
Date:10:33, 18 August 2011
Author:marooned
Status:resolved (Comments)
Tags:
Comment:
Relative values of the "Location" header are incorrect as stated in RFC, however they do happen and modern browsers support them.
This function loops backwards through all locations in order to build the proper absolute URI - Marooned at wikia-inc.com
Modified paths:
  • /trunk/phase3/includes/HttpFunctions.php (modified) (history)

Diff [purge]

Index: trunk/phase3/includes/HttpFunctions.php
@@ -569,13 +569,44 @@
570570 /**
571571 * Returns the final URL after all redirections.
572572 *
573 - * @return String
 573+ * Relative values of the "Location" header are incorrect as stated in RFC, however they do happen and modern browsers support them.
 574+ * This function loops backwards through all locations in order to build the proper absolute URI - Marooned at wikia-inc.com
 575+ *
 576+ * @returns string
574577 */
575578 public function getFinalUrl() {
576 - $location = $this->getResponseHeader( "Location" );
 579+ $headers = $this->getResponseHeaders();
577580
578 - if ( $location ) {
579 - return $location;
 581+ //return full url (fix for incorrect but handled relative location)
 582+ if ( isset( $headers[ 'location' ] ) ) {
 583+ $locations = $headers[ 'location' ];
 584+ $domain = '';
 585+ $foundRelativeURI = false;
 586+ $countLocations = count($locations);
 587+
 588+ for ( $i = $countLocations - 1; $i >= 0; $i-- ) {
 589+ $url = parse_url( $locations[ $i ] );
 590+
 591+ if ( isset($url[ 'host' ]) ) {
 592+ $domain = $url[ 'scheme' ] . '://' . $url[ 'host' ];
 593+ break; //found correct URI (with host)
 594+ } else {
 595+ $foundRelativeURI = true;
 596+ }
 597+ }
 598+
 599+ if ( $foundRelativeURI ) {
 600+ if ( $domain ) {
 601+ return $domain . $locations[ $countLocations - 1 ];
 602+ } else {
 603+ $url = parse_url( $this->url );
 604+ if ( isset($url[ 'host' ]) ) {
 605+ return $url[ 'scheme' ] . '://' . $url[ 'host' ] . $locations[ $countLocations - 1 ];
 606+ }
 607+ }
 608+ } else {
 609+ return $locations[ $countLocations - 1 ];
 610+ }
580611 }
581612
582613 return $this->url;

Follow-up revisions

RevisionCommit summaryAuthorDate
r106948tests for r94881 which interprets relative Location: headershashar15:31, 21 December 2011
r107121Add comments explaining that the funky multiple Location headers stuff is a C...brion23:12, 22 December 2011

Comments

#Comment by Marooned~mediawikiwiki (talk | contribs)   10:36, 18 August 2011

http://borntoolatevintage.com/ is a real life example, it has 2 redirects:

a) first: http://www.artfire.com/
b) second: /ext/shop/studio/BornTooLateVintage

so final URL is http://www.artfire.com/ext/shop/studio/BornTooLateVintage

#Comment by Catrope (talk | contribs)   14:55, 18 August 2011

This code uses parse_url(), which is subtly broken in a few ways and throws PHP warnings for invalid input. You might be better off using wfParseUrl() instead, see its documentation in GlobalFunctions.php

#Comment by Marooned MB (talk | contribs)   16:05, 18 August 2011

Good to know. However, one function above (parseCookies) uses getFinalUrl and parse_url so this function was already used in the flow.

#Comment by MZMcBride (talk | contribs)   18:10, 18 August 2011

Does this have or need unit tests?

#Comment by Marooned~mediawikiwiki (talk | contribs)   18:45, 18 August 2011

http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/tests/phpunit/includes/HttpTest.php Can't see related unit tests but perhaps it's a good idea to create them.

#Comment by Marooned~mediawikiwiki (talk | contribs)   16:08, 22 August 2011

Catrope: using wfParseUrl rise a notice: "Undefined index: scheme" when passed URL has only relative part of the path (eg. "/some/folder/foo") so I'll stick with parse_url() - "Location" should not have protocols like "news:" so it should be safe to leave PHP native function.

#Comment by Hashar (talk | contribs)   15:31, 21 December 2011

Looks good to me. PHP Tests added by r106948.

#Comment by Brion VIBBER (talk | contribs)   19:28, 21 December 2011

This feels wrong; inline documentation seems to indicate that $this->respHeaders in a MWHttpRequest will hold response headers from a single response, using headers in case some values exist multiple times.

After following a redirect then, we should only have one 'Location' header from the latest request. Multiple Location headers are not allowed as far as I know, so looping over the single Location header shouldn't give you much information.

Is something merging the returned headers from every followed request when doing redirects or something? This could end up giving some weird bogus results in general...

#Comment by Brion VIBBER (talk | contribs)   19:30, 21 December 2011

s/using headers/using sub-arrays/

#Comment by Marooned~mediawikiwiki (talk | contribs)   09:58, 22 December 2011

I can't remember the site causing troubles but I've fixed that functions because of a real case. I got multiple Location headers from getResponseHeaders() and some of them were relative.

#Comment by Marooned~mediawikiwiki (talk | contribs)   12:14, 22 December 2011

Found it - the site that at that time caused multiple redirections was: http://borntoolatevintage.com/

I can see it still does 2 redirects http://borntoolatevintage.com/ => http://www.artfire.com/users/BornTooLateVintage => http://www.artfire.com/ext/shop/studio/BornTooLateVintage though haven't checked if the latter is still relative or was changed into absolute.

#Comment by Hashar (talk | contribs)   12:51, 22 December 2011

By setting CURLOPT_HEADERFUNCTION we can pass any header to a callback:

       $this->curlOptions[CURLOPT_HEADERFUNCTION] = array( $this, "readHeader" );

The callback just append received header one after the other in a huge string. It is then exploded by \r\n to an array.

   protected function readHeader( $fh, $content ) {
       $this->headerText .= $content;
                        ^^^^
       return strlen( $content );
   }

So internally, the Location: header content is really an array of any Location: lines curl might have encountered while following redirections.


I have setup a testing env on my computer using existing virtual hosts and just creating dummy PHP redirection files.

http://trunk.dev/redir.php redirects to http://mw18.dev/redir.php http://mw18.dev/redir.php redirects to /landing.php

h = MWHttpRequest::factory( 'http://trunk.dev/redir.php', array( 'followRedirects'=>true) ); $h->execute(); print $h->getFinalUrl();

The header representation for 'location' is:

   [location] => Array
       (
           [0] => http://mw18.dev/redir.php
           [1] => /landing.php
       )

So previously we would receive /landing.php as a Location and one could assume it is relative to the requested URL host (trunk.dev) whereas it is relative to the host of the previous URL having a host.

#Comment by Brion VIBBER (talk | contribs)   23:05, 22 December 2011

Ugh, that looks like a bug/misfeature of the CURL stuff!

bug 29232 (rewriting the redirect handling at a higher level) would avoid this. I'll add a comment about the case that this rev is for...

#Comment by Hashar (talk | contribs)   23:15, 22 December 2011

CURL just give us whatever new headers he received after following a location and pass that to our callback. I guess the issue is in our callback that should not just blindly append the headers but should update them instead. i.e. we probably have to rewrite all that stuff to better support Location :-)

#Comment by Marooned~mediawikiwiki (talk | contribs)   23:18, 22 December 2011

"we probably have to rewrite all that stuff to better support Location" => sounds like my commit was a part of that "rewrite" process ;)

#Comment by Hashar (talk | contribs)   01:14, 23 December 2011

Yes totally :-) I just think it should be made sooner in the workflow though, but your code is fine nonetheless. Thanks!

#Comment by Brion VIBBER (talk | contribs)   23:13, 22 December 2011

Added an explanatory comment in r107121 -- the multiple headers are a CURL artifact.

Status & tagging log