r76876 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r76875‎ | r76876 | r76877 >
Date:09:12, 17 November 2010
Author:aaron
Status:ok (Comments)
Tags:needs-php-test 
Comment:
* Reduced some pointless regex capture overhead
* Made preg_replace calls easier to verify w.r.t. captures
* (bug 25920) Made RE_IPV6_ADD avoid conditions on whether a named group matched anything, which requires PCRE 6.7+ (not bundled with PHP 5.1.0)
Modified paths:
  • /trunk/phase3/includes/IP.php (modified) (history)

Diff [purge]

Index: trunk/phase3/includes/IP.php
@@ -25,7 +25,7 @@
2626
2727 // An IPv4 address is made of 4 bytes from x00 to xFF which is d0 to d255
2828 define( 'RE_IP_BYTE', '(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|0?[0-9]?[0-9])' );
29 -define( 'RE_IP_ADD' , RE_IP_BYTE . '\.' . RE_IP_BYTE . '\.' . RE_IP_BYTE . '\.' . RE_IP_BYTE );
 29+define( 'RE_IP_ADD', RE_IP_BYTE . '\.' . RE_IP_BYTE . '\.' . RE_IP_BYTE . '\.' . RE_IP_BYTE );
3030 // An IPv4 block is an IP address and a prefix (d1 to d32)
3131 define( 'RE_IP_PREFIX', '(3[0-2]|[12]?\d)' );
3232 define( 'RE_IP_BLOCK', RE_IP_ADD . '\/' . RE_IP_PREFIX );
@@ -35,14 +35,18 @@
3636 define( 'RE_IPV6_WORD', '([0-9A-Fa-f]{1,4})' );
3737 define( 'RE_IPV6_PREFIX', '(12[0-8]|1[01][0-9]|[1-9]?\d)');
3838 define( 'RE_IPV6_ADD',
39 - '(' . // starts with "::" (includes the address "::")
40 - '(::|:(:' . RE_IPV6_WORD . '){1,7})' .
 39+ '(?:' . // starts with "::" (includes the address "::")
 40+ '::|:(?::' . RE_IPV6_WORD . '){1,7}' .
4141 '|' . // ends with "::" (not including the address "::")
42 - RE_IPV6_WORD . '(:' . RE_IPV6_WORD . '){0,6}::' .
 42+ RE_IPV6_WORD . '(?::' . RE_IPV6_WORD . '){0,6}::' .
4343 '|' . // has no "::"
44 - RE_IPV6_WORD . '(:' . RE_IPV6_WORD . '){7}' .
45 - '|' . // contains one "::" in the middle ("^" check always fails if no "::" found)
46 - RE_IPV6_WORD . '(:(?P<abbr>(?(abbr)|:))?' . RE_IPV6_WORD . '){1,6}(?(abbr)|^)' .
 44+ RE_IPV6_WORD . '(?::' . RE_IPV6_WORD . '){7}' .
 45+ '|' . // contains one "::" in the middle (awkward regex for PCRE 4.0+ compatibility)
 46+ RE_IPV6_WORD . '(?::(?!(?P=abn))(?P<abn>:(?P<iabn>))?' . RE_IPV6_WORD . '){1,6}(?P=iabn)' .
 47+ // NOTE: (?!(?P=abn)) fails iff "::" used twice; (?P=iabn) passes iff a "::" was found.
 48+
 49+ // Better regexp (PCRE 7.2+ only), allows intuitive regex concatenation
 50+ #RE_IPV6_WORD . '(?::((?(-1)|:))?' . RE_IPV6_WORD . '){1,6}(?(-2)|^)' .
4751 ')'
4852 );
4953 // An IPv6 block is an IP address and a prefix (d1 to d128)
@@ -54,9 +58,9 @@
5559 // This might be useful for regexps used elsewhere, matches any IPv6 or IPv6 address or network
5660 define( 'IP_ADDRESS_STRING',
5761 '(?:' .
58 - RE_IP_ADD . '(\/' . RE_IP_PREFIX . '|)' . // IPv4
 62+ RE_IP_ADD . '(?:\/' . RE_IP_PREFIX . ')?' . // IPv4
5963 '|' .
60 - RE_IPV6_ADD . '(\/' . RE_IPV6_PREFIX . '|)' . // IPv6
 64+ RE_IPV6_ADD . '(?:\/' . RE_IPV6_PREFIX . ')?' . // IPv6
6165 ')'
6266 );
6367
@@ -85,7 +89,7 @@
8690 * @return Boolean
8791 */
8892 public static function isIPv6( $ip ) {
89 - return (bool)preg_match( '/^' . RE_IPV6_ADD . '(\/' . RE_IPV6_PREFIX . '|)$/', $ip );
 93+ return (bool)preg_match( '/^' . RE_IPV6_ADD . '(?:\/' . RE_IPV6_PREFIX . ')?$/', $ip );
9094 }
9195
9296 /**
@@ -96,7 +100,7 @@
97101 * @return Boolean
98102 */
99103 public static function isIPv4( $ip ) {
100 - return (bool)preg_match( '/^' . RE_IP_ADD . '(\/' . RE_IP_PREFIX . '|)$/', $ip );
 104+ return (bool)preg_match( '/^' . RE_IP_ADD . '(?:\/' . RE_IP_PREFIX . ')?$/', $ip );
101105 }
102106
103107 /**
@@ -174,7 +178,7 @@
175179 );
176180 }
177181 // Remove leading zereos from each bloc as needed
178 - $ip = preg_replace( '/(^|:)0+' . RE_IPV6_WORD . '/', '$1$2', $ip );
 182+ $ip = preg_replace( '/(^|:)0+(' . RE_IPV6_WORD . ')/', '$1$2', $ip );
179183 return $ip;
180184 }
181185
@@ -217,7 +221,7 @@
218222 $ip_oct .= ':' . substr( $ip_hex, 4 * $n, 4 );
219223 }
220224 // NO leading zeroes
221 - $ip_oct = preg_replace( '/(^|:)0+' . RE_IPV6_WORD . '/', '$1$2', $ip_oct );
 225+ $ip_oct = preg_replace( '/(^|:)0+(' . RE_IPV6_WORD . ')/', '$1$2', $ip_oct );
222226 return $ip_oct;
223227 }
224228

Follow-up revisions

RevisionCommit summaryAuthorDate
r76915Merged r76876:76878 from trunkpdhanda23:10, 17 November 2010
r76928(bug 25920) Moved forward ref to a nested ref to really get v6 regex to compi...aaron00:25, 18 November 2010
r77067*(bug 25920) Moved forward ref to a back ref to really get v6 regex to compil...aaron10:49, 21 November 2010

Comments

#Comment by Happy-melon (talk | contribs)   14:43, 17 December 2010

Adding the (?: syntax to a regex already horribly full of colons doesn't help with comprehension. Are the performance improvements worthwhile?

#Comment by Hashar (talk | contribs)   20:33, 12 January 2011

You want to avoid capturing groups when you are not going to use them. The  ?: syntax is well known to regexp gurus. We might want to add /x to insert comments.

Note the worth performances in regexp comes from backreferences.

#Comment by Aaron Schulz (talk | contribs)   22:26, 17 December 2010

Note that the hardest part of the regex was redone in later revs.

Status & tagging log