r67994 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r67993‎ | r67994 | r67995 >
Date:13:29, 14 June 2010
Author:tstarling
Status:deferred
Tags:
Comment:
* Added Thai word break support. Adds a libthai dependency.
* Converted to a native PHP extension, ditched Swig, removed standalone mode.
* Use PHP's allocation functions instead of new/free, to allow for memory limiting. This means lots of typedefs to get the standard library containers using a custom allocator.
* Converted _DiffEngine::seq from a map to a vector, for a 2x speedup in certain test cases
* Removed outdated RedHat stuff. Debian directory not yet updated.
* Refactored global functions in wikidiff2.cpp into a class. Renamed some things.
* Updated test files for r22205 and r22229
Modified paths:
  • /trunk/extensions/wikidiff2/CREDITS (added) (history)
  • /trunk/extensions/wikidiff2/DiffEngine.h (modified) (history)
  • /trunk/extensions/wikidiff2/Makefile (deleted) (history)
  • /trunk/extensions/wikidiff2/README (modified) (history)
  • /trunk/extensions/wikidiff2/Word.h (modified) (history)
  • /trunk/extensions/wikidiff2/compile.sh (deleted) (history)
  • /trunk/extensions/wikidiff2/config.m4 (added) (history)
  • /trunk/extensions/wikidiff2/php_cpp_allocator.h (added) (history)
  • /trunk/extensions/wikidiff2/php_wikidiff2.cpp (added) (history)
  • /trunk/extensions/wikidiff2/php_wikidiff2.h (added) (history)
  • /trunk/extensions/wikidiff2/release.sh (deleted) (history)
  • /trunk/extensions/wikidiff2/standalone.cpp (deleted) (history)
  • /trunk/extensions/wikidiff2/test/test-a.diff (modified) (history)
  • /trunk/extensions/wikidiff2/test/test-b.diff (modified) (history)
  • /trunk/extensions/wikidiff2/tests (added) (history)
  • /trunk/extensions/wikidiff2/tests/001.phpt (added) (history)
  • /trunk/extensions/wikidiff2/wikidiff2.cpp (modified) (history)
  • /trunk/extensions/wikidiff2/wikidiff2.h (modified) (history)
  • /trunk/extensions/wikidiff2/wikidiff2.i (deleted) (history)
  • /trunk/extensions/wikidiff2/wikidiff2.spec (deleted) (history)

Diff [purge]

Index: trunk/extensions/wikidiff2/compile.sh
@@ -1,4 +0,0 @@
2 -#! /bin/sh
3 -swig -php4 -c++ wikidiff2.i
4 -g++ -O2 `php-config --includes` -shared -o php_wikidiff2.so wikidiff2.cpp wikidiff2_wrap.cpp
5 -
Index: trunk/extensions/wikidiff2/wikidiff2.i
@@ -1,15 +0,0 @@
2 -%module wikidiff2
3 -
4 -// Need to free the string to prevent memory leak
5 -%typemap(out) char * %{
6 - if(!$1) {
7 - ZVAL_NULL(return_value);
8 - } else {
9 - ZVAL_STRING(return_value,$1, 1);
10 - free($1);
11 - }
12 -%}
13 -
14 -%inline {
15 - const char *wikidiff2_do_diff(const char *text1, const char *text2, int num_lines_context);
16 -}
Index: trunk/extensions/wikidiff2/wikidiff2.spec
@@ -1,32 +0,0 @@
2 -Summary: PHP extension and standalone application to do fast word-level diffs
3 -Name: wikidiff2
4 -Version: VERSION
5 -Release: 2
6 -License: GPL
7 -Group: Applications/Internet
8 -Source: wikidiff2-VERSION.tar.gz
9 -BuildRoot: /var/tmp/%{name}-buildroot
10 -
11 -%description
12 -PHP extension and standalone application to do fast word-level diffs.
13 -(Packaged for Wikimedia local use!)
14 -
15 -%prep
16 -%setup -q
17 -
18 -%build
19 -make
20 -
21 -%install
22 -rm -rf $RPM_BUILD_ROOT
23 -INSTALL_TARGET="$RPM_BUILD_ROOT" make install
24 -
25 -%clean
26 -rm -rf $RPM_BUILD_ROOT
27 -
28 -%files
29 -%defattr(-,root,root)
30 -%dir /usr/local/lib/php/extensions/no-debug-non-zts-20060613
31 -
32 -/usr/local/lib/php/extensions/no-debug-non-zts-20060613/php_wikidiff2.so
33 -
Index: trunk/extensions/wikidiff2/release.sh
@@ -1,13 +0,0 @@
2 -#!/bin/bash
3 -
4 -if [ X$1 == X ];then
5 - echo "Usage: release.sh <version>"
6 -fi
7 -
8 -sed "s/VERSION/$1/" wikidiff2.spec > /usr/src/redhat/SPECS/wikidiff2.spec
9 -mkdir -p /usr/src/redhat/SOURCES/wikidiff2-$1
10 -cp Makefile standalone.cpp wikidiff2.cpp wikidiff2.h wikidiff2.i wikidiff2.spec DiffEngine.h Word.h JudyHS.h /usr/src/redhat/SOURCES/wikidiff2-$1
11 -cd /usr/src/redhat/SOURCES
12 -tar -czf wikidiff2-$1.tar.gz wikidiff2-$1
13 -rpmbuild -ba ../SPECS/wikidiff2.spec
14 -
Index: trunk/extensions/wikidiff2/standalone.cpp
@@ -1,53 +0,0 @@
2 -#include <stdio.h>
3 -#include <sys/stat.h>
4 -#include <cstring>
5 -#include "wikidiff2.h"
6 -
7 -/**
8 - * Standalone (i.e. PHP-free) application to produce HTML-formatted word-level diffs from two files
9 - */
10 -
11 -
12 -void report_file_error(char* filename)
13 -{
14 - char errorFormat[] = "Error opening file \"%s\"";
15 - char * error = new char[strlen(filename) + sizeof(errorFormat)];
16 - sprintf(error, errorFormat, filename);
17 - perror(error);
18 - delete[] error;
19 - exit(1);
20 -}
21 -
22 -char* file_get_contents(char* filename)
23 -{
24 - struct stat s;
25 - char* buffer;
26 - if (stat(filename, &s)) {
27 - report_file_error(filename);
28 - }
29 - FILE * file = fopen(filename, "rb");
30 - if (!file) {
31 - report_file_error(filename);
32 - }
33 - buffer = new char[s.st_size + 1];
34 - size_t bytes_read = fread(buffer, 1, s.st_size, file);
35 - buffer[bytes_read] = '\0';
36 - fclose(file);
37 - return buffer;
38 -}
39 -
40 -int main(int argc, char** argv)
41 -{
42 - if (argc != 3) {
43 - printf("Usage: wikidiff2 <file1> <file2>\n");
44 - exit(1);
45 - }
46 -
47 - char *buffer1 = file_get_contents(argv[1]);
48 - char *buffer2 = file_get_contents(argv[2]);
49 - const char *diff = wikidiff2_do_diff(buffer1, buffer2, 2);
50 - fputs(diff, stdout);
51 - return 0;
52 -}
53 -
54 -
Index: trunk/extensions/wikidiff2/Makefile
@@ -1,84 +0,0 @@
2 -PHP_EXT_DIR=`php-config --extension-dir`
3 -PRODUCT=wikidiff2
4 -VERSION=1.0.3
5 -CXX?=g++
6 -
7 -# For Linux
8 -SHARED = -shared -fPIC
9 -
10 -# For Mac OS X
11 -# SHARED = -bundle
12 -
13 -OUTPUT=php_$(PRODUCT).so
14 -STANDALONE=$(PRODUCT)
15 -#LIBDIRS=-L/usr/local/lib
16 -#LIBS=-lJudy
17 -LIBS=
18 -LIBDIRS=
19 -
20 -TMPDIST=$(PRODUCT)-$(VERSION)
21 -DISTDIRS=test debian
22 -DISTFILES=Makefile \
23 - $(PRODUCT).spec compile.sh release.sh \
24 - DiffEngine.h JudyHS.h Word.h wikidiff2.h \
25 - judy_test.cpp standalone.cpp \
26 - $(PRODUCT).cpp $(PRODUCT).i \
27 - $(PRODUCT)_wrap.cpp php_$(PRODUCT).h wikidiff2.php \
28 - test/chinese-reverse.zip \
29 - test/test-a.diff \
30 - test/test-a1 \
31 - test/test-a2 \
32 - test/test-b.diff \
33 - test/test-b1 \
34 - test/test-b2 \
35 - debian/control \
36 - debian/compat \
37 - debian/changelog \
38 - debian/copyright \
39 - debian/rules
40 -
41 -$(OUTPUT) : $(PRODUCT).cpp $(PRODUCT)_wrap.cpp
42 - $(CXX) -O2 `php-config --includes` $(SHARED) -o $@ $(PRODUCT).cpp $(PRODUCT)_wrap.cpp
43 -
44 -.PHONY: standalone
45 -standalone:
46 - $(CXX) -o $(STANDALONE) -O3 $(PRODUCT).cpp standalone.cpp $(LIBS) $(LIBDIRS)
47 -
48 -# The below _almost_ works. It gets unresolved symbol errors on load looking for _compiler_globals.
49 -# MACOSX_DEPLOYMENT_TARGET=10.3 g++ -O2 `php-config --includes` $(SHARED) -o php_wikidiff2.so wikidiff2.cpp wikidiff2_wrap.cpp -undefined dynamic_lookup
50 -
51 -test.php : $(PRODUCT)_wrap.cpp
52 -
53 -$(PRODUCT)_wrap.cpp : $(PRODUCT).i
54 - swig -php4 -c++ $(PRODUCT).i
55 -
56 -install : $(OUTPUT)
57 - install -d "$(INSTALL_TARGET)$(PHP_EXT_DIR)"
58 - install -m 0755 $(OUTPUT) "$(INSTALL_TARGET)$(PHP_EXT_DIR)"
59 -
60 -uninstall :
61 - rm -f "$(INSTALL_TARGET)$(PHP_EXT_DIR)"/$(OUTPUT)
62 -
63 -clean :
64 - rm -f $(OUTPUT)
65 - rm -f $(PRODUCT)_wrap.cpp
66 - rm -f $(PRODUCT).php
67 -
68 -test : $(OUTPUT)
69 - php test.php
70 -
71 -distclean : clean
72 - rm -rf $(TMPDIST)
73 - rm -f $(TMPDIST).tar.gz
74 -
75 -dist : $(DISTFILES) Makefile
76 - rm -rf $(TMPDIST)
77 - mkdir $(TMPDIST)
78 - for x in $(DISTDIRS); do mkdir $(TMPDIST)/$$x; done
79 - for x in $(DISTFILES); do cp -p $$x $(TMPDIST)/$$x; done
80 - tar zcvf $(TMPDIST).tar.gz $(TMPDIST)
81 -
82 -rpm : dist
83 - cp $(TMPDIST).tar.gz /usr/src/redhat/SOURCES
84 - cp $(PRODUCT).spec /usr/src/redhat/SPECS/$(PRODUCT)-$(VERSION).spec
85 - cd /usr/src/redhat/SPECS && rpmbuild -ba $(PRODUCT)-$(VERSION).spec
Index: trunk/extensions/wikidiff2/php_cpp_allocator.h
@@ -0,0 +1,39 @@
 2+#ifndef PHP_CPP_ALLOCATOR_H
 3+#define PHP_CPP_ALLOCATOR_H
 4+
 5+#include <memory>
 6+#include "php.h"
 7+
 8+/**
 9+ * Allocation class which allows various C++ standard library functions
 10+ * to allocate and free memory using PHP's emalloc/efree facilities.
 11+ */
 12+template <class T>
 13+class PhpAllocator : public std::allocator<T>
 14+{
 15+ public:
 16+ // Make some typedefs to avoid having to use "typename" everywhere
 17+ typedef typename std::allocator<T>::pointer pointer;
 18+ typedef typename std::allocator<T>::size_type size_type;
 19+
 20+ // The rebind member allows callers to get allocators for other types,
 21+ // given a specialised allocator
 22+ template <class U> struct rebind { typedef PhpAllocator<U> other; };
 23+
 24+ // Various constructors that do nothing
 25+ PhpAllocator() throw() {}
 26+ PhpAllocator(const PhpAllocator& other) throw() {}
 27+ template <class U> PhpAllocator(const PhpAllocator<U>&) throw() {}
 28+
 29+ // Allocate some memory from the PHP request pool
 30+ pointer allocate(size_type size, typename std::allocator<void>::const_pointer hint = 0) {
 31+ return (pointer)safe_emalloc(size, sizeof(T), 0);
 32+ }
 33+
 34+ // Free memory
 35+ void deallocate(pointer p, size_type n) {
 36+ return efree(p);
 37+ }
 38+};
 39+
 40+#endif
Property changes on: trunk/extensions/wikidiff2/php_cpp_allocator.h
___________________________________________________________________
Name: svn:eol-style
141 + native
Index: trunk/extensions/wikidiff2/test/test-a.diff
@@ -1,280 +1,280 @@
22 <tr>
3 - <td colspan="2" align="left"><strong><!--LINE 1--></strong></td>
4 - <td colspan="2" align="left"><strong><!--LINE 1--></strong></td>
 3+ <td colspan="2" class="diff-lineno"><!--LINE 1--></td>
 4+ <td colspan="2" class="diff-lineno"><!--LINE 1--></td>
55 </tr>
66 <tr>
7 - <td> </td>
8 - <td class="diff-context">== Added line ==</td>
9 - <td> </td>
10 - <td class="diff-context">== Added line ==</td>
 7+ <td class="diff-marker"> </td>
 8+ <td class="diff-context"><div>== Added line ==</div></td>
 9+ <td class="diff-marker"> </td>
 10+ <td class="diff-context"><div>== Added line ==</div></td>
1111 </tr>
1212 <tr>
13 - <td> </td>
 13+ <td class="diff-marker"> </td>
1414 <td class="diff-context"></td>
15 - <td> </td>
 15+ <td class="diff-marker"> </td>
1616 <td class="diff-context"></td>
1717 </tr>
1818 <tr>
1919 <td colspan="2">&nbsp;</td>
20 - <td>+</td>
21 - <td class="diff-addedline">sjgfkdjfgb</td>
 20+ <td class="diff-marker">+</td>
 21+ <td class="diff-addedline"><div>sjgfkdjfgb</div></td>
2222 </tr>
2323 <tr>
24 - <td> </td>
25 - <td class="diff-context">== Removed line ==</td>
26 - <td> </td>
27 - <td class="diff-context">== Removed line ==</td>
 24+ <td class="diff-marker"> </td>
 25+ <td class="diff-context"><div>== Removed line ==</div></td>
 26+ <td class="diff-marker"> </td>
 27+ <td class="diff-context"><div>== Removed line ==</div></td>
2828 </tr>
2929 <tr>
30 - <td> </td>
 30+ <td class="diff-marker"> </td>
3131 <td class="diff-context"></td>
32 - <td> </td>
 32+ <td class="diff-marker"> </td>
3333 <td class="diff-context"></td>
3434 </tr>
3535 <tr>
36 - <td>-</td>
37 - <td class="diff-deletedline">kjahegwnygw</td>
 36+ <td class="diff-marker">-</td>
 37+ <td class="diff-deletedline"><div>kjahegwnygw</div></td>
3838 <td colspan="2">&nbsp;</td>
3939 </tr>
4040 <tr>
41 - <td> </td>
42 - <td class="diff-context">== Moved text ==</td>
43 - <td> </td>
44 - <td class="diff-context">== Moved text ==</td>
 41+ <td class="diff-marker"> </td>
 42+ <td class="diff-context"><div>== Moved text ==</div></td>
 43+ <td class="diff-marker"> </td>
 44+ <td class="diff-context"><div>== Moved text ==</div></td>
4545 </tr>
4646 <tr>
47 - <td> </td>
48 - <td class="diff-context">a</td>
49 - <td> </td>
50 - <td class="diff-context">a</td>
 47+ <td class="diff-marker"> </td>
 48+ <td class="diff-context"><div>a</div></td>
 49+ <td class="diff-marker"> </td>
 50+ <td class="diff-context"><div>a</div></td>
5151 </tr>
5252 <tr>
53 - <td>-</td>
54 - <td class="diff-deletedline">---line---</td>
 53+ <td class="diff-marker">-</td>
 54+ <td class="diff-deletedline"><div>---line---</div></td>
5555 <td colspan="2">&nbsp;</td>
5656 </tr>
5757 <tr>
58 - <td> </td>
59 - <td class="diff-context">a</td>
60 - <td> </td>
61 - <td class="diff-context">a</td>
 58+ <td class="diff-marker"> </td>
 59+ <td class="diff-context"><div>a</div></td>
 60+ <td class="diff-marker"> </td>
 61+ <td class="diff-context"><div>a</div></td>
6262 </tr>
6363 <tr>
64 - <td> </td>
65 - <td class="diff-context">a</td>
66 - <td> </td>
67 - <td class="diff-context">a</td>
 64+ <td class="diff-marker"> </td>
 65+ <td class="diff-context"><div>a</div></td>
 66+ <td class="diff-marker"> </td>
 67+ <td class="diff-context"><div>a</div></td>
6868 </tr>
6969 <tr>
70 - <td colspan="2" align="left"><strong><!--LINE 13--></strong></td>
71 - <td colspan="2" align="left"><strong><!--LINE 12--></strong></td>
 70+ <td colspan="2" class="diff-lineno"><!--LINE 13--></td>
 71+ <td colspan="2" class="diff-lineno"><!--LINE 12--></td>
7272 </tr>
7373 <tr>
74 - <td> </td>
75 - <td class="diff-context">a</td>
76 - <td> </td>
77 - <td class="diff-context">a</td>
 74+ <td class="diff-marker"> </td>
 75+ <td class="diff-context"><div>a</div></td>
 76+ <td class="diff-marker"> </td>
 77+ <td class="diff-context"><div>a</div></td>
7878 </tr>
7979 <tr>
80 - <td> </td>
81 - <td class="diff-context">a</td>
82 - <td> </td>
83 - <td class="diff-context">a</td>
 80+ <td class="diff-marker"> </td>
 81+ <td class="diff-context"><div>a</div></td>
 82+ <td class="diff-marker"> </td>
 83+ <td class="diff-context"><div>a</div></td>
8484 </tr>
8585 <tr>
8686 <td colspan="2">&nbsp;</td>
87 - <td>+</td>
88 - <td class="diff-addedline">---line---</td>
 87+ <td class="diff-marker">+</td>
 88+ <td class="diff-addedline"><div>---line---</div></td>
8989 </tr>
9090 <tr>
91 - <td> </td>
92 - <td class="diff-context">a</td>
93 - <td> </td>
94 - <td class="diff-context">a</td>
 91+ <td class="diff-marker"> </td>
 92+ <td class="diff-context"><div>a</div></td>
 93+ <td class="diff-marker"> </td>
 94+ <td class="diff-context"><div>a</div></td>
9595 </tr>
9696 <tr>
97 - <td> </td>
98 - <td class="diff-context">a</td>
99 - <td> </td>
100 - <td class="diff-context">a</td>
 97+ <td class="diff-marker"> </td>
 98+ <td class="diff-context"><div>a</div></td>
 99+ <td class="diff-marker"> </td>
 100+ <td class="diff-context"><div>a</div></td>
101101 </tr>
102102 <tr>
103 - <td colspan="2" align="left"><strong><!--LINE 19--></strong></td>
104 - <td colspan="2" align="left"><strong><!--LINE 19--></strong></td>
 103+ <td colspan="2" class="diff-lineno"><!--LINE 19--></td>
 104+ <td colspan="2" class="diff-lineno"><!--LINE 19--></td>
105105 </tr>
106106 <tr>
107 - <td> </td>
108 - <td class="diff-context">a</td>
109 - <td> </td>
110 - <td class="diff-context">a</td>
 107+ <td class="diff-marker"> </td>
 108+ <td class="diff-context"><div>a</div></td>
 109+ <td class="diff-marker"> </td>
 110+ <td class="diff-context"><div>a</div></td>
111111 </tr>
112112 <tr>
113 - <td> </td>
114 - <td class="diff-context">a</td>
115 - <td> </td>
116 - <td class="diff-context">a</td>
 113+ <td class="diff-marker"> </td>
 114+ <td class="diff-context"><div>a</div></td>
 115+ <td class="diff-marker"> </td>
 116+ <td class="diff-context"><div>a</div></td>
117117 </tr>
118118 <tr>
119 - <td>-</td>
120 - <td class="diff-deletedline">--line1--</td>
 119+ <td class="diff-marker">-</td>
 120+ <td class="diff-deletedline"><div>--line1--</div></td>
121121 <td colspan="2">&nbsp;</td>
122122 </tr>
123123 <tr>
124 - <td>-</td>
125 - <td class="diff-deletedline">--line2--</td>
 124+ <td class="diff-marker">-</td>
 125+ <td class="diff-deletedline"><div>--line2--</div></td>
126126 <td colspan="2">&nbsp;</td>
127127 </tr>
128128 <tr>
129 - <td> </td>
130 - <td class="diff-context">a</td>
131 - <td> </td>
132 - <td class="diff-context">a</td>
 129+ <td class="diff-marker"> </td>
 130+ <td class="diff-context"><div>a</div></td>
 131+ <td class="diff-marker"> </td>
 132+ <td class="diff-context"><div>a</div></td>
133133 </tr>
134134 <tr>
135 - <td> </td>
136 - <td class="diff-context">a</td>
137 - <td> </td>
138 - <td class="diff-context">a</td>
 135+ <td class="diff-marker"> </td>
 136+ <td class="diff-context"><div>a</div></td>
 137+ <td class="diff-marker"> </td>
 138+ <td class="diff-context"><div>a</div></td>
139139 </tr>
140140 <tr>
141 - <td colspan="2" align="left"><strong><!--LINE 29--></strong></td>
142 - <td colspan="2" align="left"><strong><!--LINE 27--></strong></td>
 141+ <td colspan="2" class="diff-lineno"><!--LINE 29--></td>
 142+ <td colspan="2" class="diff-lineno"><!--LINE 27--></td>
143143 </tr>
144144 <tr>
145 - <td> </td>
146 - <td class="diff-context">a</td>
147 - <td> </td>
148 - <td class="diff-context">a</td>
 145+ <td class="diff-marker"> </td>
 146+ <td class="diff-context"><div>a</div></td>
 147+ <td class="diff-marker"> </td>
 148+ <td class="diff-context"><div>a</div></td>
149149 </tr>
150150 <tr>
151 - <td> </td>
152 - <td class="diff-context">a</td>
153 - <td> </td>
154 - <td class="diff-context">a</td>
 151+ <td class="diff-marker"> </td>
 152+ <td class="diff-context"><div>a</div></td>
 153+ <td class="diff-marker"> </td>
 154+ <td class="diff-context"><div>a</div></td>
155155 </tr>
156156 <tr>
157157 <td colspan="2">&nbsp;</td>
158 - <td>+</td>
159 - <td class="diff-addedline">--line1--</td>
 158+ <td class="diff-marker">+</td>
 159+ <td class="diff-addedline"><div>--line1--</div></td>
160160 </tr>
161161 <tr>
162162 <td colspan="2">&nbsp;</td>
163 - <td>+</td>
164 - <td class="diff-addedline">--line2--</td>
 163+ <td class="diff-marker">+</td>
 164+ <td class="diff-addedline"><div>--line2--</div></td>
165165 </tr>
166166 <tr>
167 - <td> </td>
168 - <td class="diff-context">a</td>
169 - <td> </td>
170 - <td class="diff-context">a</td>
 167+ <td class="diff-marker"> </td>
 168+ <td class="diff-context"><div>a</div></td>
 169+ <td class="diff-marker"> </td>
 170+ <td class="diff-context"><div>a</div></td>
171171 </tr>
172172 <tr>
173 - <td> </td>
174 - <td class="diff-context">a</td>
175 - <td> </td>
176 - <td class="diff-context">a</td>
 173+ <td class="diff-marker"> </td>
 174+ <td class="diff-context"><div>a</div></td>
 175+ <td class="diff-marker"> </td>
 176+ <td class="diff-context"><div>a</div></td>
177177 </tr>
178178 <tr>
179 - <td colspan="2" align="left"><strong><!--LINE 35--></strong></td>
180 - <td colspan="2" align="left"><strong><!--LINE 35--></strong></td>
 179+ <td colspan="2" class="diff-lineno"><!--LINE 35--></td>
 180+ <td colspan="2" class="diff-lineno"><!--LINE 35--></td>
181181 </tr>
182182 <tr>
183 - <td> </td>
184 - <td class="diff-context">a</td>
185 - <td> </td>
186 - <td class="diff-context">a</td>
 183+ <td class="diff-marker"> </td>
 184+ <td class="diff-context"><div>a</div></td>
 185+ <td class="diff-marker"> </td>
 186+ <td class="diff-context"><div>a</div></td>
187187 </tr>
188188 <tr>
189 - <td> </td>
190 - <td class="diff-context">== Shortest sequence in Y ==</td>
191 - <td> </td>
192 - <td class="diff-context">== Shortest sequence in Y ==</td>
 189+ <td class="diff-marker"> </td>
 190+ <td class="diff-context"><div>== Shortest sequence in Y ==</div></td>
 191+ <td class="diff-marker"> </td>
 192+ <td class="diff-context"><div>== Shortest sequence in Y ==</div></td>
193193 </tr>
194194 <tr>
195 - <td>-</td>
196 - <td class="diff-deletedline">x1</td>
 195+ <td class="diff-marker">-</td>
 196+ <td class="diff-deletedline"><div>x1</div></td>
197197 <td colspan="2">&nbsp;</td>
198198 </tr>
199199 <tr>
200 - <td> </td>
201 - <td class="diff-context">x2</td>
202 - <td> </td>
203 - <td class="diff-context">x2</td>
 200+ <td class="diff-marker"> </td>
 201+ <td class="diff-context"><div>x2</div></td>
 202+ <td class="diff-marker"> </td>
 203+ <td class="diff-context"><div>x2</div></td>
204204 </tr>
205205 <tr>
206 - <td> </td>
207 - <td class="diff-context">x1</td>
208 - <td> </td>
209 - <td class="diff-context">x1</td>
 206+ <td class="diff-marker"> </td>
 207+ <td class="diff-context"><div>x1</div></td>
 208+ <td class="diff-marker"> </td>
 209+ <td class="diff-context"><div>x1</div></td>
210210 </tr>
211211 <tr>
212 - <td> </td>
213 - <td class="diff-context">x2</td>
214 - <td> </td>
215 - <td class="diff-context">x2</td>
 212+ <td class="diff-marker"> </td>
 213+ <td class="diff-context"><div>x2</div></td>
 214+ <td class="diff-marker"> </td>
 215+ <td class="diff-context"><div>x2</div></td>
216216 </tr>
217217 <tr>
218 - <td> </td>
219 - <td class="diff-context">x1</td>
220 - <td> </td>
221 - <td class="diff-context">x1</td>
 218+ <td class="diff-marker"> </td>
 219+ <td class="diff-context"><div>x1</div></td>
 220+ <td class="diff-marker"> </td>
 221+ <td class="diff-context"><div>x1</div></td>
222222 </tr>
223223 <tr>
224 - <td>-</td>
225 - <td class="diff-deletedline">x2</td>
 224+ <td class="diff-marker">-</td>
 225+ <td class="diff-deletedline"><div>x2</div></td>
226226 <td colspan="2">&nbsp;</td>
227227 </tr>
228228 <tr>
229 - <td>-</td>
230 - <td class="diff-deletedline">x1</td>
 229+ <td class="diff-marker">-</td>
 230+ <td class="diff-deletedline"><div>x1</div></td>
231231 <td colspan="2">&nbsp;</td>
232232 </tr>
233233 <tr>
234 - <td>-</td>
235 - <td class="diff-deletedline">x2</td>
 234+ <td class="diff-marker">-</td>
 235+ <td class="diff-deletedline"><div>x2</div></td>
236236 <td colspan="2">&nbsp;</td>
237237 </tr>
238238 <tr>
239 - <td> </td>
240 - <td class="diff-context">context</td>
241 - <td> </td>
242 - <td class="diff-context">context</td>
 239+ <td class="diff-marker"> </td>
 240+ <td class="diff-context"><div>context</div></td>
 241+ <td class="diff-marker"> </td>
 242+ <td class="diff-context"><div>context</div></td>
243243 </tr>
244244 <tr>
245 - <td> </td>
246 - <td class="diff-context">context</td>
247 - <td> </td>
248 - <td class="diff-context">context</td>
 245+ <td class="diff-marker"> </td>
 246+ <td class="diff-context"><div>context</div></td>
 247+ <td class="diff-marker"> </td>
 248+ <td class="diff-context"><div>context</div></td>
249249 </tr>
250250 <tr>
251 - <td colspan="2" align="left"><strong><!--LINE 49--></strong></td>
252 - <td colspan="2" align="left"><strong><!--LINE 45--></strong></td>
 251+ <td colspan="2" class="diff-lineno"><!--LINE 49--></td>
 252+ <td colspan="2" class="diff-lineno"><!--LINE 45--></td>
253253 </tr>
254254 <tr>
255 - <td> </td>
256 - <td class="diff-context">context</td>
257 - <td> </td>
258 - <td class="diff-context">context</td>
 255+ <td class="diff-marker"> </td>
 256+ <td class="diff-context"><div>context</div></td>
 257+ <td class="diff-marker"> </td>
 258+ <td class="diff-context"><div>context</div></td>
259259 </tr>
260260 <tr>
261 - <td> </td>
262 - <td class="diff-context">== Changed line ==</td>
263 - <td> </td>
264 - <td class="diff-context">== Changed line ==</td>
 261+ <td class="diff-marker"> </td>
 262+ <td class="diff-context"><div>== Changed line ==</div></td>
 263+ <td class="diff-marker"> </td>
 264+ <td class="diff-context"><div>== Changed line ==</div></td>
265265 </tr>
266266 <tr>
267 - <td>-</td>
268 - <td class="diff-deletedline">
269 -blah blah blah <span class="diffchange">1</span>
270 - </td>
271 - <td>+</td>
272 - <td class="diff-addedline">
273 -blah blah blah <span class="diffchange">2</span>
274 - </td>
 267+ <td class="diff-marker">-</td>
 268+ <td class="diff-deletedline"><div>
 269+blah blah blah <span class="diffchange diffchange-inline">1</span>
 270+ </div></td>
 271+ <td class="diff-marker">+</td>
 272+ <td class="diff-addedline"><div>
 273+blah blah blah <span class="diffchange diffchange-inline">2</span>
 274+ </div></td>
275275 </tr>
276276 <tr>
277 - <td> </td>
 277+ <td class="diff-marker"> </td>
278278 <td class="diff-context"></td>
279 - <td> </td>
 279+ <td class="diff-marker"> </td>
280280 <td class="diff-context"></td>
281281 </tr>
Index: trunk/extensions/wikidiff2/test/test-b.diff
@@ -1,66 +1,66 @@
22 <tr>
3 - <td colspan="2" align="left"><strong><!--LINE 1--></strong></td>
4 - <td colspan="2" align="left"><strong><!--LINE 1--></strong></td>
 3+ <td colspan="2" class="diff-lineno"><!--LINE 1--></td>
 4+ <td colspan="2" class="diff-lineno"><!--LINE 1--></td>
55 </tr>
66 <tr>
7 - <td> </td>
8 - <td class="diff-context">== Shortest sequence in X ==</td>
9 - <td> </td>
10 - <td class="diff-context">== Shortest sequence in X ==</td>
 7+ <td class="diff-marker"> </td>
 8+ <td class="diff-context"><div>== Shortest sequence in X ==</div></td>
 9+ <td class="diff-marker"> </td>
 10+ <td class="diff-context"><div>== Shortest sequence in X ==</div></td>
1111 </tr>
1212 <tr>
1313 <td colspan="2">&nbsp;</td>
14 - <td>+</td>
15 - <td class="diff-addedline">x1</td>
 14+ <td class="diff-marker">+</td>
 15+ <td class="diff-addedline"><div>x1</div></td>
1616 </tr>
1717 <tr>
18 - <td> </td>
19 - <td class="diff-context">x2</td>
20 - <td> </td>
21 - <td class="diff-context">x2</td>
 18+ <td class="diff-marker"> </td>
 19+ <td class="diff-context"><div>x2</div></td>
 20+ <td class="diff-marker"> </td>
 21+ <td class="diff-context"><div>x2</div></td>
2222 </tr>
2323 <tr>
24 - <td> </td>
25 - <td class="diff-context">x1</td>
26 - <td> </td>
27 - <td class="diff-context">x1</td>
 24+ <td class="diff-marker"> </td>
 25+ <td class="diff-context"><div>x1</div></td>
 26+ <td class="diff-marker"> </td>
 27+ <td class="diff-context"><div>x1</div></td>
2828 </tr>
2929 <tr>
30 - <td> </td>
31 - <td class="diff-context">x2</td>
32 - <td> </td>
33 - <td class="diff-context">x2</td>
 30+ <td class="diff-marker"> </td>
 31+ <td class="diff-context"><div>x2</div></td>
 32+ <td class="diff-marker"> </td>
 33+ <td class="diff-context"><div>x2</div></td>
3434 </tr>
3535 <tr>
36 - <td> </td>
37 - <td class="diff-context">x1</td>
38 - <td> </td>
39 - <td class="diff-context">x1</td>
 36+ <td class="diff-marker"> </td>
 37+ <td class="diff-context"><div>x1</div></td>
 38+ <td class="diff-marker"> </td>
 39+ <td class="diff-context"><div>x1</div></td>
4040 </tr>
4141 <tr>
4242 <td colspan="2">&nbsp;</td>
43 - <td>+</td>
44 - <td class="diff-addedline">x2</td>
 43+ <td class="diff-marker">+</td>
 44+ <td class="diff-addedline"><div>x2</div></td>
4545 </tr>
4646 <tr>
4747 <td colspan="2">&nbsp;</td>
48 - <td>+</td>
49 - <td class="diff-addedline">x1</td>
 48+ <td class="diff-marker">+</td>
 49+ <td class="diff-addedline"><div>x1</div></td>
5050 </tr>
5151 <tr>
5252 <td colspan="2">&nbsp;</td>
53 - <td>+</td>
54 - <td class="diff-addedline">x2</td>
 53+ <td class="diff-marker">+</td>
 54+ <td class="diff-addedline"><div>x2</div></td>
5555 </tr>
5656 <tr>
57 - <td> </td>
58 - <td class="diff-context">context</td>
59 - <td> </td>
60 - <td class="diff-context">context</td>
 57+ <td class="diff-marker"> </td>
 58+ <td class="diff-context"><div>context</div></td>
 59+ <td class="diff-marker"> </td>
 60+ <td class="diff-context"><div>context</div></td>
6161 </tr>
6262 <tr>
63 - <td> </td>
64 - <td class="diff-context">context</td>
65 - <td> </td>
66 - <td class="diff-context">context</td>
 63+ <td class="diff-marker"> </td>
 64+ <td class="diff-context"><div>context</div></td>
 65+ <td class="diff-marker"> </td>
 66+ <td class="diff-context"><div>context</div></td>
6767 </tr>
Index: trunk/extensions/wikidiff2/wikidiff2.cpp
@@ -8,25 +8,28 @@
99 #include <stdio.h>
1010 #include <string.h>
1111 #include "wikidiff2.h"
 12+#include <thai/thailib.h>
 13+#include <thai/thwchar.h>
 14+#include <thai/thbrk.h>
1215
13 -void print_diff(std::vector<std::string> &text1, std::vector<std::string> &text2, int num_lines_context, std::string &ret)
 16+void Wikidiff2::diffLines(const StringVector & lines1, const StringVector & lines2,
 17+ int numContextLines)
1418 {
1519 // first do line-level diff
16 - Diff<std::string> linediff(text1, text2);
 20+ StringDiff linediff(lines1, lines2);
1721
1822 int ctx = 0;
19 - int from_ind = 1, to_ind = 1;
20 - int num_ops = linediff.size();
 23+ int from_index = 1, to_index = 1;
2124
2225 // Should a line number be printed before the next context line?
2326 // Set to true initially so we get a line number on line 1
24 - bool showLineNumber = true;
 27+ bool showLineNumber = true;
2528
26 - for (int i = 0; i < num_ops; ++i) {
 29+ for (int i = 0; i < linediff.size(); ++i) {
2730 int n, j, n1, n2;
2831 // Line 1 changed, show heading with no leading context
29 - if (linediff[i].op != DiffOp<std::string>::copy && i == 0) {
30 - ret +=
 32+ if (linediff[i].op != DiffOp<String>::copy && i == 0) {
 33+ result +=
3134 "<tr>\n"
3235 " <td colspan=\"2\" class=\"diff-lineno\"><!--LINE 1--></td>\n"
3336 " <td colspan=\"2\" class=\"diff-lineno\"><!--LINE 1--></td>\n"
@@ -34,76 +37,76 @@
3538 }
3639
3740 switch (linediff[i].op) {
38 - case DiffOp<std::string>::add:
 41+ case DiffOp<String>::add:
3942 // inserted lines
4043 n = linediff[i].to.size();
4144 for (j=0; j<n; j++) {
42 - print_add(*linediff[i].to[j], ret);
 45+ printAdd(*linediff[i].to[j]);
4346 }
44 - to_ind += n;
 47+ to_index += n;
4548 break;
46 - case DiffOp<std::string>::del:
 49+ case DiffOp<String>::del:
4750 // deleted lines
4851 n = linediff[i].from.size();
4952 for (j=0; j<n; j++) {
50 - print_del(*linediff[i].from[j], ret);
 53+ printDelete(*linediff[i].from[j]);
5154 }
52 - from_ind += n;
 55+ from_index += n;
5356 break;
54 - case DiffOp<std::string>::copy:
 57+ case DiffOp<String>::copy:
5558 // copy/context
5659 n = linediff[i].from.size();
5760 for (j=0; j<n; j++) {
58 - if ((i != 0 && j < num_lines_context) /*trailing*/
59 - || (i != num_ops - 1 && j >= n - num_lines_context)) /*leading*/ {
 61+ if ((i != 0 && j < numContextLines) /*trailing*/
 62+ || (i != linediff.size() - 1 && j >= n - numContextLines)) /*leading*/ {
6063 if (showLineNumber) {
6164 // Print Line: heading
6265 char buf[256]; // should be plenty
63 - sprintf(buf,
 66+ snprintf(buf, 256,
6467 "<tr>\n"
6568 " <td colspan=\"2\" class=\"diff-lineno\"><!--LINE %u--></td>\n"
6669 " <td colspan=\"2\" class=\"diff-lineno\"><!--LINE %u--></td>\n"
6770 "</tr>\n",
68 - from_ind, to_ind);
69 - ret += buf;
 71+ from_index, to_index);
 72+ result += buf;
7073 showLineNumber = false;
7174 }
7275 // Print context
73 - ret +=
 76+ result +=
7477 "<tr>\n"
7578 " <td class=\"diff-marker\"> </td>\n"
7679 " <td class=\"diff-context\">";
77 - print_div_htmlspecialchars(*linediff[i].from[j], ret);
78 - ret +=
 80+ printTextWithDiv(*linediff[i].from[j]);
 81+ result +=
7982 "</td>\n"
8083 " <td class=\"diff-marker\"> </td>\n"
8184 " <td class=\"diff-context\">";
82 - print_div_htmlspecialchars(*linediff[i].from[j], ret);
83 - ret += "</td>\n</tr>\n";
 85+ printTextWithDiv(*linediff[i].from[j]);
 86+ result += "</td>\n</tr>\n";
8487 } else {
8588 showLineNumber = true;
8689 }
87 - from_ind++;
88 - to_ind++;
 90+ from_index++;
 91+ to_index++;
8992 }
9093 break;
91 - case DiffOp<std::string>::change:
92 - // replace, ie. we do a word diff between the two sets of lines
 94+ case DiffOp<String>::change:
 95+ // replace, i.e. we do a word diff between the two sets of lines
9396 n1 = linediff[i].from.size();
9497 n2 = linediff[i].to.size();
9598 n = std::min(n1, n2);
9699 for (j=0; j<n; j++) {
97 - print_worddiff(*linediff[i].from[j], *linediff[i].to[j], ret);
 100+ printWordDiff(*linediff[i].from[j], *linediff[i].to[j]);
98101 }
99 - from_ind += n;
100 - to_ind += n;
 102+ from_index += n;
 103+ to_index += n;
101104 if (n1 > n2) {
102105 for (j=n2; j<n1; j++) {
103 - print_del(*linediff[i].from[j], ret);
 106+ printDelete(*linediff[i].from[j]);
104107 }
105108 } else {
106109 for (j=n1; j<n2; j++) {
107 - print_add(*linediff[i].to[j], ret);
 110+ printAdd(*linediff[i].to[j]);
108111 }
109112 }
110113 break;
@@ -113,98 +116,98 @@
114117 }
115118 }
116119
117 -void print_add(const std::string & line, std::string & ret)
 120+void Wikidiff2::printAdd(const String & line)
118121 {
119 - ret += "<tr>\n"
 122+ result += "<tr>\n"
120123 " <td colspan=\"2\">&nbsp;</td>\n"
121124 " <td class=\"diff-marker\">+</td>\n"
122125 " <td class=\"diff-addedline\">";
123 - print_div_htmlspecialchars(line, ret);
124 - ret += "</td>\n</tr>\n";
 126+ printTextWithDiv(line);
 127+ result += "</td>\n</tr>\n";
125128 }
126129
127 -void print_del(const std::string & line, std::string & ret)
 130+void Wikidiff2::printDelete(const String & line)
128131 {
129 - ret += "<tr>\n"
 132+ result += "<tr>\n"
130133 " <td class=\"diff-marker\">-</td>\n"
131134 " <td class=\"diff-deletedline\">";
132 - print_div_htmlspecialchars(line, ret);
133 - ret += "</td>\n"
 135+ printTextWithDiv(line);
 136+ result += "</td>\n"
134137 " <td colspan=\"2\">&nbsp;</td>\n"
135138 "</tr>\n";
136139 }
137140
138 -void print_worddiff(const std::string & text1, const std::string & text2, std::string &ret)
 141+void Wikidiff2::printWordDiff(const String & text1, const String & text2)
139142 {
140 - std::vector<Word> text1_words, text2_words;
 143+ WordVector words1, words2;
141144
142 - split_tokens(text1, text1_words);
143 - split_tokens(text2, text2_words);
144 - Diff<Word> worddiff(text1_words, text2_words);
 145+ explodeWords(text1, words1);
 146+ explodeWords(text2, words2);
 147+ WordDiff worddiff(words1, words2);
145148
146 - //debug_print_worddiff(worddiff, ret);
 149+ //debugPrintWordDiff(worddiff);
147150
148151 // print twice; first for left side, then for right side
149 - ret += "<tr>\n"
 152+ result += "<tr>\n"
150153 " <td class=\"diff-marker\">-</td>\n"
151154 " <td class=\"diff-deletedline\"><div>\n";
152 - print_worddiff_side(worddiff, false, ret);
153 - ret += "\n </div></td>\n"
 155+ printWordDiffSide(worddiff, false);
 156+ result += "\n </div></td>\n"
154157 " <td class=\"diff-marker\">+</td>\n"
155158 " <td class=\"diff-addedline\"><div>\n";
156 - print_worddiff_side(worddiff, true, ret);
157 - ret += "\n </div></td>\n"
 159+ printWordDiffSide(worddiff, true);
 160+ result += "\n </div></td>\n"
158161 "</tr>\n";
159162 }
160163
161 -void debug_print_worddiff(Diff<Word> &worddiff, std::string &ret)
 164+void Wikidiff2::debugPrintWordDiff(WordDiff & worddiff)
162165 {
163166 for (unsigned i = 0; i < worddiff.size(); ++i) {
164167 DiffOp<Word> & op = worddiff[i];
165168 switch (op.op) {
166169 case DiffOp<Word>::copy:
167 - ret += "Copy\n";
 170+ result += "Copy\n";
168171 break;
169172 case DiffOp<Word>::del:
170 - ret += "Delete\n";
 173+ result += "Delete\n";
171174 break;
172175 case DiffOp<Word>::add:
173 - ret += "Add\n";
 176+ result += "Add\n";
174177 break;
175178 case DiffOp<Word>::change:
176 - ret += "Change\n";
 179+ result += "Change\n";
177180 break;
178181 }
179 - ret += "From: ";
 182+ result += "From: ";
180183 bool first = true;
181184 for (int j=0; j<op.from.size(); j++) {
182185 if (first) {
183186 first = false;
184187 } else {
185 - ret += ", ";
 188+ result += ", ";
186189 }
187 - ret += "(";
188 - ret += op.from[j]->whole() + ")";
 190+ result += "(";
 191+ result += op.from[j]->whole() + ")";
189192 }
190 - ret += "\n";
191 - ret += "To: ";
 193+ result += "\n";
 194+ result += "To: ";
192195 first = true;
193196 for (int j=0; j<op.to.size(); j++) {
194197 if (first) {
195198 first = false;
196199 } else {
197 - ret += ", ";
 200+ result += ", ";
198201 }
199 - ret += "(";
200 - ret += op.to[j]->whole() + ")";
 202+ result += "(";
 203+ result += op.to[j]->whole() + ")";
201204 }
202 - ret += "\n\n";
 205+ result += "\n\n";
203206 }
204207 }
205208
206 -void print_worddiff_side(Diff<Word> &worddiff, bool added, std::string &ret)
 209+void Wikidiff2::printWordDiffSide(WordDiff &worddiff, bool added)
207210 {
208 - std::string word;
 211+ String word;
209212 for (unsigned i = 0; i < worddiff.size(); ++i) {
210213 DiffOp<Word> & op = worddiff[i];
211214 int n, j;
@@ -213,106 +216,79 @@
214217 if (added) {
215218 for (j=0; j<n; j++) {
216219 op.to[j]->get_whole(word);
217 - print_htmlspecialchars(word, ret);
 220+ printText(word);
218221 }
219222 } else {
220223 for (j=0; j<n; j++) {
221224 op.from[j]->get_whole(word);
222 - print_htmlspecialchars(word, ret);
 225+ printText(word);
223226 }
224227 }
225228 } else if (!added && (op.op == DiffOp<Word>::del || op.op == DiffOp<Word>::change)) {
226229 n = op.from.size();
227 - ret += "<span class=\"diffchange diffchange-inline\">";
 230+ result += "<span class=\"diffchange diffchange-inline\">";
228231 for (j=0; j<n; j++) {
229232 op.from[j]->get_whole(word);
230 - print_htmlspecialchars(word, ret);
 233+ printText(word);
231234 }
232 - ret += "</span>";
 235+ result += "</span>";
233236 } else if (added && (op.op == DiffOp<Word>::add || op.op == DiffOp<Word>::change)) {
234237 n = op.to.size();
235 - ret += "<span class=\"diffchange diffchange-inline\">";
 238+ result += "<span class=\"diffchange diffchange-inline\">";
236239 for (j=0; j<n; j++) {
237240 op.to[j]->get_whole(word);
238 - print_htmlspecialchars(word, ret);
 241+ printText(word);
239242 }
240 - ret += "</span>";
 243+ result += "</span>";
241244 }
242245 }
243246 }
244247
245 -void print_div_htmlspecialchars(const std::string & input, std::string & ret)
 248+void Wikidiff2::printTextWithDiv(const String & input)
246249 {
247250 // Wrap string in a <div> if it's not empty
248251 if (input.size() > 0) {
249 - ret.append("<div>");
250 - print_htmlspecialchars(input, ret);
251 - ret.append("</div>");
 252+ result.append("<div>");
 253+ printText(input);
 254+ result.append("</div>");
252255 }
253256 }
254257
255 -void print_htmlspecialchars(const std::string & input, std::string & ret)
 258+void Wikidiff2::printText(const String & input)
256259 {
257260 size_t start = 0;
258261 size_t end = input.find_first_of("<>&");
259 - while (end != std::string::npos) {
 262+ while (end != String::npos) {
260263 if (end > start) {
261 - ret.append(input, start, end - start);
 264+ result.append(input, start, end - start);
262265 }
263266 switch (input[end]) {
264267 case '<':
265 - ret.append("&lt;");
 268+ result.append("&lt;");
266269 break;
267270 case '>':
268 - ret.append("&gt;");
 271+ result.append("&gt;");
269272 break;
270273 default /*case '&'*/:
271 - ret.append("&amp;");
 274+ result.append("&amp;");
272275 }
273276 start = end + 1;
274277 end = input.find_first_of("<>&", start);
275278 }
276279 // Append the rest of the string after the last special character
277280 if (start < input.size()) {
278 - ret.append(input, start, input.size() - start);
 281+ result.append(input, start, input.size() - start);
279282 }
280283 }
281284
282 -
283 -inline bool my_istext(int ch)
284 -{
285 - // Standard alphanumeric
286 - if ((ch >= '0' && ch <= '9') ||
287 - (ch == '_') ||
288 - (ch >= 'A' && ch <= 'Z') ||
289 - (ch >= 'a' && ch <= 'z'))
290 - {
291 - return true;
292 - }
293 - // Punctuation and control characters
294 - if (ch < 0xc0) return false;
295 - // Thai, return false so it gets split up
296 - if (ch >= 0xe00 && ch <= 0xee7) return false;
297 - // Chinese/Japanese, same
298 - if (ch >= 0x3000 && ch <= 0x9fff) return false;
299 - if (ch >= 0x20000 && ch <= 0x2a000) return false;
300 - // Otherwise assume it's from a language that uses spaces
301 - return true;
302 -}
303 -
304 -inline bool my_isspace(int ch)
305 -{
306 - return ch == ' ' || ch == '\t';
307 -}
308 -
309285 // Weak UTF-8 decoder
310286 // Will return garbage on invalid input (overshort sequences, overlong sequences, etc.)
311 -int next_utf8_char(std::string::const_iterator & p, std::string::const_iterator & charStart,
312 - std::string::const_iterator end)
 287+int Wikidiff2::nextUtf8Char(String::const_iterator & p, String::const_iterator & charStart,
 288+ String::const_iterator end)
313289 {
314290 int c;
315291 unsigned char byte;
316 - int bytes = 0;
 292+ int seqLength = 0;
317293 charStart = p;
318294 if (p == end) {
319295 return 0;
@@ -321,116 +297,155 @@
322298 byte = (unsigned char)*p;
323299 if (byte < 0x80) {
324300 c = byte;
325 - bytes = 0;
 301+ seqLength = 0;
326302 } else if (byte >= 0xc0) {
327303 // Start of UTF-8 character
328304 // If this is unexpected, due to an overshort sequence, we ignore the invalid
329305 // sequence and resynchronise here
330 - if (byte < 0xe0) {
331 - bytes = 1;
 306+ if (byte < 0xe0) {
 307+ seqLength = 1;
332308 c = byte & 0x1f;
333309 } else if (byte < 0xf0) {
334 - bytes = 2;
 310+ seqLength = 2;
335311 c = byte & 0x0f;
336312 } else {
337 - bytes = 3;
 313+ seqLength = 3;
338314 c = byte & 7;
339315 }
340 - } else if (bytes) {
 316+ } else if (seqLength) {
341317 c <<= 6;
342318 c |= byte & 0x3f;
343 - --bytes;
 319+ --seqLength;
344320 } else {
345321 // Unexpected continuation, ignore
346322 }
347323 ++p;
348 - } while (bytes && p != end);
 324+ } while (seqLength && p != end);
349325 return c;
350326 }
351327
352 -// split a string into multiple tokens
353 -void split_tokens(const std::string & text, std::vector<Word> &tokens)
 328+// Split a string into words
 329+void Wikidiff2::explodeWords(const String & text, WordVector &words)
354330 {
355331 // Don't try to do a word-level diff on very long lines
356332 if (text.size() > MAX_DIFF_LINE) {
357 - tokens.push_back(Word(text.begin(), text.end(), text.end()));
 333+ words.push_back(Word(text.begin(), text.end(), text.end()));
358334 return;
359335 }
360 -
361 - std::string body, suffix;
362 - std::string::const_iterator bodyStart, bodyEnd, suffixEnd, charStart, p;
363 - int ch;
 336+
 337+ // Decode the UTF-8 in the string.
 338+ // * Save the character sizes (in bytes)
 339+ // * Convert the string to TIS-620, which is the internal character set of libthai.
 340+ // * Save the character offsets of any break positions (same format as libthai).
 341+
 342+ String tisText, charSizes;
 343+ String::const_iterator suffixEnd, charStart, p;
 344+ IntSet breaks;
 345+
 346+ tisText.reserve(text.size());
 347+ charSizes.reserve(text.size());
 348+ wchar_t ch, lastChar;
 349+ thchar_t thaiChar;
 350+ bool hasThaiChars = false;
 351+
364352 p = text.begin();
365 - ch = next_utf8_char(p, charStart, text.end());
 353+ ch = nextUtf8Char(p, charStart, text.end());
 354+ lastChar = 0;
 355+ int charIndex = 0;
366356 while (ch) {
367 - // first group has three different opportunities:
368 - if (my_isspace(ch)) {
369 - // one or more whitespace characters
370 - bodyStart = charStart;
371 - while (my_isspace(ch)) {
372 - ch = next_utf8_char(p, charStart, text.end());
373 - }
374 - bodyEnd = charStart;
375 - } else if (my_istext(ch)) {
376 - // one or more text characters
377 - bodyStart = charStart;
378 - while (my_istext(ch)) {
379 - ch = next_utf8_char(p, charStart, text.end());
380 - }
381 - bodyEnd = charStart;
382 - } else {
383 - // one character, no matter what it is
384 - bodyStart = charStart;
385 - bodyEnd = p;
386 - ch = next_utf8_char(p, charStart, text.end());
 357+ thaiChar = th_uni2tis(ch);
 358+ if (thaiChar >= 0x80 && thaiChar != THCHAR_ERR) {
 359+ hasThaiChars = true;
387360 }
388 -
389 - // second group: any whitespace character
390 - while (my_isspace(ch)) {
391 - ch = next_utf8_char(p, charStart, text.end());
 361+ tisText += (char)thaiChar;
 362+ charSizes += (char)(p - charStart);
 363+
 364+ if (!isSpace(ch) && lastChar && isSpace(lastChar)) {
 365+ breaks.insert(charIndex);
 366+ } else if (isChineseJapanese(ch)) {
 367+ breaks.insert(charIndex);
392368 }
393 - suffixEnd = charStart;
394 - tokens.push_back(Word(bodyStart, bodyEnd, suffixEnd));
 369+ charIndex++;
 370+ lastChar = ch;
 371+ ch = nextUtf8Char(p, charStart, text.end());
395372 }
 373+
 374+ // If there were any Thai characters in the string, run th_brk on it and add
 375+ // the resulting break positions
 376+ if (hasThaiChars) {
 377+ IntVector thaiBreakPositions;
 378+ tisText += '\0';
 379+ thaiBreakPositions.resize(tisText.size());
 380+ int numBreaks = th_brk((const thchar_t*)(tisText.data()),
 381+ &thaiBreakPositions[0], thaiBreakPositions.size());
 382+ thaiBreakPositions.resize(numBreaks);
 383+ breaks.insert(thaiBreakPositions.begin(), thaiBreakPositions.end());
 384+ }
 385+
 386+ // Now make the word array by traversing the breaks set
 387+ p = text.begin();
 388+ IntSet::iterator pBrk = breaks.begin();
 389+ String::const_iterator wordStart = text.begin();
 390+ String::const_iterator suffixStart = text.end();
 391+
 392+ // If there's a break at the start of the string, skip it
 393+ if (pBrk != breaks.end() && *pBrk == 0) {
 394+ pBrk++;
 395+ }
 396+
 397+ // Add a fake end-of-string character and have a break on it, so that the
 398+ // last word gets added without special handling
 399+ breaks.insert(charSizes.size());
 400+ charSizes += (char)0;
 401+
 402+ for (charIndex = 0; charIndex < charSizes.size(); p += charSizes[charIndex++]) {
 403+ // Assume all spaces are ASCII
 404+ if (isSpace(*p)) {
 405+ suffixStart = p;
 406+ }
 407+ if (pBrk != breaks.end() && charIndex == *pBrk) {
 408+ if (suffixStart == text.end()) {
 409+ words.push_back(Word(wordStart, p, p));
 410+ } else {
 411+ words.push_back(Word(wordStart, suffixStart, p));
 412+ }
 413+ pBrk++;
 414+ suffixStart = text.end();
 415+ wordStart = p;
 416+ }
 417+ }
396418 }
397419
398 -void line_explode(const char *text, std::vector<std::string> &lines)
 420+void Wikidiff2::explodeLines(const String & text, StringVector &lines)
399421 {
400 - const char *ptr = text;
401 - while (*ptr) {
402 - const char *ptr2 = strchr(ptr, '\n');
403 - if (ptr2 == NULL)
404 - ptr2 = ptr + strlen(ptr);
405 -
406 - lines.push_back(std::string(ptr, ptr2));
407 -
 422+ String::const_iterator ptr = text.begin();
 423+ while (ptr != text.end()) {
 424+ String::const_iterator ptr2 = std::find(ptr, text.end(), '\n');
 425+ lines.push_back(String(ptr, ptr2));
 426+
408427 ptr = ptr2;
409 - if (*ptr)
 428+ if (ptr != text.end()) {
410429 ++ptr;
 430+ }
411431 }
412432 }
413433
414 -// Finally, the entry point for the PHP code.
415 -const char *wikidiff2_do_diff(const char *text1, const char *text2, int num_lines_context)
 434+const Wikidiff2::String & Wikidiff2::execute(const String & text1, const String & text2, int numContextLines)
416435 {
417 - try {
418 - std::vector<std::string> lines1;
419 - std::vector<std::string> lines2;
420 - std::string ret;
421 -
422 - // constant reallocation is bad for performance (note: we might want to reduce this
423 - // later, it might be too much)
424 - ret.reserve(strlen(text1) + strlen(text2) + 10000);
425 -
426 - line_explode(text1, lines1);
427 - line_explode(text2, lines2);
428 - print_diff(lines1, lines2, num_lines_context, ret);
429 -
430 - return strdup(ret.c_str());
431 - } catch (std::bad_alloc &e) {
432 - return strdup("Out of memory in diff.");
433 - } catch (...) {
434 - return strdup("Unknown exception in diff.");
435 - }
 436+ // Allocate some result space to avoid excessive copying
 437+ result.clear();
 438+ result.reserve(text1.size() + text2.size() + 10000);
 439+
 440+ // Split input strings into lines
 441+ StringVector lines1;
 442+ StringVector lines2;
 443+ explodeLines(text1, lines1);
 444+ explodeLines(text2, lines2);
 445+
 446+ // Do the diff
 447+ diffLines(lines1, lines2, numContextLines);
 448+
 449+ // Return a reference to the result buffer
 450+ return result;
436451 }
437452
Index: trunk/extensions/wikidiff2/Word.h
@@ -3,6 +3,7 @@
44
55 #include <string>
66 #include <algorithm>
 7+#include "wikidiff2.h"
78
89 // a small class to accomodate word-level diffs; basically, a body and an
910 // optional suffix (the latter consisting of a single whitespace), where
@@ -13,7 +14,8 @@
1415 // not be changed or destroyed.
1516 class Word {
1617 public:
17 - typedef std::string::const_iterator Iterator;
 18+ typedef std::basic_string<char, std::char_traits<char>, WD2_ALLOCATOR<char> > String;
 19+ typedef String::const_iterator Iterator;
1820
1921 Iterator bodyStart;
2022 Iterator bodyEnd;
@@ -39,22 +41,22 @@
4042 }
4143
4244 // Get the whole word as a string
43 - std::string whole() const {
44 - std::string w;
 45+ String whole() const {
 46+ String w;
4547 get_whole(w);
4648 return w;
4749 }
4850
4951 // Assign the whole word to a string
50 - void get_whole(std::string & w) const {
 52+ void get_whole(String & w) const {
5153 // Do it with swap() to avoid a second copy
52 - std::string temp(bodyStart, suffixEnd);
 54+ String temp(bodyStart, suffixEnd);
5355 temp.swap(w);
5456 }
5557
5658 // Get the body as a string
57 - operator std::string() const {
58 - return std::string(bodyStart, bodyEnd);
 59+ operator String() const {
 60+ return String(bodyStart, bodyEnd);
5961 }
6062 };
6163
Index: trunk/extensions/wikidiff2/tests/001.phpt
@@ -0,0 +1,21 @@
 2+--TEST--
 3+Check for wikidiff2 presence
 4+--SKIPIF--
 5+<?php if (!extension_loaded("wikidiff2")) print "skip"; ?>
 6+--FILE--
 7+<?php
 8+echo "wikidiff2 extension is available";
 9+/*
 10+ you can add regression tests for your extension here
 11+
 12+ the output of your test code has to be equal to the
 13+ text in the --EXPECT-- section below for the tests
 14+ to pass, differences between the output and the
 15+ expected text are interpreted as failure
 16+
 17+ see php5/README.TESTING for further information on
 18+ writing regression tests
 19+*/
 20+?>
 21+--EXPECT--
 22+wikidiff2 extension is available
Index: trunk/extensions/wikidiff2/DiffEngine.h
@@ -18,6 +18,8 @@
1919 #include "JudyHS.h"
2020 #endif
2121
 22+#include "wikidiff2.h"
 23+
2224 /**
2325 * Diff operation
2426 *
@@ -32,27 +34,31 @@
3335 * not be the same length.
3436 */
3537 template<typename T>
36 -class DiffOp
 38+class DiffOp
3739 {
3840 public:
39 - DiffOp(int op_, const std::vector<const T*> & from_, const std::vector<const T*> & to_)
 41+ typedef std::vector<const T*, WD2_ALLOCATOR<const T*> > PointerVector;
 42+ DiffOp(int op_, const PointerVector & from_, const PointerVector & to_)
4043 : op(op_), from(from_), to(to_) {}
4144
4245 enum {copy, del, add, change};
4346 int op;
44 - std::vector<const T*> from;
45 - std::vector<const T*> to;
 47+ PointerVector from;
 48+ PointerVector to;
4649 };
4750
4851 /**
49 - * Basic diff template class. After construction, edits will contain a vector of DiffOp
 52+ * Basic diff template class. After construction, edits will contain a vector of DiffOpTemplate
5053 * objects representing the diff
5154 */
5255 template<typename T>
5356 class Diff
5457 {
5558 public:
56 - Diff(const std::vector<T> & from_lines, const std::vector<T> & to_lines);
 59+ typedef std::vector<T, WD2_ALLOCATOR<T> > ValueVector;
 60+ typedef std::vector<DiffOp<T>, WD2_ALLOCATOR<T> > DiffOpVector;
 61+
 62+ Diff(const ValueVector & from_lines, const ValueVector & to_lines);
5763
5864 virtual void add_edit(const DiffOp<T> & edit) {
5965 edits.push_back(edit);
@@ -60,7 +66,7 @@
6167 unsigned size() { return edits.size(); }
6268 DiffOp<T> & operator[](int i) {return edits[i];}
6369
64 - std::vector<DiffOp<T> > edits;
 70+ DiffOpVector edits;
6571 };
6672 /**
6773 * Class used internally by Diff to actually compute the diffs.
@@ -88,23 +94,45 @@
8995 class _DiffEngine
9096 {
9197 public:
 98+ // Vectors
 99+ typedef std::vector<bool> BoolVector; // skip the allocator here to get the specialisation
 100+ typedef std::vector<const T*, WD2_ALLOCATOR<const T*> > PointerVector;
 101+ typedef std::vector<T, WD2_ALLOCATOR<T> > ValueVector;
 102+ typedef std::vector<int, WD2_ALLOCATOR<int> > IntVector;
 103+ typedef std::vector<std::pair<int, int>, WD2_ALLOCATOR<std::pair<int, int> > > IntPairVector;
 104+
 105+ // Maps
 106+#ifdef USE_JUDY
 107+ typedef JudyHS<IntVector> MatchesMap;
 108+#else
 109+ typedef std::map<T, IntVector, std::less<T>, WD2_ALLOCATOR<IntVector> > MatchesMap;
 110+#endif
 111+
 112+ // Sets
 113+ typedef std::set<int, std::less<int>, WD2_ALLOCATOR<int> > IntSet;
 114+#ifdef USE_JUDY
 115+ typedef JudySet ValueSet;
 116+#else
 117+ typedef std::set<T, std::less<T>, WD2_ALLOCATOR<T> > ValueSet;
 118+#endif
 119+
92120 _DiffEngine() : done(false) {}
93121 void clear();
94 - void diff (const std::vector<T> & from_lines,
95 - const std::vector<T> & to_lines, Diff<T> & diff);
 122+ void diff (const ValueVector & from_lines,
 123+ const ValueVector & to_lines, Diff<T> & diff);
96124 int _lcs_pos (int ypos);
97125 void _compareseq (int xoff, int xlim, int yoff, int ylim);
98 - void _shift_boundaries (const std::vector<T> & lines, std::vector<bool> & changed,
99 - const std::vector<bool> & other_changed);
 126+ void _shift_boundaries (const ValueVector & lines, BoolVector & changed,
 127+ const BoolVector & other_changed);
100128 protected:
101129 int _diag (int xoff, int xlim, int yoff, int ylim, int nchunks,
102 - std::vector<std::pair<int, int> > & seps);
 130+ IntPairVector & seps);
103131
104 - std::vector<bool> xchanged, ychanged;
105 - std::vector<const T*> xv, yv;
106 - std::vector<int> xind, yind;
107 - std::map<int, int> seq;
108 - std::set<int> in_seq;
 132+ BoolVector xchanged, ychanged;
 133+ PointerVector xv, yv;
 134+ IntVector xind, yind;
 135+ IntVector seq;
 136+ IntSet in_seq;
109137 int lcs;
110138 bool done;
111139 enum {MAX_CHUNKS=8};
@@ -128,8 +156,8 @@
129157 }
130158
131159 template<typename T>
132 -void _DiffEngine<T>::diff (const std::vector<T> & from_lines,
133 - const std::vector<T> & to_lines, Diff<T> & diff)
 160+void _DiffEngine<T>::diff (const ValueVector & from_lines,
 161+ const ValueVector & to_lines, Diff<T> & diff)
134162 {
135163 int n_from = (int)from_lines.size();
136164 int n_to = (int)to_lines.size();
@@ -140,6 +168,7 @@
141169 }
142170 xchanged.resize(n_from);
143171 ychanged.resize(n_to);
 172+ seq.resize(std::max(n_from, n_to) + 1);
144173
145174 // Skip leading common lines.
146175 int skip, endskip;
@@ -157,11 +186,7 @@
158187 }
159188
160189 // Ignore lines which do not exist in both files.
161 -#ifdef USE_JUDY
162 - JudySet xhash, yhash;
163 -#else
164 - std::set<T> xhash, yhash;
165 -#endif
 190+ ValueSet xhash, yhash;
166191 for (xi = skip; xi < n_from - endskip; xi++) {
167192 xhash.insert(from_lines[xi]);
168193 }
@@ -196,9 +221,9 @@
197222 assert(xi < n_from || ychanged[yi]);
198223
199224 // Skip matching "snake".
200 - std::vector<const T*> del;
201 - std::vector<const T*> add;
202 - std::vector<const T*> empty;
 225+ PointerVector del;
 226+ PointerVector add;
 227+ PointerVector empty;
203228 while (xi < n_from && yi < n_to && !xchanged[xi] && !ychanged[yi]) {
204229 del.push_back(&from_lines[xi]);
205230 add.push_back(&to_lines[yi]);
@@ -247,19 +272,13 @@
248273 */
249274 template <typename T>
250275 int _DiffEngine<T>::_diag (int xoff, int xlim, int yoff, int ylim, int nchunks,
251 - std::vector<std::pair<int, int> > & seps)
 276+ IntPairVector & seps)
252277 {
253 - using std::vector;
254278 using std::swap;
255279 using std::make_pair;
256 - using std::map;
257280 using std::copy;
258281 bool flip = false;
259 -#ifdef USE_JUDY
260 - JudyHS<vector<int> > ymatches;
261 -#else
262 - map<T, vector<int> > ymatches;
263 -#endif
 282+ MatchesMap ymatches;
264283
265284 if (xlim - xoff > ylim - yoff) {
266285 // Things seems faster (I'm not sure I understand why)
@@ -282,7 +301,7 @@
283302 in_seq.clear();
284303
285304 // 2-d array, line major, chunk minor
286 - vector<int> ymids(nlines * nchunks);
 305+ IntVector ymids(nlines * nchunks);
287306
288307 int numer = xlim - xoff + nchunks - 1;
289308 int x = xoff, x1, y1;
@@ -295,16 +314,16 @@
296315 for ( ; x < x1; x++) {
297316 const T & line = flip ? *yv[x] : *xv[x];
298317 #ifdef USE_JUDY
299 - vector<int> * pMatches = ymatches.Get(line);
 318+ IntVector * pMatches = ymatches.Get(line);
300319 if (!pMatches)
301320 continue;
302321 #else
303 - typename map<T, vector<int> >::iterator iter = ymatches.find(line);
 322+ typename MatchesMap::iterator iter = ymatches.find(line);
304323 if (iter == ymatches.end())
305324 continue;
306 - vector<int> * pMatches = &(iter->second);
 325+ IntVector * pMatches = &(iter->second);
307326 #endif
308 - vector<int>::iterator y;
 327+ IntVector::iterator y;
309328 int k = 0;
310329
311330 for (y = pMatches->begin(); y != pMatches->end(); ++y) {
@@ -339,7 +358,7 @@
340359 seps.resize(nchunks + 1);
341360
342361 seps[0] = flip ? make_pair(yoff, xoff) : make_pair(xoff, yoff);
343 - vector<int>::iterator ymid = ymids.begin() + lcs * nchunks;
 362+ IntVector::iterator ymid = ymids.begin() + lcs * nchunks;
344363 for (int n = 0; n < nchunks - 1; n++) {
345364 x1 = xoff + (numer + (xlim - xoff) * n) / nchunks;
346365 y1 = ymid[n] + 1;
@@ -388,10 +407,9 @@
389408 */
390409 template <typename T>
391410 void _DiffEngine<T>::_compareseq (int xoff, int xlim, int yoff, int ylim) {
392 - using std::vector;
393411 using std::pair;
394412
395 - vector<pair<int, int> > seps;
 413+ IntPairVector seps;
396414 int lcs;
397415
398416 // Slide down the bottom initial diagonal.
@@ -425,7 +443,7 @@
426444 xchanged[xind[xoff++]] = true;
427445 } else {
428446 // Use the partitions to split this problem into subproblems.
429 - vector<pair<int, int> >::iterator pt1, pt2;
 447+ IntPairVector::iterator pt1, pt2;
430448 pt1 = pt2 = seps.begin();
431449 while (++pt2 != seps.end()) {
432450 _compareseq (pt1->first, pt2->first, pt1->second, pt2->second);
@@ -447,8 +465,8 @@
448466 * This is extracted verbatim from analyze.c (GNU diffutils-2.7).
449467 */
450468 template <typename T>
451 -void _DiffEngine<T>::_shift_boundaries (const std::vector<T> & lines, std::vector<bool> & changed,
452 - const std::vector<bool> & other_changed)
 469+void _DiffEngine<T>::_shift_boundaries (const ValueVector & lines, BoolVector & changed,
 470+ const BoolVector & other_changed)
453471 {
454472 int i = 0;
455473 int j = 0;
@@ -553,7 +571,7 @@
554572 //-----------------------------------------------------------------------------
555573
556574 template<typename T>
557 -Diff<T>::Diff(const std::vector<T> & from_lines, const std::vector<T> & to_lines)
 575+Diff<T>::Diff(const ValueVector & from_lines, const ValueVector & to_lines)
558576 {
559577 _DiffEngine<T> engine;
560578 engine.diff(from_lines, to_lines, *this);
Index: trunk/extensions/wikidiff2/config.m4
@@ -0,0 +1,40 @@
 2+dnl $Id$
 3+
 4+dnl if test -z "$CXX"; then
 5+dnl AC_MSG_ERROR([PHP is bugged. Set \$CXX to a C++ compiler.])
 6+dnl fi
 7+
 8+PHP_ARG_ENABLE(wikidiff2, whether to enable wikidiff2 support,
 9+[ --enable-wikidiff2 Enable wikidiff2 support])
 10+
 11+if test "$PHP_WIKIDIFF2" != "no"; then
 12+ PHP_REQUIRE_CXX
 13+ AC_LANG_CPLUSPLUS
 14+ PHP_ADD_LIBRARY(stdc++,,WIKIDIFF2_SHARED_LIBADD)
 15+
 16+ if test -z "$PKG_CONFIG"
 17+ then
 18+ AC_PATH_PROG(PKG_CONFIG, pkg-config, no)
 19+ fi
 20+ if test "$PKG_CONFIG" = "no"
 21+ then
 22+ AC_MSG_ERROR([required utility 'pkg-config' not found])
 23+ fi
 24+
 25+ if ! $PKG_CONFIG --exists libthai
 26+ then
 27+ AC_MSG_ERROR(['libthai' not known to pkg-config])
 28+ fi
 29+
 30+ PHP_EVAL_INCLINE(`$PKG_CONFIG --cflags-only-I libthai`)
 31+ PHP_EVAL_LIBLINE(`$PKG_CONFIG --libs libthai`, WIKIDIFF2_SHARED_LIBADD)
 32+
 33+ export OLD_CPPFLAGS="$CPPFLAGS"
 34+ export CPPFLAGS="$CPPFLAGS $INCLUDES -DHAVE_WIKIDIFF2"
 35+ AC_CHECK_HEADER([thai/thailib.h], [], AC_MSG_ERROR('thai/thailib.h' header not found'))
 36+ export CPPFLAGS="$OLD_CPPFLAGS"
 37+
 38+ PHP_SUBST(WIKIDIFF2_SHARED_LIBADD)
 39+ AC_DEFINE(HAVE_WIKIDIFF2, 1, [ ])
 40+ PHP_NEW_EXTENSION(wikidiff2, php_wikidiff2.cpp wikidiff2.cpp, $ext_shared)
 41+fi
Index: trunk/extensions/wikidiff2/wikidiff2.h
@@ -3,37 +3,73 @@
44
55 #define MAX_DIFF_LINE 10000
66
 7+/** Set WD2_USE_STD_ALLOCATOR to compile for standalone (non-PHP) operation */
 8+#ifdef WD2_USE_STD_ALLOCATOR
 9+#define WD2_ALLOCATOR std::allocator
 10+#else
 11+#define WD2_ALLOCATOR PhpAllocator
 12+#include "php_cpp_allocator.h"
 13+#endif
 14+
715 #include "DiffEngine.h"
816 #include "Word.h"
917 #include <string>
 18+#include <vector>
 19+#include <set>
1020
 21+class Wikidiff2 {
 22+ public:
 23+ typedef std::basic_string<char, std::char_traits<char>, WD2_ALLOCATOR<char> > String;
 24+ typedef std::vector<String, WD2_ALLOCATOR<String> > StringVector;
 25+ typedef std::vector<Word, WD2_ALLOCATOR<Word> > WordVector;
 26+ typedef std::vector<int, WD2_ALLOCATOR<int> > IntVector;
 27+ typedef std::set<int, std::less<int>, WD2_ALLOCATOR<int> > IntSet;
1128
12 -// operations for the diff, as returned by do_diff
13 -template<class T>
14 -struct diff_op
15 -{
16 - unsigned char op;
17 - const T *from, *to;
18 - unsigned from_ind, to_ind;
 29+ typedef Diff<String> StringDiff;
 30+ typedef Diff<Word> WordDiff;
1931
20 - diff_op<T> () {}
 32+ const String & execute(const String & text1, const String & text2, int numContextLines);
 33+
 34+ inline const String & getResult() const;
 35+
 36+ protected:
 37+ String result;
 38+
 39+ void diffLines(const StringVector & lines1, const StringVector & lines2,
 40+ int numContextLines);
 41+ void printAdd(const String & line);
 42+ void printDelete(const String & line);
 43+ void printWordDiff(const String & text1, const String & text2);
 44+ void printWordDiffSide(WordDiff &worddiff, bool added);
 45+ void printTextWithDiv(const String & input);
 46+ void printText(const String & input);
 47+ inline bool isChineseJapanese(int ch);
 48+ inline bool isSpace(int ch);
 49+ void debugPrintWordDiff(WordDiff & worddiff);
 50+
 51+ int nextUtf8Char(String::const_iterator & p, String::const_iterator & charStart,
 52+ String::const_iterator end);
 53+
 54+ void explodeWords(const String & text, WordVector &tokens);
 55+ void explodeLines(const String & text, StringVector &lines);
2156 };
2257
23 -template<class T>
24 -std::vector<diff_op<T> > do_diff(const std::vector<T> &text1, const std::vector<T> &text2);
 58+bool Wikidiff2::isChineseJapanese(int ch)
 59+{
 60+ if (ch >= 0x3000 && ch <= 0x9fff) return true;
 61+ if (ch >= 0x20000 && ch <= 0x2a000) return true;
 62+ return false;
 63+}
2564
26 -void print_diff(std::vector<std::string> &text1, std::vector<std::string> &text2, int num_lines_context, std::string &ret);
27 -void print_worddiff(const std::string & text1, const std::string & text2, std::string &ret);
28 -void print_worddiff_side(Diff<Word> &worddiff, bool added, std::string &ret);
29 -void split_tokens(const std::string & text, std::vector<Word> &tokens);
30 -void print_add(const std::string & line, std::string & ret);
31 -void print_del(const std::string & line, std::string & ret);
32 -void print_div_htmlspecialchars(const std::string & input, std::string & ret);
33 -void print_htmlspecialchars(const std::string & input, std::string & ret);
34 -void debug_print_worddiff(Diff<Word> &worddiff, std::string &ret);
35 -const char *wikidiff2_do_diff(const char *text1, const char *text2, int num_lines_context);
36 -int next_utf8_char(std::string::const_iterator & p, std::string::const_iterator & charStart,
37 - std::string::const_iterator end);
 65+bool Wikidiff2::isSpace(int ch)
 66+{
 67+ return ch == ' ' || ch == '\t';
 68+}
3869
 70+const Wikidiff2::String & Wikidiff2::getResult() const
 71+{
 72+ return result;
 73+}
 74+
3975 #endif
4076
Index: trunk/extensions/wikidiff2/php_wikidiff2.cpp
@@ -0,0 +1,103 @@
 2+/* $Id$ */
 3+
 4+#ifdef HAVE_CONFIG_H
 5+#include "config.h"
 6+#endif
 7+
 8+#include "php.h"
 9+#include "php_ini.h"
 10+#include "ext/standard/info.h"
 11+#include "php_wikidiff2.h"
 12+#include "wikidiff2.h"
 13+
 14+static int le_wikidiff2;
 15+
 16+const zend_function_entry wikidiff2_functions[] = {
 17+ PHP_FE(wikidiff2_do_diff, NULL)
 18+ {NULL, NULL, NULL}
 19+};
 20+
 21+
 22+zend_module_entry wikidiff2_module_entry = {
 23+#if ZEND_MODULE_API_NO >= 20010901
 24+ STANDARD_MODULE_HEADER,
 25+#endif
 26+ "wikidiff2",
 27+ wikidiff2_functions,
 28+ PHP_MINIT(wikidiff2),
 29+ PHP_MSHUTDOWN(wikidiff2),
 30+ PHP_RINIT(wikidiff2),
 31+ PHP_RSHUTDOWN(wikidiff2),
 32+ PHP_MINFO(wikidiff2),
 33+#if ZEND_MODULE_API_NO >= 20010901
 34+ "0.1",
 35+#endif
 36+ STANDARD_MODULE_PROPERTIES
 37+};
 38+
 39+
 40+#ifdef COMPILE_DL_WIKIDIFF2
 41+ZEND_GET_MODULE(wikidiff2)
 42+#endif
 43+
 44+PHP_MINIT_FUNCTION(wikidiff2)
 45+{
 46+ return SUCCESS;
 47+}
 48+
 49+PHP_MSHUTDOWN_FUNCTION(wikidiff2)
 50+{
 51+ return SUCCESS;
 52+}
 53+
 54+PHP_RINIT_FUNCTION(wikidiff2)
 55+{
 56+ return SUCCESS;
 57+}
 58+
 59+PHP_RSHUTDOWN_FUNCTION(wikidiff2)
 60+{
 61+ return SUCCESS;
 62+}
 63+
 64+PHP_MINFO_FUNCTION(wikidiff2)
 65+{
 66+ php_info_print_table_start();
 67+ php_info_print_table_header(2, "wikidiff2 support", "enabled");
 68+ php_info_print_table_end();
 69+
 70+}
 71+
 72+/* {{{ proto string wikidiff2_do_diff(string text1, string text2, int numContextLines)
 73+ */
 74+PHP_FUNCTION(wikidiff2_do_diff)
 75+{
 76+ char *text1 = NULL;
 77+ char *text2 = NULL;
 78+ int argc = ZEND_NUM_ARGS();
 79+ int text1_len;
 80+ int text2_len;
 81+ long numContextLines;
 82+
 83+ if (zend_parse_parameters(argc TSRMLS_CC, "ssl", &text1, &text1_len, &text2,
 84+ &text2_len, &numContextLines) == FAILURE)
 85+ {
 86+ return;
 87+ }
 88+
 89+
 90+ try {
 91+ Wikidiff2 wikidiff2;
 92+ Wikidiff2::String text1String(text1, text1_len);
 93+ Wikidiff2::String text2String(text2, text2_len);
 94+ const Wikidiff2::String & ret = wikidiff2.execute(text1String, text2String, numContextLines);
 95+ RETURN_STRINGL(ret.data(), ret.size(), 1);
 96+ } catch (std::bad_alloc &e) {
 97+ zend_error(E_WARNING, "Out of memory in wikidiff2_do_diff().");
 98+ } catch (...) {
 99+ zend_error(E_WARNING, "Unknown exception in wikidiff2_do_diff().");
 100+ }
 101+}
 102+/* }}} */
 103+
 104+
Property changes on: trunk/extensions/wikidiff2/php_wikidiff2.cpp
___________________________________________________________________
Name: svn:eol-style
1105 + native
Index: trunk/extensions/wikidiff2/php_wikidiff2.h
@@ -0,0 +1,65 @@
 2+/*
 3+ +----------------------------------------------------------------------+
 4+ | PHP Version 5 |
 5+ +----------------------------------------------------------------------+
 6+ | Copyright (c) 1997-2008 The PHP Group |
 7+ +----------------------------------------------------------------------+
 8+ | This source file is subject to version 3.01 of the PHP license, |
 9+ | that is bundled with this package in the file LICENSE, and is |
 10+ | available through the world-wide-web at the following url: |
 11+ | http://www.php.net/license/3_01.txt |
 12+ | If you did not receive a copy of the PHP license and are unable to |
 13+ | obtain it through the world-wide-web, please send a note to |
 14+ | license@php.net so we can mail you a copy immediately. |
 15+ +----------------------------------------------------------------------+
 16+ | Author: |
 17+ +----------------------------------------------------------------------+
 18+*/
 19+
 20+/* $Id: header,v 1.16.2.1.2.1.2.1 2008/02/07 19:39:50 iliaa Exp $ */
 21+
 22+#ifndef PHP_WIKIDIFF2_H
 23+#define PHP_WIKIDIFF2_H
 24+
 25+extern zend_module_entry wikidiff2_module_entry;
 26+#define phpext_wikidiff2_ptr &wikidiff2_module_entry
 27+
 28+#ifdef PHP_WIN32
 29+# define PHP_WIKIDIFF2_API __declspec(dllexport)
 30+#elif defined(__GNUC__) && __GNUC__ >= 4
 31+# define PHP_WIKIDIFF2_API __attribute__ ((visibility("default")))
 32+#else
 33+# define PHP_WIKIDIFF2_API
 34+#endif
 35+
 36+#ifdef ZTS
 37+#include "TSRM.h"
 38+#endif
 39+
 40+PHP_MINIT_FUNCTION(wikidiff2);
 41+PHP_MSHUTDOWN_FUNCTION(wikidiff2);
 42+PHP_RINIT_FUNCTION(wikidiff2);
 43+PHP_RSHUTDOWN_FUNCTION(wikidiff2);
 44+PHP_MINFO_FUNCTION(wikidiff2);
 45+
 46+PHP_FUNCTION(wikidiff2_do_diff);
 47+
 48+
 49+
 50+#ifdef ZTS
 51+#define WIKIDIFF2_G(v) TSRMG(wikidiff2_globals_id, zend_wikidiff2_globals *, v)
 52+#else
 53+#define WIKIDIFF2_G(v) (wikidiff2_globals.v)
 54+#endif
 55+
 56+#endif
 57+
 58+
 59+/*
 60+ * Local variables:
 61+ * tab-width: 4
 62+ * c-basic-offset: 4
 63+ * End:
 64+ * vim600: noet sw=4 ts=4 fdm=marker
 65+ * vim<600: noet sw=4 ts=4
 66+ */
Property changes on: trunk/extensions/wikidiff2/php_wikidiff2.h
___________________________________________________________________
Name: svn:eol-style
167 + native
Index: trunk/extensions/wikidiff2/CREDITS
@@ -0,0 +1 @@
 2+wikidiff2
\ No newline at end of file
Index: trunk/extensions/wikidiff2/README
@@ -13,10 +13,18 @@
1414
1515 These files are 2.3MB each, and give a worst-case performance test. Performance in the worst case is sensitive to the performance of the associative array class used to cross-reference the strings. I tried using an STL map and a Judy array. The Judy array gave an 11% improvement in execution time over the map, which could probably be increased to 15% with further optimisation work. I don't consider that to be a sufficient improvement to warrant adding a library dependency, but the code has been left in for the benefit of Judy fans and performance perfectionists. It can be enabled by compiling with -DUSE_JUDY. The C++ wrapper for JudyHS might be of use to someone.
1616
17 -Wikidiff2 can be compiled as either a PHP extension (with the help of swig), or as a standalone executable. The standalone is mainly intended for testing and development purposes.
 17+Wikidiff2 is a PHP extension. To compile and install it:
1818
19 -Tim Starling
20 -February 2006
 19+$ phpize
 20+$ CXXFLAGS=-Wno-write-strings ./configure
 21+$ make
 22+$ sudo make install
2123
 24+== Changelog ==
2225
 26+2010-06-14
 27+* Converted the extension from swig to PHP native.
 28+* Added Thai word break support
 29+* Added PHP allocator support
 30+
2331 vim: wrap

Follow-up revisions

RevisionCommit summaryAuthorDate
r107135* Fix for bug 33331, r67994: be much more aggressive in splitting punctuation...tstarling05:56, 23 December 2011

Past revisions this follows-up on

RevisionCommit summaryAuthorDate
r22205Update for MW 1.11 diff formatting....brion18:34, 16 May 2007
r22229Update with class for line number cells per update in trunk. Fixes RTL alignm...brion18:37, 17 May 2007

Status & tagging log