r48139 MediaWiki - Code Review archive

Repository:MediaWiki
Revision:r48138‎ | r48139 | r48140 >
Date:18:40, 7 March 2009
Author:rainman
Status:deferred
Tags:
Comment:
Fix the bug reported on Village Pump with some articles missing in search index.

Not sure what *exactly* happens when a page is moved over a redirect, but there seems to
be a missing OAI entry for the update/deletion of the redirect over which the page is moved.
As a consequence, lucene "links" index gets out of date and thinks the new page is a redirect
and thus doesn't index it. It kinds sucks to have two sets of keys there but both are needed
since we might not always have ns:title (i.e. on page deletions) and we usually want to
retrieve information for ns:title (i.e. when doing link analysis) without worring about page_id.

Fixed by explicitely cleaning up all "links" entries for both ns:title and page_id.
Next time someone edits the missing pages they will be re-added to the search index.
Modified paths:
  • /branches/lucene-search-2.1/src/org/wikimedia/lsearch/ranks/Links.java (modified) (history)

Diff [purge]

Index: branches/lucene-search-2.1/src/org/wikimedia/lsearch/ranks/Links.java
@@ -255,14 +255,19 @@
256256 for(IndexUpdateRecord rec : records){
257257 if(rec.doDelete()){
258258 Article a = rec.getArticle();
 259+ String articleKey = null;
259260 if(a.getTitle()==null || a.getTitle().equals("")){
260261 // try to fetch ns:title so we can have nicer debug info
261262 String key = getKeyFromPageId(rec.getIndexKey());
262263 if(key != null)
263264 a.setNsTitleKey(key);
264 - }
 265+ } else
 266+ articleKey = a.getNsTitleKey();
265267 log.debug(iid+": Deleting "+a);
266268 reader.deleteDocuments(new Term("article_pageid",rec.getIndexKey()));
 269+
 270+ if( articleKey != null ) // if not a deletion be sure to cleanup funky stuff when moving over redirects, etc..
 271+ reader.deleteDocuments(new Term("article_key", articleKey));
267272 }
268273 }
269274 flush();

Status & tagging log