Francois PIETTE | 1 Mar 15:08 2009
Picon

Re: Extension:FileIndexer has issue with accented characters

Hi there !

I'm still looking for a solution of the problem explained below.
Any advice is really welcome.

--
francois.piette <at> overbyte.be
The author of the freeware multi-tier middleware MidWare
The author of the freeware Internet Component Suite (ICS)
http://www.overbyte.be

----- Original Message ----- 
From: "Francois Piette" <francois.piette <at> overbyte.be>
To: <mediawiki-l <at> lists.wikimedia.org>
Sent: Tuesday, February 24, 2009 2:25 PM
Subject: [Mediawiki-l] Extension:FileIndexer has issue with 
accentedcharacters

Hi !

I have installed the Extension:FileIndexer new variant
 (http://www.mediawiki.org/wiki/Extension_talk:FileIndexer#New_Variant) from
Ramon Dohle (raZe) on my version 1.12 and it works well for english text.
When I upload a PDF file containing french accented characters such as
e-acute ("é"), those are wrongly indexed and show on the file upload page.

I've looked inside the wiki database (table wikiprefix_searchindex, column
si_text) and found that an e-acute is represented as the string "u8c3a9" for
any standard page while it is represented by "u8efbfbd" for the uploaded PDF
entry. Actually any accented character is represented by "u8efbfbd" ! Of
(Continue reading)


Gmane