David Smiley (JIRA | 30 Apr 08:26 2012
Picon

[jira] [Created] (SOLR-3424) PhoneticFilterFactory threadsafety bug

David Smiley created SOLR-3424:
----------------------------------

             Summary: PhoneticFilterFactory threadsafety bug
                 Key: SOLR-3424
                 URL: https://issues.apache.org/jira/browse/SOLR-3424
             Project: Solr
          Issue Type: Bug
          Components: Schema and Analysis
    Affects Versions: 3.6, 4.0
            Reporter: David Smiley
            Assignee: David Smiley
            Priority: Minor
             Fix For: 4.0

PhoneticFilterFactory has a static HashMap registry mapping an encoder name to an implementation. There
is a ReentrantLock used when the map is modified (when the encoder config specifies a class name). 
However, this map, which can be accessed by multiple indexing threads, isn't guarded on any of the reads,
which isn't just the common path but also the error messages which dump the registry into the error message.

I realize the likelihood of a problem is extremely slim, but a bug's a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
David Smiley (JIRA | 30 Apr 08:32 2012
Picon

[jira] [Updated] (SOLR-3424) PhoneticFilterFactory threadsafety bug


     [
https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Smiley updated SOLR-3424:
-------------------------------

    Attachment: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch

The attached patch fixes the problem by basically "upgrading" the code to use Guava's MapMaker which is
excellent for caches. I keep 2 distinct registries, the constant builtin one, and the custom class one,
because there is a distinction between the two which is the capitalization of the keys.  I couldn't have
just one MapMaker map because the key lookup of the computed value should be the original class name uncapitalized.

> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
>                 Key: SOLR-3424
>                 URL: https://issues.apache.org/jira/browse/SOLR-3424
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch
>
(Continue reading)

David Smiley (JIRA | 30 Apr 08:32 2012
Picon

[jira] [Commented] (SOLR-3424) PhoneticFilterFactory threadsafety bug


    [
https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13264688#comment-13264688
] 

David Smiley commented on SOLR-3424:
------------------------------------

And I also noticed that Commons-Codec's Caverphone.class was deprecated for Caverphone2.class so I made
that simple change.  The docs say the deprecated one simply forwards calls, so there should be no
side-effects of this change.

> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
>                 Key: SOLR-3424
>                 URL: https://issues.apache.org/jira/browse/SOLR-3424
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch
>
>
> PhoneticFilterFactory has a static HashMap registry mapping an encoder name to an implementation.
(Continue reading)

Robert Muir (JIRA | 30 Apr 10:15 2012
Picon

[jira] [Commented] (SOLR-3424) PhoneticFilterFactory threadsafety bug


    [
https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13264720#comment-13264720
] 

Robert Muir commented on SOLR-3424:
-----------------------------------

where is the test for the bug?

why does this factory have a cache at all?! 

                
> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
>                 Key: SOLR-3424
>                 URL: https://issues.apache.org/jira/browse/SOLR-3424
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch
>
>
(Continue reading)

Uwe Schindler (JIRA | 30 Apr 10:35 2012
Picon

[jira] [Commented] (SOLR-3424) PhoneticFilterFactory threadsafety bug


    [
https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13264730#comment-13264730
] 

Uwe Schindler commented on SOLR-3424:
-------------------------------------

The whole design of this cache is wrong:
- it should not be static, the resolved class lookups should only be cached per instance
- the cache only caches "resolves" not instantiations of encoders, the JVM caches Class.forName()
lookups, so why cache it again?
- To fix the orginal issue, a ConcurrentHashMap would also be fine
- Solr/Lucene use Analyzers which are now enforced to reuse tokenstreams, so the lookup is only done once
when IndexWriter starts and a new indexing thread is created

The last point tells me: "remove this cache"

> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
>                 Key: SOLR-3424
>                 URL: https://issues.apache.org/jira/browse/SOLR-3424
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Minor
(Continue reading)

Uwe Schindler (JIRA | 30 Apr 11:01 2012
Picon

[jira] [Commented] (SOLR-3424) PhoneticFilterFactory threadsafety bug


    [
https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13264741#comment-13264741
] 

Uwe Schindler commented on SOLR-3424:
-------------------------------------

One thing to add:

bq. the cache only caches "resolves" not instantiations of encoders, the JVM caches Class.forName()
lookups, so why cache it again?

You may argue, that the lookup may be more expensive, as it uses a try..catch block (first try in expected
commons package, later try a full class name). I think the chain of Try...Catch with ClassNoFound should
be changed to do it more smart: If the name of encoder contains a period (indexOf('.')>=0), look it up as
class name, otherwise prefix it with the commons package name. This way, the JVM cache for loaded classes
can be used and the cache is completely useless.

I like in your fix, that it also changed the broken uppercasing: It should only do that for the builtins,
class names itsself are case sensitive.

+1 to remove the cache and only keep the static builtins (and please: as
Collections.unmodifiableMap!!!), lookup non-builtins without try...catch(ClassNotFound). The
error message should only list the builtins and mention that otherwise the name should be a full class name.

> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
>                 Key: SOLR-3424
(Continue reading)

David Smiley (JIRA | 30 Apr 17:31 2012
Picon

[jira] [Commented] (SOLR-3424) PhoneticFilterFactory threadsafety bug


    [
https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13264976#comment-13264976
] 

David Smiley commented on SOLR-3424:
------------------------------------

Rob: This is a (minor) thread-safety bug and consequently it's not really feasible to test it without a lot
of work and without a guarantee at the end that there is no problem, and so it isn't worth it.  Of course if you
want to; go for it ;-)

Uwe: Thanks very much for your comments.  I wasn't sure what the lifecycle of this class was or if/how it is
shared; I looked at the javadocs of TokenFilterFactory just now and I think its clearer.  Given that the
Factory's init() method is not going to be called frequently, it seems to me that the class name based
lookups need not cache at all.  And I also like your suggestion of improving the fallback lookup by looking
for a '.'.  BTW I considered wrapping with unmodifiableMap but didn't bother because it's not exposed
outside of this class, which is fairly small too, but I will since it seems to be a big deal to you (three
exclamation points).  I'll propose another patch later

~ David

> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
>                 Key: SOLR-3424
>                 URL: https://issues.apache.org/jira/browse/SOLR-3424
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
(Continue reading)

Uwe Schindler (JIRA | 30 Apr 17:39 2012
Picon

[jira] [Commented] (SOLR-3424) PhoneticFilterFactory threadsafety bug


    [
https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13264986#comment-13264986
] 

Uwe Schindler commented on SOLR-3424:
-------------------------------------

Thanks! Thaks for taking care, as I have no time to provide a patch at the moment :(

bq. but I will since it seems to be a big deal to you (three exclamation points)

:-) It's not so important, if it's private to the class! I just don't want public code to expose Collections
not intended to be modified in a modifiable way, so i am fine with or without unmodifiable. There are also
places in Lucene violating this (or use public arrays, which cannot be protected - like CompositeReader.getSequentialSubReaders()).

> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
>                 Key: SOLR-3424
>                 URL: https://issues.apache.org/jira/browse/SOLR-3424
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Minor
>             Fix For: 4.0
>
(Continue reading)

David Smiley (JIRA | 1 May 06:20 2012
Picon

[jira] [Updated] (SOLR-3424) PhoneticFilterFactory threadsafety bug


     [
https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Smiley updated SOLR-3424:
-------------------------------

    Attachment: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch

Here is an updated patch. I enhanced the javadocs too, which have become out of date.

> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
>                 Key: SOLR-3424
>                 URL: https://issues.apache.org/jira/browse/SOLR-3424
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch
>
>
> PhoneticFilterFactory has a static HashMap registry mapping an encoder name to an implementation.
There is a ReentrantLock used when the map is modified (when the encoder config specifies a class name). 
(Continue reading)

Uwe Schindler (JIRA | 1 May 09:29 2012
Picon

[jira] [Commented] (SOLR-3424) PhoneticFilterFactory threadsafety bug


    [
https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265696#comment-13265696
] 

Uwe Schindler commented on SOLR-3424:
-------------------------------------

Hi David,

looks good, but there is a bug:
The refactoring of the REGISTRY static map was an anonymous inner class before (new HashMap() {{ hashmap
ctor code }}; (please note double parens). Now it is assigned to a static field, but the initializer code is
an inline ctor of the factory, means the initialization runs on every instantiation.

- add *static* before the anonymous ctor:
{code:java}
private static final Map<String, Class<? extends Encoder>> registry = new HashMap<String, Class<?
extends Encoder>>(6);
static {
 registry.put(...
{code}

- or leave the initializer as it was before (anonymous HashMap subclass with overridden ctor).

in resolve encoder maybe remove the clazz variable and directly return the result of forName(). I do't like
non-final variables initialized with null because that prevents compilation failures on missing
initialization and null could be returned - especially with lots of try-catch blocks.

> PhoneticFilterFactory threadsafety bug
(Continue reading)

Uwe Schindler (JIRA | 1 May 09:45 2012
Picon

[jira] [Updated] (SOLR-3424) PhoneticFilterFactory threadsafety bug


     [
https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated SOLR-3424:
--------------------------------

    Attachment: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch

Attached is a patch that fixes the initialization bug and improves the reflection code.

*One thing that came into my mind when I looked at the code of PhoneticFilter:* I did not find out if Encoders
are threadsafe or not. If they aren't or we are not sure (the documentation states nothing), we should
create the Encoder class instance in the TokenStream create() method. TokenStreams itsself are only
used per thread (iterator pattern), and the Analyzer reuse ensures that we have a separate instance for
each thread.

> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
>                 Key: SOLR-3424
>                 URL: https://issues.apache.org/jira/browse/SOLR-3424
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Minor
>             Fix For: 4.0
(Continue reading)

Uwe Schindler (JIRA | 1 May 10:03 2012
Picon

[jira] [Updated] (SOLR-3424) PhoneticFilterFactory threadsafety bug


     [
https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated SOLR-3424:
--------------------------------

    Attachment: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch

Updated patch, I added a testcase for the reflection without package name:
- I use "Caverphone2" as package less name, which is not in the REGISTRY. The returned class should be Caverphone2
- I cross-check with the REGISTRY name "Caverphone", which should also return the new Caverphone2
encoder, but without reflection

> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
>                 Key: SOLR-3424
>                 URL: https://issues.apache.org/jira/browse/SOLR-3424
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch,
SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch,
(Continue reading)

David Smiley (JIRA | 1 May 17:29 2012
Picon

[jira] [Commented] (SOLR-3424) PhoneticFilterFactory threadsafety bug


    [
https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265862#comment-13265862
] 

David Smiley commented on SOLR-3424:
------------------------------------

Uwe: Excellent catch RE my "static" bug, and I agree on your improvement RE clazz & null.

I tried to ascertain the thread-safety status of Commons-Codec and unfortunately I don't think we can
assume it's thread-safe.  I sent an email to their list about this just now: http://apache-commons.680414.n4.nabble.com/Thread-safety-of-Commons-Codec-Encoder-tt4600956.html

So I agree that we'll have to insatiate one of these in our create() method instead of caching it.

> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
>                 Key: SOLR-3424
>                 URL: https://issues.apache.org/jira/browse/SOLR-3424
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch,
(Continue reading)

David Smiley (JIRA | 2 May 06:27 2012
Picon

[jira] [Updated] (SOLR-3424) PhoneticFilterFactory threadsafety bug


     [
https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Smiley updated SOLR-3424:
-------------------------------

    Attachment: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch

My new patch creates & initializes the encoder per call to create(). The testSpeed() tests seem about the
same. I also added a test for the maxCodeLen arg, and i documented it too.

> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
>                 Key: SOLR-3424
>                 URL: https://issues.apache.org/jira/browse/SOLR-3424
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch,
SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch,
SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch,
SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch
(Continue reading)

David Smiley (JIRA | 2 May 17:02 2012
Picon

[jira] [Commented] (SOLR-3424) PhoneticFilterFactory threadsafety bug


    [
https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13266617#comment-13266617
] 

David Smiley commented on SOLR-3424:
------------------------------------

I'll commit it in ~24 hours.

> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
>                 Key: SOLR-3424
>                 URL: https://issues.apache.org/jira/browse/SOLR-3424
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch,
SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch,
SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch,
SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch
>
>
(Continue reading)

Uwe Schindler (JIRA | 3 May 08:05 2012
Picon

[jira] [Commented] (SOLR-3424) PhoneticFilterFactory threadsafety bug


    [
https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267229#comment-13267229
] 

Uwe Schindler commented on SOLR-3424:
-------------------------------------

+1

I wanted to review the stock Lucene Analyzer about thread safety, too.

> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
>                 Key: SOLR-3424
>                 URL: https://issues.apache.org/jira/browse/SOLR-3424
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch,
SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch,
SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch,
SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch
(Continue reading)

Uwe Schindler (JIRA | 3 May 08:09 2012
Picon

[jira] [Commented] (SOLR-3424) PhoneticFilterFactory threadsafety bug


    [
https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267233#comment-13267233
] 

Uwe Schindler commented on SOLR-3424:
-------------------------------------

There seems to be no Analyzer in Lucene that uses this class. The phonetic package only contains the plain
TokenFilters and those are single-thread only.

> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
>                 Key: SOLR-3424
>                 URL: https://issues.apache.org/jira/browse/SOLR-3424
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch,
SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch,
SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch,
SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch
>
(Continue reading)

Uwe Schindler (JIRA | 3 May 08:36 2012
Picon

[jira] [Updated] (SOLR-3424) PhoneticFilterFactory threadsafety bug


     [
https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated SOLR-3424:
--------------------------------

    Attachment: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch

Hi, I looked again at the code and found the following improvement, attached as new patch:

The reflection based-method lookup should be done in the initialization. The pointer to the method is then
saved in a instance field, so the getEncoder() method only needs to invoke. This removes more than half of
the reflection cost.
The pointer to the method is null, if no maxLength is set.
I also did some minor cleanups in Exception handling.

> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
>                 Key: SOLR-3424
>                 URL: https://issues.apache.org/jira/browse/SOLR-3424
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Minor
>             Fix For: 4.0
(Continue reading)

David Smiley (JIRA | 3 May 09:05 2012
Picon

[jira] [Commented] (SOLR-3424) PhoneticFilterFactory threadsafety bug


    [
https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267264#comment-13267264
] 

David Smiley commented on SOLR-3424:
------------------------------------

So it appears the Commons-Codec Encoders are all thread-safe after all:
http://apache-commons.680414.n4.nabble.com/Thread-safety-of-Commons-Codec-Encoder-tt4600956.html

> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
>                 Key: SOLR-3424
>                 URL: https://issues.apache.org/jira/browse/SOLR-3424
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch,
SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch,
SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch,
SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch,
SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch, SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch
(Continue reading)

Uwe Schindler (JIRA | 3 May 09:27 2012
Picon

[jira] [Commented] (SOLR-3424) PhoneticFilterFactory threadsafety bug


    [
https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267272#comment-13267272
] 

Uwe Schindler commented on SOLR-3424:
-------------------------------------

Hi David,
I don't really trust that thread so I would really go with the current solution. In Lucene 4.0, reuse of
TokenStreams is *enforced* so there is no cost at all by the reflection and the createInstance. I also
improved the reflective method call (which has the side effect of being able to print a useful message when
the encoder does not support setMaxLen, the old code throwed unspecified and unhelpful Exception).

In general, the commons encoders might be thread safe, but we use reflection and use any class the user
provides and other classes implementing the abstract Encoder interface may not be thread safe.

As it costs us nothing (Class.newInstance() is cheap and only called once a new thread is used and the
SolrAnalyzer class has to create new TokenStreamComponents), we should stay with the current code. The
thread-safety is then ensured by the Analyzer class (it uses thread-locals to reuse TokenStreams).

What do you think about my latest patch?

> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
>                 Key: SOLR-3424
>                 URL: https://issues.apache.org/jira/browse/SOLR-3424
>             Project: Solr
>          Issue Type: Bug
(Continue reading)

Uwe Schindler (JIRA | 3 May 09:27 2012
Picon

[jira] [Issue Comment Edited] (SOLR-3424) PhoneticFilterFactory threadsafety bug


    [
https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267272#comment-13267272
] 

Uwe Schindler edited comment on SOLR-3424 at 5/3/12 7:27 AM:
-------------------------------------------------------------

Hi David,
I don't really trust that thread so I would really go with the current solution. In Lucene 4.0, reuse of
TokenStreams is *enforced* so there is no cost at all by the reflection and the createInstance. I also
improved the reflective method call (which has the side effect of being able to print a useful message when
the encoder does not support setMaxLen, the old code throwed unspecified and unhelpful Exception).

In general, the commons encoders might be thread safe, but we use reflection and use any class the user
provides and other classes implementing the abstract Encoder interface may not be thread safe (unless
this is explicitely specified in their spec / javadocs - which isn't)

As it costs us nothing (Class.newInstance() is cheap and only called once a new thread is used and the
SolrAnalyzer class has to create new TokenStreamComponents), we should stay with the current code. The
thread-safety is then ensured by the Analyzer class (it uses thread-locals to reuse TokenStreams).

What do you think about my latest patch?

      was (Author: thetaphi):
    Hi David,
I don't really trust that thread so I would really go with the current solution. In Lucene 4.0, reuse of
TokenStreams is *enforced* so there is no cost at all by the reflection and the createInstance. I also
improved the reflective method call (which has the side effect of being able to print a useful message when
the encoder does not support setMaxLen, the old code throwed unspecified and unhelpful Exception).
(Continue reading)

David Smiley (JIRA | 4 May 04:46 2012
Picon

[jira] [Commented] (SOLR-3424) PhoneticFilterFactory threadsafety bug


    [
https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13268053#comment-13268053
] 

David Smiley commented on SOLR-3424:
------------------------------------

Looks good Uwe.
One minor nit is that the URL in the class Javadoc to Commons-Codec's Language package should be this
URL:
http://commons.apache.org/codec/apidocs/org/apache/commons/codec/language/package-summary.html
  which is the one to 1.6; the existing link is older with fewer classes.  We've got this link in both the
FilterFactory & Filter.

Feel free to commit if you want or just leave it to me.

> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
>                 Key: SOLR-3424
>                 URL: https://issues.apache.org/jira/browse/SOLR-3424
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Minor
>             Fix For: 4.0
(Continue reading)

Uwe Schindler (JIRA | 5 May 23:29 2012
Picon

[jira] [Commented] (SOLR-3424) PhoneticFilterFactory threadsafety bug


    [
https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269064#comment-13269064
] 

Uwe Schindler commented on SOLR-3424:
-------------------------------------

Hi David,
just commit it, I am about to leave for business trips (including Lucene Rev) so have not much time.
Uwe

> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
>                 Key: SOLR-3424
>                 URL: https://issues.apache.org/jira/browse/SOLR-3424
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch,
SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch,
SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch,
SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch,
(Continue reading)

David Smiley (JIRA | 6 May 04:27 2012
Picon

[jira] [Closed] (SOLR-3424) PhoneticFilterFactory threadsafety bug


     [
https://issues.apache.org/jira/browse/SOLR-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Smiley closed SOLR-3424.
------------------------------

    Resolution: Fixed

Committed to trunk in 1334544.

I left the commons-coded javadoc url as-is since it was finally updated to the latest version.

> PhoneticFilterFactory threadsafety bug
> --------------------------------------
>
>                 Key: SOLR-3424
>                 URL: https://issues.apache.org/jira/browse/SOLR-3424
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch,
SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch,
SOLR-3424_PhoneticFilterFactory_threadsafety_bug.patch,
(Continue reading)


Gmane