Klemens Friedl | 12 Jun 2010 21:18
Picon

clucene - cl_demo stops with error

clucene - cl_demo stops with error while indexing reuters corpus

clucene version: current git current master
platform: WinXP SP3
build system: VS 2008 SP1
cmake: 2.8.1
cmake settings: see cmakecache.txt file (attached to email)

cl_demo app stops with error:
(one code line changed only to meet path to reuters-21578 corpa directory)

F:\Home\Search\clucene\build\bin\Debug>cl_demo.exe
adding file 1: src\test\data\reuters-21578/all-exchanges-strings.lc.txt
adding file 2: src\test\data\reuters-21578/all-orgs-strings.lc.txt
adding file 3: src\test\data\reuters-21578/all-people-strings.lc.txt
adding file 4: src\test\data\reuters-21578/all-places-strings.lc.txt
adding file 5: src\test\data\reuters-21578/all-topics-strings.lc.txt
adding file 6: src\test\data\reuters-21578/cat-descriptions_120396.txt
adding file 7: src\test\data\reuters-21578/feldman-cia-worldfactbook-data.txt
adding file 8: src\test\data\reuters-21578/LEWIS.DTD
adding file 9: src\test\data\reuters-21578/README.TXT
adding file 10: src\test\data\reuters-21578/reut2-000.sgm
adding file 11: src\test\data\reuters-21578/reut2-001.sgm

=> VS 2008 SP1 debugger:
Unhandled exception at 0x10099e4f (clucene-cored.dll) in cl_demo.exe:
0xC0000005: Access violation writing location 0x01034f74.

file:
clucene\src\core\CLucene\index\DocumentsWriterThreadState.cpp (line 642)
(Continue reading)

Itamar Syn-Hershko | 12 Jun 2010 21:33

Re: clucene - cl_demo stops with error

I'm running cl_test on a similar environment without any problem (using the
default CMake config). One of the tests there indexes the reuters corpus
too. Can you try running that?

The actual error looks like something we fixed in the atomicthreads branch,
and wasn't merged into master yet due to lack of feedback. Can you try
running demo from that branch (after trying cl_test too)?

Itamar.

> -----Original Message-----
> From: Klemens Friedl [mailto:frik85@...] 
> Sent: Saturday, June 12, 2010 10:19 PM
> To: clucene-developers@...
> Subject: [CLucene-dev] clucene - cl_demo stops with error
> 
> clucene - cl_demo stops with error while indexing reuters corpus
> 
> clucene version: current git current master
> platform: WinXP SP3
> build system: VS 2008 SP1
> cmake: 2.8.1
> cmake settings: see cmakecache.txt file (attached to email)
> 
> 
> cl_demo app stops with error:
> (one code line changed only to meet path to reuters-21578 
> corpa directory)
> 
> 
(Continue reading)

Klemens Friedl | 12 Jun 2010 23:27
Picon

Re: clucene - cl_demo stops with error

I forgot to mention that I ran the cl_test app earlier today, it
stopped with an failure at test 97.
(although, I may have used slightly different cmake settings)

I will try out that branch tomorrow, as it's already late there.

Klemens

2010/6/12 Itamar Syn-Hershko <itamar@...>:
> I'm running cl_test on a similar environment without any problem (using the
> default CMake config). One of the tests there indexes the reuters corpus
> too. Can you try running that?
>
> The actual error looks like something we fixed in the atomicthreads branch,
> and wasn't merged into master yet due to lack of feedback. Can you try
> running demo from that branch (after trying cl_test too)?
>
> Itamar.
>
>> -----Original Message-----
>> From: Klemens Friedl [mailto:frik85@...]
>> Sent: Saturday, June 12, 2010 10:19 PM
>> To: clucene-developers@...
>> Subject: [CLucene-dev] clucene - cl_demo stops with error
>>
>> clucene - cl_demo stops with error while indexing reuters corpus
>>
>> clucene version: current git current master
>> platform: WinXP SP3
>> build system: VS 2008 SP1
(Continue reading)

Klemens Friedl | 13 Jun 2010 10:15
Picon

Re: clucene - cl_demo stops with error

I tried the cl_test and cl_demo with the atomicthreads branch and
default cmake settings (except added zlib path vars).
(see attached log files)

cl_test runs through 102 tests, but fails on first of two UTF8 tests.
cl_demo indexes all files of the reuters corpa, though it deadlocks
right after that :/

Kind regards,
Klemens Friedl

F:\Home\Search\clucene\atomicthreads\build\bin\Debug>cl_test.exe
Key: .= pass N=not implemented F=fail
All CLucene Tests:
    CLucene Atomic Updates Test:     ..              - 6203ms
    CLucene IndexReader Test:        ..              - 766ms
    CLucene Reuters Test:            ...             - 8547ms
    CLucene Analysis Test:           .               - 0ms
    CLucene Analyzers Test:          .........       - 234ms
    CLucene Document Test:           ......          - 4563ms
    CLucene Number Tools Test:       ...             - 422ms
    CLucene Debug Test:              .               - 0ms
    CLucene IndexWriter Test:        ......          - 4281ms
    CLucene IndexModifier Test:      .               - 56047ms
    CLucene High Frequencies Test:   .               - 16ms
    CLucene Priority Queue Test:     .               - 62ms
    CLucene DateTools Test:          ..              - 0ms
    CLucene Query Parser Test:       ............... - 63ms
    CLucene Multi-Field QP Test:     ..              - 0ms
    CLucene Boolean Tests:           ....            - 15ms
(Continue reading)

Itamar Syn-Hershko | 13 Jun 2010 10:59

Re: clucene - cl_demo stops with error

Can you please test the master branch (cl_test and cl_demo) with default
cmake settings as well?

Also, can you send the stacktrace for this deadlock? If you get this on
master, then for master, otherwise for atomicthreads.

Itamar.

> -----Original Message-----
> From: Klemens Friedl [mailto:frik85 <at> gmail.com] 
> Sent: Sunday, June 13, 2010 11:16 AM
> To: clucene-developers <at> lists.sourceforge.net
> Subject: Re: [CLucene-dev] clucene - cl_demo stops with error
> 
> I tried the cl_test and cl_demo with the atomicthreads branch 
> and default cmake settings (except added zlib path vars).
> (see attached log files)
> 
> cl_test runs through 102 tests, but fails on first of two UTF8 tests.
> cl_demo indexes all files of the reuters corpa, though it 
> deadlocks right after that :/
> 
> 
> Kind regards,
> Klemens Friedl
> 
> 
> 
> F:\Home\Search\clucene\atomicthreads\build\bin\Debug>cl_test.exe
> Key: .= pass N=not implemented F=fail
(Continue reading)

Klemens Friedl | 13 Jun 2010 16:36
Picon

Re: clucene - cl_demo stops with error

I build and executed cl_test and cl_demo again on master and the
atomicthreads branch with default cmake settings, see attached files.
I included a stack trace for the cl_demo app in both cases.

Klemens

2010/6/13 Itamar Syn-Hershko <itamar@...>:
> Can you please test the master branch (cl_test and cl_demo) with default
> cmake settings as well?
>
> Also, can you send the stacktrace for this deadlock? If you get this on
> master, then for master, otherwise for atomicthreads.
>
> Itamar.
>
>> -----Original Message-----
>> From: Klemens Friedl [mailto:frik85@...]
>> Sent: Sunday, June 13, 2010 11:16 AM
>> To: clucene-developers@...
>> Subject: Re: [CLucene-dev] clucene - cl_demo stops with error
>>
>> I tried the cl_test and cl_demo with the atomicthreads branch
>> and default cmake settings (except added zlib path vars).
>> (see attached log files)
>>
>> cl_test runs through 102 tests, but fails on first of two UTF8 tests.
>> cl_demo indexes all files of the reuters corpa, though it
>> deadlocks right after that :/
>>
>>
(Continue reading)

Itamar Syn-Hershko | 13 Jun 2010 16:40

Re: clucene - cl_demo stops with error

Just to confirm: for both branches, cl_test works fine but cl_demo crashes?

> -----Original Message-----
> From: Klemens Friedl [mailto:frik85 <at> gmail.com] 
> Sent: Sunday, June 13, 2010 5:37 PM
> To: clucene-developers <at> lists.sourceforge.net
> Subject: Re: [CLucene-dev] clucene - cl_demo stops with error
> 
> I build and executed cl_test and cl_demo again on master and 
> the atomicthreads branch with default cmake settings, see 
> attached files.
> I included a stack trace for the cl_demo app in both cases.
> 
> Klemens
> 
> 
> 2010/6/13 Itamar Syn-Hershko <itamar <at> divrei-tora.com>:
> > Can you please test the master branch (cl_test and cl_demo) with 
> > default cmake settings as well?
> >
> > Also, can you send the stacktrace for this deadlock? If you 
> get this 
> > on master, then for master, otherwise for atomicthreads.
> >
> > Itamar.
> >
> >> -----Original Message-----
> >> From: Klemens Friedl [mailto:frik85 <at> gmail.com]
> >> Sent: Sunday, June 13, 2010 11:16 AM
> >> To: clucene-developers <at> lists.sourceforge.net
(Continue reading)

Klemens Friedl | 13 Jun 2010 17:03
Picon

Re: clucene - cl_demo stops with error

cl_test apps run, but one test fails in both, and both versions run a
different amount of tests.

cl_test (master) runs 97 tests, one of the two UTF8 tests failed, see
"cl_test.txt" file (attached last email).
cl_test (atomicthreads) runs 102 tests, also one of the two UTF8 tests
failed, see "branch_cl_test.txt" file.

cl_demo crashes in both. yesterday, I tried to test cl_demo with only
circa half of the documents of the reuters test directory, and it run
through fine. I played a bit around and it seems that cl_demo crashes
while indexing text files with a few kilobytes (files that are a bit
larger than the smallest text files in the directory). The index
merging and optimizing process takes unusally (in my opinion) long
time, as the index files are combined maybe a megabyte of disc space.
weird.

2010/6/13 Itamar Syn-Hershko <itamar@...>:
> Just to confirm: for both branches, cl_test works fine but cl_demo crashes?
>
>> -----Original Message-----
>> From: Klemens Friedl [mailto:frik85@...]
>> Sent: Sunday, June 13, 2010 5:37 PM
>> To: clucene-developers@...
>> Subject: Re: [CLucene-dev] clucene - cl_demo stops with error
>>
>> I build and executed cl_test and cl_demo again on master and
>> the atomicthreads branch with default cmake settings, see
>> attached files.
>> I included a stack trace for the cl_demo app in both cases.
(Continue reading)

Itamar Syn-Hershko | 13 Jun 2010 17:14

Re: clucene - cl_demo stops with error

Master comments out a few tests, which fail on several platforms. These
tests pass for atomicthreads on every platform we could check, therefore
they were uncommented for it. This is for the different number of tests.

The UTF8 tests fail on Windows because when converting from SVN to git the
UTF8 test files were tempered with. I fixed this once, but for some reason
it needs fixing again. Will do soon. But this is not a code issue.

The problem seem to be with cl_demo itself. I will have to investigate this
a bit, but its going to take a while until I get to that. In the meantime,
try doing some of your own work with clucene - it shouldn't fail you. Use
the code from cl_test as guidance if needed.

HTH

Itamar.

> -----Original Message-----
> From: Klemens Friedl [mailto:frik85 <at> gmail.com] 
> Sent: Sunday, June 13, 2010 6:04 PM
> To: clucene-developers <at> lists.sourceforge.net
> Subject: Re: [CLucene-dev] clucene - cl_demo stops with error
> 
> cl_test apps run, but one test fails in both, and both 
> versions run a different amount of tests.
> 
> cl_test (master) runs 97 tests, one of the two UTF8 tests 
> failed, see "cl_test.txt" file (attached last email).
> cl_test (atomicthreads) runs 102 tests, also one of the two 
> UTF8 tests failed, see "branch_cl_test.txt" file.
(Continue reading)

Klemens Friedl | 13 Jun 2010 17:20
Picon

Re: clucene - cl_demo stops with error

I tried to execute the cl_demo in both versions with an reduced
reuters corpa; see difference in the files:
files-reuters_*.txt

the branch cl_demo version worked fine with less files, weird:
branch_cl_demo_short.txt

the master version does work only with very few files, with a bit more
it shows another error:
cl_demo_short_*.txt

Klemens

2010/6/13 Klemens Friedl <frik85@...>:
> cl_test apps run, but one test fails in both, and both versions run a
> different amount of tests.
>
> cl_test (master) runs 97 tests, one of the two UTF8 tests failed, see
> "cl_test.txt" file (attached last email).
> cl_test (atomicthreads) runs 102 tests, also one of the two UTF8 tests
> failed, see "branch_cl_test.txt" file.
>
>
> cl_demo crashes in both. yesterday, I tried to test cl_demo with only
> circa half of the documents of the reuters test directory, and it run
> through fine. I played a bit around and it seems that cl_demo crashes
> while indexing text files with a few kilobytes (files that are a bit
> larger than the smallest text files in the directory). The index
> merging and optimizing process takes unusally (in my opinion) long
> time, as the index files are combined maybe a megabyte of disc space.
(Continue reading)

Itamar Syn-Hershko | 16 Jun 2010 17:45

Re: clucene - cl_demo stops with error

I had a quick look. Seems like cl_demo is indeed broken (following a some
work Ben made on it a year ago), and we are working on it. As I said before,
try using CLucene for your needs as-is, and let us know if you hit any
walls.

The UTF8 test fails because of src\test\data\utf8text\french_utf8.txt. I
can't seem to commit it with -crlf -diff, so an extra LF is added and that
breaks the UTF8 code. This can be a code issue as well, but is less likely.

Itamar. 

> -----Original Message-----
> From: Klemens Friedl [mailto:frik85 <at> gmail.com] 
> Sent: Sunday, June 13, 2010 6:21 PM
> To: clucene-developers <at> lists.sourceforge.net
> Subject: Re: [CLucene-dev] clucene - cl_demo stops with error
> 
> I tried to execute the cl_demo in both versions with an 
> reduced reuters corpa; see difference in the files:
> files-reuters_*.txt
> 
> the branch cl_demo version worked fine with less files, weird:
> branch_cl_demo_short.txt
> 
> the master version does work only with very few files, with a 
> bit more it shows another error:
> cl_demo_short_*.txt
> 
> 
> Klemens
(Continue reading)

Klemens Friedl | 18 Jun 2010 19:03
Picon

Re: clucene - cl_demo stops with error

Thanks for the clarification about cl_demo. I saw that the demo code
got slightly altered yesterday.
Though, the demo app still crashes with the same error (today master
branch; 02ddf6)

For my application, I need to index a bunch of text files. The cl_demo
code seemed to be a good starting point, and it has some similarities
to the JavaLucene example app.

The cl_test app contains various unit tests, it does use mainly
generic data for the index process and the code spreads around various
files which makes it hard to graps the overall concept. Plus I still
can't be sure that clucene is behaving fine or just the demo app has a
bug, as the unit tests may not cover a real world scenario (as it
mainly uses ramdir, generic data, etc.).

>From the VS debugger:
clucene-cored.dll!lucene::index::DocumentsWriter::balanceRAM()  Line
1326 + 0x21 bytes	C++
...looks more like a clucene issue than a demo app bug.

I see that some work on the demo code, that's nice and I am looking
forward to a working real world indexing sample code :)

Klemens

2010/6/16 Itamar Syn-Hershko <itamar@...>:
> I had a quick look. Seems like cl_demo is indeed broken (following a some
> work Ben made on it a year ago), and we are working on it. As I said before,
> try using CLucene for your needs as-is, and let us know if you hit any
(Continue reading)

Itamar Syn-Hershko | 19 Jun 2010 21:11

Re: clucene - cl_demo stops with error

Hi,

cl_demo is still the best place to start. Also remember every Java Lucene
code snippet will work with CLucene, except for cases where functionality
wasn't ported yet (minus syntax and client memory management tasks).

cl_test can answer many "how do I do this" questions in more advanced
situations (multi-threading, for example).

We are investigating this internally, but this might take long to resolve
since we are focusing on other aspects of the core at the moment and our
resources are slim. It indeed looks more like an internal issue, but only
one that occurs while indexing very large streams. This also seem to relate
to
http://sourceforge.net/tracker/?func=detail&aid=2981449&group_id=80013&atid=
558446.

What confuses me is the sporadic nature of this issue. It is failing on my
end on a different file than on yours, for example.

Bottom line, clucene is generally fine - you can verify this by indexing any
other corpora you have. Something that isn't easy to pinpoint is obviously
wrong, and is only visible while indexing the reuters corpora. IMO the best
way to attack this is to port relevant tests from Java Lucene 2.3.2, until
one of them points at a culprit (I'd start with
index/TestDocumentWriter.java). If you can help us with this that would be
great, as I can't see myself doing that in the next weeks.

Itamar. 

(Continue reading)


Gmane