Peter Dietz | 1 Feb 2012 18:29
Picon
Gravatar

Re: search can't sort by date issued

Hi All,


I've just started digging into this as well. Its really unfortunate to only get "relevance" results for searches.

In digging in, I've spit out the stack trace, and its telling me a few things. 
1) Do we have "bad" metadata for dc.date.issued? 
-- (I've already harassed my content folks to have them review all our metadata) ;)

2) Are we doing the comparison of dates incorrectly. The error below says is the value of "dateissued" an INT.
-- I've been reading this thread, which is very similar: http://www.gossamer-threads.com/lists/lucene/java-user/109530


2012-01-31 17:47:02,475 ERROR org.dspace.search.DSQuery <at> Unable to use speficied sort option: dateissued
2012-01-31 17:47:02,475 ERROR org.dspace.search.DSQuery <at> Invalid shift value in prefixCoded string (is encoded value really an INT?)
2012-01-31 17:47:02,476 ERROR org.dspace.search.DSQuery <at> java.lang.NumberFormatException: Invalid shift value in prefixCoded string (is encoded value really an INT?)
at org.apache.lucene.util.NumericUtils.prefixCodedToInt(NumericUtils.java:233)
at org.apache.lucene.search.FieldCache$7.parseInt(FieldCache.java:237)
at org.apache.lucene.search.FieldCacheImpl$IntCache.createValue(FieldCacheImpl.java:457)
at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:224)
at org.apache.lucene.search.FieldCacheImpl.getInts(FieldCacheImpl.java:430)
at org.apache.lucene.search.FieldCacheImpl$IntCache.createValue(FieldCacheImpl.java:447)
at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:224)
at org.apache.lucene.search.FieldCacheImpl.getInts(FieldCacheImpl.java:430)
at org.apache.lucene.search.FieldComparator$IntComparator.setNextReader(FieldComparator.java:332)
at org.apache.lucene.search.TopFieldCollector$MultiComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:435)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:249)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:240)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:181)
at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:113)
at org.apache.lucene.search.Hits.<init>(Hits.java:90)
at org.apache.lucene.search.Searcher.search(Searcher.java:63)
at org.dspace.search.DSQuery.doQuery(DSQuery.java:151)
at org.dspace.search.DSQuery.doQuery(DSQuery.java:309)
at org.dspace.app.xmlui.aspect.artifactbrowser.AbstractSearch.performSearch(AbstractSearch.java:438)


Just for fun, I enabled Discovery on our development machines, and sorting by date issued works perfectly in a search. So, a quick-fix would be to switch to using discovery. But, none-the-less, I look forward to getting a resolution to this issue.


Peter Dietz



On Wed, Feb 1, 2012 at 7:15 AM, Cristian Romanescu <cristian.romanescu-TkmMeo7H0CqhKNWrAYCRhA@public.gmane.org> wrote:
Greetings,

Have you tried to look into the lucene indexes with Luke tool?
(http://www.getopt.org/luke/).
We are using:
     search.index.13 = dc_date:dc.date.issued:date
to filter by time interval and it works.

But first, we had to remove the old indexes and re-create them to have
correct indexing (ie rm -rf $builddir/search and run
./$builddir/bin/dspace index-init). It only worked when data inside
index looks like 201201010000 ... - when you look with luke tool

HTH,
Cristian


On 02/01/2012 12:46 PM, Päivi Rosenström wrote:
> Any solution for this found yet ?
>
>
> Thanks!
>
> Päivi
>
>
>> Re: [Dspace-tech] search can't sort by date issued
>> From: James Bardin<jbardin <at> bu...>  - 2011-10-27 19:23
>> On Thu, Oct 27, 2011 at 1:52 PM, Blanco, Jose<blancoj <at> ...>  wrote:
>>> # Browse indexes
>>> webui.browse.index.1 = title:item:title
>>> webui.browse.index.2 = author:metadata:dc.contributor.author:text
>>> webui.browse.index.3 = subject:metadata:dc.subject.*:text
>>> webui.browse.index.4 = dateissued:item:dateissued
>>>
>>> # Sorting options
>>> webui.itemlist.sort-option.1 = title:dc.title:title
>>> webui.itemlist.sort-option.2 = dateissued:dc.date.issued:date
>>> webui.itemlist.sort-option.3 = dateaccessioned:dc.date.accessioned:date
>>>
>> Yeah, I have dateissued in both the browse.index and sort-option, like above.
>> Sorting by dateissued *does* work in browsing, but not for search
>> results (I think search result ordering is done by lucene, and not the
>> webui). I took a guess and added another search index for
>> dateissued:dc.date.issued:date, but that doesn't seem to have any
>> effect.
>
>> -jim
>
>
> ------------------------------------------------------------------------------
> Keep Your Developer Skills Current with LearnDevNow!
> The most comprehensive online learning library for Microsoft developers
> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
> Metro Style Apps, more. Free future releases when you subscribe now!
> http://p.sf.net/sfu/learndevnow-d2d
> _______________________________________________
> DSpace-tech mailing list
> DSpace-tech-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
> https://lists.sourceforge.net/lists/listinfo/dspace-tech


------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
DSpace-tech mailing list
DSpace-tech-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
https://lists.sourceforge.net/lists/listinfo/dspace-tech

------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
Picon
Favicon
Gravatar

Re: search can't sort by date issued

Thanks Peter.  I've spent several hours researching this issue, especially why we have it in one DSpace instance and not another (running same versions).  Although I'm not 100% sure, I suspect the issue is caused by invalid data in the date.issued field(s) in the respository.  The solution for this, of course, would be to clean up the bad dates we have and then put some edits on the date fields that end up in DSpace so we do not allow bad dates to get IN our repository.  But again, I'm not 100% sure of this and I won't be able to get back to looking into this for awhile.
Best regards,
Sue
 
Sue Walker-Thornton
Software Developer|Database Administrator
NASA Langley Research Center
SGT, Inc.|LITES Contract
130 Research Drive
Hampton, VA  23666
Office: (757) 864-2368|Fax: (757) 224-4001|Mobile: (757) 506-9903
Email:  susan.m.thornton <at> nasa.gov
From: Peter Dietz [pdietz84-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org]
Sent: Wednesday, February 01, 2012 12:29 PM
To: Cristian Romanescu
Cc: dspace-tech-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
Subject: Re: [Dspace-tech] search can't sort by date issued

Hi All,

I've just started digging into this as well. Its really unfortunate to only get "relevance" results for searches.

In digging in, I've spit out the stack trace, and its telling me a few things. 
1) Do we have "bad" metadata for dc.date.issued? 
-- (I've already harassed my content folks to have them review all our metadata) ;)

2) Are we doing the comparison of dates incorrectly. The error below says is the value of "dateissued" an INT.
-- I've been reading this thread, which is very similar: http://www.gossamer-threads.com/lists/lucene/java-user/109530


2012-01-31 17:47:02,475 ERROR org.dspace.search.DSQuery <at> Unable to use speficied sort option: dateissued
2012-01-31 17:47:02,475 ERROR org.dspace.search.DSQuery <at> Invalid shift value in prefixCoded string (is encoded value really an INT?)
2012-01-31 17:47:02,476 ERROR org.dspace.search.DSQuery <at> java.lang.NumberFormatException: Invalid shift value in prefixCoded string (is encoded value really an INT?)
at org.apache.lucene.util.NumericUtils.prefixCodedToInt(NumericUtils.java:233)
at org.apache.lucene.search.FieldCache$7.parseInt(FieldCache.java:237)
at org.apache.lucene.search.FieldCacheImpl$IntCache.createValue(FieldCacheImpl.java:457)
at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:224)
at org.apache.lucene.search.FieldCacheImpl.getInts(FieldCacheImpl.java:430)
at org.apache.lucene.search.FieldCacheImpl$IntCache.createValue(FieldCacheImpl.java:447)
at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:224)
at org.apache.lucene.search.FieldCacheImpl.getInts(FieldCacheImpl.java:430)
at org.apache.lucene.search.FieldComparator$IntComparator.setNextReader(FieldComparator.java:332)
at org.apache.lucene.search.TopFieldCollector$MultiComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:435)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:249)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:240)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:181)
at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:113)
at org.apache.lucene.search.Hits.<init>(Hits.java:90)
at org.apache.lucene.search.Searcher.search(Searcher.java:63)
at org.dspace.search.DSQuery.doQuery(DSQuery.java:151)
at org.dspace.search.DSQuery.doQuery(DSQuery.java:309)
at org.dspace.app.xmlui.aspect.artifactbrowser.AbstractSearch.performSearch(AbstractSearch.java:438)


Just for fun, I enabled Discovery on our development machines, and sorting by date issued works perfectly in a search. So, a quick-fix would be to switch to using discovery. But, none-the-less, I look forward to getting a resolution to this issue.


Peter Dietz



On Wed, Feb 1, 2012 at 7:15 AM, Cristian Romanescu <cristian.romanescu-TkmMeo7H0CqhKNWrAYCRhA@public.gmane.org> wrote:
Greetings,

Have you tried to look into the lucene indexes with Luke tool?
(http://www.getopt.org/luke/).
We are using:
     search.index.13 = dc_date:dc.date.issued:date
to filter by time interval and it works.

But first, we had to remove the old indexes and re-create them to have
correct indexing (ie rm -rf $builddir/search and run
./$builddir/bin/dspace index-init). It only worked when data inside
index looks like 201201010000 ... - when you look with luke tool

HTH,
Cristian


On 02/01/2012 12:46 PM, Päivi Rosenström wrote:
> Any solution for this found yet ?
>
>
> Thanks!
>
> Päivi
>
>
>> Re: [Dspace-tech] search can't sort by date issued
>> From: James Bardin<jbardin <at> bu...>  - 2011-10-27 19:23
>> On Thu, Oct 27, 2011 at 1:52 PM, Blanco, Jose<blancoj <at> ...>  wrote:
>>> # Browse indexes
>>> webui.browse.index.1 = title:item:title
>>> webui.browse.index.2 = author:metadata:dc.contributor.author:text
>>> webui.browse.index.3 = subject:metadata:dc.subject.*:text
>>> webui.browse.index.4 = dateissued:item:dateissued
>>>
>>> # Sorting options
>>> webui.itemlist.sort-option.1 = title:dc.title:title
>>> webui.itemlist.sort-option.2 = dateissued:dc.date.issued:date
>>> webui.itemlist.sort-option.3 = dateaccessioned:dc.date.accessioned:date
>>>
>> Yeah, I have dateissued in both the browse.index and sort-option, like above.
>> Sorting by dateissued *does* work in browsing, but not for search
>> results (I think search result ordering is done by lucene, and not the
>> webui). I took a guess and added another search index for
>> dateissued:dc.date.issued:date, but that doesn't seem to have any
>> effect.
>
>> -jim
>
>
> ------------------------------------------------------------------------------
> Keep Your Developer Skills Current with LearnDevNow!
> The most comprehensive online learning library for Microsoft developers
> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
> Metro Style Apps, more. Free future releases when you subscribe now!
> http://p.sf.net/sfu/learndevnow-d2d
> _______________________________________________
> DSpace-tech mailing list
> DSpace-tech <at> lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dspace-tech


------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
DSpace-tech mailing list
DSpace-tech <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech

------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
Peter Dietz | 3 Feb 2012 20:51
Picon
Gravatar

Re: search can't sort by date issued

Hi Sue,


I've been diagnosing this issue in Luke (a Java GUI that allows you browse your lucene index). And while digging around, it looked like the sort_dateissued field is having trouble with certain date metadata.

In our repository, we have our date metadata values scattered all about.
1981-12-07T16:56:12Z
1981-12-07
1981-12
1981

Each one of them is a valid ISO8601 date. However, that doesn't mean each of them is a valid date in Lucene (your search and browse index). A metadata person might see 1981-12 as meaning some type of range or approximation. However, when you are searching and sorting, it must be able to sort the values precisely. So is 1981-12 before or after 1981-12-07? Is 1981-12 before, after, or equal to 1981-12-01?

I'll ask my metadata people if we can flatten our metadata for dates, and stuff them to have a day (of the first of the month).

And I'll dig further into the DSpace reindexing code to see if when we are processing DSpace metadata dates, that might be valid iso8601, that we convert them to an appropriate lucene date.


Peter Dietz



On Thu, Feb 2, 2012 at 11:55 PM, Thornton, Susan M. (LARC-B702)[LITES] <susan.m.thornton-NSQ8wuThN14@public.gmane.org> wrote:
Thanks Peter.  I've spent several hours researching this issue, especially why we have it in one DSpace instance and not another (running same versions).  Although I'm not 100% sure, I suspect the issue is caused by invalid data in the date.issued field(s) in the respository.  The solution for this, of course, would be to clean up the bad dates we have and then put some edits on the date fields that end up in DSpace so we do not allow bad dates to get IN our repository.  But again, I'm not 100% sure of this and I won't be able to get back to looking into this for awhile.
Best regards,
Sue
 
Sue Walker-Thornton
Software Developer|Database Administrator
NASA Langley Research Center
SGT, Inc.|LITES Contract
130 Research Drive
Hampton, VA  23666
Office: (757) 864-2368|Fax: (757) 224-4001|Mobile: (757) 506-9903
Email:  susan.m.thornton-NSQ8wuThN14@public.gmane.org
From: Peter Dietz [pdietz84-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org]
Sent: Wednesday, February 01, 2012 12:29 PM
To: Cristian Romanescu
Cc: dspace-tech-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
Subject: Re: [Dspace-tech] search can't sort by date issued

Hi All,

I've just started digging into this as well. Its really unfortunate to only get "relevance" results for searches.

In digging in, I've spit out the stack trace, and its telling me a few things. 
1) Do we have "bad" metadata for dc.date.issued? 
-- (I've already harassed my content folks to have them review all our metadata) ;)

2) Are we doing the comparison of dates incorrectly. The error below says is the value of "dateissued" an INT.
-- I've been reading this thread, which is very similar: http://www.gossamer-threads.com/lists/lucene/java-user/109530


2012-01-31 17:47:02,475 ERROR org.dspace.search.DSQuery <at> Unable to use speficied sort option: dateissued
2012-01-31 17:47:02,475 ERROR org.dspace.search.DSQuery <at> Invalid shift value in prefixCoded string (is encoded value really an INT?)
2012-01-31 17:47:02,476 ERROR org.dspace.search.DSQuery <at> java.lang.NumberFormatException: Invalid shift value in prefixCoded string (is encoded value really an INT?)
at org.apache.lucene.util.NumericUtils.prefixCodedToInt(NumericUtils.java:233)
at org.apache.lucene.search.FieldCache$7.parseInt(FieldCache.java:237)
at org.apache.lucene.search.FieldCacheImpl$IntCache.createValue(FieldCacheImpl.java:457)
at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:224)
at org.apache.lucene.search.FieldCacheImpl.getInts(FieldCacheImpl.java:430)
at org.apache.lucene.search.FieldCacheImpl$IntCache.createValue(FieldCacheImpl.java:447)
at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:224)
at org.apache.lucene.search.FieldCacheImpl.getInts(FieldCacheImpl.java:430)
at org.apache.lucene.search.FieldComparator$IntComparator.setNextReader(FieldComparator.java:332)
at org.apache.lucene.search.TopFieldCollector$MultiComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:435)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:249)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:240)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:181)
at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:113)
at org.apache.lucene.search.Hits.<init>(Hits.java:90)
at org.apache.lucene.search.Searcher.search(Searcher.java:63)
at org.dspace.search.DSQuery.doQuery(DSQuery.java:151)
at org.dspace.search.DSQuery.doQuery(DSQuery.java:309)
at org.dspace.app.xmlui.aspect.artifactbrowser.AbstractSearch.performSearch(AbstractSearch.java:438)


Just for fun, I enabled Discovery on our development machines, and sorting by date issued works perfectly in a search. So, a quick-fix would be to switch to using discovery. But, none-the-less, I look forward to getting a resolution to this issue.


Peter Dietz



On Wed, Feb 1, 2012 at 7:15 AM, Cristian Romanescu <cristian.romanescu-TkmMeo7H0CqhKNWrAYCRhA@public.gmane.org> wrote:
Greetings,

Have you tried to look into the lucene indexes with Luke tool?
(http://www.getopt.org/luke/).
We are using:
     search.index.13 = dc_date:dc.date.issued:date
to filter by time interval and it works.

But first, we had to remove the old indexes and re-create them to have
correct indexing (ie rm -rf $builddir/search and run
./$builddir/bin/dspace index-init). It only worked when data inside
index looks like 201201010000 ... - when you look with luke tool

HTH,
Cristian


On 02/01/2012 12:46 PM, Päivi Rosenström wrote:
> Any solution for this found yet ?
>
>
> Thanks!
>
> Päivi
>
>
>> Re: [Dspace-tech] search can't sort by date issued
>> From: James Bardin<jbardin <at> bu...>  - 2011-10-27 19:23
>> On Thu, Oct 27, 2011 at 1:52 PM, Blanco, Jose<blancoj <at> ...>  wrote:
>>> # Browse indexes
>>> webui.browse.index.1 = title:item:title
>>> webui.browse.index.2 = author:metadata:dc.contributor.author:text
>>> webui.browse.index.3 = subject:metadata:dc.subject.*:text
>>> webui.browse.index.4 = dateissued:item:dateissued
>>>
>>> # Sorting options
>>> webui.itemlist.sort-option.1 = title:dc.title:title
>>> webui.itemlist.sort-option.2 = dateissued:dc.date.issued:date
>>> webui.itemlist.sort-option.3 = dateaccessioned:dc.date.accessioned:date
>>>
>> Yeah, I have dateissued in both the browse.index and sort-option, like above.
>> Sorting by dateissued *does* work in browsing, but not for search
>> results (I think search result ordering is done by lucene, and not the
>> webui). I took a guess and added another search index for
>> dateissued:dc.date.issued:date, but that doesn't seem to have any
>> effect.
>
>> -jim
>
>
> ------------------------------------------------------------------------------
> Keep Your Developer Skills Current with LearnDevNow!
> The most comprehensive online learning library for Microsoft developers
> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
> Metro Style Apps, more. Free future releases when you subscribe now!
> http://p.sf.net/sfu/learndevnow-d2d
> _______________________________________________
> DSpace-tech mailing list
> DSpace-tech-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
> https://lists.sourceforge.net/lists/listinfo/dspace-tech


------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
DSpace-tech mailing list
DSpace-tech-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
https://lists.sourceforge.net/lists/listinfo/dspace-tech


------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
Peter Dietz | 3 Feb 2012 21:01
Picon
Gravatar

Re: search can't sort by date issued

I'll also add that we have two repositories, and one of them can you can search, and it will sort properly. And the other has the problems identified in this thread. I'm sure both repositories has equally screwy metadata for dates. So I'm not entirely sure this is going to be the end of it.


Peter Dietz



On Fri, Feb 3, 2012 at 2:51 PM, Peter Dietz <pdietz84 <at> gmail.com> wrote:
Hi Sue,

I've been diagnosing this issue in Luke (a Java GUI that allows you browse your lucene index). And while digging around, it looked like the sort_dateissued field is having trouble with certain date metadata.

In our repository, we have our date metadata values scattered all about.
1981-12-07T16:56:12Z
1981-12-07
1981-12
1981

Each one of them is a valid ISO8601 date. However, that doesn't mean each of them is a valid date in Lucene (your search and browse index). A metadata person might see 1981-12 as meaning some type of range or approximation. However, when you are searching and sorting, it must be able to sort the values precisely. So is 1981-12 before or after 1981-12-07? Is 1981-12 before, after, or equal to 1981-12-01?

I'll ask my metadata people if we can flatten our metadata for dates, and stuff them to have a day (of the first of the month).

And I'll dig further into the DSpace reindexing code to see if when we are processing DSpace metadata dates, that might be valid iso8601, that we convert them to an appropriate lucene date.


Peter Dietz




On Thu, Feb 2, 2012 at 11:55 PM, Thornton, Susan M. (LARC-B702)[LITES] <susan.m.thornton-NSQ8wuThN14@public.gmane.org> wrote:
Thanks Peter.  I've spent several hours researching this issue, especially why we have it in one DSpace instance and not another (running same versions).  Although I'm not 100% sure, I suspect the issue is caused by invalid data in the date.issued field(s) in the respository.  The solution for this, of course, would be to clean up the bad dates we have and then put some edits on the date fields that end up in DSpace so we do not allow bad dates to get IN our repository.  But again, I'm not 100% sure of this and I won't be able to get back to looking into this for awhile.
Best regards,
Sue
 
Sue Walker-Thornton
Software Developer|Database Administrator
NASA Langley Research Center
SGT, Inc.|LITES Contract
130 Research Drive
Hampton, VA  23666
Office: (757) 864-2368|Fax: (757) 224-4001|Mobile: (757) 506-9903
Email:  susan.m.thornton-NSQ8wuThN14@public.gmane.org
From: Peter Dietz [pdietz84-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org]
Sent: Wednesday, February 01, 2012 12:29 PM
To: Cristian Romanescu
Cc: dspace-tech-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
Subject: Re: [Dspace-tech] search can't sort by date issued

Hi All,

I've just started digging into this as well. Its really unfortunate to only get "relevance" results for searches.

In digging in, I've spit out the stack trace, and its telling me a few things. 
1) Do we have "bad" metadata for dc.date.issued? 
-- (I've already harassed my content folks to have them review all our metadata) ;)

2) Are we doing the comparison of dates incorrectly. The error below says is the value of "dateissued" an INT.
-- I've been reading this thread, which is very similar: http://www.gossamer-threads.com/lists/lucene/java-user/109530


2012-01-31 17:47:02,475 ERROR org.dspace.search.DSQuery <at> Unable to use speficied sort option: dateissued
2012-01-31 17:47:02,475 ERROR org.dspace.search.DSQuery <at> Invalid shift value in prefixCoded string (is encoded value really an INT?)
2012-01-31 17:47:02,476 ERROR org.dspace.search.DSQuery <at> java.lang.NumberFormatException: Invalid shift value in prefixCoded string (is encoded value really an INT?)
at org.apache.lucene.util.NumericUtils.prefixCodedToInt(NumericUtils.java:233)
at org.apache.lucene.search.FieldCache$7.parseInt(FieldCache.java:237)
at org.apache.lucene.search.FieldCacheImpl$IntCache.createValue(FieldCacheImpl.java:457)
at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:224)
at org.apache.lucene.search.FieldCacheImpl.getInts(FieldCacheImpl.java:430)
at org.apache.lucene.search.FieldCacheImpl$IntCache.createValue(FieldCacheImpl.java:447)
at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:224)
at org.apache.lucene.search.FieldCacheImpl.getInts(FieldCacheImpl.java:430)
at org.apache.lucene.search.FieldComparator$IntComparator.setNextReader(FieldComparator.java:332)
at org.apache.lucene.search.TopFieldCollector$MultiComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:435)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:249)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:240)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:181)
at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:113)
at org.apache.lucene.search.Hits.<init>(Hits.java:90)
at org.apache.lucene.search.Searcher.search(Searcher.java:63)
at org.dspace.search.DSQuery.doQuery(DSQuery.java:151)
at org.dspace.search.DSQuery.doQuery(DSQuery.java:309)
at org.dspace.app.xmlui.aspect.artifactbrowser.AbstractSearch.performSearch(AbstractSearch.java:438)


Just for fun, I enabled Discovery on our development machines, and sorting by date issued works perfectly in a search. So, a quick-fix would be to switch to using discovery. But, none-the-less, I look forward to getting a resolution to this issue.


Peter Dietz



On Wed, Feb 1, 2012 at 7:15 AM, Cristian Romanescu <cristian.romanescu-TkmMeo7H0CqhKNWrAYCRhA@public.gmane.org> wrote:
Greetings,

Have you tried to look into the lucene indexes with Luke tool?
(http://www.getopt.org/luke/).
We are using:
     search.index.13 = dc_date:dc.date.issued:date
to filter by time interval and it works.

But first, we had to remove the old indexes and re-create them to have
correct indexing (ie rm -rf $builddir/search and run
./$builddir/bin/dspace index-init). It only worked when data inside
index looks like 201201010000 ... - when you look with luke tool

HTH,
Cristian


On 02/01/2012 12:46 PM, Päivi Rosenström wrote:
> Any solution for this found yet ?
>
>
> Thanks!
>
> Päivi
>
>
>> Re: [Dspace-tech] search can't sort by date issued
>> From: James Bardin<jbardin <at> bu...>  - 2011-10-27 19:23
>> On Thu, Oct 27, 2011 at 1:52 PM, Blanco, Jose<blancoj <at> ...>  wrote:
>>> # Browse indexes
>>> webui.browse.index.1 = title:item:title
>>> webui.browse.index.2 = author:metadata:dc.contributor.author:text
>>> webui.browse.index.3 = subject:metadata:dc.subject.*:text
>>> webui.browse.index.4 = dateissued:item:dateissued
>>>
>>> # Sorting options
>>> webui.itemlist.sort-option.1 = title:dc.title:title
>>> webui.itemlist.sort-option.2 = dateissued:dc.date.issued:date
>>> webui.itemlist.sort-option.3 = dateaccessioned:dc.date.accessioned:date
>>>
>> Yeah, I have dateissued in both the browse.index and sort-option, like above.
>> Sorting by dateissued *does* work in browsing, but not for search
>> results (I think search result ordering is done by lucene, and not the
>> webui). I took a guess and added another search index for
>> dateissued:dc.date.issued:date, but that doesn't seem to have any
>> effect.
>
>> -jim
>
>
> ------------------------------------------------------------------------------
> Keep Your Developer Skills Current with LearnDevNow!
> The most comprehensive online learning library for Microsoft developers
> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
> Metro Style Apps, more. Free future releases when you subscribe now!
> http://p.sf.net/sfu/learndevnow-d2d
> _______________________________________________
> DSpace-tech mailing list
> DSpace-tech-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
> https://lists.sourceforge.net/lists/listinfo/dspace-tech


------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
DSpace-tech mailing list
DSpace-tech-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
https://lists.sourceforge.net/lists/listinfo/dspace-tech



------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
Peter Warrington | 23 Feb 2012 15:33
Picon
Picon

Re: search can't sort by date issued

Hi Peter,

Do you know if this been fixed in the latest version?  We are currently running 1.7.2 and have this problem
with date issued metadata values in the multiple formats you have mentioned.  In the latest version
(1.8.1) it looks like as part of the lucene upgrade (DS-980) the sort fields are now explicitly specified
as STRING
types:

http://scm.dspace.org/svn/repo/dspace/tags/dspace-1.8.1/dspace-api/src/main/java/org/dspace/search/DSQuery.java
(~line 225)

new SortField("sort_" + args.getSortOption().getName(), SortField.STRING,
SortOption.DESCENDING.equals(args.getSortOrder())), SortField.FIELD_SCORE

The second parameter (SortField.STRING) is not used in 1.7.2 but could be and editing this line seems to
resolve the issue.  

I'm not that familiar with the indexing - could this be a temporary fix until we upgrade or have I missed something?

Regards Pete


From: Peter Dietz [mailto:pdietz84 <at> gmail.com] 
Sent: 03 February 2012 20:01
To: Thornton, Susan M. (LARC-B702)[LITES]
Cc: dspace-tech <at> lists.sourceforge.net
Subject: Re: [Dspace-tech] search can't sort by date issued

I'll also add that we have two repositories, and one of them can you can search, and it will sort properly. And
the other has the problems identified in this thread. I'm sure both repositories has equally screwy
metadata for dates. So I'm not entirely sure this is going to be the end of it.

Peter Dietz


On Fri, Feb 3, 2012 at 2:51 PM, Peter Dietz <pdietz84 <at> gmail.com> wrote:
Hi Sue,

I've been diagnosing this issue in Luke (a Java GUI that allows you browse your lucene index). And while
digging around, it looked like the sort_dateissued field is having trouble with certain date metadata.

In our repository, we have our date metadata values scattered all about.
1981-12-07T16:56:12Z
1981-12-07
1981-12
1981

Each one of them is a valid ISO8601 date. However, that doesn't mean each of them is a valid date in Lucene
(your search and browse index). A metadata person might see 1981-12 as meaning some type of range or
approximation. However, when you are searching and sorting, it must be able to sort the values precisely.
So is 1981-12 before or after 1981-12-07? Is 1981-12 before, after, or equal to 1981-12-01?

I'll ask my metadata people if we can flatten our metadata for dates, and stuff them to have a day (of the first
of the month).

And I'll dig further into the DSpace reindexing code to see if when we are processing DSpace metadata dates,
that might be valid iso8601, that we convert them to an appropriate lucene date.


Peter Dietz



On Thu, Feb 2, 2012 at 11:55 PM, Thornton, Susan M. (LARC-B702)[LITES] <susan.m.thornton <at> nasa.gov> wrote:
Thanks Peter.  I've spent several hours researching this issue, especially why we have it in one DSpace
instance and not another (running same versions).  Although I'm not 100% sure, I suspect the issue is
caused by invalid data in the date.issued field(s) in the respository.  The solution for this, of
course, would be to clean up the bad dates we have and then put some edits on the date fields that end up in
DSpace so we do not allow bad dates to get IN our repository.  But again, I'm not 100% sure of this and I
won't be able to get back to looking into this for awhile.
Best regards,
Sue
 
Sue Walker-Thornton
Software Developer|Database Administrator
NASA Langley Research Center
SGT, Inc.|LITES Contract
130 Research Drive
Hampton, VA  23666
Office: (757) 864-2368|Fax: (757) 224-4001|Mobile: (757) 506-9903
Email:  susan.m.thornton <at> nasa.gov
________________________________________
From: Peter Dietz [pdietz84 <at> gmail.com]
Sent: Wednesday, February 01, 2012 12:29 PM
To: Cristian Romanescu
Cc: dspace-tech <at> lists.sourceforge.net
Subject: Re: [Dspace-tech] search can't sort by date issued
Hi All, 

I've just started digging into this as well. Its really unfortunate to only get "relevance" results for searches.

In digging in, I've spit out the stack trace, and its telling me a few things. 
1) Do we have "bad" metadata for dc.date.issued? 
-- (I've already harassed my content folks to have them review all our metadata) ;)

2) Are we doing the comparison of dates incorrectly. The error below says is the value of "dateissued" an INT.
-- I've been reading this thread, which is very similar: http://www.gossamer-threads.com/lists/lucene/java-user/109530



2012-01-31 17:47:02,475 ERROR org.dspace.search.DSQuery  <at>  Unable to use speficied sort option: dateissued
2012-01-31 17:47:02,475 ERROR org.dspace.search.DSQuery  <at>  Invalid shift value in prefixCoded string
(is encoded value really an INT?)
2012-01-31 17:47:02,476 ERROR org.dspace.search.DSQuery  <at>  java.lang.NumberFormatException:
Invalid shift value in prefixCoded string (is encoded value really an INT?)
at org.apache.lucene.util.NumericUtils.prefixCodedToInt(NumericUtils.java:233)
at org.apache.lucene.search.FieldCache$7.parseInt(FieldCache.java:237)
at org.apache.lucene.search.FieldCacheImpl$IntCache.createValue(FieldCacheImpl.java:457)
at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:224)
at org.apache.lucene.search.FieldCacheImpl.getInts(FieldCacheImpl.java:430)
at org.apache.lucene.search.FieldCacheImpl$IntCache.createValue(FieldCacheImpl.java:447)
at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:224)
at org.apache.lucene.search.FieldCacheImpl.getInts(FieldCacheImpl.java:430)
at org.apache.lucene.search.FieldComparator$IntComparator.setNextReader(FieldComparator.java:332)
at org.apache.lucene.search.TopFieldCollector$MultiComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:435)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:249)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:240)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:181)
at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:113)
at org.apache.lucene.search.Hits.<init>(Hits.java:90)
at org.apache.lucene.search.Searcher.search(Searcher.java:63)
at org.dspace.search.DSQuery.doQuery(DSQuery.java:151)
at org.dspace.search.DSQuery.doQuery(DSQuery.java:309)
at org.dspace.app.xmlui.aspect.artifactbrowser.AbstractSearch.performSearch(AbstractSearch.java:438)


Just for fun, I enabled Discovery on our development machines, and sorting by date issued works perfectly
in a search. So, a quick-fix would be to switch to using discovery. But, none-the-less, I look forward to
getting a resolution to this issue.


Peter Dietz


On Wed, Feb 1, 2012 at 7:15 AM, Cristian Romanescu <cristian.romanescu <at> eaudeweb.ro> wrote:
Greetings,

Have you tried to look into the lucene indexes with Luke tool?
(http://www.getopt.org/luke/).

We are using:
     search.index.13 = dc_date:dc.date.issued:date
to filter by time interval and it works.

But first, we had to remove the old indexes and re-create them to have
correct indexing (ie rm -rf $builddir/search and run
./$builddir/bin/dspace index-init). It only worked when data inside
index looks like 201201010000 ... - when you look with luke tool

HTH,
Cristian


On 02/01/2012 12:46 PM, Päivi Rosenström wrote:
> Any solution for this found yet ?
>
>
> Thanks!
>
> Päivi
>
>
>> Re: [Dspace-tech] search can't sort by date issued
>> From: James Bardin<jbardin <at> bu...>  - 2011-10-27 19:23
>> On Thu, Oct 27, 2011 at 1:52 PM, Blanco, Jose<blancoj <at> ...>  wrote:
>>> # Browse indexes
>>> webui.browse.index.1 = title:item:title
>>> webui.browse.index.2 = author:metadata:dc.contributor.author:text
>>> webui.browse.index.3 = subject:metadata:dc.subject.*:text
>>> webui.browse.index.4 = dateissued:item:dateissued
>>>
>>> # Sorting options
>>> webui.itemlist.sort-option.1 = title:dc.title:title
>>> webui.itemlist.sort-option.2 = dateissued:dc.date.issued:date
>>> webui.itemlist.sort-option.3 = dateaccessioned:dc.date.accessioned:date
>>>
>> Yeah, I have dateissued in both the browse.index and sort-option, like above.
>> Sorting by dateissued *does* work in browsing, but not for search
>> results (I think search result ordering is done by lucene, and not the
>> webui). I took a guess and added another search index for
>> dateissued:dc.date.issued:date, but that doesn't seem to have any
>> effect.
>
>> -jim
>
>
> ------------------------------------------------------------------------------
> Keep Your Developer Skills Current with LearnDevNow!
> The most comprehensive online learning library for Microsoft developers
> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
> Metro Style Apps, more. Free future releases when you subscribe now!
> http://p.sf.net/sfu/learndevnow-d2d

> _______________________________________________
> DSpace-tech mailing list
> DSpace-tech <at> lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dspace-tech



------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d

_______________________________________________
DSpace-tech mailing list
DSpace-tech <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech




------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
DSpace-tech mailing list
DSpace-tech <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Gmane