Stephen Bayliss | 27 Jan 11:20

[fcrepo-dev] Fedora validation enhancements - FCREPO-1026

I've made some validation enhancements as per https://jira.duraspace.org/browse/FCREPO-1026, these are currently in the fcrepo-1026 branch on GitHub.  Some documentation is in the Fedora 3.6 documentation space at https://wiki.duraspace.org/display/FEDORA36/Validation
 
I've some questions on how far to take this, so feedback is welcomed.
 
The current implementation:
 
* allows configuration of the XML ingest validation via a new DOManager fedora.fcfg parameter (with a suitable warning in the documentation about decreasing the level of validation)
* allows all objects to be validated when they are modified, with the API operation being failed if the resulting object would be invalid
 
Object validation is configured via spring (see doobjectvalidator.xml in the server/config/spring directory) - by default it is turned off, so out-of-the-box there's no performance hit.  This feature enables for instance ECM validation to be turned on for every object modification to enforce repository content conformance with the CModel specification via ECM.  Certainly this isn't for everyone, but there are use cases.  Custom validators can be written and added that validate the Java Fedora object (rather than the XML).  Any number of validators can be added, these will execute in turn until (if) one fails.
 
Questions and thoughts:
 
* HTTP response code for REST API operations:  Currently if an ingest fails XML validation this is reported via HTTP status code 500 (Server Error).  To maintain consistency with the existing behaviour, object validation failures will also result in this code, with the text of the exception containing details of the validation failure.  I'd suggest that maybe 400 - Bad Request [1] might be more appropriate for both of these; but this would essentially represent a REST API change - would that be acceptable for a Fedora 3.6 release?  If this change was made I'd suggest implementing this by catching ObjectValidityException at the API level, and extending this exception to contain details of the validation failure for the response body (rather than the 500 exception reporting that occurs currently).
 
* Validate API method.  Currently this performs the ECM validation as it did in previous releases.  This could be modified to perform object validation as specified in the spring config for this - would this make sense?  It should be configurable so that custom validation can be plugged into the validate API method *without* enforcing validation on object commital of course.
 
* Comments on the implementation and code in that branch are most welcome
 
Thanks
Steve
 
 
------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Fedora-commons-developers mailing list
Fedora-commons-developers@...
https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers
Asger Askov Blekinge | 27 Jan 16:25
Picon
Favicon

Re: [fcrepo-dev] Fedora validation enhancements - FCREPO-1026

Hi Steve

Well, you could do this with decorators at the moment. Having both decorators and special validation decorators in the spring config file is somewhat messy, I think. Have you removed the decorators?
Do you hook the data change itself, or the API method? If the API method, how does this work with the REST methods that invoke multiple API methods. If you hook the data-change, then how do you do so?

Besides, how do you expect to validate the object, without making the changes? The way I see it, you will have to commit the changes, do the validation, rollback the changes if the validation fails.
I would like to know more about how you have managed to work around this? Or do you just rollback, and leave the mess in the audit stream?
Yes, the java object being validated should work for most things, but you have to be really careful about managed datastreams and the like, which may or may not exist before the change is committed.

Remember the curious case of the interdependent objects
A depends on B. B depends on A. Neither is valid until both exists. How will you ever ingest them?

We have solved this by only requiring validity from Active objects. This is implemented with a decorator, doing validation when the object is modified to Active. Are your new hooks as finegrained? Ie, can I hook a method to do validation if the parameters have special values? Basically, do you work from the "One set of rules for the entire repository" mindset, or from the "Several heterogeneous collections in the repository" mindset?

And of course, validator user rights. Since we do support an advanced rights model in fedora, validation can fail because you do not have the rights to view the nessesary data in to objects or from it's relations. Should the validator use the invoking users rights, or root rights? If the validation crashes, the change should be refused, I guess.

I will look at your code later, when I can find the time.

Regards

On 01/27/2012 11:20 AM, Stephen Bayliss wrote:
Message
I've made some validation enhancements as per https://jira.duraspace.org/browse/FCREPO-1026, these are currently in the fcrepo-1026 branch on GitHub.  Some documentation is in the Fedora 3.6 documentation space at https://wiki.duraspace.org/display/FEDORA36/Validation
 
I've some questions on how far to take this, so feedback is welcomed.
 
The current implementation:
 
* allows configuration of the XML ingest validation via a new DOManager fedora.fcfg parameter (with a suitable warning in the documentation about decreasing the level of validation)
* allows all objects to be validated when they are modified, with the API operation being failed if the resulting object would be invalid
 
Object validation is configured via spring (see doobjectvalidator.xml in the server/config/spring directory) - by default it is turned off, so out-of-the-box there's no performance hit.  This feature enables for instance ECM validation to be turned on for every object modification to enforce repository content conformance with the CModel specification via ECM.  Certainly this isn't for everyone, but there are use cases.  Custom validators can be written and added that validate the Java Fedora object (rather than the XML).  Any number of validators can be added, these will execute in turn until (if) one fails.
 
Questions and thoughts:
 
* HTTP response code for REST API operations:  Currently if an ingest fails XML validation this is reported via HTTP status code 500 (Server Error).  To maintain consistency with the existing behaviour, object validation failures will also result in this code, with the text of the exception containing details of the validation failure.  I'd suggest that maybe 400 - Bad Request [1] might be more appropriate for both of these; but this would essentially represent a REST API change - would that be acceptable for a Fedora 3.6 release?  If this change was made I'd suggest implementing this by catching ObjectValidityException at the API level, and extending this exception to contain details of the validation failure for the response body (rather than the 500 exception reporting that occurs currently).
 
* Validate API method.  Currently this performs the ECM validation as it did in previous releases.  This could be modified to perform object validation as specified in the spring config for this - would this make sense?  It should be configurable so that custom validation can be plugged into the validate API method *without* enforcing validation on object commital of course.
 
* Comments on the implementation and code in that branch are most welcome
 
Thanks
Steve
 
 

------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Fedora-commons-developers mailing list
Fedora-commons-developers@...
https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers
Stephen Bayliss | 27 Jan 16:49

Re: [fcrepo-dev] Fedora validation enhancements - FCREPO-1026

Hi Asger
 
Thanks for your feedback, some good comments.
 
Re decorators, it is an approach I explored - the difficulty is that before the API option is completed, one doesn't have an object to validate - and after it has completed one has the complexity of undoing the operation if it fails validation.
 
So this is instead a new interface and module hooked into DOManager (which is where the existing XSD and Schematron validation is hooked in).  One then has the Digital Object pre-commit, and it was simple enough to wrap that in a reader and hook that into the ECM validator.
 
I'm sure there are cases that can't be dealt with; so yes a relationship with a cardinality restriction of 1 in both directions would cause an issue here.  (Though in fact only validating active objects could be a solution here)
 
Overall it isn't just about hooking in ECM validation but providing an extensibility point where any validation based on the digital object can be performed prior to a commit.  And doesn't preclude of course validating using other patterns as an alternative, or in addition, depending on the use case.
 
Regards
Steve
 
 
-----Original Message-----
From: Asger Askov Blekinge [mailto:abr-QOZTYAA+/Ks+nozOxzIu92SdvHPH+/yF@public.gmane.org]
Sent: 27 January 2012 15:26
To: fedora-commons-developers-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
Subject: Re: [fcrepo-dev] Fedora validation enhancements - FCREPO-1026

Hi Steve

Well, you could do this with decorators at the moment. Having both decorators and special validation decorators in the spring config file is somewhat messy, I think. Have you removed the decorators?
Do you hook the data change itself, or the API method? If the API method, how does this work with the REST methods that invoke multiple API methods. If you hook the data-change, then how do you do so?

Besides, how do you expect to validate the object, without making the changes? The way I see it, you will have to commit the changes, do the validation, rollback the changes if the validation fails.
I would like to know more about how you have managed to work around this? Or do you just rollback, and leave the mess in the audit stream?
Yes, the java object being validated should work for most things, but you have to be really careful about managed datastreams and the like, which may or may not exist before the change is committed.

Remember the curious case of the interdependent objects
A depends on B. B depends on A. Neither is valid until both exists. How will you ever ingest them?

We have solved this by only requiring validity from Active objects. This is implemented with a decorator, doing validation when the object is modified to Active. Are your new hooks as finegrained? Ie, can I hook a method to do validation if the parameters have special values? Basically, do you work from the "One set of rules for the entire repository" mindset, or from the "Several heterogeneous collections in the repository" mindset?

And of course, validator user rights. Since we do support an advanced rights model in fedora, validation can fail because you do not have the rights to view the nessesary data in to objects or from it's relations. Should the validator use the invoking users rights, or root rights? If the validation crashes, the change should be refused, I guess.

I will look at your code later, when I can find the time.

Regards

On 01/27/2012 11:20 AM, Stephen Bayliss wrote:
I've made some validation enhancements as per https://jira.duraspace.org/browse/FCREPO-1026, these are currently in the fcrepo-1026 branch on GitHub.  Some documentation is in the Fedora 3.6 documentation space at https://wiki.duraspace.org/display/FEDORA36/Validation
 
I've some questions on how far to take this, so feedback is welcomed.
 
The current implementation:
 
* allows configuration of the XML ingest validation via a new DOManager fedora.fcfg parameter (with a suitable warning in the documentation about decreasing the level of validation)
* allows all objects to be validated when they are modified, with the API operation being failed if the resulting object would be invalid
 
Object validation is configured via spring (see doobjectvalidator.xml in the server/config/spring directory) - by default it is turned off, so out-of-the-box there's no performance hit.  This feature enables for instance ECM validation to be turned on for every object modification to enforce repository content conformance with the CModel specification via ECM.  Certainly this isn't for everyone, but there are use cases.  Custom validators can be written and added that validate the Java Fedora object (rather than the XML).  Any number of validators can be added, these will execute in turn until (if) one fails.
 
Questions and thoughts:
 
* HTTP response code for REST API operations:  Currently if an ingest fails XML validation this is reported via HTTP status code 500 (Server Error).  To maintain consistency with the existing behaviour, object validation failures will also result in this code, with the text of the exception containing details of the validation failure.  I'd suggest that maybe 400 - Bad Request [1] might be more appropriate for both of these; but this would essentially represent a REST API change - would that be acceptable for a Fedora 3.6 release?  If this change was made I'd suggest implementing this by catching ObjectValidityException at the API level, and extending this exception to contain details of the validation failure for the response body (rather than the 500 exception reporting that occurs currently).
 
* Validate API method.  Currently this performs the ECM validation as it did in previous releases.  This could be modified to perform object validation as specified in the spring config for this - would this make sense?  It should be configurable so that custom validation can be plugged into the validate API method *without* enforcing validation on object commital of course.
 
* Comments on the implementation and code in that branch are most welcome
 
Thanks
Steve
 
 

------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Fedora-commons-developers mailing list
Fedora-commons-developers@...
https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers
Favicon

Re: [fcrepo-dev] Fedora validation enhancements - FCREPO-1026

Just a thought about rights vs. validation:

It seems to me that validation should operate with the rights of the user (not the rights of the system) because to do otherwise would make it difficult to provide good, useful reporting of failed validation without potentially breaking policy by exposing information to which the user lacks access rights.

If the user's request cannot be fulfilled because he or she lacks access to resources required for validation, that seems to me to be a problem in the design of site policies that should be corrected in those policies. It seems unfair to users to expect them to conform their actions to ontologies or other restrictions they can't see! {grin}

---
A. Soroka
Software and Systems Engineering
Online Library Environment
the University of Virginia Library

On Fri, Jan 27, 2012 at 10:49 AM, Stephen Bayliss <stephen.bayliss <at> acuityunlimited.net> wrote:
Hi Asger
 
Thanks for your feedback, some good comments.
 
Re decorators, it is an approach I explored - the difficulty is that before the API option is completed, one doesn't have an object to validate - and after it has completed one has the complexity of undoing the operation if it fails validation.
 
So this is instead a new interface and module hooked into DOManager (which is where the existing XSD and Schematron validation is hooked in).  One then has the Digital Object pre-commit, and it was simple enough to wrap that in a reader and hook that into the ECM validator.
 
I'm sure there are cases that can't be dealt with; so yes a relationship with a cardinality restriction of 1 in both directions would cause an issue here.  (Though in fact only validating active objects could be a solution here)
 
Overall it isn't just about hooking in ECM validation but providing an extensibility point where any validation based on the digital object can be performed prior to a commit.  And doesn't preclude of course validating using other patterns as an alternative, or in addition, depending on the use case.
 
Regards
Steve
 
 
-----Original Message-----
From: Asger Askov Blekinge [mailto:abr-QOZTYAA+/Ks+nozOxzIu92SdvHPH+/yF@public.gmane.org]
Sent: 27 January 2012 15:26
To: fedora-commons-developers-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
Subject: Re: [fcrepo-dev] Fedora validation enhancements - FCREPO-1026

Hi Steve

Well, you could do this with decorators at the moment. Having both decorators and special validation decorators in the spring config file is somewhat messy, I think. Have you removed the decorators?
Do you hook the data change itself, or the API method? If the API method, how does this work with the REST methods that invoke multiple API methods. If you hook the data-change, then how do you do so?

Besides, how do you expect to validate the object, without making the changes? The way I see it, you will have to commit the changes, do the validation, rollback the changes if the validation fails.
I would like to know more about how you have managed to work around this? Or do you just rollback, and leave the mess in the audit stream?
Yes, the java object being validated should work for most things, but you have to be really careful about managed datastreams and the like, which may or may not exist before the change is committed.

Remember the curious case of the interdependent objects
A depends on B. B depends on A. Neither is valid until both exists. How will you ever ingest them?

We have solved this by only requiring validity from Active objects. This is implemented with a decorator, doing validation when the object is modified to Active. Are your new hooks as finegrained? Ie, can I hook a method to do validation if the parameters have special values? Basically, do you work from the "One set of rules for the entire repository" mindset, or from the "Several heterogeneous collections in the repository" mindset?

And of course, validator user rights. Since we do support an advanced rights model in fedora, validation can fail because you do not have the rights to view the nessesary data in to objects or from it's relations. Should the validator use the invoking users rights, or root rights? If the validation crashes, the change should be refused, I guess.

I will look at your code later, when I can find the time.

Regards

On 01/27/2012 11:20 AM, Stephen Bayliss wrote:
I've made some validation enhancements as per https://jira.duraspace.org/browse/FCREPO-1026, these are currently in the fcrepo-1026 branch on GitHub.  Some documentation is in the Fedora 3.6 documentation space at https://wiki.duraspace.org/display/FEDORA36/Validation
 
I've some questions on how far to take this, so feedback is welcomed.
 
The current implementation:
 
* allows configuration of the XML ingest validation via a new DOManager fedora.fcfg parameter (with a suitable warning in the documentation about decreasing the level of validation)
* allows all objects to be validated when they are modified, with the API operation being failed if the resulting object would be invalid
 
Object validation is configured via spring (see doobjectvalidator.xml in the server/config/spring directory) - by default it is turned off, so out-of-the-box there's no performance hit.  This feature enables for instance ECM validation to be turned on for every object modification to enforce repository content conformance with the CModel specification via ECM.  Certainly this isn't for everyone, but there are use cases.  Custom validators can be written and added that validate the Java Fedora object (rather than the XML).  Any number of validators can be added, these will execute in turn until (if) one fails.
 
Questions and thoughts:
 
* HTTP response code for REST API operations:  Currently if an ingest fails XML validation this is reported via HTTP status code 500 (Server Error).  To maintain consistency with the existing behaviour, object validation failures will also result in this code, with the text of the exception containing details of the validation failure.  I'd suggest that maybe 400 - Bad Request [1] might be more appropriate for both of these; but this would essentially represent a REST API change - would that be acceptable for a Fedora 3.6 release?  If this change was made I'd suggest implementing this by catching ObjectValidityException at the API level, and extending this exception to contain details of the validation failure for the response body (rather than the 500 exception reporting that occurs currently).
 
* Validate API method.  Currently this performs the ECM validation as it did in previous releases.  This could be modified to perform object validation as specified in the spring config for this - would this make sense?  It should be configurable so that custom validation can be plugged into the validate API method *without* enforcing validation on object commital of course.
 
* Comments on the implementation and code in that branch are most welcome
 
Thanks
Steve
 
 


------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Fedora-commons-developers mailing list
Fedora-commons-developers-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers


------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Fedora-commons-developers mailing list
Fedora-commons-developers@...
https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers
Favicon

Re: [fcrepo-dev] Fedora validation enhancements - FCREPO-1026

Whether or not we can (or want to) do this for 3.6, I agree that a 400-series error is more appropriate here, since the system is able to respond correctly and the problem is at a different semantic level than the HTTP protocol.

Perhaps a 422 (Unprocessable Entity)? Unfortunately, this is a WebDAV extension, but the semantics very good for this case. See: http://tools.ietf.org/html/rfc4918.

The 422 (Unprocessable Entity) status code means the server understands the content type of the request entity (hence a 415(Unsupported Media Type) status code is inappropriate), and the syntax of the request entity is correct (thus a 400 (Bad Request) status code is inappropriate) but was unable to process the contained instructions. For example, this error condition may occur if an XML request body contains well-formed (i.e., syntactically correct), but semantically erroneous, XML instructions.

---
A. Soroka
Software and Systems Engineering
Online Library Environment
the University of Virginia Library

On Fri, Jan 27, 2012 at 5:20 AM, Stephen Bayliss <stephen.bayliss <at> acuityunlimited.net> wrote:
<snipped>
* HTTP response code for REST API operations:  Currently if an ingest fails XML validation this is reported via HTTP status code 500 (Server Error).  To maintain consistency with the existing behaviour, object validation failures will also result in this code, with the text of the exception containing details of the validation failure.  I'd suggest that maybe 400 - Bad Request [1] might be more appropriate for both of these; but this would essentially represent a REST API change - would that be acceptable for a Fedora 3.6 release?  If this change was made I'd suggest implementing this by catching ObjectValidityException at the API level, and extending this exception to contain details of the validation failure for the response body (rather than the 500 exception reporting that occurs currently).
<snipped>
------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Fedora-commons-developers mailing list
Fedora-commons-developers@...
https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers
Stephen Bayliss | 1 Feb 09:28

Re: [fcrepo-dev] Fedora validation enhancements - FCREPO-1026

That status code does seem appropriate - I guess there could be a question as to whether we should provide a code outside rfc2616 but it certainly would be more informative than 400 (and some of the other rfc4918 codes could be useful in the future if we supported full transactioning and any locking capabilities outside the current optimistic concurrency support).
 
The REST API documentation doesn't in fact in general specify error codes (apart from setDatastreamVersionable) and I believe most of the integration tests just test for a failure.
 
So would this in fact constitute an API change to move away from 500 to eg 422 for invalid FOXML (and for the new object validation failure)?  My preference would be to go for this in 3.6, clearly documenting it in the release notes.
 
Steve
 
 
-----Original Message-----
From: ajs6f-4Ng6DfrEGID2fBVCVOL8/A@public.gmane.org [mailto:ajs6f-4Ng6DfrEGID2fBVCVOL8/A@public.gmane.org]
Sent: 27 January 2012 16:31
To: fedora-commons-developers-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
Subject: Re: [fcrepo-dev] Fedora validation enhancements - FCREPO-1026

Whether or not we can (or want to) do this for 3.6, I agree that a 400-series error is more appropriate here, since the system is able to respond correctly and the problem is at a different semantic level than the HTTP protocol.

Perhaps a 422 (Unprocessable Entity)? Unfortunately, this is a WebDAV extension, but the semantics very good for this case. See: http://tools.ietf.org/html/rfc4918.

The 422 (Unprocessable Entity) status code means the server understands the content type of the request entity (hence a 415(Unsupported Media Type) status code is inappropriate), and the syntax of the request entity is correct (thus a 400 (Bad Request) status code is inappropriate) but was unable to process the contained instructions. For example, this error condition may occur if an XML request body contains well-formed (i.e., syntactically correct), but semantically erroneous, XML instructions.

---
A. Soroka
Software and Systems Engineering
Online Library Environment
the University of Virginia Library

On Fri, Jan 27, 2012 at 5:20 AM, Stephen Bayliss <stephen.bayliss <at> acuityunlimited.net> wrote:
<snipped>
* HTTP response code for REST API operations:  Currently if an ingest fails XML validation this is reported via HTTP status code 500 (Server Error).  To maintain consistency with the existing behaviour, object validation failures will also result in this code, with the text of the exception containing details of the validation failure.  I'd suggest that maybe 400 - Bad Request [1] might be more appropriate for both of these; but this would essentially represent a REST API change - would that be acceptable for a Fedora 3.6 release?  If this change was made I'd suggest implementing this by catching ObjectValidityException at the API level, and extending this exception to contain details of the validation failure for the response body (rather than the 500 exception reporting that occurs currently).
<snipped>
------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Fedora-commons-developers mailing list
Fedora-commons-developers@...
https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers

Gmane