Issue Number | 2441 |
---|---|
Summary | Schema validation packages |
Created | 2008-01-08 19:27:43 |
Issue Type | Improvement |
Submitted By | Kline, Bob (NIH/NCI) [C] |
Assigned To | alan |
Status | Closed |
Resolved | 2013-07-11 19:04:19 |
Resolution | Won't Fix |
Path | /home/bkline/backups/jira/ocecdr/issue.106769 |
BZISSUE::3818
BZDATETIME::2008-01-08 19:27:43
BZCREATOR::Bob Kline
BZASSIGNEE::Alan Meyer
BZQACONTACT::Lakshmi Grama
Please take a look at the available options for replacing our home-grown schema validation software with an off-the-shelf solution.
BZDATETIME::2008-01-11 00:50:54
BZCOMMENTOR::Alan Meyer
BZCOMMENT::1
The first thing I did, before looking at any schema validation
packages, was to take a quick look at alternative methods of
validation.
Some alternatives to schema that have been proposed are:
Schematron.
Trex (now part of Relax NG).
Relax NG (REgular LAnguage description for Xml - New
Generation.)
Examplotron.
I saw no indication that any of these are going anywhere. A
couple of them had some sort of attempted commercial
implementation, but it's not clear if these are actively marketed
or supported or that anyone is using them.
It doesn't look to me like there would be any benefit in
pursuing
them. I'll confine my further investigation to schema validation
only.
BZDATETIME::2008-01-11 00:52:46
BZCOMMENTOR::Alan Meyer
BZCOMMENT::2
The next thing I think I need to do is to develop a list of
requirements and desirable features in a schema validator. We
need to know what we're looking for before we can say that one
package is better than another.
BZDATETIME::2008-01-18 01:00:17
BZCOMMENTOR::Alan Meyer
BZCOMMENT::3
The most promising off the shelf validator I have found so
far is sold by Intel. It looks superficially similar to the
validator in the Java SDK. It costs $5,000 per server plus
$1,250 per year. The cost doesn't seem justified to me unless
it has significant advantages over our home grown program -
which remains to be seen.
I've downloaded an evaluation copy, good for 30 days, and will
try to work with it.
Bob suggested checking the Apache / Jakarta / Xerces projects
for validators - which I will do.
BZDATETIME::2008-01-23 00:23:17
BZCOMMENTOR::Alan Meyer
BZCOMMENT::4
I spent a half hour or so today with Bryan Pizzillo looking at
the schema validatator in C#. It looks flexible and powerful.
It does not appear to be based on the MSXML ActiveX control
distributed with Internet Explorer - which also has schema
validation built in.
It looks like there would be a steep learning and
experimentation
curve to climb if we want to invoke the C# validator from our
server written in native C++.
To begin with, it's not easy to get information on how to do
this. I have pursued a number of leads on the net only to
discover that the "unmanaged C++" in the author's example was
still written to generate .NET common language runtime
"intermediate language", not native Windows executable code.
Getting past that, we'll have to master "Interop Pinvoke",
"marshaling" (serializing data for transmission between
programs, there are about 8 ways to do this), management of
object lifetimes, prevention of inappropriate garbage collection,
dealing with multi-threading in our own application with possible
resource pooling, and of course, avoiding the evils of double
thunking. My inner geek is reeling from all of the possibilities
that this presents.
We could take a shot at it, though if we really wanted to do
this
we should reconsider rewriting the CDR server in C# - especially
since it would also allow us to eliminate separate packages for
XSLT processing, regular expressions, DOM and SAX parsing, and
maybe some other things. For better or worse, we could put all
our eggs in the Microsoft basket.
BZDATETIME::2008-02-14 15:17:15
BZCOMMENTOR::Alan Meyer
BZCOMMENT::5
I looked at the problem of implementing a new schema validator.
In
order to find out what was involved, and how good it would be, I
thought
I might write a test program that uses a different validator in order
to
see what the difficulties and benefits might be. The one I chose
was
the open source Xerces-c 2.8 XML parser/validator.
I spent a couple of hours looking at this package in order to
understand
it. It turns out that even writing a test program will require
some
effort. We'll have to write the program and modify some schemas to
conform to the W3C Schema recommendations that emerged after Bob
wrote
our original schema validator, and which handle namespaces and
required/optional characteristics differently. So, while not
extremely
difficult, there is some work involved in writing a test program.
I have suspended that effort. See the next comment for more
reasons
why.
BZDATETIME::2008-02-14 15:35:48
BZCOMMENTOR::Alan Meyer
BZCOMMENT::6
My original idea for program packaging of our schema error reporting
was
to produce an error handler object that could be passed in to the
schema
validator. I planned to modify our existing schema validation
program
to accept such an error handler and, if we switched to a new
schema
validator, to re-use it as-is.
Deeper investigation shows that this isn't as good an idea as I
originally thought.
In the first place, our error handling plan will require information
to
be communicated from the schema validator to the error handler for
which
no provision is made in the default interfaces for off-the-shelf
schema
validator software. So it's not clear that the plan we have now will
be
that useful later on.
In the second place, the current schema validation software does
not
have a validation object into which we can pass an error handler.
Modifying it to contain our validation in such an object requires
more
changes than we would like - both in terms of the time involved and
in
terms of the reliability risk involved in modifying working code.
This
risk would be incurred for no particular benefit except to make
our
software more compatible with a package that we might never use.
Finally, it now appears to me that the server side software needed
for
better error reporting is not trivial, but it does not require a
major
effort. The effort required is small enough that it no longer
looks
appealing to add significant extra work to it to make it re-usable.
If
we switch to another validator and throw away the new changes we plan
to
implement in our current validator, the loss will not be that
great.
Furthermore, the concepts embedded in our new software (attaching
an
error ID attribute to every element) may not translate well to a
new
validator, which may not be compatible with that approach, so the
software may not be re-usable anyway.
For all those reasons I have removed the block that said issue
#3637
depended on this one.
BZDATETIME::2008-04-10 14:17:46
BZCOMMENTOR::Bob Kline
BZCOMMENT::7
Lowering priority at Alan's suggestion.
As discussed at the weekly status meeting this issue won't be addressed in the foreseeable future.
Elapsed: 0:00:00.000529