Issue Number | 4255 |
---|---|
Summary | Spike Story: Prototype Python version of CDR Server Search Module |
Created | 2017-04-06 06:15:17 |
Issue Type | Task |
Submitted By | Kline, Bob (NIH/NCI) [C] |
Assigned To | Kline, Bob (NIH/NCI) [C] |
Status | Closed |
Resolved | 2017-04-13 12:17:41 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/ocecdr/issue.206163 |
The Problem:
The CDR search API is currently implemented in C++ with a dependency on
lex and yacc tools. We would like to simplify the software tool chain,
replacing the CDR Server with Python if possible.
The Question:
Can the functionality provided by the CDR Server Search Module be
implemented in Python?
Objective of this Spike Story:
Investigate whether there are Python packages available which could be
used to parse the CDR Search API queries, and if not, whether the subset
of the currently supported search query syntax which is actually used
can be supported without any third-party parsing libraries.
The existing search module implements an XQL parser built with low-level lexical and grammar processors which are not directly available to Python. While it would be possible to create a compiled extension to replicate the existing XQL parser functionality, that would involve a non-trivial level of effort, and would compromise the goal of reducing dependencies on programming expertise in C and C++. An analysis of the uses of the search module shows that only a subset of the flexibility of the supported XQL syntax is ever used, and it would be possible to provide the required functionality without using XQL syntax. A replacement API was implemented using assertion test strings which can be easily parsed by the builtin string support in Python. Each valid test assertion string contains exactly three tokens:
a path, which can be one of
CdrCtl/Title
the xpath (starting with a single forward slash) for an element or attribute, with /value or /int_val appended to indicate which column of the query_term table should be used for the test
an operator (the same operators supported by the current XQL parser; e.g. =, contains, begins, gt, etc.)
a value to be used in the test; wildcards are added as appropriate if the operator is "contains" or "begins"
The three tokens are separated by whitespace. The first two tokens
cannot contain whitespace, but there are no whitespace restrictions on
the value component of the test.
The places which currently use the search module (and which will
therefore need to be modified) are significantly fewer than original
anticipated:
DevTools/Utilities/DiffSchemas.py
XMetaL/DLL/SearchDialog.cpp
lib/Python/RtfWriter.py
lib/Python/cdrpub.py (just remove the calls to cdr.search(); XQL queries have never been used in the publishing control documents)
Bin/UpdateSchemas.py
Build/AnthillPro/deploy-all.py
Elapsed: 0:00:00.001213