CDR Tickets

Issue Number 4255
Summary Spike Story: Prototype Python version of CDR Server Search Module
Created 2017-04-06 06:15:17
Issue Type Task
Submitted By Kline, Bob (NIH/NCI) [C]
Assigned To Kline, Bob (NIH/NCI) [C]
Status Closed
Resolved 2017-04-13 12:17:41
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.206163
Description

The Problem:
The CDR search API is currently implemented in C++ with a dependency on lex and yacc tools. We would like to simplify the software tool chain, replacing the CDR Server with Python if possible.

The Question:
Can the functionality provided by the CDR Server Search Module be implemented in Python?

Objective of this Spike Story:
Investigate whether there are Python packages available which could be used to parse the CDR Search API queries, and if not, whether the subset of the currently supported search query syntax which is actually used can be supported without any third-party parsing libraries.

Comment entered 2017-04-13 12:17:35 by Kline, Bob (NIH/NCI) [C]

The existing search module implements an XQL parser built with low-level lexical and grammar processors which are not directly available to Python. While it would be possible to create a compiled extension to replicate the existing XQL parser functionality, that would involve a non-trivial level of effort, and would compromise the goal of reducing dependencies on programming expertise in C and C++. An analysis of the uses of the search module shows that only a subset of the flexibility of the supported XQL syntax is ever used, and it would be possible to provide the required functionality without using XQL syntax. A replacement API was implemented using assertion test strings which can be easily parsed by the builtin string support in Python. Each valid test assertion string contains exactly three tokens:

  1. a path, which can be one of

    • CdrCtl/Title

    • the xpath (starting with a single forward slash) for an element or attribute, with /value or /int_val appended to indicate which column of the query_term table should be used for the test

  2. an operator (the same operators supported by the current XQL parser; e.g. =, contains, begins, gt, etc.)

  3. a value to be used in the test; wildcards are added as appropriate if the operator is "contains" or "begins"

The three tokens are separated by whitespace. The first two tokens cannot contain whitespace, but there are no whitespace restrictions on the value component of the test.
The places which currently use the search module (and which will therefore need to be modified) are significantly fewer than original anticipated:

  • DevTools/Utilities/DiffSchemas.py

  • XMetaL/DLL/SearchDialog.cpp

  • lib/Python/RtfWriter.py

  • lib/Python/cdrpub.py (just remove the calls to cdr.search(); XQL queries have never been used in the publishing control documents)

  • Bin/UpdateSchemas.py

  • Build/AnthillPro/deploy-all.py

Elapsed: 0:00:00.001213