CDR Tickets

Issue Number 4915
Summary [Glossary] Consider joining the CDR and Cancer.gov Glossifiers
Created 2020-10-22 13:31:17
Issue Type Improvement
Submitted By Juthe, Robin (NIH/NCI) [E]
Assigned To Kline, Bob (NIH/NCI) [C]
Status Open
Resolved
Resolution
Path /home/bkline/backups/jira/ocecdr/issue.277130
Description

There are currently two separate glossifiers for CDR users and Drupal/CMS users. Bob outlined several differences (below) in OCECDR-4625. This issue is to examine whether there need to be two separate glossifiers and/or as many differences between the two.

_____

Here are the deltas between the two glossifiers ("CDR" refers to the glossifier used inside XMetaL; "CMS" refers to the glossifier information exported by the CDR for use in the CMS):

 

Feature

CDR

CMS

Includes unpublished terms

Normalizes RIGHT SINGLE QUOTATION MARK (U+2019) to APOSTROPHE (U+0027)

Strips punctuation

Preserves the original term for display in the user interface

Uses the query_term table

Includes dictionary information

Parses the publishable document versions

Replaces hyphens with spaces

Normalizes whitespace

Preserves diacritics

Collects and stores the glossary information nightly

Assembles the glossary information on demand

Includes terms without definitions

 

It's possible that I have overlooked some other differences, but I'm pretty confident I've found them all. As far as I know, the requirements for these two glossifiers were established completely independently from each other, so it doesn't seem surprising that there is so much divergence in their behavior. Because of this, it would not seem to make sense, in my judgment, to refactor the common functionality into a single library in the CDR unless the requirements for the two glossifiers were brought more in line with each other.

Elapsed: 0:00:00.001799