CDR Tickets

Issue Number 4625
Summary [Glossary] Modify Glossifier to allow for Dictionary Selection
Created 2019-05-23 11:39:26
Issue Type Improvement
Submitted By Juthe, Robin (NIH/NCI) [E]
Assigned To Kline, Bob (NIH/NCI) [C]
Status Closed
Resolved 2020-10-23 14:57:58
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.244502
Description

We would like to expand the use of the glossifier for the genetics dictionary (Eng and Spanish [coming soon]). Let's discuss what would be involved.

Comment entered 2019-05-29 18:00:47 by Osei-Poku, William (NIH/NCI) [C]

CIAT discussed this on Tuesday and the biggest concern was that using the glossifier creates additional work for users that is why users don't like to use it as much as they should. Overall users find the glossifier helpful when glossifying a completely new section or a new summary.  I have summarized our suggestions below for your consideration.   

  1. We agreed that it is necessary to modify the current Glossifier to identify only genetics terms when glossifying a Genetics summary. This would be a timesaver as the user would not have to skip several terms that would normally be glossified in a Genetics summary. 

  2. It is also important to create a separate mapping table for Genetics summaries to at least keep track of new phrases. 

  3. One improvement to the glossifier that would be helpful is in cases where a summary had been glossified previously. For sections that have been glossified already, if additional changes are made to the section and it needs to be re-glossified, it would really help if the glossifier would not identify terms that have already been glossified in the same section. In other words, if a section has already been glossified and the first occurrence of "cancer" has been glossified in the first paragraph for example, when the glossifier is reactivated, it should not identify the term "cancer" in the second paragraph of the same section for the user to skip it. 

  4. There are also cases where the glossifier includes punctuations when glossifying terms. This creates additional work as the user needs to go through all the glossified terms in a section to check and make sure that no punctuations were included thus creating additional work. (I have not been able to reproduce this behavior yet).

  5. Levels of Evidence links do not use the GlossaryTermRef tags (They use the LOERef tags) so it looks like it falls outside the scope of the glossifier at this point.

Comment entered 2019-06-20 11:35:37 by Osei-Poku, William (NIH/NCI) [C]

I am attaching 3 examples of cases where the glossifer picks up parenthesis. I have not come across a case where a semicolon has been picked up yet. 

 

Comment entered 2019-12-03 08:57:22 by Kline, Bob (NIH/NCI) [C]

I am investigating the report of the glossifier picking up extra punctuation. Please open a separate ticket for that issue, which is unrelated to the enhancement request for this ticket.

Comment entered 2020-01-29 20:31:52 by Osei-Poku, William (NIH/NCI) [C]

Closing this ticket in favor of a new ticket (to be created) for programmatic handling of glossifying summaries.

Comment entered 2020-09-17 10:08:07 by Osei-Poku, William (NIH/NCI) [C]

I would like to re-open this ticket for discussion this afternoon. While the Glossifier works for all HP summaries including genetics summaries, there is a continued demand for the glossier to only identify terms that are in the genetics dictionary instead of terms from both dictionaries. If it would be possible to modify the glossifier to allow the user to choose which dictionary to use to glossify a summary document, that would be great.

Comment entered 2020-09-18 08:39:26 by Osei-Poku, William (NIH/NCI) [C]

We decided not to implement the original requirements as stated in this ticket but to repurpose this ticked to handle the most current request above (modifying the glossifier to allow for users to choose the dictionary). I have modified the title of the ticket to reflect this new requirement.

Comment entered 2020-10-09 12:31:58 by Kline, Bob (NIH/NCI) [C]

, , , .

 

Just a heads-up that I've marked this as 20 story points, though it's likely to be higher than that. I don't think it will be as much as a 40, but the estimating system doesn't have anything in between.

We have two glossifier systems in the CDR, one for the CDR/XMetaL and one for the CMS. The CMS version categorizes every glossary term name by language and dictionary (though I'm not sure the current front end code makes use of all of this information). The other glossifier system is the one used by our XMetaL DLL (through the CDR client-server API), and that only distinguishes term names by language, and doesn't have any knowledge of the dictionaries for which definitions are available.

So I'll need to

  1. do the analysis to determine whether the two glossifier systems should be refactored to extract out common functionality, now that the requirements for the two are moving closer toward each other (and do that refactoring if it makes sense) ✔

  2. rewrite the glossifier API code and the Python wrappers ✔

  3. rewrite the API unit tests and verify that tests pass ✔

  4. rewrite the glossifier code in the DLL ✔

  5. come up with a user interface for choosing a dictionary (probably just by having multiple context menu items) ✔

  6. test the changes from inside XMetaL ✔

I'll proceed with this work some time next week, after you've had a chance to read this over and provide any feedback.

Comment entered 2020-10-14 18:09:26 by Kline, Bob (NIH/NCI) [C]

Here are the deltas between the two glossifiers ("CDR" refers to the glossifier used inside XMetaL; "CMS" refers to the glossifier information exported by the CDR for use in the CMS):

 

Feature

CDR

CMS

Includes unpublished terms

Normalizes RIGHT SINGLE QUOTATION MARK (U+2019) to APOSTROPHE (U+0027)

Strips punctuation

Preserves the original term for display in the user interface

Uses the query_term table

Includes dictionary information

Parses the publishable document versions

Replaces hyphens with spaces

Normalizes whitespace

Preserves diacritics

Collects and stores the glossary information nightly

Assembles the glossary information on demand

Includes terms without definitions

 

It's possible that I have overlooked some other differences, but I'm pretty confident I've found them all. As far as I know, the requirements for these two glossifiers were established completely independently from each other, so it doesn't seem surprising that there is so much divergence in their behavior. Because of this, it would not seem to make sense, in my judgment, to refactor the common functionality into a single library in the CDR unless the requirements for the two glossifiers were brought more in line with each other.

So the next question would be: is there a compelling business reason to maintain two separate glossifiers which don't behave the same way, or should those requirements be driven by logic which is closer to being on the same page, so to speak? This would be a question directed to  (for the CDR) and  (who knows more about the needs of the CMS editing interface than I do). Perhaps we can discuss this question at tomorrow afternoon's meeting.

Comment entered 2020-10-15 12:10:56 by Kline, Bob (NIH/NCI) [C]

Implemented on DEV. As noted above, the new commands have been added to the Summary context (right-click) menu.

Comment entered 2020-10-23 13:33:27 by Osei-Poku, William (NIH/NCI) [C]

This appears to be working as expected. Would it be possible to add icons for the two macros next to the existing glossifier icons in XMetal ?

Comment entered 2020-10-23 13:57:53 by Kline, Bob (NIH/NCI) [C]

Put in a Newton ticket for that.

Comment entered 2020-10-27 09:44:28 by Osei-Poku, William (NIH/NCI) [C]

Verified on DEV. Thanks!

Comment entered 2020-11-06 12:57:01 by Osei-Poku, William (NIH/NCI) [C]

Verified on QA. Thanks!

Comment entered 2020-12-16 14:25:00 by Osei-Poku, William (NIH/NCI) [C]

Verified on PROD. Thanks!

Attachments
File Name Posted User
Glossfier paren 668479 - b.JPG 2019-06-20 11:35:08 Osei-Poku, William (NIH/NCI) [C]
glossifier paren markup 668479.JPG 2019-06-20 11:35:08 Osei-Poku, William (NIH/NCI) [C]
glossifier paren markup 668479 - c.JPG 2019-06-20 11:35:08 Osei-Poku, William (NIH/NCI) [C]

Elapsed: 0:00:00.001565