CDR Tickets

Issue Number 4780
Summary [Glossary] Health Professional Glossary Terms Report
Created 2020-02-13 09:35:15
Issue Type Improvement
Submitted By Juthe, Robin (NIH/NCI) [E]
Assigned To Kline, Bob (NIH/NCI) [C]
Status Closed
Resolved 2020-05-07 17:27:05
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.256679
Description

We would like to create a new report to view health professional dictionary terms with or without their definitions. I'm attaching specifications for the new report.

Comment entered 2020-04-13 08:48:17 by Kline, Bob (NIH/NCI) [C]

Dictionary: Genetics OR None

Just to confirm: None means only display terms which have no dictionary, right?

Comment entered 2020-04-13 16:19:21 by Juthe, Robin (NIH/NCI) [E]

That's right. No dictionary. Thanks.

Comment entered 2020-04-25 15:44:19 by Kline, Bob (NIH/NCI) [C]

If "could look like the GTC by type report" from the specification means we need to implement rich text mixed formatting in the definition cells AND we need to support an Excel version of the report (unlike the GTC by Type report) then this will be at least a 20. Can you confirm?

Comment entered 2020-04-28 17:01:23 by Juthe, Robin (NIH/NCI) [E]

What do you mean by rich text? Filling in the placeholders? Please clarify.

 

I'm fine with the report only being in HTML for now and we could add Excel down the road if it's needed. You make a good point that we don't have this for the GTC by Type report so I don't think it's necessary. Not sure how much that saves in terms of LOE.

Comment entered 2020-04-29 08:15:56 by Kline, Bob (NIH/NCI) [C]

What do you mean by rich text?

I mean more than one font style (color, size, weight, font family, etc.) applied individually to parts of a cell.

That's not supported by the common framework we have built for Excel reports (because it's not supported by the underlying openpyxl library), so we'd have to pull in another, more specialized Excel library and build the report by hand. We can do it (two other reports were implemented that way), but it's definitely more work.

Comment entered 2020-04-29 09:37:26 by Kline, Bob (NIH/NCI) [C]

I'm adding some more general notes for the development team on Excel support in the CDR, as questions about this topic have come up in the past, and I don't always remember the details myself.

  • Home-grown libraries for reading and writing Excel workbooks have been completely retired. Those libraries were developed back in the days when there were no suitable third-party libraries available.

  • We still have some scripts which use the older xlrd and xlwt libraries. The xlwt library is only able to generate spreadsheet files compatible with Microsoft Excel versions 95 to 2003 (Excel 97/2000/XP/2003 XLS), and cannot create modern Excel workbooks. The xlrd package is a companion to xlwt and is able to read both .xls and the newer .xlsx files.

  • The openpyxl package is currently our primary tool for reading and writing modern Excel files. Its two limitations are (a) it does not support old Excel .xls files; and (b) it does not yet support rich text in cells (though there is a pull request in the pipeline for such support). Because of the first limitation, we should keep the xlrd package installed, even after all use of it in our existing scripts has been replaced with openpyxl. The openpyxl package is well supported, and it is currently the most widely-used package for working with modern Excel files. Our own report framework provides a wrapper around it to facilitate creating Excel versions of the reports, and all new software (and extensive rewrites of existing scripts and libraries) should use this package (with our report framework, if feasible).

  • The xlsxwriter package does not support reading Excel workbooks, but it does supply the rich text support which openpyxl currently lacks. There are currently two reports (Media Caption and Content Report and Summary Standard Wording) in the CDR which use this package to meet the requirement for rich text in Excel report cells.

Comment entered 2020-05-01 09:18:17 by Kline, Bob (NIH/NCI) [C]

: need some more clarification. I assume "terms" in the phrase "blocked terms" (in the requirements attachment) refers to the GlossaryTermName documents and not the GlossaryTermConcept documents. I base this assumption on two things:

  1. In the GTC By Type report it's the GTN documents which are labeled BLOCKED rather than the GTC document.

  2. The word "terms" is used elsewhere (without any qualification) to refer to the term names, not the term concepts (e.g., "List of Terms").

(It's ironic, given that dictionaries are all about eliminating ambiguities in the meaning of words, that the words we use to identify things for the glossary documents are packed with so much confusing overloading, don't you think? 😛)

So here's my question: if the user chooses to exclude blocked terms and a concept has some blocked and some unblocked name documents, I would guess that for the Concept flavor of the report we'll show the concept but only show the unblocked names. What if all of the concept's name documents are blocked? Do we skip the concept altogether for the Concept version of the report (as we will for the List of Terms flavor)? Or do we show the concept but use a blank cell for the second column?

Comment entered 2020-05-01 09:49:51 by Kline, Bob (NIH/NCI) [C]

Hmm. I posted a longish comment to this ticket late in the day yesterday, to give you an FYI that in anticipation of creating this new report based on the GTC By Type report, I had reworked the latter report so I wouldn't be replicating the old techniques of manually assembling strings for the report's HTML into the new report. I even had screen shots to show how much faster the rewritten GTC By Type report was, and oddly enough, the screen shots I had pasted into the comment appear to have survived, but Jira lost the comment itself. So here's a less long-winded version of that comment. 🙂

Comment entered 2020-05-01 10:25:05 by Kline, Bob (NIH/NCI) [C]

While I'm soliciting clarification of the requirements: I see that TermType is unbounded for the GTC docs. So if the user says exclude LOE terms, and a GTC has a term type of Level of evidence as well as one or more other term types, we still exclude the concept, right?

Comment entered 2020-05-01 15:40:58 by Kline, Bob (NIH/NCI) [C]

Installed on the Glossary Terms Reports menu page on DEV. Please report any bugs you find here on this ticket. Please submit any modifications to the original requirements as new tickets in Maxwell.

Comment entered 2020-05-07 15:48:34 by Osei-Poku, William (NIH/NCI) [C]

Looks good on DEV. (Amy reviewed this on DEV and said it worked as expected).

Comment entered 2020-05-07 17:00:19 by Juthe, Robin (NIH/NCI) [E]

Yes, by terms I meant term names. 🙂 I also like how you've handled the display of blocked terms - it appears you are still displaying blocked term names on the concept version of the report (even if all terms for the concept are blocked), including the blocked (or not used anyway) definition. It looks good to me as you have it. Thanks.

Comment entered 2020-05-07 17:01:45 by Juthe, Robin (NIH/NCI) [E]

This looks good to me too although I noticed one small thing:

 

At the top of the Genetics Dictionary version of the report, it says "Glossary Dictionary". I think you meant to say "Genetics Dictionary". Thanks!

Comment entered 2020-05-07 17:27:05 by Kline, Bob (NIH/NCI) [C]

Indeed. Fixed. (All those G words!)

Comment entered 2020-05-28 14:31:43 by Juthe, Robin (NIH/NCI) [E]

Verified on DEV. Thanks!

Comment entered 2020-06-11 18:00:35 by Osei-Poku, William (NIH/NCI) [C]

Verified on QA. Thanks!

Comment entered 2020-07-09 13:00:54 by Juthe, Robin (NIH/NCI) [E]

Verified on PROD.

Attachments
File Name Posted User
Health Professional Glossary Terms Report.docx 2020-02-13 09:36:34 Juthe, Robin (NIH/NCI) [E]
image-2020-04-29-08-10-04-271.png 2020-04-29 08:10:04 Kline, Bob (NIH/NCI) [C]
image-2020-04-30-18-45-18-133.png 2020-04-30 18:45:18 Kline, Bob (NIH/NCI) [C]
image-2020-04-30-18-45-53-018.png 2020-04-30 18:45:53 Kline, Bob (NIH/NCI) [C]

Elapsed: 0:00:00.001863