CDR Tickets

Issue Number 4695
Summary [Glossary] keywords search report
Created 2019-10-31 12:42:34
Issue Type New Feature
Submitted By Osei-Poku, William (NIH/NCI) [C]
Assigned To Kline, Bob (NIH/NCI) [C]
Status Closed
Resolved 2021-04-22 13:24:11
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.251892
Description

This is a request for a new report that works like the Standard Wording report. It will primarily be used by the Spanish team to get a good sense of how terms have been translated across several documents to ensure consistency. I am attaching a draft requirements document.

Comment entered 2020-09-03 08:23:13 by Kline, Bob (NIH/NCI) [C]

Let me know when the requirements have been finalized.

Comment entered 2020-09-17 11:24:01 by Osei-Poku, William (NIH/NCI) [C]

The original requirements as attached are final now. Please use it for the implementation.

Comment entered 2020-09-23 10:22:55 by Osei-Poku, William (NIH/NCI) [C]

I have made slight modifications to the requirements. 

It is okay to have two separate reports, one each for Glossary and Media. Please let me know if you want to create 2 separate tickets. It will also be helpful to be able to search definition documents and not just the glossary documents. 

 

Glossary and Media Keyword search report_updated.docx

Comment entered 2020-09-23 12:28:04 by Kline, Bob (NIH/NCI) [C]

Please let me know if you want to create 2 separate tickets.

Yes, please.

Comment entered 2020-09-24 09:54:55 by Osei-Poku, William (NIH/NCI) [C]

Glossary Keyword search report.docx

I have modified the title to reflect that the focus will be on the Glossary keyword search report. I have also updated the requirements and removed the requirements for the media doc from it.

Comment entered 2020-10-07 09:53:32 by Kline, Bob (NIH/NCI) [C]

Ability to search the definitions of glossary term name and concept documents

I don't think term name documents have definitions. You list the term name string elements under this heading (alongside the definition elements). Don't we already have reports which find glossary terms by name (for example, the advanced glossary term search page)?

Comment entered 2020-10-07 16:06:04 by Osei-Poku, William (NIH/NCI) [C]

Glossary Keyword search report_Corrections.docx

I have made the correction in the attached updated requirements document. I believe that are reports for searching glossary terms but I am not sure that there is one that will accomplish the same purpose as this request.

Comment entered 2020-10-07 16:42:45 by Kline, Bob (NIH/NCI) [C]

Well, as long as you have presented your business case for this report to Margaret and gotten it approved. In this case, you'll probably want to eliminate the sentence "We want to be able to search for specific words within a definition document." from the requirements document.

Which of the fields (language, audience, etc.) are required? For which can the user specify more than one value?

Please be specific as to which fields are used for selection, and which fields are used for determining where the keywords to be displayed should be drawn from. When you say "... specify the audience of the document" you imply that we are to find glossary terms which have a definition for the designated audience, but are to look in all definitions for the keywords. Is this right? Please say exactly what is meant by "specify the language."

Are the definitions to be searched as they are stored in the concept documents? Or are the placeholders to be resolved and the generated version of the definition searched for each of the term's names?

No mention is made of highlighting/rich text, so I assume that's not a requirement, right?

Comment entered 2020-10-08 05:35:49 by Kline, Bob (NIH/NCI) [C]

Perhaps it would be better if you were to explain at today's status meeting the underlying problem you would like to solve with this new report (including any helpful background information), it would be easier to determine what the specific requirements should be. Thanks.

Comment entered 2020-10-08 10:49:00 by Osei-Poku, William (NIH/NCI) [C]

Sure. 

I am attaching a revised document providing information on search criteria. I also updated the requirements the requirements removing some of the desired features. Glossary Keyword search report_Corrections_Selection_Criteria.docx

Comment entered 2020-10-14 10:30:12 by Kline, Bob (NIH/NCI) [C]

 since this is one of the larger tickets we've undertaken in a while, could I get your approval before starting in on the design and implementation?

To summarize, this will involve

  • parsing each glossary term (possibly thousands of them)

  • generating separate versions of each definition in each concept with placeholders resolved for each of the concept's term name documents

  • normalizing the generated definitions and each of the term names (including alternate names) for searching for each of the phrases specified by the requestor of the report

  • custom code to create rich-text markup inside table cells (both HTML and Excel)

  • benchmarking the report to determining whether exceeds the time limits for the web server and needs to be converted to run as a batch job (and if so, perform that conversion)

Thanks!

Comment entered 2021-03-16 09:09:46 by Kline, Bob (NIH/NCI) [C]

 and : do we have the green light for this request?

Comment entered 2021-04-08 14:02:32 by Kline, Bob (NIH/NCI) [C]

We agreed in the 2021-04-08 status meeting that we will proceed with implementing this report. We decided we will use the same approach for highlighting the target terms using added characters instead of rich text. We also agreed that we will do the searching of the exported GlossaryTerm documents instead of the original GlossaryTermName and GlossaryTermConcept CDR documents, in order to be working with definitions which have already had placeholders resolved. This will mean that documents which have not yet been publishing will not be included in the report (this satisfies another of the requirements, that blocked documents be excluded from the report). It also means that only term names which have been published (exported) will be searched.

Comment entered 2021-04-21 15:47:29 by Kline, Bob (NIH/NCI) [C]

Same question about specifying wildcards in search terms as with the media keywords report.

Comment entered 2021-04-22 08:52:56 by Kline, Bob (NIH/NCI) [C]

 Can you clarify an ambiguity in the requirements, which talk about using Audience to "indicate whether to search Patient or HP definitions Or Patient or HP Terms." As far as I can tell, definitions are assigned a specific audience, but the terms themselves are not. A term document can have

  1. Patient definitions

  2. HP definitions

  3. Patient and HP definitions

  4. No definitions

... but the terms themselves are not HP or Patient terms.

So let's take a specific case. A glossary document has a name which contains "toenail cancer" and a single definition, with the audience set to "Health professional." The user gives "toenail cancer" as the phrase to be searched, and "Patient" as the audience. Should the report ignore this document or include it?

Comment entered 2021-04-22 13:24:11 by Kline, Bob (NIH/NCI) [C]

Implemented on CDR DEV.

Comment entered 2021-04-26 09:38:21 by Osei-Poku, William (NIH/NCI) [C]

Please ignore the requirements document references to the audience of the term (instead of the definition). By the way, we've done some testing on DEV and the report appears to work as expected. Thanks!

Comment entered 2021-04-29 10:50:57 by Osei-Poku, William (NIH/NCI) [C]

Verified on DEV. Thanks!

Comment entered 2021-05-25 12:01:47 by Osei-Poku, William (NIH/NCI) [C]

When a search term that is in both dictionaries (Patient and HP) is supplied, the definitions for both dictionaries are displayed even when the user selects one dictionary  (in this case Health Professional).  Please see attached screenshots. 

Comment entered 2021-05-25 14:54:02 by Kline, Bob (NIH/NCI) [C]

The requirements say

Also, providing the CDR IDs of both the GTNs and GTCs, Term Names and Definitions, highlighting the retrieved terms and showing surrounding text if applicable

and

Elements to Search:

  1. Term Name string

  2. Definition Text

So the definitions are given as both elements to be displayed as well as elements to be searched. The audience is used for filtering used to determine which glossary terms are to be included on the report, not for determining what will be displayed for those terms. I assume that if we found the target term in the term name string but not in the definitions for that term's concept you wouldn't want us to ignore the instructions to display the definitions, right?

Comment entered 2021-05-26 13:18:04 by Osei-Poku, William (NIH/NCI) [C]


 I assume that if we found the target term in the term name string but not in the definitions for that term's concept you wouldn't want us to ignore the instructions to display the definitions, right?

That is right. Displaying the definitions is helpful and they should not be ignored. Is it the same case for the issue reported above?

Comment entered 2021-05-26 13:55:54 by Kline, Bob (NIH/NCI) [C]

Right. The way I read the requirements, the audience is used for deciding which documents are selected for the report, not which definitions should be displayed for those selected documents.

Comment entered 2021-05-27 09:30:13 by Osei-Poku, William (NIH/NCI) [C]

For the same search term in the screenshot above "autosomal dominant inheritance"  When the audience is changed from "Health Professional" to "Patient", Only one definition is displayed. Shouldn't two definitions be displayed just as when I selected "Health Professional" ?

Comment entered 2021-05-27 10:18:37 by Kline, Bob (NIH/NCI) [C]

Tell you what. Modify the requirements to say explicitly which parameters control document selection (and how) and which parameters control data display (and how), and I'll make the report match those requirements.

Comment entered 2021-05-27 15:26:16 by Osei-Poku, William (NIH/NCI) [C]

The report works very well when there is only one definition block for a term. It also seems to display exactly as expected from other terms or definitions that are not directly linked to the search term. Where there seems to be some inconsistencies is when the search term has multiple definition blocks in the search term's own linked GTC document and you select "Health Professional" for the Audience option. 

This is our preference for the display:

1.  Display the definition only if the search term is in the definition. - Allele as example --I believe this is happening as expected when you select "Patient" for the Audience. However, when you select "Health Professional", the "Patient" definition is also displayed even though the term "Allele" is not in that definition.

2. When a search term  is in both dictionaries,  for example - "autosomal dominant inheritance" - The definition to display should match the Audience selected by the user. So, if a term is in both dictionaries, and a user selects "Patient" as Audience, please display only the "Patient" definition, and the other way around.

3. If a search term is in both dictionaries ,  for example - "autosomal dominant inheritance" - and a user selects "Any" for the Audience option, please display all the definitions from the GTC that is directly linked to it'

Comment entered 2021-06-01 08:33:24 by Kline, Bob (NIH/NCI) [C]

Does the shift in requirements (don't show a definition at all if it doesn't contain one of the search terms) also apply to the name column (don't show the name at all if it doesn't contain one of the search terms)?

Comment entered 2021-06-01 09:28:41 by Kline, Bob (NIH/NCI) [C]

While waiting for the answer to my last question, I have

  1. applied the new rule for leaving out display of definitions which don't contain any of the search terms; and

  2. corrected "Health Professional" (which is what the requirements have) in the picklist to "Health professional" (which is what's in the documents); that should eliminate some of the anomalies you ran into.

These changes are only on DEV right now.

Comment entered 2021-06-01 12:14:38 by Osei-Poku, William (NIH/NCI) [C]


Does the shift in requirements (don't show a definition at all if it doesn't contain one of the search terms) also apply to the name column (don't show the name at all if it doesn't contain one of the search terms)?

Please show the name. It does not apply to the name column.

Comment entered 2021-06-01 15:46:53 by Kline, Bob (NIH/NCI) [C]

OK. When I get the word that the changes I have installed on DEV produce the results you now want, I'll install those changes on QA.

Comment entered 2021-06-02 17:41:24 by Osei-Poku, William (NIH/NCI) [C]

Verified on DEV. Please install on QA. Thanks!

Comment entered 2021-06-02 17:56:18 by Kline, Bob (NIH/NCI) [C]

Modifications installed on QA.

Comment entered 2021-06-08 16:58:36 by Osei-Poku, William (NIH/NCI) [C]

Verified on QA. Thanks!

Comment entered 2021-06-21 12:30:02 by Osei-Poku, William (NIH/NCI) [C]

Verified on PROD. Thanks!

Attachments
File Name Posted User
Glossary and Media Keyword search report_updated.docx 2020-09-23 10:22:35 Osei-Poku, William (NIH/NCI) [C]
Glossary and Media Keyword search report.docx 2019-10-31 12:41:35 Osei-Poku, William (NIH/NCI) [C]
Glossary Keyword search report_Corrections_Selection_Criteria.docx 2020-10-08 10:48:56 Osei-Poku, William (NIH/NCI) [C]
Glossary Keyword search report_Corrections.docx 2020-10-07 15:59:32 Osei-Poku, William (NIH/NCI) [C]
Glossary Keyword Search Report_results.PNG 2021-05-25 12:01:08 Osei-Poku, William (NIH/NCI) [C]
Glossary Keyword search report.docx 2020-09-24 09:53:47 Osei-Poku, William (NIH/NCI) [C]
Glossary Keyword Search Report.PNG 2021-05-25 12:01:08 Osei-Poku, William (NIH/NCI) [C]

Elapsed: 0:00:00.001653