CDR Tickets

Issue Number 4072
Summary PCIB Statistics Report - Genetics Dictionary Terms Counted Twice
Created 2016-04-01 09:56:17
Issue Type Bug
Submitted By Juthe, Robin (NIH/NCI) [E]
Assigned To Englisch, Volker (NIH/NCI) [C]
Status Closed
Resolved 2016-11-10 13:44:08
Resolution Won't Fix
Path /home/bkline/backups/jira/ocecdr/issue.181624
Description

I think the Genetics Dictionary New & Revised terms are being counted twice on this report - both as new/revised dictionary terms and as new/revised Genetics Dictionary terms. They should only be counted for the Genetics Dictionary totals (unless the same term was also added to the Dictionary of Cancer Terms in the same time frame).

Comment entered 2016-06-21 13:10:31 by Englisch, Volker (NIH/NCI) [C]

I looked at the report for May 2016 and there are 4 new genetics terms listed:

  • X-linked dominant

  • antioncogene

  • insertion

  • tumor suppressor gene

Only two of these are also counted in the section for new glossary terms

  • X-linked dominant

  • insertion

My first question: Is the list of new genetics terms correct - should the list include 2 or 4 terms?

Comment entered 2016-08-16 16:57:43 by Juthe, Robin (NIH/NCI) [E]

The list of 4 genetics terms is correct, but the terms X-linked dominant and insertion were not added to the dictionary of cancer terms, so they should not show up there.

Tumor suppressor gene and antioncogene were already in the dictionary of cancer terms, so that's why only two of them showed up as "new" in the patient dictionary section. It's tough to tell if they are also being counted as revised terms in the patient dictionary since we don't show the list of terms, but I suspect they might be.

Comment entered 2016-08-23 17:39:10 by Englisch, Volker (NIH/NCI) [C]

, is this ticket still relevant after Bob's rewrite of the PCIB report?
I'm starting now to look at the email thread of the PCIB report changes but it sounded that he has fixed a few bugs along the way.

Comment entered 2016-08-25 13:48:25 by Juthe, Robin (NIH/NCI) [E]

This ticket is still relevant, as in it hasn't been fixed, although I think we're going to live with things the way the are because it would be a lot of work to get this exactly right.

I'm pasting Bob's email comment below, as this helps explain why this would be so difficult. That way we'll have it the next time we go hunting this down.

"Modifying the report to determine when a term was first published in a particular dictionary would be a massive change, and I would be inclined to say it would be out of scope even for the extensive rewriting I'm doing for this task. I can think of a couple of approaches which could be taken to implement this change. In the first approach, the report software would have to potentially retrieve and parse every published version of every term name document as well as figure out which version of the concept document was current at the time of the publication and parse that, too (imagine, in an extreme example, a document with many versions going back ten years or more, published in dictionary X at the beginning of that time, then removed from that dictionary, then put back in the dictionary in the last version). To do it really right, we'd even have to continue going backward through the GlossaryTerm document versions prior to the term-name/concept split. Even if we take a shortcut and say we can limit how far back we have to look by treating a term as "first published" in a dictionary a second time in the edge case I just described, we'd still be in the business of looking through at least some older versions of the documents. A second approach would involve modifying the schema to add tracking of the first publication date for each dictionary. We'd have to decide how this information gets maintained (both manual maintenance and automated maintenance have potential pitfalls and challenges), as well as whether we'd want some kind of global change to seed the new elements. If we went the automated maintenance route, we might want to consider a new database table instead of modifying the documents themselves. Modifying the already complicated publishing software to keep such a table populated would not be for the faint of heart. :-)"

Comment entered 2016-11-10 13:43:32 by Osei-Poku, William (NIH/NCI) [C]

Closed in the status meeting.

Elapsed: 0:00:00.000557