CDR Tickets

Issue Number 5105
Summary Possible report to generate counts of various PDQ content
Created 2022-03-18 12:45:58
Issue Type Improvement
Submitted By Juthe, Robin (NIH/NCI) [E]
Assigned To Kline, Bob (NIH/NCI) [C]
Status Closed
Resolved 2023-03-14 15:41:05
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.313461
Description

I often pull together counts of published PDQ content of various types (see attached PPT slide as an example). We have various lists reports already, and the number of DoCT terms is on Cancer.gov, but it would helpful to have a single report that could be generated so I don't have to look in a bunch of different places. Let's discuss.

Comment entered 2022-03-31 13:51:50 by Kline, Bob (NIH/NCI) [C]

Robin will add specific criteria for any rows on the report which aren't obvious (particularly the media counts). Bob will then come up with an ad-hoc report and we'll take it from there.

Comment entered 2023-03-10 17:06:26 by Juthe, Robin (NIH/NCI) [E]

I think we discussed that this would likely be an ad-hoc report, but here are the numbers I'm interested in getting and where I think they could come from:

  • PDQ English HP summaries

    • Summaries Lists report - please use Summaries & Modules selection

  • PDQ Spanish HP summaries

    • Summaries Lists report - please use Summaries & Modules selection

  • PDQ English patient summaries

    • Summaries Lists report - please use Summaries & Modules selection

  • PDQ Spanish patient summaries

    • Summaries Lists report - please use Summaries & Modules selection

  • PDQ English SVPC summaries

    • Summaries Lists report

  • PDQ Spanish SVPC summaries

    • Summaries Lists report

  • Drug information summaries (only available in English)

    • Drug Info Summaries Lists report - please combine total of single agent summaries & combination summaries

  • NCI Dictionary of Cancer Terms in English

    • I currently get this from Cancer.gov - I'd like the number of published glossary terms rather than the number of definitions

  • NCI Dictionary of Cancer Terms in Spanish

    • I currently get this from Cancer.gov - I'd like the number of published glossary terms rather than the number of definitions

  • NCI Dictionary of Genetics Terms in English

    • I currently get this from Cancer.gov - I'd like the number of published glossary terms rather than the number of definitions

  • NCI Dictionary of Genetics Terms in Spanish

    • I currently get this from Cancer.gov - I'd like the number of published glossary terms rather than the number of definitions

  • NCI Drug Dictionary Terms (only available in English)

    • I currently get this from Cancer.gov by manually counting the number of terms for each letter. I don't think we have an existing report to generate this statistic but I think we could get this number by identifying publishable Term documents that have a definition block and a semantic type of drug/agent.

  • Biomedical Images and Animations

    • Media Lists report - this should combine the number of images + videos (unless we start adding a bunch of other types of videos, this seems to make the most sense - we have just 3 animations) BUT I'd like to exclude images that we're reusing from journals or something like that. To do that I think you could exclude any images that have a Permission Information block (see CDR788250 as an example)

I'm sure there will be questions, so just let me know as those come up. Thank you!!

Comment entered 2023-03-14 09:08:50 by Kline, Bob (NIH/NCI) [C]

do you want the counts restricted to documents which have actually been published?

Edit: answering my own question: I guess not, since you've asked for modules to be included, and they won't show up in the pub_proc_cg table.

Second edit: I guess it's not as simple as that, as I see that for the glossary terms you want the published documents.

Comment entered 2023-03-14 10:48:55 by Kline, Bob (NIH/NCI) [C]

Are we going back to using "PDQ" when referring to the SVPC summaries, then?

Comment entered 2023-03-14 15:41:05 by Kline, Bob (NIH/NCI) [C]

Report attached.

Comment entered 2023-03-14 16:10:32 by Kline, Bob (NIH/NCI) [C]
Comment entered 2023-03-15 22:00:48 by Juthe, Robin (NIH/NCI) [E]

I already love this report. 🙂 It will be a big time saver for me.

 

In response to your first question, I'd like this to reflect publishable documents for all categories. I see your point about summaries and modules, but if possible I'd like to include only the publishable modules (those that are published as both a module and a standalone summary – in other words, they have a "yes" value for the AvailableAsModule attribute but no value for the ModuleOnly attribute).

 

In response to your second question, yes, that was a mistake to include "PDQ" at the beginning of the SVPC summary items. Please remove PDQ from those rows. Thanks!

 

One additional request: 

for biomedical images and animations, is it possible to break that down by language?

Comment entered 2023-03-16 09:21:40 by Kline, Bob (NIH/NCI) [C]

A while ago I tried to explain that we were inviting confusion by our use of the unqualified word "module" to refer to different things at different times. I don't think I explained the problem very well, as the response I got was mostly along the lines of "we always know what we mean." This ticket provides a pretty good example of the confusion I was ineptly trying to warn about. For the Summaries Lists report, I was explicitly told that for the context of that report the word "module" referred to a document which could not be published separately. So when your original requirements for this ticket asked for using the Summaries and Modules selection logic of the Summaries Lists report, I created queries for this new report which also include documents which can only be used as modules, just as the Summaries Lists report does. However from your most recent comment, I can see that this was not what you really wanted, and you don't want the "modules" (as defined for that older report). I'll rewrite the queries. 😉

As for splitting the media by language, can I assume that I should use the presence of a TranslationOf element to indicate Spanish, and the absence of that element to mean English? (The @language attributes sprinkled around in those documents are kind of a muddle, and therefore unreliable, as they contradict each other; see, for example, the discussion in OCECDR-5095.) I'll wait for your answer to this question before proceeding with making the requested changes.

If the Media documents really are all language-specific, as William says (and as your request implies), I have to wonder why we don't have a required Language element or attribute at the top level of the document, as we do for Summary documents. 😛

Comment entered 2023-03-16 09:33:37 by Kline, Bob (NIH/NCI) [C]

One more clarification. The original requirements asked that the report include only "published" glossary terms, but your most recent comment says "I'd like this to reflect publishable documents for all categories." I just want to confirm that you want me to remove the restriction to glossary documents which show up in the pub_proc_cg table (that is, they're actually available to the web site and the data partners) and only make sure that a publishable version exists and the document isn't blocked. Won't make a huge difference most of the time, but there can be a gap.

Comment entered 2023-03-16 13:12:30 by Juthe, Robin (NIH/NCI) [E]

Let's discuss these questions/clarifications in our status meeting shortly.

Comment entered 2023-03-16 13:45:22 by Juthe, Robin (NIH/NCI) [E]

We discussed adding this report to the menus. Please add it to the OCC Board Managers page, under PCIB Management Reports. It could be #3: PDQ Content Counts.

 

Thank you!

Comment entered 2023-03-17 04:34:36 by Kline, Bob (NIH/NCI) [C]

All the enhancements have been implemented, and the report is installed (on DEV, though the data comes from PROD) on the admin menu as requested. I've got menu entries for both HTML and Excel. If you'd prefer to have just one format on the menu, let me know which one and I'll remove the other.

Comment entered 2023-03-21 13:53:42 by Juthe, Robin (NIH/NCI) [E]

Fine to have both options. Looks great on DEV!

Comment entered 2023-05-11 14:31:20 by Juthe, Robin (NIH/NCI) [E]

The HTML version of this report looks good on QA, but I'm getting a "Failed - Disk Full" message when I try to run the Excel version. Is it just me?

Comment entered 2023-05-11 14:55:36 by Kline, Bob (NIH/NCI) [C]

I just tried it without any problems. And looking at the QA server, all the disks seem to have plenty of free space. Want to do a screen share?

Comment entered 2023-05-12 20:56:50 by Juthe, Robin (NIH/NCI) [E]

Turned out to be a problem with disk space on the virtual machine. Once that resolved, I was able to verify this on QA. Thanks!

Comment entered 2023-07-05 08:57:32 by Osei-Poku, William (NIH/NCI) [C]

Verified on PROD. Thanks!

Attachments
File Name Posted User
PDQ Content Counts.xlsx 2023-03-14 15:39:42 Kline, Bob (NIH/NCI) [C]
What is PDQ Now - Statistics.pptx 2022-03-18 12:45:55 Juthe, Robin (NIH/NCI) [E]

Elapsed: 0:00:00.001678