CDR Tickets

Issue Number 5020
Summary [GovDelivery] Modify GovDelivery report to include only published documents
Created 2021-08-23 14:48:07
Issue Type New Feature
Submitted By Osei-Poku, William (NIH/NCI) [C]
Assigned To Englisch, Volker (NIH/NCI) [C]
Status Closed
Resolved 2023-02-14 18:33:04
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.297110
Description

The linked summary documents in the gov delivery email could refer to a summary that is yet to be published to the live sight, making the report misleading. Other document types may be affected as well depending on when the user makes it publishable. Please review and modify the report as appropriate. I am including the email exchanges with Volker below:

 

From: Englisch, Volker (NIH/NCI) [C] <volker@mail.nih.gov>
Sent: Monday, August 23, 2021 12:29 PM
To: Osei-Poku, William (NIH/NCI) [C] <oseipokuw@mail.nih.gov>
Subject: Re: [PROD] GovDelivery New/Changed English Summaries Report (2021-08-15 to 2021-08-21)

 

Hi William,

 

  1. It seems that the links for the HP titles are pointing to the “Changes” section which only exists for HP summaries.

  2. You are correct, that is misleading.  The problem here is the result of a timing issue.
    From what I can see the version 187 of the summary was part of the publishing summary set on Friday but the summary was not pushed to Cancer.gov because there were no changes.  The publishing job starts at 4pm, which is our cut-off but at 4:55 a new, publishable version of the summary had been created (V-189).  When the summary is saved as a publishable version the table ‘query_term_pub’ get’s updated.
    On Sunday morning at 0:30am the scheduled  GovDelivery report runs to check for summary updates for the previous week (Sun – Sat) using the DLM that’s stored in the table ‘query_term_pub’ as the inclusion criteria.  Therefore, the fact that the summary had been made publishable after the publishing job ran but before the GovDelivery report started made this summary appear on this week’s report.

 

Please file a ticket if you feel that this edge case needs to be addressed.  It seems to me that it would be simple to drop this summary from the report by double-checking if it was part of the push to Cancer.gov.  However, we would need to think about how to deal with the fact that you would want this document to be included in the report that will run in the following week even though its DLM does not match the specified time frame.

 

Thanks,

 

        Volker

Volker Englisch
NCI OCPL – Office of Communications & Public Liaison

Contractor: publicis sapient
NCI: 240-276-6583

 

 

From: "Osei-Poku, William (NIH/NCI) [C]" <oseipokuw@mail.nih.gov>
Date: Monday, August 23, 2021 at 08:07
To: Volker Englisch <volker@mail.nih.gov>
Subject: FW: [PROD] GovDelivery New/Changed English Summaries Report (2021-08-15 to 2021-08-21)

 

Good morning Volker,

I have two questions regarding this report –

  1. Is there a reason why there are links for the health professional titles but no links for patient summaries or other document types?

  2. Are the summaries included based on the publishable state or published state? If you look 454520 the LDM is in April while the report heading says “GovDelivery New/Changed English Summaries Report (2021-08-15 to 2021-08-21)”. It seems misleading.

 

Thanks,

William

 

 

From: NCI PDQ Operator <NCIPDQOperator@mail.nih.gov>
Sent: Sunday, August 22, 2021 12:32 AM
To: Shields, Victoria (NIH/NCI) [E] <vshields@mail.nih.gov>; Englisch, Volker (NIH/NCI) [C] <volker@mail.nih.gov>; Beckwith, Margaret (NIH/NCI) [E] <mbeckwit@icic.nci.nih.gov>; Baldwin, Robin (NIH/NCI) [E] <robin@mail.nih.gov>; Osei-Poku, William (NIH/NCI) [C] <oseipokuw@mail.nih.gov>; Broun, Kevin (NIH/NCI) [E] <brounk@mail.nih.gov>; Ferguson, Bonnie (NIH/NCI) [C] <bonnie.ferguson@nih.gov>; Norwood, Christina (NIH/NCI) [E] <christina.norwood@nih.gov>; Reyes, Kimberly (NIH/NCI) [E] <reyesk@mail.nih.gov>; Hansen, Leshia (NIH/NCI) [E] <leshia.hansen@nih.gov>
Subject: [PROD] GovDelivery New/Changed English Summaries Report (2021-08-15 to 2021-08-21)

 

GovDelivery New/Changed English Summaries Report (2021-08-15 to 2021-08-21)

Report date: 2021-08-22

New Health Professional Summaries

None

 

Revised Health Professional Summaries

CDR ID

Title

62906

Adult Primary Liver Cancer Treatment

62787

Breast Cancer Treatment (Adult)

62726

Rectal Cancer Treatment

 

New Patient Summaries

None

 

Revised Patient Summaries

CDR ID

Title

258011

Bile Duct Cancer (Cholangiocarcinoma) Treatment

454520

Cancer Prevention Overview

258000

Childhood Acute Myeloid Leukemia/Other Myeloid Malignancies Treatment

62811

Fatigue

415235

Wilms Tumor and Other Childhood Kidney Tumors Treatment

 

New Drug Information Summaries

CDR ID

Title

805797

Asparaginase Erwinia Chrysanthemi (Recombinant)-rywn

 

Revised Drug Information Summaries

CDR ID

Title

717920

Asparaginase Erwinia Chrysanthemi

804890

Dostarlimab-gxly

Comment entered 2021-08-23 18:40:50 by Englisch, Volker (NIH/NCI) [C]

This report had been created with OCECDR-3989 in 2015.  Looking at the original requirements the report was designed to be looking at the information for publishable versions.  Before we're making changes and compare the output to the information displayed on Cancer.gov I would like for others to weigh in as well.

A solution to this "misleading" display of summaries on the report might be as simple as changing the report title to make it clear of what we're looking at.

Comment entered 2021-08-26 14:01:15 by Kline, Bob (NIH/NCI) [C]

Volker will capture the current logic and report job timing and post them to this ticket. Then we might want to talk to Kim and clarify what is really needed for this report. Volker has copies of the older reports, in case we need to look at them to find out what's included. These discussions might evolve into a broader investigation into expanding the rôle of the GovDelivery "newsletter" to do more PDQ marketing/branding.

Comment entered 2021-08-26 23:00:25 by Englisch, Volker (NIH/NCI) [C]

The GovDelivery report is actually two reports that are run via our scheduler.  The "English" version run on Sundays at 0:30h while the "Spanish" version runs on Sundays at 2:30h.  Originally these two reports were run at the same time but the Spanish version started failing when run together with the English version, so we separated both.

The report is delivered as an email message that includes 6 sections (4 sections in Spanish):

  • New HP Summaries

  • Revised HP Summaries

  • New Patient Summaries

  • Revised Patient Summaries

  • New DIS (English only)

  • Revised DIS (English only)

The report when started with default options includes the documents with versions made publishable between the previous Sunday and the previous day (Saturday). Note, this is the default date range for the scheduled job because it is scheduled to run on a Sunday.  Running the job on a Thursday, for instance, would include the date range from the previous Thursday through Wednesday.  

The selection of new summaries looks for the "first_pub" value of the document table to fall within the date range.
The selection of revised summaries looks for the value "DateLastModified" to fall within the date range.

Comment entered 2021-08-26 23:13:38 by Englisch, Volker (NIH/NCI) [C]

For making any changes to this report please note that we have an issue with precision of the dates we're using.  We can identify the documents that are published between two publishing events, say the documents published between Friday, 4pm of last week to Friday 4pm of this week.  However, the precision of the DateLastModified is a day, i.e. 2021-08-26.  For two documents revised on the same day that the publishing job is running and one document made publishable before job started and the other document after, this element DLM alone is not sufficient to separate these documents out for this week's report or next week's report.

Comment entered 2021-08-27 11:52:26 by Englisch, Volker (NIH/NCI) [C]

Attaching screenshots of most recent GovDelivery reports (English version):

 

 

 

Comment entered 2021-09-02 18:05:06 by Osei-Poku, William (NIH/NCI) [C]

This is another example Ning/Carolyn just brought to my attention. This is from the official Gov Delivery email that goes to the public. Gastrointestinal Complications go to a summary that has not yet updated on Cancer.gov because it was published late last Friday 8/27/21. 

 

You are subscribed to PDQ Updates for Health Professionals from the National Cancer Institute. NCI newly published or updated these health professional summaries within the last week.

 

Prostate Cancer Prevention

Comment entered 2021-09-02 19:31:58 by Osei-Poku, William (NIH/NCI) [C]

Adding a screenshot of the changes section since this will be updated after this week's publishing. 

Comment entered 2022-01-06 14:03:49 by Kline, Bob (NIH/NCI) [C]

Robin will look at this ticket and decide whether we should pursue this now.

Comment entered 2022-01-06 15:28:02 by Juthe, Robin (NIH/NCI) [E]

Victoria and I reviewed this ticket and we'd like to move ahead with making this change.

Comment entered 2022-10-31 19:30:38 by Englisch, Volker (NIH/NCI) [C]

The GovDelivery report specifies a date range from (the previous) Sunday up to (and excluding) Sunday.  With the requested changes we'll have to expand this date range to include those outlier documents up to the previous Export job running prior to the previous Sunday, therefore actually showing a date range from Friday to Sunday (or possibly Saturday to Sunday if the Friday job failed and was rerun over the weekend).

Currently, those date ranges shown are, for instance, 
2022-09-25 to 2022-10-01
2022-10-02 to 2022-10-08
2022-10-09 to 2022-10-15, etc.

With the new date ranges consecutive weeks would show overlapping date ranges:
2022-09-23 to 2022-10-01
2022-09-30 to 2022-10-08
2022-10-07 to 2022-10-15, etc.

Would you prefer to keep the one-week headers or the actual date ranges?

Comment entered 2022-12-12 20:25:39 by Englisch, Volker (NIH/NCI) [C]

 

The GovDelivery report is picking up documents for which the DLM (Date Last Modified) is within a specific date range - in the image above between the Start and End date.  Only the document Doc 2 satisfies this condition and will be picked up by our report.  However, if a document is made publishable after the weekly publishing job started (after End), then Doc 3 will also be picked up by our report although that document had not been published yet to Cancer.gov and this ticket is going to fix this issue.

The change to the report will be to identify if a document version has been made publishable after the weekly publishing job started and then exclude that document (Doc 3).  The excluded document will then be included when the next GovDelivery report gets created.  The End point for this weeks GD report will be the Start date for the next GD report.

A document version that is represented by Doc 3 will not be included on the GD report but this could be a problem.

Let's look now at the same document but with publishable versions created at different times.  Let's say when the report runs, Version 3 of a document has been created as a publishable version after the End date.  Therefore the document is excluded from the GD report.  But it is possible that that same document was also made publishable within the correct date range between Start and End as Version 2.  We can't easily find out if the DLM fits within the date range because we only store 1 DLM in the query_term_pub table - the date of the last publishable version, which is Version 3.  We would need an additional safeguard to load the older version document, inspect the DLM and decide if the date does fit within the correct date range and then make the decision if the particular document should be excluded or not.

This can be done but will of course make the code more complicated.  In my opinion, we are updating the report to prevent the picking up of a document not yet published.  It is an edge case that a publishable version of a document is created between the time the publishing job runs and the report is created and to my knowledge this issue occurred just once in more than a year that the GD report has run.  Having a publishable version of a document created during our date range and also after the publishing job started, seems to me to be an edge case of an edge case.  I'm therefore suggesting not to include additional code to prevent the second level edge case unless or feel this additional step is necessary.

Comment entered 2022-12-13 12:40:10 by Osei-Poku, William (NIH/NCI) [C]

  This sounds good to me. We can also advise editors to be careful when creating multiple publishable versions (which we do from time to time) of the same document, especially before and after the publishing deadline. That is, it would be better to wait until the next publishing "cycle" before making another publishable version unless it is a critical change.

Comment entered 2023-02-14 18:46:55 by Englisch, Volker (NIH/NCI) [C]

The report has been modified to only include documents between the last two successful weekly publishing events.  This means that instead of setting the default start and end dates from Sunday through Saturday, the start and end dates are now set to be around Friday, 4pm, when the weekly publishing job starts.  

The following program has been updated:

This is ready for review on DEV.

Please note:
Since the PROD version of the GovDelivery report is currently running on DEV, I created a temporary copy of the modified script that would need to be run manually in order to create the report for testing.

Comment entered 2023-05-16 11:15:02 by Osei-Poku, William (NIH/NCI) [C]

Hi , Do you need to run this manually on QA for me to check?

Comment entered 2023-05-16 16:23:50 by Englisch, Volker (NIH/NCI) [C]

, I'm trying to prep QA for a publishing job.  I will first need to run a publishing job, then you will need to make a document version publishable, and then I will run the GovDelivery report which should not include the latest publishable version of that document you prepared.

I will let you know once the full publishing job finished successfully.

Comment entered 2023-05-16 22:29:32 by Englisch, Volker (NIH/NCI) [C]

A full publishing job finished successfully now.   , you can go ahead and prepare a summary document and I will follow that up and run the GovDelivery report.

Comment entered 2023-05-17 09:15:36 by Osei-Poku, William (NIH/NCI) [C]

Hi   I have made a few summary documents publishable on QA. Thanks!

Comment entered 2023-05-17 12:55:24 by Osei-Poku, William (NIH/NCI) [C]

Verified on QA. Thanks!

Comment entered 2023-07-05 08:04:36 by Osei-Poku, William (NIH/NCI) [C]

Verified on PROD. Thanks!

Attachments
File Name Posted User
Gov Delivery.PNG 2021-09-02 19:30:59 Osei-Poku, William (NIH/NCI) [C]
Gov Delivery-1.PNG 2021-09-02 19:31:55 Osei-Poku, William (NIH/NCI) [C]
Screen Shot 2021-08-27 at 11.42.06.png 2021-08-27 11:44:48 Englisch, Volker (NIH/NCI) [C]
Screen Shot 2021-08-27 at 11.47.22.png 2021-08-27 11:49:01 Englisch, Volker (NIH/NCI) [C]
Screen Shot 2021-08-27 at 11.47.55.png 2021-08-27 11:49:01 Englisch, Volker (NIH/NCI) [C]
Screen Shot 2021-08-27 at 11.48.16.png 2021-08-27 11:49:01 Englisch, Volker (NIH/NCI) [C]
Screen Shot 2022-12-12 at 12.30.06.png 2022-12-12 19:49:46 Englisch, Volker (NIH/NCI) [C]
Screen Shot 2022-12-12 at 12.35.00.png 2022-12-12 20:04:15 Englisch, Volker (NIH/NCI) [C]

Elapsed: 0:00:00.002161