Issue Number | 5020 |
---|---|
Summary | [GovDelivery] Modify GovDelivery report to include only published documents |
Created | 2021-08-23 14:48:07 |
Issue Type | New Feature |
Submitted By | Osei-Poku, William (NIH/NCI) [C] |
Assigned To | Englisch, Volker (NIH/NCI) [C] |
Status | Closed |
Resolved | 2023-02-14 18:33:04 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/ocecdr/issue.297110 |
The linked summary documents in the gov delivery email could refer to a summary that is yet to be published to the live sight, making the report misleading. Other document types may be affected as well depending on when the user makes it publishable. Please review and modify the report as appropriate. I am including the email exchanges with Volker below:
From: Englisch, Volker (NIH/NCI) [C]
<volker@mail.nih.gov>
Sent: Monday, August 23, 2021 12:29 PM
To: Osei-Poku, William (NIH/NCI) [C]
<oseipokuw@mail.nih.gov>
Subject: Re: [PROD] GovDelivery New/Changed English
Summaries Report (2021-08-15 to 2021-08-21)
Hi William,
It seems that the links for the HP titles are pointing to the “Changes” section which only exists for HP summaries.
You are correct, that is misleading. The problem here is the
result of a timing issue.
From what I can see the version 187 of the summary was part of the
publishing summary set on Friday but the summary was not pushed to
Cancer.gov because there were no changes. The publishing job starts at
4pm, which is our cut-off but at 4:55 a new, publishable version of the
summary had been created (V-189). When the summary is saved as a
publishable version the table ‘query_term_pub’ get’s updated.
On Sunday morning at 0:30am the scheduled GovDelivery report runs to
check for summary updates for the previous week (Sun – Sat) using the
DLM that’s stored in the table ‘query_term_pub’ as the inclusion
criteria. Therefore, the fact that the summary had been made
publishable after the publishing job
ran but before the GovDelivery report
started made this summary appear on this week’s report.
Please file a ticket if you feel that this edge case needs to be addressed. It seems to me that it would be simple to drop this summary from the report by double-checking if it was part of the push to Cancer.gov. However, we would need to think about how to deal with the fact that you would want this document to be included in the report that will run in the following week even though its DLM does not match the specified time frame.
Thanks,
Volker
Volker Englisch
NCI OCPL – Office of Communications & Public Liaison
Contractor: publicis sapient
NCI: 240-276-6583
From: "Osei-Poku, William (NIH/NCI) [C]" <oseipokuw@mail.nih.gov>
Date: Monday, August 23, 2021 at 08:07
To: Volker Englisch <volker@mail.nih.gov>
Subject: FW: [PROD] GovDelivery New/Changed English
Summaries Report (2021-08-15 to 2021-08-21)
Good morning Volker,
I have two questions regarding this report –
Is there a reason why there are links for the health professional titles but no links for patient summaries or other document types?
Are the summaries included based on the publishable state or published state? If you look 454520 the LDM is in April while the report heading says “GovDelivery New/Changed English Summaries Report (2021-08-15 to 2021-08-21)”. It seems misleading.
Thanks,
William
From: NCI PDQ Operator <NCIPDQOperator@mail.nih.gov>
Sent: Sunday, August 22, 2021 12:32 AM
To: Shields, Victoria (NIH/NCI) [E] <vshields@mail.nih.gov>;
Englisch, Volker (NIH/NCI) [C] <volker@mail.nih.gov>; Beckwith,
Margaret (NIH/NCI) [E] <mbeckwit@icic.nci.nih.gov>;
Baldwin, Robin (NIH/NCI) [E] <robin@mail.nih.gov>; Osei-Poku,
William (NIH/NCI) [C] <oseipokuw@mail.nih.gov>;
Broun, Kevin (NIH/NCI) [E] <brounk@mail.nih.gov>; Ferguson,
Bonnie (NIH/NCI) [C] <bonnie.ferguson@nih.gov>;
Norwood, Christina (NIH/NCI) [E] <christina.norwood@nih.gov>;
Reyes, Kimberly (NIH/NCI) [E] <reyesk@mail.nih.gov>; Hansen,
Leshia (NIH/NCI) [E] <leshia.hansen@nih.gov>
Subject: [PROD] GovDelivery New/Changed English
Summaries Report (2021-08-15 to 2021-08-21)
Report date: 2021-08-22
New Health Professional Summaries |
None |
Revised Health Professional Summaries |
|
CDR ID |
Title |
New Patient Summaries |
None |
Revised Patient Summaries |
|
CDR ID |
Title |
Bile Duct Cancer (Cholangiocarcinoma) Treatment |
|
Cancer Prevention Overview |
|
Childhood Acute Myeloid Leukemia/Other Myeloid Malignancies Treatment |
|
Fatigue |
|
Wilms Tumor and Other Childhood Kidney Tumors Treatment |
New Drug Information Summaries |
|
CDR ID |
Title |
Asparaginase Erwinia Chrysanthemi (Recombinant)-rywn |
Revised Drug Information Summaries |
|
CDR ID |
Title |
Asparaginase Erwinia Chrysanthemi |
|
Dostarlimab-gxly |
This report had been created with OCECDR-3989 in 2015. Looking at the original requirements the report was designed to be looking at the information for publishable versions. Before we're making changes and compare the output to the information displayed on Cancer.gov I would like for others to weigh in as well.
A solution to this "misleading" display of summaries on the report might be as simple as changing the report title to make it clear of what we're looking at.
Volker will capture the current logic and report job timing and post them to this ticket. Then we might want to talk to Kim and clarify what is really needed for this report. Volker has copies of the older reports, in case we need to look at them to find out what's included. These discussions might evolve into a broader investigation into expanding the rôle of the GovDelivery "newsletter" to do more PDQ marketing/branding.
The GovDelivery report is actually two reports that are run via our scheduler. The "English" version run on Sundays at 0:30h while the "Spanish" version runs on Sundays at 2:30h. Originally these two reports were run at the same time but the Spanish version started failing when run together with the English version, so we separated both.
The report is delivered as an email message that includes 6 sections (4 sections in Spanish):
New HP Summaries
Revised HP Summaries
New Patient Summaries
Revised Patient Summaries
New DIS (English only)
Revised DIS (English only)
The report when started with default options includes the documents with versions made publishable between the previous Sunday and the previous day (Saturday). Note, this is the default date range for the scheduled job because it is scheduled to run on a Sunday. Running the job on a Thursday, for instance, would include the date range from the previous Thursday through Wednesday.
The selection of new summaries looks for the
"first_pub" value of the document table to fall within the date
range.
The selection of revised summaries looks for the value
"DateLastModified" to fall within the date range.
For making any changes to this report please note that we have an issue with precision of the dates we're using. We can identify the documents that are published between two publishing events, say the documents published between Friday, 4pm of last week to Friday 4pm of this week. However, the precision of the DateLastModified is a day, i.e. 2021-08-26. For two documents revised on the same day that the publishing job is running and one document made publishable before job started and the other document after, this element DLM alone is not sufficient to separate these documents out for this week's report or next week's report.
Attaching screenshots of most recent GovDelivery reports (English version):
This is another example Ning/Carolyn just brought to my attention. This is from the official Gov Delivery email that goes to the public. Gastrointestinal Complications go to a summary that has not yet updated on Cancer.gov because it was published late last Friday 8/27/21.
You are subscribed to PDQ Updates for Health Professionals from the National Cancer Institute. NCI newly published or updated these health professional summaries within the last week.
Adding a screenshot of the changes section since this will be updated after this week's publishing.
Robin will look at this ticket and decide whether we should pursue this now.
Victoria and I reviewed this ticket and we'd like to move ahead with making this change.
The GovDelivery report specifies a date range from (the previous) Sunday up to (and excluding) Sunday. With the requested changes we'll have to expand this date range to include those outlier documents up to the previous Export job running prior to the previous Sunday, therefore actually showing a date range from Friday to Sunday (or possibly Saturday to Sunday if the Friday job failed and was rerun over the weekend).
Currently, those date ranges shown are, for instance,
2022-09-25 to 2022-10-01
2022-10-02 to 2022-10-08
2022-10-09 to 2022-10-15, etc.
With the new date ranges consecutive weeks would show overlapping
date ranges:
2022-09-23 to 2022-10-01
2022-09-30 to 2022-10-08
2022-10-07 to 2022-10-15, etc.
Would you prefer to keep the one-week headers or the actual date ranges?
The GovDelivery report is picking up documents for which the DLM (Date Last Modified) is within a specific date range - in the image above between the Start and End date. Only the document Doc 2 satisfies this condition and will be picked up by our report. However, if a document is made publishable after the weekly publishing job started (after End), then Doc 3 will also be picked up by our report although that document had not been published yet to Cancer.gov and this ticket is going to fix this issue.
The change to the report will be to identify if a document version has been made publishable after the weekly publishing job started and then exclude that document (Doc 3). The excluded document will then be included when the next GovDelivery report gets created. The End point for this weeks GD report will be the Start date for the next GD report.
A document version that is represented by Doc 3 will not be included on the GD report but this could be a problem.
Let's look now at the same document but with publishable versions created at different times. Let's say when the report runs, Version 3 of a document has been created as a publishable version after the End date. Therefore the document is excluded from the GD report. But it is possible that that same document was also made publishable within the correct date range between Start and End as Version 2. We can't easily find out if the DLM fits within the date range because we only store 1 DLM in the query_term_pub table - the date of the last publishable version, which is Version 3. We would need an additional safeguard to load the older version document, inspect the DLM and decide if the date does fit within the correct date range and then make the decision if the particular document should be excluded or not.
This can be done but will of course make the code more complicated. In my opinion, we are updating the report to prevent the picking up of a document not yet published. It is an edge case that a publishable version of a document is created between the time the publishing job runs and the report is created and to my knowledge this issue occurred just once in more than a year that the GD report has run. Having a publishable version of a document created during our date range and also after the publishing job started, seems to me to be an edge case of an edge case. I'm therefore suggesting not to include additional code to prevent the second level edge case unless ~juther or ~oseipokuw feel this additional step is necessary.
~volker This sounds good to me. We can also advise editors to be careful when creating multiple publishable versions (which we do from time to time) of the same document, especially before and after the publishing deadline. That is, it would be better to wait until the next publishing "cycle" before making another publishable version unless it is a critical change.
The report has been modified to only include documents between the last two successful weekly publishing events. This means that instead of setting the default start and end dates from Sunday through Saturday, the start and end dates are now set to be around Friday, 4pm, when the weekly publishing job starts.
The following program has been updated:
gov_delivery_reports.py
https://github.com/NCIOCPL/cdr-scheduler/commit/6d761ff
This is ready for review on DEV.
Please note:
Since the PROD version of the GovDelivery report is currently running on
DEV, I created a temporary copy of the modified script that would need
to be run manually in order to create the report for testing.
Hi ~volker , Do you need to run this manually on QA for me to check?
~oseipokuw , I'm trying to prep QA for a publishing job. I will first need to run a publishing job, then you will need to make a document version publishable, and then I will run the GovDelivery report which should not include the latest publishable version of that document you prepared.
I will let you know once the full publishing job finished successfully.
A full publishing job finished successfully now. ~oseipokuw , you can go ahead and prepare a summary document and I will follow that up and run the GovDelivery report.
Hi ~volker I have made a few summary documents publishable on QA. Thanks!
Verified on QA. Thanks!
Verified on PROD. Thanks!
File Name | Posted | User |
---|---|---|
Gov Delivery.PNG | 2021-09-02 19:30:59 | Osei-Poku, William (NIH/NCI) [C] |
Gov Delivery-1.PNG | 2021-09-02 19:31:55 | Osei-Poku, William (NIH/NCI) [C] |
Screen Shot 2021-08-27 at 11.42.06.png | 2021-08-27 11:44:48 | Englisch, Volker (NIH/NCI) [C] |
Screen Shot 2021-08-27 at 11.47.22.png | 2021-08-27 11:49:01 | Englisch, Volker (NIH/NCI) [C] |
Screen Shot 2021-08-27 at 11.47.55.png | 2021-08-27 11:49:01 | Englisch, Volker (NIH/NCI) [C] |
Screen Shot 2021-08-27 at 11.48.16.png | 2021-08-27 11:49:01 | Englisch, Volker (NIH/NCI) [C] |
Screen Shot 2022-12-12 at 12.30.06.png | 2022-12-12 19:49:46 | Englisch, Volker (NIH/NCI) [C] |
Screen Shot 2022-12-12 at 12.35.00.png | 2022-12-12 20:04:15 | Englisch, Volker (NIH/NCI) [C] |
Elapsed: 0:00:00.002161