Issue Number | 5139 |
---|---|
Summary | [LOE Adult/Peds] Global replace of LOEs in adult and pediatric summaries |
Created | 2022-09-15 12:56:53 |
Issue Type | Task |
Submitted By | Shields, Victoria (NIH/NCI) [E] |
Assigned To | Kline, Bob (NIH/NCI) [C] |
Status | Closed |
Resolved | 2022-09-23 12:51:39 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/ocecdr/issue.327524 |
The Levels of Evidence system used by the Adult and Pediatric Treatment Boards has been revised. The current Levels of Evidence included in the summaries, tagged as LOERefs in the CDR, need to be updated with the new Levels.
An example of the current format used in the text is:
[Level of evidence: 1iiA]
The corresponding term is:
Level of evidence 1iiA
An example of the new format used in the text is:
[Level of evidence A1]
The corresponding term is the same:
Level of evidence A1
The terms have been created in the CDR and published, in both English and Spanish.
Attached is a Word table that shows the mapping between the current and new LOEs.
Levels of Evidence_Adult_Peds_Mapping Old to New_09_15_2022.docx
Software can't reliably pull values from a table in a Word document. I have pasted the table into the attached Excel workbook.
What should we do with the row with a question mark for the new LOE?
I assume we're going straight to production (in test mode at first) since that's where the new values are loaded, right?
EEK! That question mark shouldn't be there. Sorry about that!
Yes, the terms are on PROD.
New workbook attached. Had to do a bunch of tweaking to the name values to get them to match.
Thanks, Bob! Hi ~vshields , if it is OK I will ask Stacy and team to review the new spreadsheet and proceed with the global on QA once they are done with the review.
Yes, ~oseipokuw , please talk to Stacy and proceed with the global replace on QA. Thanks.
Stacy has finished reviewing the terms in the spreadsheet and confirmed that they are good to go. So we can proceed with running in test mode on QA. Thanks!
What should we do with the row with a question mark for the new LOE?
I assume this is no longer and issue since you're using the CDR ID instead of the term names?
I got my answer to that question both in the response from Victoria below, as well as in last Thursday's meeting. Reflected in the latest spreadsheet.
Just so you know, I ran across some summaries which conflicted with
my picture of how the Spanish summaries were supposed to work. I thought
I had been told that the real boards weren't linked directly in the
Spanish summaries, but instead the PDQBoard links were to a fake board,
and that the only way to find out what the real editorial board for a
Spanish summary is was to follow the TranslationOf link and pull the
editorial board out of the English summary of which this is a
translation. However, in assembling the logic to identify the summaries
which should be processed for this global change, I came across six
summaries which had direct links to the real PDQ Editorial boards AS
WELL AS a TranslationOf
link to the English summary.
SELECT distinct t.doc_id AS "Doc ID"
FROM query_term t
JOIN query_term b
ON b.doc_id = t.doc_id
WHERE t.path = '/Summary/TranslationOf/@cdr:ref'
AND b.path = '/Summary/SummaryMetaData/PDQBoard/Board/@cdr:ref'
AND b.int_val IN (28327, 28557)
AND t.int_val IN (
SELECT DISTINCT doc_id
FROM query_term
WHERE path = '/Summary/SummaryMetaData/PDQBoard/Board/@cdr:ref'
AND int_val IN (28327, 28557)
)
Doc ID |
---|
611985 |
772163 |
800324 |
800326 |
800370 |
800372 |
Not going to impede my progress on the global change, but I thought it possible that someone might want to be aware of these anomalies.
Thanks, Bob! They have now been fixed on PROD. 772163 is an English summary so it should have a link to a real board. However it is a Temp Doc that has been abandoned and can be deleted from the CDR.
JIRA appears to have discarded my previous comment. Test mode on QA has completed.
https://cdr-qa.cancer.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2022-09-23_12-08-56
One thing you can do to make it a little easier to find the changes is to search for the caret character (^) which is used to mark the differences.
Please run the global in live mode on QA. Thanks!
Done.
2022-09-27 14:04:17.020 [INFO] Run completed.
Docs examined = 334
Docs changed = 334
Versions changed = 726
Could not lock = 0
Errors = 0
Time = 0:54:34.082336
Specific versions saved:
new cwd = 69
new pub = 242
new ver = 150
old cwd = 130
The text within the LOERef elements in the Spanish summaries should read "Nivel de evidencia ..." as in the glossary terms (CDR0000810025, example). They are currently displaying the English text "Level of evidence .." (CDR0000256668, example).
I didn't see anything in the ticket with that requirement.
The live mode did exactly what the test mode did.
I think I should have included a document that mapped the Spanish terms (old to new) like I did for the English. If I provide that, would you be able to update the Spanish summaries? And should I open a new ticket for this part of the task? Sorry I missed that when I created this ticket.
What you would need to do, I think, is provide a seventh column to the latest spreadsheet (ocecdr-5139-names-and-ids.xlsx) with the Spanish names, unless they can all be reliably derived from the English names by mechanically replacing "Level of evidence " with "Nivel de evidencia " in the Spanish summaries.
We can't just perform the live run a second time, because the document IDs for the terms which the script is looking for aren't there any more. That's why it's unfortunate that the requirement didn't make it into the original ticket nor was it caught in the review of the test-mode run. William would have to put in another ticket for Volker to refresh QA again in order to do another live-mode run. And if you go that route I would strongly recommend that another test run be performed and carefully reviewed before running the job in live mode again.
The alternative to refreshing QA again is to create another global change job to replace the text content of the elements in the Spanish summaries. For that we'd need the map of new IDs to Spanish strings. If we go this route, we'd want to avoid fixing the original script for this ticket, because in order for what we test on QA to be of any use in verifying that what we will do on PROD will be correct is to run the unaltered first script on production, creating the wrong term names for the Spanish summaries, and then run the second global change job to fix that problem.
Make sense?
I will ask Linda to update the spreadsheet if the text cannot be replaced with "Nivel de evidencia " in all cases. I will also create another ticket for QA to be refreshed so we go through another test run and careful review again before a live run on QA.
Linda confirmed that the text will read "Nivel de evidencia" in all cases. Does that mean there is no need to update the spreadsheet? Will all the different level values display correctly even without the updated spreadsheet?
Yes, assuming all the software needs to do is exactly what it did during the previous two runs, except replace "Level of evidence " in the name of the text value to "Nivel de evidencia " for the Spanish summaries.
Another test mode done on QA:
2022-09-29 16:46:58.168 [INFO] Run completed.
Docs examined = 334
Docs changed = 0
Versions changed = 910
Could not lock = 0
Errors = 0
Time = 0:37:05.709539
I see that they used to call it "Grado de comprobación."
https://cdr-qa.cancer.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2022-09-29_16-09-52
Review of test results is complete. Please run in live mode on QA. Thanks!
Done.
2022-10-04 16:00:55.104 [INFO] Run completed.
Docs examined = 334
Docs changed = 334
Versions changed = 726
Could not lock = 0
Errors = 0
Time = 0:56:17.399866
Specific versions saved:
new cwd = 69
new pub = 242
new ver = 150
old cwd = 134
As a side note, I noticed that we have a LOT of blocked summary documents.
Please run the global in test mode on PROD. Thanks!
Done.
2022-10-07 14:47:04.547 [INFO] Run completed.
Docs examined = 334
Docs changed = 0
Versions changed = 909
Could not lock = 0
Errors = 0
Time = 0:53:48.420192
https://cdr-qa.cancer.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2022-10-07_13-53-16
Please run in live mode on PROD.
Done.
2022-10-12 12:12:10.168 [INFO] Run completed.
Docs examined = 318
Docs changed = 317
Versions changed = 682
Could not lock = 0
Errors = 0
Time = 1:25:18.408458
Specific versions saved:
new cwd = 69
new pub = 232
new ver = 133
old cwd = 133
Hi ~bkline
Do you know why these two summaries appear to have not been updated on PROD?
779396 and 779398
That was a bug in the script's query. Unfortunately, we didn't run the job in test mode on PROD, so we didn't catch it.
We ran it in test mode on PROD. Please see comments below. Would you be able to run an ad hoc query to identify only the ones that were not updated? I ran a simple query that identified these docs below, but I am not sure if that is the complete list.
779396
779398
778295
777844
780682
781009
781609
780118
801593
Unfortunately, we didn't run the job in test mode on PROD, ...
I think JIRA's comment-suppression bug was just doing its thing. 😛
25 summaries still need to be processed
777844
778295
779396
779398
780118
780682
781009
781609
805701
805704
805868
806272
806827
807006
808521
808522
809267
810147
810237
810726
810727
810728
810743
810760
810761
Thanks, Bob! Please let's proceed to run the global for these documents. I should have been specific to run in test mode for these documents.
Before you run the global in mode for the identified documents. Could you also check why this document CDR62941 is not on the list. It is not a Module Only document and I expected to see it on the list.
Don't think ModuleOnly
would make a difference for this
global. There are no LOERef
rows in the
query_term
table for that document on PROD. I can think of
no reason why that would be, as I can see that there are four
LOERef
elements in the current working document. Will keep
digging.
We talked in this afternoon's meeting about proceeding with the followup global on prod, but we still haven't unraveled the mystery for why 62941 doesn't have any rows in the query_term table for LOERef links, even though the document has those elements. Do you want me to proceed anyway?
OK, I finally tracked down why this document doesn't have any rows in
the query_term
table for LOERef
elements. All
four of the LOERef
element are deeply nested inside
Insertion
elements whose RevisionLevel
causes
the revisions to be backed out for the resolved version of the document
on which the query_term
indexing is performed. So the
query_term
table is as it should be. I will proceed with
the followup test job.
Yes, proceed to run it in test-mode on PROD. Would you be able to include this one too CDR62941 ?
CDR62941 manually included.
2022-10-28 09:14:00.857 [INFO] Run completed.
Docs examined = 26
Docs changed = 0
Versions changed = 58
Could not lock = 0
Errors = 0
Time = 0:03:33.543087
Pleaser run in live mode on PROD. Thanks!
Done.
2022-10-31 09:54:27.212 [INFO] Run completed.
Docs examined = 26
Docs changed = 23
Versions changed = 52
Could not lock = 0
Errors = 0
Time = 0:05:58.964680
Specific versions saved:
new cwd = 6
new pub = 13
new ver = 16
old cwd = 8
Closing this ticket as all docs appear to be OK. Thank you!!
File Name | Posted | User |
---|---|---|
Levels of Evidence_Adult_Peds_Mapping Old to New_09_15_2022.docx | 2022-09-15 12:56:24 | Shields, Victoria (NIH/NCI) [E] |
ocecdr-5139.xlsx | 2022-09-15 13:23:06 | Kline, Bob (NIH/NCI) [C] |
ocecdr-5139-names-and-ids.xlsx | 2022-09-15 16:15:23 | Kline, Bob (NIH/NCI) [C] |
Elapsed: 0:00:00.001238