CDR Tickets

Issue Number 4954
Summary Global change to remove AltTitle elements that have Navlabel values
Created 2021-03-08 15:27:29
Issue Type Improvement
Submitted By Osei-Poku, William (NIH/NCI) [C]
Assigned To Englisch, Volker (NIH/NCI) [C]
Status Closed
Resolved 2021-03-09 19:08:55
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.286481
Description

Please run a global change to remove all AltTitle elements (including data) that have been marked with the Navlabel value for the TitleType attribute.

Comment entered 2021-03-09 19:08:46 by Englisch, Volker (NIH/NCI) [C]

I created the global change script Summary_AltTitle.py and started the first test run on DEV.

Comment entered 2021-03-09 21:10:59 by Englisch, Volker (NIH/NCI) [C]

The results of the test run are available on DEV

Comment entered 2021-03-10 13:26:46 by Englisch, Volker (NIH/NCI) [C]

Here are the stats for the global change run:

2021-03-09 20:36:01.844 [INFO] Run completed.
{{ Docs examined    = 556}}
{{ Docs changed     = 0}}
{{ Versions changed = 1576}}
{{ Could not lock   = 0}}
{{ Errors           = 0}}
{{ Time             = 1:36:22.311558}}

Comment entered 2021-03-11 11:56:01 by Osei-Poku, William (NIH/NCI) [C]

I have started looking at the results but the display of the elements for the Diff makes it difficult to review. There are two long lines of text with all the elements on the same line. I have to scroll all the way to the right in order to see the changes. If this could be formatted so that I don't have to scroll to the right, that would be great.

Comment entered 2021-03-11 13:01:23 by Englisch, Volker (NIH/NCI) [C]

I understand the issue but implementing an XML diff tool for this global change is probably out of scope.  That is not a simple tool to build.

Also, if you're sorting the report by the documents diff size you will see that only the first 10-15% of the documents are showing the behavior you're describing.  For the majority of the diffs the output does fit on a single page without horizontal scrolling needed (of course, that depends a little on the monitor size and resolution).

Comment entered 2021-03-18 11:12:04 by Osei-Poku, William (NIH/NCI) [C]

Test results look good on DEV. Please run in live mode on DEV.

Comment entered 2021-03-19 11:48:48 by Englisch, Volker (NIH/NCI) [C]

2021-03-18 20:20:22.300 [INFO] Run completed.
{{   Docs examined = 552}}
{{   Docs changed = 549}}
{{   Versions changed = 1161}}
{{   Could not lock = 3}}
{{   Errors = 0}}
{{   Time = 2:35:33.052141}}
Specific versions saved:
{{   new cwd = 41}}
{{   new pub = 461}}
{{   new ver = 151}}
{{   old cwd = 123}}

Comment entered 2021-03-19 11:52:36 by Englisch, Volker (NIH/NCI) [C]

The live mode on DEV finished (see the job summary in the comment above).

I will attach the log file in case you would like to see the blocked documents and those with warnings.  I was surprised to see that many validation warnings (since the data came from PROD) but I guess that's OK.

GlobalChangeLog.txt

Comment entered 2021-03-24 11:33:01 by Osei-Poku, William (NIH/NCI) [C]

Verified on DEV. Thanks!  Please run in test mode on QA. 

 

The errors appear to stem from the one glossary term with a rejected status - CDR0000302456 (There may be additional terms). (OCECDR-4950 is what is taking care of this issue during publishing). The global is likely to invalidate a lot of summaries especially on PROD. We would certainly prefer to fix this before we run the global in live mode on PROD.

Comment entered 2021-03-24 12:16:13 by Englisch, Volker (NIH/NCI) [C]

Why would the global invalidate documents?  The Navlabel version of the AltTitle is not mandatory and we're not making any changes to the schema.

Do you have an example of a document that became invalid?

Comment entered 2021-03-24 12:36:10 by Osei-Poku, William (NIH/NCI) [C]

This is one example from the logs which made me think that some of the documents would be invalidated, and there are several of them. Also, I did confirm from the live run that the CWD is invalid after the global. 

2021-03-18 20:10:43.993 [WARNING] CDR0000800326: Failed link target rule: /GlossaryTermName/TermNameStatus != "Rejected" 2021-03-18 20:10:43.993 [WARNING] CDR0000800326: Non-publishable version will be created. 2021-03-18 20:10:43.996 [WARNING] CDR0000800326: b'Failed link target rule: /GlossaryTermName/TermNameStatus != "Rejected"

Comment entered 2021-03-24 15:10:08 by Englisch, Volker (NIH/NCI) [C]

I see what you mean.  I thought you were referring to documents becoming invalid because the AltTitle being removed.  This change, as I mentioned, won't have any affect on a document being valid or not.

If a linked document isn't valid or doesn't exist anymore then, Yes, this will create an invalid document which will need to be corrected prior to publishing an updated version of the document but you still have the existing last publishable version sitting around until that time comes.

Comment entered 2021-03-24 15:40:00 by Englisch, Volker (NIH/NCI) [C]

2021-03-24 15:01:58.636 [INFO] Run completed.
{{      Docs examined = 554}}
{{       Docs changed = 0}}
{{   Versions changed = 1573}}
{{     Could not lock = 0}}
{{             Errors = 0}}
{{               Time = 1:36:00.753793}}

Comment entered 2021-03-24 16:36:23 by Englisch, Volker (NIH/NCI) [C]

The test results for the Global Change are available on QA.

Here is the log file. GlobalChange_QA.txt

Comment entered 2021-03-24 16:39:49 by Osei-Poku, William (NIH/NCI) [C]

We may need to talk about this a bit more to find a solution since there are several documents that fall into this category. The problem is that, the warning being reported in the logs is not really a problem we need to fix in the CDR. I think it is pointing to the fact the glossary term has a definition that is rejected. There are no plans to fix it in XMetal. That is, the definition will remain rejected in the CDR unless we want to "fix it" before running the global in live mode and then "unfix it" after that to prevent invalidating the summary documents on PROD. We will like to avoid having several invalid documents on PROD after the global.

Comment entered 2021-03-24 18:14:13 by Englisch, Volker (NIH/NCI) [C]

We could extract the CDR-IDs for documents with warnings from the log file and exclude these from the Global Change.  That would result in 86 documents to be excluded from the Global Change.  That's about 10% of summaries. However, not all of the warnings are a result of a non-publishable link target.  Someone would need to make a decision if only a specific warning should be excluded or all of them.

Comment entered 2021-03-25 10:17:25 by Osei-Poku, William (NIH/NCI) [C]

Excluding the affected documents from the global run should be fine but that will depend on how many of the warnings are the result of the non-publishable link target. Could you please provide a list of the documents that have the different types of warning?

Please exclude blocked summaries as we won't fix any errors in those documents.

Comment entered 2021-03-25 15:53:35 by Englisch, Volker (NIH/NCI) [C]

Please exclude blocked summaries as we won't fix any errors in those documents. 

Are you asking to exclude blocked summaries from being processed by the global change or to process blocked summaries and exclude them from being reported because of the warnings?

Looking at the log file I see 9 documents that are not blocked with validation warnings.

Comment entered 2021-03-30 17:19:31 by Osei-Poku, William (NIH/NCI) [C]
Comment entered 2021-03-30 17:35:04 by Osei-Poku, William (NIH/NCI) [C]

Please run in live mode on QA.

Comment entered 2021-03-31 15:58:01 by Englisch, Volker (NIH/NCI) [C]

The live run on QA completed.  

I identified the following document that aren't blocked to include warnings:

62890, 256677, 256685, 587224, 772163, 784073, 792723, 797908, 802226

I'm attaching the log file AltTitle_QA_live.log.

Comment entered 2021-03-31 15:58:56 by Englisch, Volker (NIH/NCI) [C]

{{   Docs examined = 554}}
{{    Docs changed = 554}}
{{ Versions changed = 1169}}
{{  Could not lock = 0}}
{{          Errors = 0}}
{{ Time = 2:46:19.686110}}
Specific versions saved:
{{   new cwd = 39}}
{{   new pub = 465}}
{{   new ver = 150}}
{{   old cwd = 128}}

Comment entered 2021-04-06 12:14:06 by Osei-Poku, William (NIH/NCI) [C]

Looks good on QA. Please run in test mode on PROD.

Comment entered 2021-04-07 13:27:36 by Englisch, Volker (NIH/NCI) [C]

{{    Docs examined = 553}}
{{     Docs changed = 0}}
{{ Versions changed = 1570}}
{{   Could not lock = 0}}
{{           Errors = 0}}
{{ Time = 1:21:51.319073}}

Comment entered 2021-04-07 14:04:30 by Englisch, Volker (NIH/NCI) [C]

The diff files for the test run on PROD are now available on DEV.

I'm attaching the log file for the run.  AltTitle_PROD_test.log

I see the following documents that are not blocked with warnings: 

256685, 587224, 772163, 784073, 792723, 797908, 802226

Comment entered 2021-04-08 11:54:09 by Osei-Poku, William (NIH/NCI) [C]

Looks good from test results. Please proceed to run in live mode on PROD.   Thanks!

Comment entered 2021-04-13 09:57:22 by Osei-Poku, William (NIH/NCI) [C]

It looks like the live run was completed last Thursday. I can see the changes on PROD.

Comment entered 2021-04-27 18:22:30 by Englisch, Volker (NIH/NCI) [C]

I had forgotten to include the statistics for the live run on PROD.  Here it is:

2021-04-08 22:34:22.941 [INFO] Run completed.

{{   Docs examined    = 554}}
{{   Docs changed     = 553}}
{{   Versions changed = 1165}}
{{   Could not lock   = 1}}
{{   Errors           = 0}}
{{   Time             = 2:20:39.029035}}
Specific versions saved:
{{  new cwd = 40}}
{{  new pub = 464}}
{{  new ver = 148}}
{{  old cwd = 123}}

Attachments
File Name Posted User
AltTitle_PROD_test.log 2021-04-07 14:03:12 Englisch, Volker (NIH/NCI) [C]
AltTitle_QA_live.log 2021-03-31 15:56:47 Englisch, Volker (NIH/NCI) [C]
GlobalChange_QA.txt 2021-03-24 16:36:18 Englisch, Volker (NIH/NCI) [C]
GlobalChangeLog.txt 2021-03-19 11:52:21 Englisch, Volker (NIH/NCI) [C]

Elapsed: 0:00:00.001453