CDR Tickets

Issue Number 4538
Summary [ORG] Remove links to deleted orgs and run global
Created 2018-10-22 19:57:50
Issue Type Improvement
Submitted By Osei-Poku, William (NIH/NCI) [C]
Assigned To Englisch, Volker (NIH/NCI) [C]
Status Closed
Resolved 2019-04-17 13:53:28
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.234725
Description

Please remove organization links to these two elements and run a global to delete the data from the affected Orgs. because the Org. QC report fails with these links.

1. ResearchBaseFor (child of OrganizationAffiliations) Example 35883 (29310 U CCOP - Upstate Carolina has been deleted but still remains linked to 35883)

2. ParentOrganization - Example 34983 (29106 has been deleted ).

Comment entered 2018-10-23 08:35:14 by Kline, Bob (NIH/NCI) [C]

Just to be clear, we're removing the elements contain the links (not removing "links to these elements").

Comment entered 2018-10-23 11:47:41 by Osei-Poku, William (NIH/NCI) [C]

That is right. Sorry for the confusion.

Comment entered 2018-10-23 18:34:18 by Kline, Bob (NIH/NCI) [C]

When I ran the live global for OCECDR-4518 I noticed that 153 of the saved documents were invalid, so non-publishable versions were created. There were 235 errors for invalid ZIP codes (caused by the bug reported in OCECDR-4529), and another 114 errors "No document version found for link target." Presumably some of the latter errors would have been corrected by the global change for this ticket, but not all of them, I believe. The first of those errors I looked at was in CDR27662, which has a MemberOfCCOP/CCOP/cdr:ref link pointed to a document which has been removed. Should I proceed anyway, knowing that we'll wind up with more unpublishable versions than we'd end up with if all of these anomalies were addressed at once? If not, we'll need to either more this ticket as well as OCECDR-4518 and OCECDR-4529 into Joule and process everything in a single global change job.

Comment entered 2018-10-25 10:40:22 by Osei-Poku, William (NIH/NCI) [C]

I'd prefer we run all of them in a single global change. It is okay to move the tickets into Joule for now.

Comment entered 2018-11-01 13:28:32 by Kline, Bob (NIH/NCI) [C]

On reflection, I can't think of any reason this task (or the related similar task(s) which should be performed along with this) would need to be in a release, as there isn't any part which requires CBIIT assistance (that I can think of). Here are the results of the query I ran to see if there were other elements which need to be dropped because they link to documents which are no longer in the system (omitting schema and mailer documents).

18  /GlossaryTermName/GlossaryTermConcept/@cdr:ref
29  /Organization/OrganizationAffiliations/MemberOfAdHocGroup/@cdr:ref
44  /Organization/OrganizationAffiliations/MemberOfCCOP/CCOP/@cdr:ref
1   /Organization/OrganizationAffiliations/MemberOfCooperativeGroups/AffiliateMemberOf/MainMember/@cdr:ref
20  /Organization/OrganizationAffiliations/MemberOfCooperativeGroups/MainMemberOf/CooperativeGroup/@cdr:ref
170 /Organization/OrganizationAffiliations/ResearchBaseFor/@cdr:ref
378 /Organization/OrganizationDetails/CIPSContactPerson/@cdr:ref
12  /Organization/OrganizationParent/ParentOrganization/@cdr:ref
4   /Organization/PreferredProtocolOrganization/@cdr:ref
1   /PDQBoardMemberInfo/BoardMemberName/@cdr:ref
1   /Person/PersonLocations/OtherPracticeLocation/OrganizationLocation/@cdr:ref
8   /Person/PersonLocations/OtherPracticeLocation/SpecificPostalAddress/PoliticalSubUnit_State/@cdr:ref
2   /Summary/SummaryMetaData/PDQBoard/BoardMember/@cdr:ref
2   /Summary/SummarySection/Para/ProtocolLink/@cdr:ref
1   /Summary/SummarySection/SummarySection/ItemizedList/ListItem/StandardWording/GlossaryTermRef/@cdr:href
2   /Summary/SummarySection/SummarySection/Para/CitationLink/@cdr:ref
1   /Summary/SummarySection/SummarySection/Para/GlossaryTermRef/@cdr:href
Comment entered 2018-11-08 11:19:05 by Osei-Poku, William (NIH/NCI) [C]

These are the elements I propose we should drop:

29 /Organization/OrganizationAffiliations/MemberOfAdHocGroup/@cdr:ref
44 /Organization/OrganizationAffiliations/MemberOfCCOP/CCOP/@cdr:ref
1 /Organization/OrganizationAffiliations/MemberOfCooperativeGroups/AffiliateMemberOf/MainMember/@cdr:ref
20 /Organization/OrganizationAffiliations/MemberOfCooperativeGroups/MainMemberOf/CooperativeGroup/@cdr:ref
170 /Organization/OrganizationAffiliations/ResearchBaseFor/@cdr:ref
378 /Organization/OrganizationDetails/CIPSContactPerson/@cdr:ref
12 /Organization/OrganizationParent/ParentOrganization/@cdr:ref
4 /Organization/PreferredProtocolOrganization/@cdr:ref

Comment entered 2019-04-08 11:06:05 by Englisch, Volker (NIH/NCI) [C]

, I thought you said I could take over this ticket but reading through the comments I'm not certain anymore if we were talking about this one or the related ticket OCECDR-4518? Also, I'm not clear if both of the tickets need to be handled simultaneously.  There is a comment in the other ticket indicating they should be handled at the same time.

Comment entered 2019-04-08 11:28:12 by Kline, Bob (NIH/NCI) [C]

Yes, they should be tackled together. If you feel uncomfortable doing them, save them for me.

Comment entered 2019-04-12 16:23:58 by Englisch, Volker (NIH/NCI) [C]

Please take a look at CDR27272. When I remove the elements CooperativeGroup and MainMember the elements MainMemberOf and AffiliateMemberOf are left with a single child element ProtocolParticipation.  
What would you like to have happen with this element?

Follow-up question: If your answer to the above question is to delete this element what would you like to happen to the - after the removal of it's child - empty parent elements MainMemberOf and AffiliateMemberOf?

Comment entered 2019-04-12 16:26:52 by Kline, Bob (NIH/NCI) [C]

I assume your questions are directed to , right?

Comment entered 2019-04-12 16:40:48 by Englisch, Volker (NIH/NCI) [C]

If you know the answer I would take it from you as well. 🙂

I'm guessing that William will want to think about this for a moment but I can't imagine CIAT would want to keep that element ProtocolParticipation without an organization listed.

Comment entered 2019-04-12 17:18:58 by Osei-Poku, William (NIH/NCI) [C]

We don't need to keep that information for anything. So, please delete ProtocolParticipation, and the empty child elements as well. Thanks.

Comment entered 2019-04-12 17:32:30 by Englisch, Volker (NIH/NCI) [C]
Comment entered 2019-04-12 17:34:30 by Osei-Poku, William (NIH/NCI) [C]

That is right. Thanks!

Comment entered 2019-04-16 16:47:21 by Osei-Poku, William (NIH/NCI) [C]

Per our offline communication, I have reviewed the test results for several documents and they all look good. There was at least one case 37040 where the element MemberOfProfessionalOrganization would be left in the Affiliations block after the global. This is not a linking element. It is just free text so it shouldn't be a problem to leave them in the record. However, I would like to know if there are a lot of documents with this element (above). If there are a lot of documents with the MemberOfProfessionalOrganization element, it would be good to delete the element as part of this global.

Comment entered 2019-04-16 16:56:18 by Osei-Poku, William (NIH/NCI) [C]

For CDR0000029943, it appears to show that the Date Last Modified element and the PdqKey element are being deleted. Am I interpreting this correctly?

  • <DateLastModified>2004-12-29</DateLastModified><PdqKey>186</PdqKey>

Comment entered 2019-04-16 17:05:25 by Englisch, Volker (NIH/NCI) [C]

Are you suggesting to remove the entire block OrganizationAffiliations instead of only removing the org link elements?

It looks like there are around 190 documents with an element MemberOfProfessionalOrganization.

Comment entered 2019-04-16 17:11:40 by Englisch, Volker (NIH/NCI) [C]

I agree, it's easy to think that's what's happening when looking at the diff.  However, when comparing the old and new documents you can see that the DLM is listed because the previous sibling element, the PreferredProtocolOrganization has been deleted and the date and PdqKey are still there.

Comment entered 2019-04-16 17:34:06 by Osei-Poku, William (NIH/NCI) [C]

Yes, let's remove the entire OrganizationAffiliations block since we are deleting nearly everything in it.

Comment entered 2019-04-16 17:34:40 by Osei-Poku, William (NIH/NCI) [C]

Okay. Thanks! I will take a look when you run the global in live mode.

Comment entered 2019-04-16 17:44:46 by Englisch, Volker (NIH/NCI) [C]

I double-checked the schema.  The MemberOfProfessionalOrganziation is the only element left within the OrganizationAffiliations block since we're already removing the element ProtocolParticipation along with the org link.

I'll go ahead and remove the entire block with the next global run.

Comment entered 2019-04-16 18:57:08 by Englisch, Volker (NIH/NCI) [C]

I ran a new TEST global on DEV with the change log output at 2019-04-16 18:18:06.

This version should also have removed all empty OrganizationAffiliations or those containing only MemberOfProfessionalOrganization elements.

Comment entered 2019-04-16 23:30:34 by Osei-Poku, William (NIH/NCI) [C]

Verified on DEV. Please run in live mode on DEV.

Comment entered 2019-04-17 14:09:30 by Englisch, Volker (NIH/NCI) [C]

The global change for org docs ran in LIVE mode on DEV.

There were no errors but a couple of warnings.  There were 192 records for which a non-publishable version was created because of an address validation issue.  All of these warnings are complaining about an invalid ZIP code.  This is related to the Joule ticket OCECDR-4528 (Zip code extension causes documents to fail schema validation).

There are 2 warnings (for the same document: CDR256088) with a message:

 PDQBoardInformation required for PDQ editorial boards

The document is blocked on the DEV server.

Comment entered 2019-04-17 14:19:39 by Englisch, Volker (NIH/NCI) [C]

These are the stats for the LIVE run on DEV:

2019-04-17 12:48:20.545 [INFO] Run completed.
  Docs examined = 388
  Docs changed = 388
  Versions changed = 902
  Could not lock = 0
  Errors = 0
  Time = 0:38:26.442339
 Specific versions saved:
  new cwd = 2
  new pub = 388
  new ver = 126
  old cwd = 6
Comment entered 2019-04-17 16:50:56 by Englisch, Volker (NIH/NCI) [C]

The global has been run in TEST mode on QA.  

Please review the logs on QA at 2019-04-17 16:27:17.

Comment entered 2019-04-22 14:51:22 by Osei-Poku, William (NIH/NCI) [C]

Verified on DEV. Thanks!

Comment entered 2019-04-22 14:52:58 by Osei-Poku, William (NIH/NCI) [C]

Verified on QA. Please run in live mode on QA.

Comment entered 2019-04-22 14:56:18 by Osei-Poku, William (NIH/NCI) [C]


There were no errors but a couple of warnings.  There were 192 records for which a non-publishable version was created because of an address validation issue.  All of these warnings are complaining about an invalid ZIP code.  This is related to the Joule ticket OCECDR-4528 (Zip code extension causes documents to fail schema validation).

Because of this, we should not run the global in live mode on PROD until OCECDR-4528 is resolved.

Comment entered 2019-04-22 17:46:03 by Englisch, Volker (NIH/NCI) [C]

The job finished running on QA in Live mode:

2019-04-22 16:42:18.711 [INFO] Run completed.
   Docs examined    = 539
   Docs changed     = 539
   Versions changed = 1084
   Could not lock   = 0
   Errors           = 0
   Time             = 0:31:07.019257
Specific versions saved:
  new pub = 539
  new ver = 6
  old cwd = 8
 
Comment entered 2019-04-22 17:58:10 by Englisch, Volker (NIH/NCI) [C]

I'm seeing the same warnings on QA as I had already reported for the DEV run.  Warnings due to the ZIP code issue and two warnings for CDR256088.

Comment entered 2019-04-22 18:22:54 by Osei-Poku, William (NIH/NCI) [C]

Verified on QA. Thanks! Ready for test mode on PROD and then we would wait until OCECDR-4528  is completed on PROD before proceeding to run this in live mode on PROD.

Comment entered 2019-04-23 16:58:25 by Englisch, Volker (NIH/NCI) [C]

The job in Test mode finished on PROD:

2019-04-22 20:49:41.629 [INFO] Run completed.
   Docs examined    = 3454
   Docs changed     = 0
   Versions changed = 10361
   Could not lock   = 0
   Errors           = 0
   Time             = 2:23:53.092216
 
Comment entered 2019-04-23 16:59:23 by Englisch, Volker (NIH/NCI) [C]

The log files have been copied to the directory:

2019-04-22_18-25-48 

on QA.

Comment entered 2019-04-23 17:33:40 by Englisch, Volker (NIH/NCI) [C]

For my information (so I'll remember when we start with Joule):

The program is located on my DEV-VM at

c:\cygwin64\home\volker\temp\OrgLinks.py 
Comment entered 2019-04-25 13:54:29 by Kline, Bob (NIH/NCI) [C]

We won't proceed any further until OCECDR-4528 is completed.

Comment entered 2019-05-02 12:29:14 by Osei-Poku, William (NIH/NCI) [C]

Verified test results. Thank you!

Comment entered 2019-07-05 06:22:50 by Osei-Poku, William (NIH/NCI) [C]

Verified on DEV. Thanks!

Comment entered 2019-08-06 05:15:56 by Osei-Poku, William (NIH/NCI) [C]

It looks like it is time to run the global on QA. This is dependent on OCECDR-4528 and OCECDR-4518 . OCECDR-4528 has been tested and it looks good. But it doesn't look like OCECDR-4528 has been completed on QA yet.

Comment entered 2019-08-06 11:03:40 by Englisch, Volker (NIH/NCI) [C]

It looks like it is time to run the global on QA

According to the comments above the global already ran on DEV and QA and in test mode on PROD.

Comment entered 2019-08-06 12:18:23 by Osei-Poku, William (NIH/NCI) [C]

Verified on QA. Thanks!

Comment entered 2019-09-09 10:02:35 by Osei-Poku, William (NIH/NCI) [C]

Verified on PROD. Thanks!

Elapsed: 0:00:00.001421