Issue Number | 4538 |
---|---|
Summary | [ORG] Remove links to deleted orgs and run global |
Created | 2018-10-22 19:57:50 |
Issue Type | Improvement |
Submitted By | Osei-Poku, William (NIH/NCI) [C] |
Assigned To | Englisch, Volker (NIH/NCI) [C] |
Status | Closed |
Resolved | 2019-04-17 13:53:28 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/ocecdr/issue.234725 |
Please remove organization links to these two elements and run a global to delete the data from the affected Orgs. because the Org. QC report fails with these links.
1. ResearchBaseFor (child of OrganizationAffiliations) Example 35883 (29310 U CCOP - Upstate Carolina has been deleted but still remains linked to 35883)
2. ParentOrganization - Example 34983 (29106 has been deleted ).
Just to be clear, we're removing the elements contain the links (not removing "links to these elements").
That is right. Sorry for the confusion.
When I ran the live global for OCECDR-4518 I noticed that 153 of the saved documents were invalid, so non-publishable versions were created. There were 235 errors for invalid ZIP codes (caused by the bug reported in OCECDR-4529), and another 114 errors "No document version found for link target." Presumably some of the latter errors would have been corrected by the global change for this ticket, but not all of them, I believe. The first of those errors I looked at was in CDR27662, which has a MemberOfCCOP/CCOP/cdr:ref link pointed to a document which has been removed. Should I proceed anyway, knowing that we'll wind up with more unpublishable versions than we'd end up with if all of these anomalies were addressed at once? If not, we'll need to either more this ticket as well as OCECDR-4518 and OCECDR-4529 into Joule and process everything in a single global change job.
I'd prefer we run all of them in a single global change. It is okay to move the tickets into Joule for now.
On reflection, I can't think of any reason this task (or the related similar task(s) which should be performed along with this) would need to be in a release, as there isn't any part which requires CBIIT assistance (that I can think of). Here are the results of the query I ran to see if there were other elements which need to be dropped because they link to documents which are no longer in the system (omitting schema and mailer documents).
18 /GlossaryTermName/GlossaryTermConcept/@cdr:ref
29 /Organization/OrganizationAffiliations/MemberOfAdHocGroup/@cdr:ref
44 /Organization/OrganizationAffiliations/MemberOfCCOP/CCOP/@cdr:ref
1 /Organization/OrganizationAffiliations/MemberOfCooperativeGroups/AffiliateMemberOf/MainMember/@cdr:ref
20 /Organization/OrganizationAffiliations/MemberOfCooperativeGroups/MainMemberOf/CooperativeGroup/@cdr:ref
170 /Organization/OrganizationAffiliations/ResearchBaseFor/@cdr:ref
378 /Organization/OrganizationDetails/CIPSContactPerson/@cdr:ref
12 /Organization/OrganizationParent/ParentOrganization/@cdr:ref
4 /Organization/PreferredProtocolOrganization/@cdr:ref
1 /PDQBoardMemberInfo/BoardMemberName/@cdr:ref
1 /Person/PersonLocations/OtherPracticeLocation/OrganizationLocation/@cdr:ref
8 /Person/PersonLocations/OtherPracticeLocation/SpecificPostalAddress/PoliticalSubUnit_State/@cdr:ref
2 /Summary/SummaryMetaData/PDQBoard/BoardMember/@cdr:ref
2 /Summary/SummarySection/Para/ProtocolLink/@cdr:ref
1 /Summary/SummarySection/SummarySection/ItemizedList/ListItem/StandardWording/GlossaryTermRef/@cdr:href
2 /Summary/SummarySection/SummarySection/Para/CitationLink/@cdr:ref
1 /Summary/SummarySection/SummarySection/Para/GlossaryTermRef/@cdr:href
These are the elements I propose we should drop:
29
/Organization/OrganizationAffiliations/MemberOfAdHocGroup/@cdr:ref
44
/Organization/OrganizationAffiliations/MemberOfCCOP/CCOP/@cdr:ref
1
/Organization/OrganizationAffiliations/MemberOfCooperativeGroups/AffiliateMemberOf/MainMember/@cdr:ref
20
/Organization/OrganizationAffiliations/MemberOfCooperativeGroups/MainMemberOf/CooperativeGroup/@cdr:ref
170
/Organization/OrganizationAffiliations/ResearchBaseFor/@cdr:ref
378 /Organization/OrganizationDetails/CIPSContactPerson/@cdr:ref
12 /Organization/OrganizationParent/ParentOrganization/@cdr:ref
4 /Organization/PreferredProtocolOrganization/@cdr:ref
~bkline, I thought you said I could take over this ticket but reading through the comments I'm not certain anymore if we were talking about this one or the related ticket OCECDR-4518? Also, I'm not clear if both of the tickets need to be handled simultaneously. There is a comment in the other ticket indicating they should be handled at the same time.
Yes, they should be tackled together. If you feel uncomfortable doing them, save them for me.
Please take a look at CDR27272. When I remove the elements
CooperativeGroup and MainMember the elements
MainMemberOf and AffiliateMemberOf are left with a
single child element ProtocolParticipation.
What would you like to have happen with this element?
Follow-up question: If your answer to the above question is to delete this element what would you like to happen to the - after the removal of it's child - empty parent elements MainMemberOf and AffiliateMemberOf?
I assume your questions are directed to ~oseipokuw, right?
If you know the answer I would take it from you as well. 🙂
I'm guessing that William will want to think about this for a moment but I can't imagine CIAT would want to keep that element ProtocolParticipation without an organization listed.
We don't need to keep that information for anything. So, please delete ProtocolParticipation, and the empty child elements as well. Thanks.
That is right. Thanks!
Per our offline communication, I have reviewed the test results for several documents and they all look good. There was at least one case 37040 where the element MemberOfProfessionalOrganization would be left in the Affiliations block after the global. This is not a linking element. It is just free text so it shouldn't be a problem to leave them in the record. However, I would like to know if there are a lot of documents with this element (above). If there are a lot of documents with the MemberOfProfessionalOrganization element, it would be good to delete the element as part of this global.
For CDR0000029943, it appears to show that the Date Last Modified element and the PdqKey element are being deleted. Am I interpreting this correctly?
<DateLastModified>2004-12-29</DateLastModified><PdqKey>186</PdqKey>
Are you suggesting to remove the entire block OrganizationAffiliations instead of only removing the org link elements?
It looks like there are around 190 documents with an element MemberOfProfessionalOrganization.
I agree, it's easy to think that's what's happening when looking at the diff. However, when comparing the old and new documents you can see that the DLM is listed because the previous sibling element, the PreferredProtocolOrganization has been deleted and the date and PdqKey are still there.
Yes, let's remove the entire OrganizationAffiliations block since we are deleting nearly everything in it.
Okay. Thanks! I will take a look when you run the global in live mode.
I double-checked the schema. The MemberOfProfessionalOrganziation is the only element left within the OrganizationAffiliations block since we're already removing the element ProtocolParticipation along with the org link.
I'll go ahead and remove the entire block with the next global run.
I ran a new TEST global on DEV with the change log output at 2019-04-16 18:18:06.
This version should also have removed all empty OrganizationAffiliations or those containing only MemberOfProfessionalOrganization elements.
Verified on DEV. Please run in live mode on DEV.
The global change for org docs ran in LIVE mode on DEV.
There were no errors but a couple of warnings. There were 192 records for which a non-publishable version was created because of an address validation issue. All of these warnings are complaining about an invalid ZIP code. This is related to the Joule ticket OCECDR-4528 (Zip code extension causes documents to fail schema validation).
There are 2 warnings (for the same document: CDR256088) with a message:
for PDQ editorial boards PDQBoardInformation required
The document is blocked on the DEV server.
These are the stats for the LIVE run on DEV:
2019-04-17 12:48:20.545 [INFO] Run completed.
Docs examined = 388
= 388
Docs changed = 902
Versions changed = 0
Could not lock = 0
Errors Time = 0:38:26.442339
:
Specific versions savednew cwd = 2
new pub = 388
new ver = 126
= 6 old cwd
The global has been run in TEST mode on QA.
Please review the logs on QA at 2019-04-17 16:27:17.
Verified on DEV. Thanks!
Verified on QA. Please run in live mode on QA.
There were no errors but a couple of warnings. There were 192 records for which a non-publishable version was created because of an address validation issue. All of these warnings are complaining about an invalid ZIP code. This is related to the Joule ticket OCECDR-4528 (Zip code extension causes documents to fail schema validation).
Because of this, we should not run the global in live mode on PROD until OCECDR-4528 is resolved.
The job finished running on QA in Live mode:
2019-04-22 16:42:18.711 [INFO] Run completed.
Docs examined = 539
= 539
Docs changed = 1084
Versions changed = 0
Could not lock = 0
Errors Time = 0:31:07.019257
:
Specific versions savednew pub = 539
new ver = 6
= 8
old cwd
I'm seeing the same warnings on QA as I had already reported for the DEV run. Warnings due to the ZIP code issue and two warnings for CDR256088.
Verified on QA. Thanks! Ready for test mode on PROD and then we would wait until OCECDR-4528 is completed on PROD before proceeding to run this in live mode on PROD.
The job in Test mode finished on PROD:
2019-04-22 20:49:41.629 [INFO] Run completed.
Docs examined = 3454
= 0
Docs changed = 10361
Versions changed = 0
Could not lock = 0
Errors Time = 2:23:53.092216
The log files have been copied to the directory:
2019-04-22_18-25-48
on QA.
For my information (so I'll remember when we start with Joule):
The program is located on my DEV-VM at
:\cygwin64\home\volker\temp\OrgLinks.py c
We won't proceed any further until OCECDR-4528 is completed.
Verified test results. Thank you!
Verified on DEV. Thanks!
It looks like it is time to run the global on QA. This is dependent on OCECDR-4528 and OCECDR-4518 . OCECDR-4528 has been tested and it looks good. But it doesn't look like OCECDR-4528 has been completed on QA yet.
It looks like it is time to run the global on QA
According to the comments above the global already ran on DEV and QA and in test mode on PROD.
Verified on QA. Thanks!
Verified on PROD. Thanks!
Elapsed: 0:00:00.001421