PDQ Issues

Issue Number	5184
Summary	Global to add demographic information
Created	2023-01-12 17:35:51
Issue Type	Task
Submitted By	Osei-Poku, William (NIH/NCI) [C]
Assigned To	Kline, Bob (NIH/NCI) [C]
Status	Closed
Resolved	2023-03-31 10:43:47
Resolution	Fixed
Path	/home/bkline/backups/jira/ocecdr/issue.336268

Description

We'd like to run a global change on media documents to add demographic information using the attached spreadsheet. We will likely have to wait until the following tickets have been installed on PROD before running this global on PROD because of its dependency on OCECDR-5160 and OCECDR-5183

Image Demographic Info Spreadsheet_Global_Finalized.xlsx

I will provide you a date to add to all the documents as the Date Last Review Date.

Beside the CDR ID and Title provided, each of the columns correspond to one of the elements of the demographic information (with the exception of the new ones (Comment and Date Last Modified) which have not been implemented on PROD yet. We will probably not have any data entered into the Comment field from the is global, but we may probably use the current date (of the global run) entered as the Date Last Modified.

Comment entered 2023-01-31 11:42:07 by Osei-Poku, William (NIH/NCI) [C]

I am providing a new spreadsheet that includes a date column with data to be populated into the new DateLastReviewed element. As we get closer to running the global on PROD, I will provide a more current spreadsheet.

Image Demographic Info Spreadsheet_Global_01-26-23.xlsx

Comment entered 2023-02-01 07:52:56 by Kline, Bob (NIH/NCI) [C]

https://cdr-dev.cancer.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2023-02-01_07-12-25

Comment entered 2023-02-03 12:06:41 by Osei-Poku, William (NIH/NCI) [C]

What do these error mean?

"Element 'SkinTone': [facet 'enumeration'] The value 'Type III[NON-BREAKING SPACE]–[NON-BREAKING SPACE]Darker white skin' is not an element of the set {'Type I......"

Comment entered 2023-02-03 14:10:54 by Kline, Bob (NIH/NCI) [C]

It means that the values in the spreadsheet don't match the valid values in the schema. I suspect this is caused by pasting values from Microsoft Word. I would strongly encourage the use of a plain text editor for working with values which you expect a machine to recognize (as opposed to test intended for human readers). On Windows a good choice is the free Notepad++, but there are lots of suitable choices. If nothing else, you could use Notepad, which is installed in Windows by default (though that program also has its quirks). But using Microsoft Word for this purpose is almost guaranteed to garble your data, as Microsoft is notorious for thinking it knows better than the user what the user really wants. 😛

Comment entered 2023-02-06 11:36:21 by Osei-Poku, William (NIH/NCI) [C]

Image Demographic Info Spreadsheet_Global_02-06-23.xlsx

I copied the data to notepad and copied it back into the spreadsheet. Hopefully, this should resolve the copy and paste issue.

Comment entered 2023-02-06 11:43:54 by Osei-Poku, William (NIH/NCI) [C]

It looks like in cases where there are multiple rows for the same document, the program is adding only one row to the CDR, when we expect two blocks. Please see CDR435997

2. In cases where there is no entry in the spreadsheet for the Date Last Reviewed element, please do not add the element. Please see CDR790804

Comment entered 2023-02-06 12:21:05 by Kline, Bob (NIH/NCI) [C]

That might not be sufficient. If you did it exactly the way you described, you probably preserved the garbling introduced by Word. (It's not that Notepad—or text editors in general—can't handle Unicode characters. They just won't change what you type behind your back the way Word does.) What I recommend is that you type in exactly what each unique value should be once in the text editor, and then copy and paste the values into the cells of the spreadsheet. Be careful to enter the values EXACTLY the same way you gave them to me for the schema change ticket (with the Unicode EN-DASH characters, not plain ASCII hyphens).

Comment entered 2023-02-06 13:50:49 by Kline, Bob (NIH/NCI) [C]

Tell you what: if you promise that you'll install Notepad++ and NEVER use Word for preparing values which you expect a machine to recognize, I'll write some software to clean up the mess Word made here. What do say? Does that sound like a fair exchange? 👍👎😃

Comment entered 2023-02-09 12:04:43 by Osei-Poku, William (NIH/NCI) [C]

I have created a ServiceNow ticket to have Notepadd++ installed on my machine.

Comment entered 2023-02-09 12:15:19 by Osei-Poku, William (NIH/NCI) [C]

I was able to install it using elevated permissions.

Comment entered 2023-02-09 12:20:55 by Kline, Bob (NIH/NCI) [C]

Good. I was just composing a comment to say that I was surprised you weren't able to install it yourself, since as part of the preparations for the XMetaL upgrade I had asked one of the senior CBIIT engineers what percentage of the CDR users had the ability to run installation programs using the "Run with elevated permissions" option, and his reply was "all of their machines are set up that way."

Comment entered 2023-02-21 10:12:39 by Osei-Poku, William (NIH/NCI) [C]

The data in the spreadsheet should have been en dashes. Do you want me to correct that and provide another spreadsheet or as you said below, you will write to code to take care of it?

Comment entered 2023-02-21 10:36:59 by Kline, Bob (NIH/NCI) [C]

The values already are en dashes. So if that what they were supposed to be (and not ASCII hyphens) then no correction is necessary.

Comment entered 2023-02-21 13:14:22 by Osei-Poku, William (NIH/NCI) [C]

OK. Got it. It was not clear to me what the next steps were.

Comment entered 2023-02-21 14:01:44 by Kline, Bob (NIH/NCI) [C]

In order to make sure there are no surprises down the road, can you tell me why you want the values to use the Unicode character for an en dash instead of an ASCII hyphen character?

Comment entered 2023-02-21 14:42:31 by Osei-Poku, William (NIH/NCI) [C]

There is no particular preference of one symbol over the other. As long as we are following the schema, there should be no issues. I think inconsistencies were mostly caused by copy and pastes.

Comment entered 2023-03-02 13:23:06 by Kline, Bob (NIH/NCI) [C]

https://cdr-dev.cancer.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2023-03-02_12-50-55

Updating CDR806255 failed because the document has no MediaContent block, so the software doesn't no where to put the new blocks.

CDR466552 failed validation with a skin tone value which was a hybrid mutant caused by mixing together parts of two valid values (problem not caused by Microsoft Word).

Comment entered 2023-03-29 17:20:47 by Osei-Poku, William (NIH/NCI) [C]

Updating CDR806255 failed because the document has no MediaContent block, so the software doesn't no where to put the new blocks.

It looks like CDR806255 is a Term document on DEV but appears to be a media document on upper tiers that is why it does not have MediaContent block on DEV.

Comment entered 2023-03-29 18:13:43 by Osei-Poku, William (NIH/NCI) [C]

CDR466552 failed validation with a skin tone value which was a hybrid mutant caused by mixing together parts of two valid values (problem not caused by Microsoft Word).

This is noted. Will be corrected before live run on QA and PROD.

Comment entered 2023-03-29 18:14:36 by Osei-Poku, William (NIH/NCI) [C]

Please run global in live mode on DEV.

Comment entered 2023-03-30 15:37:51 by Kline, Bob (NIH/NCI) [C]

Live run on DEV complete.

Comment entered 2023-03-30 17:03:41 by Osei-Poku, William (NIH/NCI) [C]

The live run looks good. But I forgot that we needed to add the same data to Spanish documents using the TranslationOf element. Is this something we can include in the global or a new ticket for the Spanish global would be better?

Comment entered 2023-03-31 10:43:39 by Kline, Bob (NIH/NCI) [C]

Another ticket. Next release.

Comment entered 2023-04-18 10:07:52 by Osei-Poku, William (NIH/NCI) [C]

Looks good on DEV. Please run in test mode on QA. Thanks!

Comment entered 2023-04-19 07:50:18 by Kline, Bob (NIH/NCI) [C]

https://cdr-qa.cancer.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2023-04-19_07-36-42

Comment entered 2023-04-19 16:24:12 by Osei-Poku, William (NIH/NCI) [C]

Image Demographic Info Spreadsheet_Global_04192023.xlsxThis is the latest file. I am wondering if you would want to do a test run with this file to see if no extraneous data is introduced like the original file.

Comment entered 2023-04-20 08:09:13 by Kline, Bob (NIH/NCI) [C]

Test job run again on QA.

https://cdr-qa.cancer.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2023-04-20_07-55-13

You do understand, I assume, that the more frequent the requests to repeat jobs with changed requirements, the more incentive there is to delay handling the original requests, right? 😛

Comment entered 2023-04-20 10:34:05 by Osei-Poku, William (NIH/NCI) [C]

Test results look good. Please run in live mode on QA. Thanks!

Comment entered 2023-04-20 15:08:57 by Kline, Bob (NIH/NCI) [C]

QA live run done.

Comment entered 2023-04-26 09:35:29 by Osei-Poku, William (NIH/NCI) [C]

Verified on QA. Thanks!

Comment entered 2023-05-23 13:06:47 by Osei-Poku, William (NIH/NCI) [C]

Please use this newest file to run the global in test mode on PROD. Thanks!

Image Demographic Info Spreadsheet_Global_Final.xlsx

Comment entered 2023-05-24 07:56:41 by Kline, Bob (NIH/NCI) [C]

https://cdr-dev.cancer.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2023-05-24_07-29-32

Comment entered 2023-05-25 14:48:16 by Osei-Poku, William (NIH/NCI) [C]

Looks good. Please run in live mode on PROD. Thanks!

Comment entered 2023-05-25 15:36:06 by Kline, Bob (NIH/NCI) [C]

Live mode run on PROD completed.

Comment entered 2023-06-01 11:39:21 by Osei-Poku, William (NIH/NCI) [C]

Verified on PROD. Thanks

Attachments

File Name	Posted	User
Image Demographic Info Spreadsheet_Global_01-26-23.xlsx	2023-01-31 11:40:12	Osei-Poku, William (NIH/NCI) [C]
Image Demographic Info Spreadsheet_Global_02-06-23.xlsx	2023-02-06 11:34:27	Osei-Poku, William (NIH/NCI) [C]
Image Demographic Info Spreadsheet_Global_04192023.xlsx	2023-04-19 16:23:00	Osei-Poku, William (NIH/NCI) [C]
Image Demographic Info Spreadsheet_Global_Final.xlsx	2023-05-23 13:06:29	Osei-Poku, William (NIH/NCI) [C]
Image Demographic Info Spreadsheet_Global_Finalized.xlsx	2023-01-12 17:33:12	Osei-Poku, William (NIH/NCI) [C]

Elapsed: 0:00:00.001273

CDR Tickets