Issue Number | 5184 |
---|---|
Summary | Global to add demographic information |
Created | 2023-01-12 17:35:51 |
Issue Type | Task |
Submitted By | Osei-Poku, William (NIH/NCI) [C] |
Assigned To | Kline, Bob (NIH/NCI) [C] |
Status | Closed |
Resolved | 2023-03-31 10:43:47 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/ocecdr/issue.336268 |
We'd like to run a global change on media documents to add demographic information using the attached spreadsheet. We will likely have to wait until the following tickets have been installed on PROD before running this global on PROD because of its dependency on OCECDR-5160 and OCECDR-5183
Image Demographic Info Spreadsheet_Global_Finalized.xlsx
I will provide you a date to add to all the documents as the Date Last Review Date.
Beside the CDR ID and Title provided, each of the columns correspond to one of the elements of the demographic information (with the exception of the new ones (Comment and Date Last Modified) which have not been implemented on PROD yet. We will probably not have any data entered into the Comment field from the is global, but we may probably use the current date (of the global run) entered as the Date Last Modified.
I am providing a new spreadsheet that includes a date column with data to be populated into the new DateLastReviewed element. As we get closer to running the global on PROD, I will provide a more current spreadsheet.
What do these error mean?
"Element 'SkinTone': [facet 'enumeration'] The value 'Type III[NON-BREAKING SPACE]–[NON-BREAKING SPACE]Darker white skin' is not an element of the set {'Type I......"
It means that the values in the spreadsheet don't match the valid values in the schema. I suspect this is caused by pasting values from Microsoft Word. I would strongly encourage the use of a plain text editor for working with values which you expect a machine to recognize (as opposed to test intended for human readers). On Windows a good choice is the free Notepad++, but there are lots of suitable choices. If nothing else, you could use Notepad, which is installed in Windows by default (though that program also has its quirks). But using Microsoft Word for this purpose is almost guaranteed to garble your data, as Microsoft is notorious for thinking it knows better than the user what the user really wants. 😛
Image Demographic Info Spreadsheet_Global_02-06-23.xlsx
I copied the data to notepad and copied it back into the spreadsheet. Hopefully, this should resolve the copy and paste issue.
It looks like in cases where there are multiple rows for the same document, the program is adding only one row to the CDR, when we expect two blocks. Please see CDR435997
2. In cases where there is no entry in the spreadsheet for the Date Last Reviewed element, please do not add the element. Please see CDR790804
That might not be sufficient. If you did it exactly the way you described, you probably preserved the garbling introduced by Word. (It's not that Notepad—or text editors in general—can't handle Unicode characters. They just won't change what you type behind your back the way Word does.) What I recommend is that you type in exactly what each unique value should be once in the text editor, and then copy and paste the values into the cells of the spreadsheet. Be careful to enter the values EXACTLY the same way you gave them to me for the schema change ticket (with the Unicode EN-DASH characters, not plain ASCII hyphens).
Tell you what: if you promise that you'll install Notepad++ and NEVER use Word for preparing values which you expect a machine to recognize, I'll write some software to clean up the mess Word made here. What do say? Does that sound like a fair exchange? 👍👎😃
I have created a ServiceNow ticket to have Notepadd++ installed on my machine.
I was able to install it using elevated permissions.
Good. I was just composing a comment to say that I was surprised you weren't able to install it yourself, since as part of the preparations for the XMetaL upgrade I had asked one of the senior CBIIT engineers what percentage of the CDR users had the ability to run installation programs using the "Run with elevated permissions" option, and his reply was "all of their machines are set up that way."
The data in the spreadsheet should have been en dashes. Do you want me to correct that and provide another spreadsheet or as you said below, you will write to code to take care of it?
The values already are en dashes. So if that what they were supposed to be (and not ASCII hyphens) then no correction is necessary.
OK. Got it. It was not clear to me what the next steps were.
In order to make sure there are no surprises down the road, can you tell me why you want the values to use the Unicode character for an en dash instead of an ASCII hyphen character?
There is no particular preference of one symbol over the other. As long as we are following the schema, there should be no issues. I think inconsistencies were mostly caused by copy and pastes.
https://cdr-dev.cancer.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2023-03-02_12-50-55
Updating CDR806255 failed because the document has no
MediaContent
block, so the software doesn't no where to put
the new blocks.
CDR466552 failed validation with a skin tone value which was a hybrid mutant caused by mixing together parts of two valid values (problem not caused by Microsoft Word).
Updating CDR806255 failed because the document has noMediaContent
block, so the software doesn't no where to put the new blocks.
It looks like CDR806255 is a Term document on DEV but appears to be a media document on upper tiers that is why it does not have MediaContent block on DEV.
CDR466552 failed validation with a skin tone value which was a hybrid mutant caused by mixing together parts of two valid values (problem not caused by Microsoft Word).
This is noted. Will be corrected before live run on QA and PROD.
Please run global in live mode on DEV.
Live run on DEV complete.
The live run looks good. But I forgot that we needed to add the same data to Spanish documents using the TranslationOf element. Is this something we can include in the global or a new ticket for the Spanish global would be better?
Another ticket. Next release.
Looks good on DEV. Please run in test mode on QA. Thanks!
Image Demographic Info Spreadsheet_Global_04192023.xlsxThis is the latest file. I am wondering if you would want to do a test run with this file to see if no extraneous data is introduced like the original file.
Test job run again on QA.
https://cdr-qa.cancer.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2023-04-20_07-55-13
You do understand, I assume, that the more frequent the requests to repeat jobs with changed requirements, the more incentive there is to delay handling the original requests, right? 😛
Test results look good. Please run in live mode on QA. Thanks!
QA live run done.
Verified on QA. Thanks!
Please use this newest file to run the global in test mode on PROD. Thanks!
Looks good. Please run in live mode on PROD. Thanks!
Live mode run on PROD completed.
Verified on PROD. Thanks
File Name | Posted | User |
---|---|---|
Image Demographic Info Spreadsheet_Global_01-26-23.xlsx | 2023-01-31 11:40:12 | Osei-Poku, William (NIH/NCI) [C] |
Image Demographic Info Spreadsheet_Global_02-06-23.xlsx | 2023-02-06 11:34:27 | Osei-Poku, William (NIH/NCI) [C] |
Image Demographic Info Spreadsheet_Global_04192023.xlsx | 2023-04-19 16:23:00 | Osei-Poku, William (NIH/NCI) [C] |
Image Demographic Info Spreadsheet_Global_Final.xlsx | 2023-05-23 13:06:29 | Osei-Poku, William (NIH/NCI) [C] |
Image Demographic Info Spreadsheet_Global_Finalized.xlsx | 2023-01-12 17:33:12 | Osei-Poku, William (NIH/NCI) [C] |
Elapsed: 0:00:00.001239