Issue Number | 3591 |
---|---|
Summary | Insert Purpose text element and text in patient summaries |
Created | 2013-03-14 10:45:39 |
Issue Type | Improvement |
Submitted By | Beckwith, Margaret (NIH/NCI) [E] |
Assigned To | Beckwith, Margaret (NIH/NCI) [E] |
Status | Closed |
Resolved | 2013-07-11 21:35:48 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/ocecdr/issue.107919 |
BZISSUE::5290
BZDATETIME::2013-03-14 10:45:39
BZCREATOR::Margaret Beckwith
BZASSIGNEE::Alan Meyer
BZQACONTACT::William Osei-Poku
We are getting ready to add an About This PDQ Summary section to the patient summaries, and the first paragraph has text inserted into it from the Purpose Text element in the metadata section. We would like to programmatically have the element inserted into the Patient summaries, and copy the text from the HP metadata to the Patient metadata.
BZDATETIME::2013-05-09 17:55:39
BZCOMMENTOR::Robin Juthe
BZCOMMENT::1
Revising priority after the discussion in today's CDR meeting.
BZDATETIME::2013-05-20 15:10:26
BZCOMMENTOR::Alan Meyer
BZCOMMENT::2
I've attached a list of 329 documents on Bach that are:
Patient Summaries
Have Active status
I plan to process all of these, doing the following:
Look for a PatientVersionOf element.
If it's not found
Log the information and ignore the document
Else
Look for a PurposeText element
If it's found:
Log the information and ignore the document
Else
Locate the Health Professional Summary linked by the cdr:ref
If it's not found
Log the information and ignore the document
Else
Copy the PurposeText from the HP to the Patient Summary.
I'll use the standard ModifyDocs global change software to store the results so that all of the versions that need to be updated (current, last, and last publishable) are updated.
There are a few questionable items in the attached list, including one (CDR740913) with the word "BLOCKED" in the title, though the status is still 'A'=Active. Looking at the document history on that one I wonder if there was a typo somewhere and the wrong document was blocked, or something equally undesirable happened. However I don't think any harm should be done if we process this or any of the unfinished docs.
Please let me know if you see any red flags in the list or if you need me to process the blocked documents too.
Attachment titles.txt has been added with description: Patient Summaries that will be examined and processed
BZDATETIME::2013-05-20 15:16:08
BZCOMMENTOR::Margaret Beckwith
BZCOMMENT::3
I think this proposal looks good Alan. I would like William to take a look at it and make sure there isn't anything that I missed from the CIAT point of view. Also, I have a question about publishing--I assume that this change means that we will have to publish all of the patient summaries. Is that a problem?
BZDATETIME::2013-05-20 16:06:13
BZCOMMENTOR::Alan Meyer
BZCOMMENT::4
(In reply to comment #3)
> ... Also, I have a question about publishing--I assume that this
change
> means that we will have to publish all of the patient summaries. Is
that a
> problem?
Yes, all 329 Patient Summaries could be republished. The only ones that would not be republished are those that have an error of some kind, or for which a PurposeText is missing in the HP Summary.
I checked with Blair and Volker about this. The impact on cancer.gov will be especially significant because there are two versions of each Patient Summary, one regular and one mobile. They would therefore propose to use the same technique that was used a few weeks ago when the address change occurred in every summary, namely:
Do the weekly publishing.
Then publish small batches nightly and, if necessary, during
the day until all are done.
If necessary, we could also accomplish something similar by limiting the scope of the global change and re-running it each day to process another batch of documents. I think my design can accommodate that if I just put in a command line parameter to limit the number of published docs, e.g., 50 one day, then 100 the next (50 of which will be skipped because they already have PurposeText), then 150, and so on. It won't be hard to add that.
BZDATETIME::2013-05-20 17:39:47
BZCOMMENTOR::Alan Meyer
BZCOMMENT::5
I have put a parameter on the command line to limit the number of documents to be processed at once.
I plan to take the PurposeText from the current working version of the HP Summary. I expect that these are all identical to the PurposeText in the last publishable version, but in the unlikely event that they aren't, this will give us the most up to date PurposeText.
As I say, I don't think it makes a difference, but if it does, someone let me know.
BZDATETIME::2013-05-21 13:30:05
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::6
(In reply to comment #2)
> There are a few questionable items in the attached list, including
one
> (CDR740913) with the word "BLOCKED" in the title, though the status
is still
> 'A'=Active. Looking at the document history on that one I wonder if
there was
The above document has now been blocked. Also, we've reviewed the list of summaries and the steps you're using to run the global and didn't see any problems with it.
BZDATETIME::2013-05-30 22:50:24
BZCOMMENTOR::Alan Meyer
BZCOMMENT::7
The global change is finished and tested. A log file of a run on OCE Mahler is attached.
It turns out that between the English and Spanish, and the various versions (current, last, last publishable) 877 documents would have had to be edited plus we'd have had to examine up to 912 versions and several hundred corresponding HP Summaries, so it really would have been a chore to do it manually.
I think that each of the errors I tested for actually occurred. There were some patient Summaries with no PatientVersionOf element, some that had the element but no cdr:ref to the HP Summary, some that already had a PurposeText, some HP Summaries without one, and so on.
The info on each of these is in the log file. For some of these cases, if they occur on Bach as well, I assume they'll need to be edited by hand.
Please examine the log file and see the global change test results at:
http://mahler.nci.nih.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2013-05-30_22-16-03
Attachment Request5290.log has been added with description: Log file from test run on Mahler
BZDATETIME::2013-06-03 20:57:02
BZCOMMENTOR::Alan Meyer
BZCOMMENT::8
I'll be on vacation from June 12 - June 30.
It might be best for us to test this soon so that I can run it on Bach before I leave. If we wait until too late CBIIT will freeze it out at least until July 15, if not later.
BZDATETIME::2013-06-04 10:44:18
BZCOMMENTOR::Margaret Beckwith
BZCOMMENT::9
William, are you going to take a look at the log file? I can also look at a few records. I would like to get this completed before Alan goes on vacation, and I will also be out beween June 14-June 17.
BZDATETIME::2013-06-04 12:59:48
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::10
(In reply to comment #9)
> William, are you going to take a look at the log file? I can also
look at a
> few records. I would like to get this completed before Alan goes on
vacation,
> and I will also be out beween June 14-June 17.
Yes. I will take a look and post my comments by the end of the day.
BZDATETIME::2013-06-04 13:18:56
BZCOMMENTOR::Margaret Beckwith
BZCOMMENT::11
Thanks William. I've (very briefly) looked at a few and they seem to look okay. I also looked at a few that had the "error messages", and they seem to make sense (e.g. the templates). After you have done your review it might make sense to run it on Frank, but let me know what you think.
BZDATETIME::2013-06-04 14:46:50
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::12
I've reviewed several of them and agree with Margret's findings. The errors are coming from the templates and one CAM summary which doesn't have and HP version. Could you please run it in test mode on Franck first and after we've reviewed it, you can run it in live mode on Franck, before we do the same on Bach?
BZDATETIME::2013-06-05 00:34:43
BZCOMMENTOR::Alan Meyer
BZCOMMENT::13
I've started the job on Franck in test mode. I'll go home now but I'll check it tomorrow and post the logfile here in Bugzilla with info on what happened.
BZDATETIME::2013-06-05 10:31:11
BZCOMMENTOR::Alan Meyer
BZCOMMENT::14
The test is complete on Franck. The log file is attached.
Attachment Request5290.log has been added with description: Log file from test mode run on Franck
BZDATETIME::2013-06-05 11:59:12
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::15
(In reply to comment #14)
> Created attachment 2325 [details]
> Log file from test mode run on Franck
>
> The test is complete on Franck. The log file is attached.
I have looked at the log file and reviewed all errors. I didn't see anything that was not expected. Please run in live mode on Franck.
BZDATETIME::2013-06-05 16:00:30
BZCOMMENTOR::Alan Meyer
BZCOMMENT::16
The live run is complete. Log file is attached.
Attachment Request5290live.log has been added with description: Log file for live run on Franck
BZDATETIME::2013-06-05 17:36:14
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::17
(In reply to comment #16)
> Created attachment 2326 [details]
> Log file for live run on Franck
>
> The live run is complete. Log file is attached.
I've reviewed the log file and I didn't find any new problems. We will review some of the documents in XMetal tomorrow morning before we proceed with a test run on Bach.
BZDATETIME::2013-06-06 11:47:19
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::18
(In reply to comment #17)
> (In reply to comment #16)
> > Created attachment 2326 [details]
> > Log file for live run on Franck
> >
> > The live run is complete. Log file is attached.
>
> I've reviewed the log file and I didn't find any new problems. We
will review
> some of the documents in XMetal tomorrow morning before we proceed
with a test
> run on Bach.
The Global appears to be working fine. But I found a problem with at least one of Spanish summaries where the text is copied from the English HP instead of the Spanish HP. For example: 470863. It appears to be a user error. Users are selecting the English HP in the PatienVersionOf element instead of the Spanish HP version (fixed in the above referenced summary now). I assume you're using these elements to determine which summary you copy the text from? Is it possible for your check for this error and report them?
BZDATETIME::2013-06-06 11:59:12
BZCOMMENTOR::Alan Meyer
BZCOMMENT::19
(In reply to comment #18)
> ... Users are
> selecting the English HP in the PatienVersionOf element instead of
the Spanish
> HP version (fixed in the above referenced summary now). I assume
you're using
> these elements to determine which summary you copy the text from?
Is it
> possible for your check for this error and report them?
Yes, I can do that and report it in the log file without updating the Patient Summary. It shouldn't be hard to do. I'll do it before we try the next test run. I'll look at the /Summary/SummaryLanguage element and make sure that they match in the Patient and HP summaries.
Will you want another test run on Franck or should we just do this as a test run on Bach?
BZDATETIME::2013-06-06 12:01:25
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::20
(In reply to comment #19)
> Will you want another test run on Franck or should we just do this
as a test
> run on Bach?
Thanks.
Test run on Bach should be fine.
BZDATETIME::2013-06-06 23:26:49
BZCOMMENTOR::Alan Meyer
BZCOMMENT::21
I made the requested change to check for language mismatches and ran the program in test mode on Bach. I figured that Franck wouldn't work for testing because we've already run in live mode there and changed most of the Patient Summaries, and running in test mode on Bach wouldn't hurt anything. So that's what I did.
There were three records that had the language mismatch. For one of them the error is reported twice, which is also the case for a different error. I don't know why that happens. I started to try to figure it out but decided that there were more important things to do.
The log file is attached.
The document outputs can be found at:
http://bach.nci.nih.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2013-06-06_22-50-52
Attachment Request5290test.log has been added with description: Log file for test run on Bach
BZDATETIME::2013-06-07 10:06:36
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::22
I have reviewed the log file and all of the errors have been fixed on Bach. Please proceed with the live run on Bach.
The global identified 6 summaries that had user errors. The errors have the potential of messing up the toggle on Cancer.gov. So, I am wondering if we can reuse your selection criteria by turning it into an ad hoc query that I can run periodically to identify such errors. You don't have to do this right away and we can talk about this later. Another option could be to have the publishing filters identify these errors and report them after publishing.
BZDATETIME::2013-06-07 14:55:11
BZCOMMENTOR::Alan Meyer
BZCOMMENT::23
I'll coordinate with Volker on the live run so we don't dump all the modified summaries at once in tonight's publishing job.
The way that the indexes are currently defined, it would not be possible to do the tests that I did using a SQL query. I did them by loading all of the patient summaries, parsing them, finding the corresponding hp summaries, parsing them, and then comparing values. That would run for about 10-15 minutes.
If we index the PatientVersionOf element, which would have no negative effects that I can think of, we can do the language comparison as a query, or possibly even as a part of the validation for Patient Summaries. Some of the others might be doable too.
However, since it wouldn't just be a simple extract of something from this program, I recommend we create a new issue and list all of the new checks we want, either as validations or as a query report.
BZDATETIME::2013-06-07 14:57:08
BZCOMMENTOR::Alan Meyer
BZCOMMENT::24
Volker,
Should I run the global change on Bach this weekend, doing the whole 360 or so Patient Summaries? Or would it be easier to just do 50 at a time for a while starting today before the publishing run?
BZDATETIME::2013-06-07 15:12:26
BZCOMMENTOR::Volker Englisch
BZCOMMENT::25
I would prefer if you'd run the full set of summaries at once and then give me the log file or the list of CDR-IDs (either way is fine with me). If you're running the entire set I could start hot-fixing summaries in small batches over the weekend without having to request from you to run additional batches.
Publishing starts at 4pm. Depending on the load that your process adds to the system you could do the conversion any time after 4:01pm. If you'd rather wait until publishing has completed you should be able to run the conversion at any time after midnight tonight.
BZDATETIME::2013-06-07 15:16:01
BZCOMMENTOR::Alan Meyer
BZCOMMENT::26
I'll run the program after publishing starts.
On an empty machine, it ran in 18 minutes on Bach in test mode, say 25 in live mode. So at worst it should add 25 minutes to the publishing run, but it probably won't be that bad.
BZDATETIME::2013-06-07 15:41:14
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::27
Volker, unless I am getting confused, I thought there was supposed to be changes to the vendor filters with regards to the purpose text. But we haven't done any changes or tests yet.
BZDATETIME::2013-06-07 15:43:43
BZCOMMENTOR::Margaret Beckwith
BZCOMMENT::28
That's true. This text will be used in the About This PDQ Summary section when we get that finished and test publishing it. Right now, the element and text is just being added to the summary but it won't show up anywhere on Cancer.gov.
BZDATETIME::2013-06-07 15:44:59
BZCOMMENTOR::Volker Englisch
BZCOMMENT::29
That's right. If the purpose text isn't used for anything except for
those future changes we won't have to worry about publishing until the
filter changes are in place.
In that case we won't have to worry about a load of summaries being
updated.
BZDATETIME::2013-06-09 12:24:59
BZCOMMENTOR::Alan Meyer
BZCOMMENT::30
I've made the live run on Bach and attached the log file.
There were a couple of validation warnings and a 22 errors in which the Patient Summary did not have a PatientVersionOf element. I assume those were expected. If and when those are all fixed it should be safe to re-run the global if desired in order to populate the PurposeText for those documents. Summaries already modified by the global won't be modified again.
Now the trick will be to remember that this has been done some time in the future when the new PurposeText is actually used in publishing so that we hotfix them into cancer.gov in small numbers rather than publish them all at once. However, it's likely that we will remember since the publishing filter will be changed - which would likely cause the same problem whether or not we had made this particular fix.
Attachment Request5290live.log has been added with description: Log file for live run on Bach
I haven't heard any more about this in a while. Can we close the issue?
Work is complete.
File Name | Posted | User |
---|---|---|
Request5290.log | 2013-06-05 10:31:11 | |
Request5290.log | 2013-05-30 22:50:24 | |
Request5290live.log | 2013-06-09 12:24:59 | |
Request5290live.log | 2013-06-05 16:00:30 | |
Request5290test.log | 2013-06-06 23:26:49 | |
titles.txt | 2013-05-20 15:10:26 |
Elapsed: 0:00:00.001563