Issue Number | 3885 |
---|---|
Summary | [Summary] Enhancements for NLM/PubMed |
Created | 2015-03-23 16:30:07 |
Issue Type | New Feature |
Submitted By | Beckwith, Margaret (NIH/NCI) [E] |
Assigned To | Kline, Bob (NIH/NCI) [C] |
Status | Closed |
Resolved | 2015-06-17 16:17:16 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/ocecdr/issue.157384 |
We are working with NLM to make the PDQ summaries available through
PubMed. In order to do that we agreed that we will add two elements to
the Summary Metadata that we export.
1. Editorial Board Name ("PDQBoard" already exists but we need to add it
to the vendor output).
2. PubMed Information (wrapper) with PubMed Abstract and PubMed Key
Words.
We will populate the Abstract information with text from a miscellaneous document, probably using what we have in the About This PDQ Summary section/Purpose and Reviewers/Updates (first paragraph) to begin with.
Obviously we need to see how this fits in with the various CDR releases in the works.
Decided in status meeting 2015-04-16:
A new abstract block and multiply-occurring keyword elements will be added to the schema and populated by a global change which pulls in text from a miscellaneous document (for the abstract) and from the main and secondary topics in the summary metadata (for the keywords).
The new element names will be SummaryKeyWords (with SummaryKeyWord children) and SummaryAbstract. Both of these will be contained in the SummaryMetaData block (at the end).
The SummaryAbstract will contain inline markup, and possibly multiple paragraphs. Details to be decided.
After the new elements have been seeded by the global change the data will be maintained by hand.
The new keyword elements will not use a controlled value list.
Two questions:
How do I know which miscellaneous document should be used for the abstract?
Have the details been decided for what markup is to be allowed in the SummaryAbstract block?
/Summary/SummaryMetaData/SummaryKeyWords/SummaryKeyWord
has been added to the summary schema (on DEV), as well as to the query term definition table. Holding off on work on the abstract part while waiting for guidance on the questions in the previous comment.
I have written the half of the global change script which will populate the new SummaryKeywords block, and it's running on DEV in test mode. Even though it hasn't finished yet, you can see the results for the documents that have already been processed:
https://cdr.dev.cancer.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?Session=guest
Will the information "added to the vendor output" (see original request description) also be sent to cancer.gov? If so, we may need to spread the global change out over block of time long enough that the number of re-published summaries that go out for any one job doesn't overwhelm GateKeeper and Percussion. If not, we'll probably want Volker to test that the resulting documents after stripping the new parts out won't cause the publishing system to think the summary documents have been modified.
Hi Bob,
Margaret and I met and have decided to use the following text to initially populate the abstract field. The About This PDQ Summary miscellaneous document (CDR684055) is already pulling this text together. It is the first two paragraphs of this section.
This PDQ cancer information summary for health professionals provides comprehensive, peer-reviewed, evidence-based information about @@PURPOSE TEXT@@. It is intended as a resource to inform and assist clinicians who care for cancer patients. It does not provide formal guidelines or recommendations for making health care decisions.
This summary is reviewed regularly and updated as necessary by the @@BOARD NAME@@, which is editorially independent of the National Cancer Institute (NCI). The summary reflects an independent review of the literature and does not represent a policy statement of NCI or the National Institutes of Health (NIH).
We are also fine with using the main and secondary topics for the keywords, as we discussed earlier.
We have a question about modules. Since this will be going to NLM after the documents have been assembled in the vendor filter, is it correct to assume that NLM will NOT be receiving individual summary modules that are NOT published as separate summaries on Cancer.gov (i.e., they have the ModuleOnly=Yes attribute populated)?
Thanks!
Are we using this English text for the Spanish summaries as well?
As for the ModuleOnly summaries, I wouldn't think we'd want to give those to NLM. We can make the software do whatever you want for this. Has it been decided how the documents will be delivered to or retrieved by NLM?
My last job blew up when it hit a summary without the required Language element. I'll fix the script so it skips such documents, but meanwhile you might want to review the results (it got most of the way through the test run):
https://cdr.dev.cancer.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2015-06-08_11-19-12
We should probably nail down answers to at least the first question in my previous comment before I run the next test. For this this one, I used the English text given above for all the summaries, but used the filter's logic for filling in the placeholders (so, the Spanish version of the board name for Spanish summaries). We'll surely want to change one or the other of those choices so the abstract text is all in the same language.
I spot-checked several summaries on the diff report and they all looked good to me.
Margaret and I discussed the Spanish summaries and I think we will use the same abstract as the English (including the English Board name), but she is checking with NLM about how they plan to handle the Spanish summaries. We might need to add a sentence to say that it is in Spanish if they don't plan to designate in another way that it is in Spanish, but hopefully they'll be able to do that with the data they're getting. We'll keep you posted.
NLM will also be getting the patient summaries, so we'd like to use the following abstract for those:
This PDQ cancer information summary has current information about @@PURPOSETEXT@@. It is meant to inform and help patients, families, and caregivers. It does not give formal guidelines or recommendations for making decisions about health care.
Editorial Boards write the PDQ cancer information summaries and keep
them up to date. These Boards are made up of experts in cancer treatment
and other specialties related to cancer. The summaries are reviewed
regularly and changes are made when there is new information. The date
on each summary ("Date Last Modified") is the date of the most recent
change.
The information in this patient summary was taken from the health
professional version, which is reviewed regularly and updated as needed,
by the @@BOARDNAME@@.
We decided in the status meeting to use the general Para element rather than a specific SummaryAbstractPara for this new information.
I have been looking a bit at the results that Bob generated using the
link above, and noticed a couple of things I thought I would just
mention. It may be that things are being worked on based on our
conversation in the meeting today, and these will be taken care of by
that.
1. 62749 is an old, blocked summary so it doesn't have Purpose text.
This is probably not a problem but we don't really need to put in these
elements for Blocked documents.
2. For the Spanish summaries, the Abstract is in English except that the
Purpose text is being pulled in from the Spanish summary and so the text
switches to Spanish mid sentence. We want the entire abstract to be in
English. As I said, Bob, you may be still be working on this, but just
wanted to let you know.
I will be in the office for a short time tomorrow morning so could probably take a quick look at this, but since Robin is also out tomorrow, we may need to just check this more thoroughly on QA.
Picking up work on the new requirements for this ticket. Don't want this question (from earlier this month) to fall through the cracks:
Has it been decided how the documents will be delivered to or retrieved by NLM?
Margaret mentioned the following in an email to NLM last week: "As I mentioned before, we are hoping that these changes to the XML will be completed and available in the data you receive sometime in early August."
Based on this comment it sounds to me like they will be receiving these data the same way they are receiving our data for PubMed Health and the NCBI bookshelf, but I'm not sure about that. Volker, do you know?
Here's another question (above) for which I don't see an answer in the ticket's comments:
Will the information "added to the vendor output" (see original request description) also be sent to cancer.gov?
I don't think Cancer.gov will have any use for this new information, but would it cause problems if it were sent to them or would it be ignored?
Has it been decided how the documents will be delivered to or retrieved by NLM?
The National Center for Biotechnology Information (NCBI) is already a content partner. I would think they will continue picking up the data via the usual process.
Thanks, Volker. So we just need to add these to the regular vendor filter?
Robin: can you add a ticket for Volker to modify the vendor filter (if there isn't one already)? Thanks!
...would it cause problems if it were sent to them or would it be ignored?
Do we need to check with Blair and Aarti about this, Volker?
So we just need to add these to the regular vendor filter?
Yes.
We also need a ticket for the DTD change.
Do we need to check with Blair and Aarti about this
Yes, we do.
Global change implementation complete and ready for user testing on DEV.
I have folded in the latest requirements:
modified the schema to use Para elements of type PhraseLevel for the abstracts
replaced Spanish board name with English version for the Spanish summaries
no longer adding the elements to ModuleOnly summaries
excluding blocked documents from the global change
using separate text for the patient summaries
I am running a fresh test global change job on DEV, and you can look at the output for the ones that are done even before the job completes. If we've finished nailing down all the requirements, and you confirm that I've implemented them correctly, I'll check this into the new Subversion branch in preparation for deployment to QA. If we haven't, I'll need to let Amy know that we may need more time before deploying to QA (I've told her I'm shooting for tomorrow).
http://cdr.dev.cancer.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2015-06-17_16-10-34
Two questions so far:
1. In the patient summary abstract, will ("Date Last Modified") be filled in with actually quotation marks? I'm guessing this is fine but I am reporting just in case.
2. Should I be seeing any of these elements in the summary docs or was this just a test run (with the report) for now?
Thanks.
Of course, in my comment above the ascii code for a quotation mark (as I'm seeing in the diff report) filled itself in, so my comment is probably pretty confusing. But, I'm asking whether it's ok that I'm seeing the ascii code as opposed to the symbol itself.
Yes, those will be real quotation marks (that's how they're represented in XML). And yes, that's a test run (which is why you get to use the ShowGlobalChangeTestResults interface to view the results). I'll do a live run on DEV when you're satisfied we've got it right.
Thanks for answering my questions, Bob. Sounds good.
For Spanish summaries, the PurposeText for the corresponding English summary should be used to populate the first sentence of the abstract rather than the Spanish PurposeText.
Every Spanish summary I've looked at so far has "Inactive" as the Board name (same on prod). William, do you know if that is intentional? If it is, we may need to tweak the wording of the abstract or draw that from the corresponding English summary.
I believe that is intentional but I will confirm with Linda tomorrow morning.
I have incorporated the latest changes ...
use PurposeText from the English summary, not the Spanish summary
get the board name from the English summary
... and I'm running another job:
https://cdr.dev.cancer.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2015-06-17_18-05-57
In version control for this ticket:
R13195 /branches/Curie/DevTools/GlobalChange/ocecdr-3885.py
R13197 /branches/Curie/Schemas/SummarySchema.xml
This looks good to me. Ready for the live run.
Live job has started on DEV. Will update the ticket when it has completed.
I believe that is intentional but I will confirm with Linda tomorrow morning.
I confirmed with Linda. Since there are no corresponding Spanish editorial boards and the SummaryBoard element is required in the metadata section, a 'fake' Spanish board needed to be created and linked to each of the Spanish summaries and it looks like in order to prevent it from being accidentally published or added to other documents, the fake Spanish board had to be blocked and inactivated. That is why they appear as "Inactive" in the metadata. It looks like only one Spanish organization document is used for all Spanish summaries - CDR0000256088 - PDQ Spanish Editorial Board
The few docs I've looked at so far on DEV look good - they have the right information. However, the formatting of the elements is not good. Is it possibly to line up the key words? I'll attach a screenshot of an example that is especially difficult to read.
I've updated the CSS file for the normal view but not for the structure view. Please add another ticket for Darwin to adjust the structure view as well.
R13223: Summary.css
This looks MUCH better. Thank you, Volker!!!
And thanks Bob for all of the hard work on this one! Verified on DEV.
As I just reported in an email to Bob, Publish Preview is no longer working for summaries on DEV. This appears to be related to this issue. Here's the error message I'm receiving:
CDRPreview web service error: Xml data validation error,The 'SummaryKeyWord' element is not declared.Validation error occurred when validating the instance document.,1285,2
Could you give it another try? Apparently the DTD files aren't handled by the build/deploy scripts.
Still getting the same error. I'll try closing XMetaL and logging back in.
I'm now getting an error message about the DTD when I open a summary document in XMetaL. I'll attach a screenshot.
Ah! I bet GateKeeper needs a new DTD. Right, ~volker? What's the SOP for handling DTD updates for GateKeeper in a release? Do I need to reach out to CBIIT?
The DTD failure in XMetaL is a separate problem. Can you close XMetaL and log back in and try it. I'll need to wait for an answer from Volker before we can address the first (pub preview) problem. If I don't hear from him before too long I'll reach out to Aarti or Blair.
Yes, that's correct. The error message is coming from GK. GK is
testing the documents against the DTD when they come in.
This relates back to your question from Wednesday:
Do we need to check with Blair and Aarti about this, Volker?
I'll send a message to Aarti to have the DTD uploaded to DEV and QA.
I logged out and back in and I'm now able to open summary documents without error but I'm still getting the Keyword error when I try to run Pub Preview.
Oh right. I see now that PP is what you are reaching out to Aarti/Blair about.
Correct! PP won't work until we've been able to have the DTD updated
on GK.
This actually reminds me of something else not directly related to this
story - there is currently no report that includes these new elements
and I'm not sure if our current reports should include the data. That's
probably another ticket for Darwin.
Yes, good idea. I think we/ll definitely want a report to view these. Adding it to my list of issues to add 🙂
I neglected to mention in all the flurry of comments about the DTDs on DEV that the global change on QA finished a while ago this morning.
Great! So are all non-Publish Preview items ready for testing on QA now, or just this one?
I believe a note from Amy is winging its way to your inbox as we
speak type. :-)
OK, I'll be patient. 🙂 Thanks!
The DTD has been updated on QA. PublishPreview reports are working again.
We've noticed a few summaries on QA that do not have the new elements. I suspect they were all checked out at the time the global was run, but could you please confirm? Here are the CDR IDs.
CDR0000062972 - 714-X HP
CDR0000446580 - 714-X Patient
CDR0000445441 - Acupuncture HP
CDR0000458088 - Acupuncture Patient
CDR0000062687 - Colon Cancer Tx HP
CDR0000062759 - Cervical Cancer Tx HP
Right. They were all locked by Wiliam, except for CDR0000062687, which Margaret has locked.
We noticed a few other summaries that were missing the elements but I confirmed that each was locked.
Verified on QA.
Commented on the wrong issue. Sorry.
The global change to include the SummaryAbstract and
SummaryKeywords elements ran on PROD and this can be confirmed
by looking at the CDR document.
The modified filters are available as well and the PDQ partner output
does now include the additional element SummaryEditorialBoard
as well as the abstract and keywords.
You can verify this by running CDR documents through the vendor filter output in the Admin system
CIAT/OCCM Staff
--> Reports
--> General Reports
--> Filter Document
Enter set:Vendor Summary Set in the field for Filter 1
Please close this ticket when done.
Attached is the log file listing all of the validation errors that came up during the global change.
Please note that some messages appear to be duplicated. This would be because the validation errors were identical for the publishable version, last versioned version or current working copy.
As discussed at the status meeting I attached the log file from the global change run and I will close this ticket.
File Name | Posted | User |
---|---|---|
ModifyDocs.log | 2015-09-11 12:04:47 | Englisch, Volker (NIH/NCI) [C] |
screenshot-1.png | 2015-06-18 16:33:45 | Juthe, Robin (NIH/NCI) [E] |
screenshot-2.png | 2015-06-18 16:36:05 | Juthe, Robin (NIH/NCI) [E] |
screenshot-3.png | 2015-06-19 12:14:44 | Juthe, Robin (NIH/NCI) [E] |
Elapsed: 0:00:00.001250