CDR Tickets

Issue Number 3885
Summary [Summary] Enhancements for NLM/PubMed
Created 2015-03-23 16:30:07
Issue Type New Feature
Submitted By Beckwith, Margaret (NIH/NCI) [E]
Assigned To Kline, Bob (NIH/NCI) [C]
Status Closed
Resolved 2015-06-17 16:17:16
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.157384
Description

We are working with NLM to make the PDQ summaries available through PubMed. In order to do that we agreed that we will add two elements to the Summary Metadata that we export.
1. Editorial Board Name ("PDQBoard" already exists but we need to add it to the vendor output).
2. PubMed Information (wrapper) with PubMed Abstract and PubMed Key Words.

We will populate the Abstract information with text from a miscellaneous document, probably using what we have in the About This PDQ Summary section/Purpose and Reviewers/Updates (first paragraph) to begin with.

Obviously we need to see how this fits in with the various CDR releases in the works.

Comment entered 2015-04-16 11:37:18 by Kline, Bob (NIH/NCI) [C]

Decided in status meeting 2015-04-16:

  • A new abstract block and multiply-occurring keyword elements will be added to the schema and populated by a global change which pulls in text from a miscellaneous document (for the abstract) and from the main and secondary topics in the summary metadata (for the keywords).

  • The new element names will be SummaryKeyWords (with SummaryKeyWord children) and SummaryAbstract. Both of these will be contained in the SummaryMetaData block (at the end).

  • The SummaryAbstract will contain inline markup, and possibly multiple paragraphs. Details to be decided.

  • After the new elements have been seeded by the global change the data will be maintained by hand.

  • The new keyword elements will not use a controlled value list.

Comment entered 2015-05-27 18:22:55 by Kline, Bob (NIH/NCI) [C]

Two questions:

  1. How do I know which miscellaneous document should be used for the abstract?

  2. Have the details been decided for what markup is to be allowed in the SummaryAbstract block?

Comment entered 2015-06-01 14:35:23 by Kline, Bob (NIH/NCI) [C]
/Summary/SummaryMetaData/SummaryKeyWords/SummaryKeyWord

has been added to the summary schema (on DEV), as well as to the query term definition table. Holding off on work on the abstract part while waiting for guidance on the questions in the previous comment.

Comment entered 2015-06-01 15:51:20 by Kline, Bob (NIH/NCI) [C]

I have written the half of the global change script which will populate the new SummaryKeywords block, and it's running on DEV in test mode. Even though it hasn't finished yet, you can see the results for the documents that have already been processed:

https://cdr.dev.cancer.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?Session=guest

Will the information "added to the vendor output" (see original request description) also be sent to cancer.gov? If so, we may need to spread the global change out over block of time long enough that the number of re-published summaries that go out for any one job doesn't overwhelm GateKeeper and Percussion. If not, we'll probably want Volker to test that the resulting documents after stripping the new parts out won't cause the publishing system to think the summary documents have been modified.

Comment entered 2015-06-05 12:04:28 by Juthe, Robin (NIH/NCI) [E]

Hi Bob,

Margaret and I met and have decided to use the following text to initially populate the abstract field. The About This PDQ Summary miscellaneous document (CDR684055) is already pulling this text together. It is the first two paragraphs of this section.

This PDQ cancer information summary for health professionals provides comprehensive, peer-reviewed, evidence-based information about @@PURPOSE TEXT@@. It is intended as a resource to inform and assist clinicians who care for cancer patients. It does not provide formal guidelines or recommendations for making health care decisions.

This summary is reviewed regularly and updated as necessary by the @@BOARD NAME@@, which is editorially independent of the National Cancer Institute (NCI). The summary reflects an independent review of the literature and does not represent a policy statement of NCI or the National Institutes of Health (NIH).

We are also fine with using the main and secondary topics for the keywords, as we discussed earlier.

We have a question about modules. Since this will be going to NLM after the documents have been assembled in the vendor filter, is it correct to assume that NLM will NOT be receiving individual summary modules that are NOT published as separate summaries on Cancer.gov (i.e., they have the ModuleOnly=Yes attribute populated)?

Thanks!

Comment entered 2015-06-08 09:24:38 by Kline, Bob (NIH/NCI) [C]

Are we using this English text for the Spanish summaries as well?

As for the ModuleOnly summaries, I wouldn't think we'd want to give those to NLM. We can make the software do whatever you want for this. Has it been decided how the documents will be delivered to or retrieved by NLM?

Comment entered 2015-06-09 08:32:15 by Kline, Bob (NIH/NCI) [C]

My last job blew up when it hit a summary without the required Language element. I'll fix the script so it skips such documents, but meanwhile you might want to review the results (it got most of the way through the test run):

https://cdr.dev.cancer.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2015-06-08_11-19-12

We should probably nail down answers to at least the first question in my previous comment before I run the next test. For this this one, I used the English text given above for all the summaries, but used the filter's logic for filling in the placeholders (so, the Spanish version of the board name for Spanish summaries). We'll surely want to change one or the other of those choices so the abstract text is all in the same language.

Comment entered 2015-06-09 18:48:52 by Juthe, Robin (NIH/NCI) [E]

I spot-checked several summaries on the diff report and they all looked good to me.

Margaret and I discussed the Spanish summaries and I think we will use the same abstract as the English (including the English Board name), but she is checking with NLM about how they plan to handle the Spanish summaries. We might need to add a sentence to say that it is in Spanish if they don't plan to designate in another way that it is in Spanish, but hopefully they'll be able to do that with the data they're getting. We'll keep you posted.

NLM will also be getting the patient summaries, so we'd like to use the following abstract for those:

This PDQ cancer information summary has current information about @@PURPOSETEXT@@. It is meant to inform and help patients, families, and caregivers. It does not give formal guidelines or recommendations for making decisions about health care.

Editorial Boards write the PDQ cancer information summaries and keep them up to date. These Boards are made up of experts in cancer treatment and other specialties related to cancer. The summaries are reviewed regularly and changes are made when there is new information. The date on each summary ("Date Last Modified") is the date of the most recent change.
The information in this patient summary was taken from the health professional version, which is reviewed regularly and updated as needed, by the @@BOARDNAME@@.

Comment entered 2015-06-11 13:57:09 by Juthe, Robin (NIH/NCI) [E]

We decided in the status meeting to use the general Para element rather than a specific SummaryAbstractPara for this new information.

Comment entered 2015-06-11 16:50:54 by Beckwith, Margaret (NIH/NCI) [E]

I have been looking a bit at the results that Bob generated using the link above, and noticed a couple of things I thought I would just mention. It may be that things are being worked on based on our conversation in the meeting today, and these will be taken care of by that.
1. 62749 is an old, blocked summary so it doesn't have Purpose text. This is probably not a problem but we don't really need to put in these elements for Blocked documents.
2. For the Spanish summaries, the Abstract is in English except that the Purpose text is being pulled in from the Spanish summary and so the text switches to Spanish mid sentence. We want the entire abstract to be in English. As I said, Bob, you may be still be working on this, but just wanted to let you know.

I will be in the office for a short time tomorrow morning so could probably take a quick look at this, but since Robin is also out tomorrow, we may need to just check this more thoroughly on QA.

Comment entered 2015-06-17 15:00:08 by Kline, Bob (NIH/NCI) [C]

Picking up work on the new requirements for this ticket. Don't want this question (from earlier this month) to fall through the cracks:

Has it been decided how the documents will be delivered to or retrieved by NLM?

Comment entered 2015-06-17 15:06:52 by Juthe, Robin (NIH/NCI) [E]

Margaret mentioned the following in an email to NLM last week: "As I mentioned before, we are hoping that these changes to the XML will be completed and available in the data you receive sometime in early August."

Based on this comment it sounds to me like they will be receiving these data the same way they are receiving our data for PubMed Health and the NCBI bookshelf, but I'm not sure about that. Volker, do you know?

Comment entered 2015-06-17 15:32:56 by Kline, Bob (NIH/NCI) [C]

Here's another question (above) for which I don't see an answer in the ticket's comments:

Will the information "added to the vendor output" (see original request description) also be sent to cancer.gov?

Comment entered 2015-06-17 15:43:27 by Juthe, Robin (NIH/NCI) [E]

I don't think Cancer.gov will have any use for this new information, but would it cause problems if it were sent to them or would it be ignored?

Comment entered 2015-06-17 15:47:36 by Englisch, Volker (NIH/NCI) [C]

Has it been decided how the documents will be delivered to or retrieved by NLM?

The National Center for Biotechnology Information (NCBI) is already a content partner. I would think they will continue picking up the data via the usual process.

Comment entered 2015-06-17 15:55:26 by Kline, Bob (NIH/NCI) [C]

Thanks, Volker. So we just need to add these to the regular vendor filter?

Robin: can you add a ticket for Volker to modify the vendor filter (if there isn't one already)? Thanks!

Comment entered 2015-06-17 15:56:58 by Kline, Bob (NIH/NCI) [C]

...would it cause problems if it were sent to them or would it be ignored?

Do we need to check with Blair and Aarti about this, Volker?

Comment entered 2015-06-17 16:07:11 by Englisch, Volker (NIH/NCI) [C]

So we just need to add these to the regular vendor filter?

Yes.

We also need a ticket for the DTD change.

Do we need to check with Blair and Aarti about this

Yes, we do.

Comment entered 2015-06-17 16:17:16 by Kline, Bob (NIH/NCI) [C]

Global change implementation complete and ready for user testing on DEV.

Comment entered 2015-06-17 16:29:35 by Kline, Bob (NIH/NCI) [C]

I have folded in the latest requirements:

  • modified the schema to use Para elements of type PhraseLevel for the abstracts

  • replaced Spanish board name with English version for the Spanish summaries

  • no longer adding the elements to ModuleOnly summaries

  • excluding blocked documents from the global change

  • using separate text for the patient summaries

I am running a fresh test global change job on DEV, and you can look at the output for the ones that are done even before the job completes. If we've finished nailing down all the requirements, and you confirm that I've implemented them correctly, I'll check this into the new Subversion branch in preparation for deployment to QA. If we haven't, I'll need to let Amy know that we may need more time before deploying to QA (I've told her I'm shooting for tomorrow).

http://cdr.dev.cancer.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2015-06-17_16-10-34

Comment entered 2015-06-17 17:26:39 by Juthe, Robin (NIH/NCI) [E]

Two questions so far:

1. In the patient summary abstract, will ("Date Last Modified") be filled in with actually quotation marks? I'm guessing this is fine but I am reporting just in case.

2. Should I be seeing any of these elements in the summary docs or was this just a test run (with the report) for now?

Thanks.

Comment entered 2015-06-17 17:28:35 by Juthe, Robin (NIH/NCI) [E]

Of course, in my comment above the ascii code for a quotation mark (as I'm seeing in the diff report) filled itself in, so my comment is probably pretty confusing. But, I'm asking whether it's ok that I'm seeing the ascii code as opposed to the symbol itself.

Comment entered 2015-06-17 17:30:11 by Kline, Bob (NIH/NCI) [C]

Yes, those will be real quotation marks (that's how they're represented in XML). And yes, that's a test run (which is why you get to use the ShowGlobalChangeTestResults interface to view the results). I'll do a live run on DEV when you're satisfied we've got it right.

Comment entered 2015-06-17 17:37:44 by Juthe, Robin (NIH/NCI) [E]

Thanks for answering my questions, Bob. Sounds good.

For Spanish summaries, the PurposeText for the corresponding English summary should be used to populate the first sentence of the abstract rather than the Spanish PurposeText.

Every Spanish summary I've looked at so far has "Inactive" as the Board name (same on prod). William, do you know if that is intentional? If it is, we may need to tweak the wording of the abstract or draw that from the corresponding English summary.

Comment entered 2015-06-17 17:53:34 by Osei-Poku, William (NIH/NCI) [C]

I believe that is intentional but I will confirm with Linda tomorrow morning.

Comment entered 2015-06-17 18:56:22 by Kline, Bob (NIH/NCI) [C]

I have incorporated the latest changes ...

  • use PurposeText from the English summary, not the Spanish summary

  • get the board name from the English summary

... and I'm running another job:

https://cdr.dev.cancer.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2015-06-17_18-05-57

Comment entered 2015-06-18 08:53:10 by Kline, Bob (NIH/NCI) [C]

In version control for this ticket:

  • R13195 /branches/Curie/DevTools/GlobalChange/ocecdr-3885.py

  • R13197 /branches/Curie/Schemas/SummarySchema.xml

Comment entered 2015-06-18 10:39:02 by Juthe, Robin (NIH/NCI) [E]

This looks good to me. Ready for the live run.

Comment entered 2015-06-18 10:58:44 by Kline, Bob (NIH/NCI) [C]

Live job has started on DEV. Will update the ticket when it has completed.

Comment entered 2015-06-18 11:19:53 by Osei-Poku, William (NIH/NCI) [C]

I believe that is intentional but I will confirm with Linda tomorrow morning.

I confirmed with Linda. Since there are no corresponding Spanish editorial boards and the SummaryBoard element is required in the metadata section, a 'fake' Spanish board needed to be created and linked to each of the Spanish summaries and it looks like in order to prevent it from being accidentally published or added to other documents, the fake Spanish board had to be blocked and inactivated. That is why they appear as "Inactive" in the metadata. It looks like only one Spanish organization document is used for all Spanish summaries - CDR0000256088 - PDQ Spanish Editorial Board

Comment entered 2015-06-18 16:33:14 by Juthe, Robin (NIH/NCI) [E]

The few docs I've looked at so far on DEV look good - they have the right information. However, the formatting of the elements is not good. Is it possibly to line up the key words? I'll attach a screenshot of an example that is especially difficult to read.

Comment entered 2015-06-18 17:29:43 by Englisch, Volker (NIH/NCI) [C]

I've updated the CSS file for the normal view but not for the structure view. Please add another ticket for Darwin to adjust the structure view as well.

  • R13223: Summary.css

Comment entered 2015-06-18 17:33:48 by Juthe, Robin (NIH/NCI) [E]

This looks MUCH better. Thank you, Volker!!!

Comment entered 2015-06-18 17:34:12 by Juthe, Robin (NIH/NCI) [E]

And thanks Bob for all of the hard work on this one! Verified on DEV.

Comment entered 2015-06-19 11:52:58 by Juthe, Robin (NIH/NCI) [E]

As I just reported in an email to Bob, Publish Preview is no longer working for summaries on DEV. This appears to be related to this issue. Here's the error message I'm receiving:

CDRPreview web service error: Xml data validation error,The 'SummaryKeyWord' element is not declared.Validation error occurred when validating the instance document.,1285,2

Comment entered 2015-06-19 12:00:57 by Kline, Bob (NIH/NCI) [C]

Could you give it another try? Apparently the DTD files aren't handled by the build/deploy scripts.

Comment entered 2015-06-19 12:12:32 by Juthe, Robin (NIH/NCI) [E]

Still getting the same error. I'll try closing XMetaL and logging back in.

Comment entered 2015-06-19 12:14:31 by Juthe, Robin (NIH/NCI) [E]

I'm now getting an error message about the DTD when I open a summary document in XMetaL. I'll attach a screenshot.

Comment entered 2015-06-19 12:18:27 by Kline, Bob (NIH/NCI) [C]

Ah! I bet GateKeeper needs a new DTD. Right, ? What's the SOP for handling DTD updates for GateKeeper in a release? Do I need to reach out to CBIIT?

Comment entered 2015-06-19 12:30:58 by Kline, Bob (NIH/NCI) [C]

The DTD failure in XMetaL is a separate problem. Can you close XMetaL and log back in and try it. I'll need to wait for an answer from Volker before we can address the first (pub preview) problem. If I don't hear from him before too long I'll reach out to Aarti or Blair.

Comment entered 2015-06-19 12:33:39 by Englisch, Volker (NIH/NCI) [C]

Yes, that's correct. The error message is coming from GK. GK is testing the documents against the DTD when they come in.
This relates back to your question from Wednesday:

Do we need to check with Blair and Aarti about this, Volker?

I'll send a message to Aarti to have the DTD uploaded to DEV and QA.

Comment entered 2015-06-19 12:52:55 by Juthe, Robin (NIH/NCI) [E]

I logged out and back in and I'm now able to open summary documents without error but I'm still getting the Keyword error when I try to run Pub Preview.

Comment entered 2015-06-19 12:53:54 by Juthe, Robin (NIH/NCI) [E]

Oh right. I see now that PP is what you are reaching out to Aarti/Blair about.

Comment entered 2015-06-19 13:04:36 by Englisch, Volker (NIH/NCI) [C]

Correct! PP won't work until we've been able to have the DTD updated on GK.
This actually reminds me of something else not directly related to this story - there is currently no report that includes these new elements and I'm not sure if our current reports should include the data. That's probably another ticket for Darwin.

Comment entered 2015-06-19 13:09:39 by Juthe, Robin (NIH/NCI) [E]

Yes, good idea. I think we/ll definitely want a report to view these. Adding it to my list of issues to add 🙂

Comment entered 2015-06-19 13:21:41 by Kline, Bob (NIH/NCI) [C]

I neglected to mention in all the flurry of comments about the DTDs on DEV that the global change on QA finished a while ago this morning.

Comment entered 2015-06-19 13:23:04 by Juthe, Robin (NIH/NCI) [E]

Great! So are all non-Publish Preview items ready for testing on QA now, or just this one?

Comment entered 2015-06-19 13:35:52 by Kline, Bob (NIH/NCI) [C]

I believe a note from Amy is winging its way to your inbox as we speak type. :-)

Comment entered 2015-06-19 13:42:58 by Juthe, Robin (NIH/NCI) [E]

OK, I'll be patient. 🙂 Thanks!

Comment entered 2015-06-19 15:08:14 by Englisch, Volker (NIH/NCI) [C]

The DTD has been updated on QA. PublishPreview reports are working again.

Comment entered 2015-06-22 12:59:46 by Juthe, Robin (NIH/NCI) [E]

We've noticed a few summaries on QA that do not have the new elements. I suspect they were all checked out at the time the global was run, but could you please confirm? Here are the CDR IDs.

CDR0000062972 - 714-X HP
CDR0000446580 - 714-X Patient
CDR0000445441 - Acupuncture HP
CDR0000458088 - Acupuncture Patient
CDR0000062687 - Colon Cancer Tx HP
CDR0000062759 - Cervical Cancer Tx HP

Comment entered 2015-06-22 13:13:36 by Kline, Bob (NIH/NCI) [C]

Right. They were all locked by Wiliam, except for CDR0000062687, which Margaret has locked.

Comment entered 2015-06-25 09:44:09 by Juthe, Robin (NIH/NCI) [E]

We noticed a few other summaries that were missing the elements but I confirmed that each was locked.

Verified on QA.

Comment entered 2015-09-01 13:40:12 by Osei-Poku, William (NIH/NCI) [C]

Commented on the wrong issue. Sorry.

Comment entered 2015-09-04 12:19:07 by Englisch, Volker (NIH/NCI) [C]

The global change to include the SummaryAbstract and SummaryKeywords elements ran on PROD and this can be confirmed by looking at the CDR document.
The modified filters are available as well and the PDQ partner output does now include the additional element SummaryEditorialBoard as well as the abstract and keywords.

You can verify this by running CDR documents through the vendor filter output in the Admin system

  • CIAT/OCCM Staff

  • --> Reports

  • --> General Reports

  • --> Filter Document

  • Enter set:Vendor Summary Set in the field for Filter 1

Please close this ticket when done.

Comment entered 2015-09-11 12:04:47 by Englisch, Volker (NIH/NCI) [C]

Attached is the log file listing all of the validation errors that came up during the global change.

Please note that some messages appear to be duplicated. This would be because the validation errors were identical for the publishable version, last versioned version or current working copy.

Comment entered 2015-09-11 12:06:45 by Englisch, Volker (NIH/NCI) [C]

As discussed at the status meeting I attached the log file from the global change run and I will close this ticket.

Attachments
File Name Posted User
ModifyDocs.log 2015-09-11 12:04:47 Englisch, Volker (NIH/NCI) [C]
screenshot-1.png 2015-06-18 16:33:45 Juthe, Robin (NIH/NCI) [E]
screenshot-2.png 2015-06-18 16:36:05 Juthe, Robin (NIH/NCI) [E]
screenshot-3.png 2015-06-19 12:14:44 Juthe, Robin (NIH/NCI) [E]

Elapsed: 0:00:00.001250