Issue Number | 5245 |
---|---|
Summary | Importing SVPC summaries from Drupal into the CDR |
Created | 2023-06-02 10:06:46 |
Issue Type | New Feature |
Submitted By | Osei-Poku, William (NIH/NCI) [C] |
Assigned To | Kline, Bob (NIH/NCI) [C] |
Status | Closed |
Resolved | 2023-07-13 09:38:58 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/ocecdr/issue.347667 |
Starting with Stomach Cancer SVPC summaries, summaries were created in Drupal instead of the CDR. With this approach, there is no efficient way to update the partner summaries without either creating new SVPC summaries in the CDR or recreating the partner summaries from scratch by manually copying data from Drupal. This approach would be too time consuming. We would want to be able to get programming help in importing relevant data from Drupal into the CDR for these summaries and subsequent summaries that are created in Drupal. Attached to this ticket is the spreadsheet with mapping of the elements/fields/values. Please let me know if you have any questions. stomach-cancer-fields-2023061.xlsx
Some of the mappings you have requested are puzzling. For example:
you've asked for node/default_langcode/value
(the
flag indicating whether this page is in the default language for the
node to be copied to the
Summary/SummaryMetaData/SummaryLanguage
element's text
content; why would you want a "1" or a "0" in that element?
you've asked for node/field_browser_title/value
to
be copied to Summary/AltTitle
but you don't want the
node/field_card_title/value
(even though the
CancerTypeHomePage AltTitle
is required
I'll hold off on any further work on the scripting to give you a chance to go over your mappings carefully to ensure that they're asking for what you really want.
I've posted the XML generated by your current mappings (working just from the mappings on the nodes page, skipping over the other two tabs on the spreadsheet, since there's no guidance yet provided for what to do with the HTML markup), in the hope that it might be helpful as you refine your mappings.
I went ahead and added the sections to give you a better picture of the work that needs to be done mapping what needs to happen with the HTML markup.
I have attached the updated spreadsheet and added information under the HTML Markup tab. I also made some minor changes in the Nodes table.
I used only one of the generated XML files to complete the spreadsheet because I thought it was representative of all of them. It is likely that I may have to review the other ones if I did not provide you with the needed information.
In all cases, please do not display the HTML tags and attributes in the XML if possible.
Some of the information is repeated in the spreadsheet so please let me know if you need any clarification.
Also, please let me know if this is what you expect or not and I will be happy to revise it. Thanks
stomach-cancer-fields-2023061_updated_06092023.xlsx Attaching the correct file.
New XML documents have been generated and attached.
Thanks, Bob! This is looking really great.
I have attached an updated spreadsheet with the yellow highlights under the Node tab.
For 1245761.xml, it looks like part of the last Summary Section in the document with the Section Title "Environmental and occupational exposures". There is no data under the Summary Title in the XML and I couldn't figure out why.
Could you please include in the next XMLs, https://www.cancer.gov/types/stomach/diagnosis. I think the node id is: 1245996. It contains a media document and I would like to know how that would come out in the XML.
Thanks
Attaching the right file.
Trying to attach the file again. stomach-cancer-fields-2023061_updated_06092023_11PM.xlsx
Fresh set added. Perhaps your copies of the zip files are corrupted,
but I've been including 1245996 all along. The CdrDocCtl
block is not stored in the CDR but is instead created on the fly when a
request for the document is received from XMetaL.
Thanks Bob! This looks good. It looks like we've gotten everything we need. The images don't have CDR IDs in Drupal so, we wouldn't be able to match them like the glossary documents. I believe this is the only thing missing from the XML.
How can we get this into the CDR? I would like some of the editors to review them in the CDR. I tried copying one into QA but I had to make a lot of modifications for it work. That was why I asked you to add the CdrDocCtl.
[JIRA ate the previous comment.]
Installed on CDR DEV as
CDR807335
CDR807336
CDR807337
CDR807338
I tried copying one into QA but I had to make a lot of modifications for it work. That was why I asked you to add the CdrDocCtl.
Ah, I don't see how I could have possibly deduced that from the requirements implied by the ticket's description.
We would want to be able to get programming help in importing relevant data from Drupal into the CDR for these summaries
That sounded very much like you wanted the documents to be imported programmatically. I have replaced the XML set.
In reviewing the documents on DEV, it looks like the Summary Key Words (SummaryKeywords/SummaryKeyword) block has a capitalization issue with the "W" in keyword not capitalized, which prevents the document from validating. I was able to manually correct them. Other than that, everything else looks good and it looks like we get 99.9% of the summary in there. The only piece that needs to be created manually is the Media document. Thank you!!!
Please generate another set of XML files for me when the Summary Key Word element is fixed. Thanks!
Fresh set posted.
For the MainTopics block, please include the child element (Terms). Thanks!
Also, if possible, please add the new attribute created in OCECDR-5251
Do-Not-Push-To-Drupal = " Yes"
Providing us with a well-formed set of XML files that we can import into the CDR to create a summary document should be good. Thanks!
Fresh set posted.
Please correct the MainTopics child element to "Term" instead of "Terms", which I incorrectly provided earlier.
Also, since we are no longer going to need to make any filter changes, please remove the "Do-Not-Push-To-Drupal = " Yes" from the XML.
Please include the Module Only attribute in the XML as ModuleOnly = "Yes".
Please generate a new set of test XML documents after the changes are completed.
Fresh set posted.
Thanks! Could you please add the XML declaration and the DTC reference
<?xml version="1.0"?>
<!DOCTYPE Summary SYSTEM "Summary.dtd">
I am also attaching a spreadsheet with all the Node IDs for the English documents.
Please generate a fresh set of data when changes are completed.
Now that I have been given a larger set of node IDs to process, I see that the content authors aren't using the same type for each of the nodes. The ones I've been processing so far have been cgov article nodes. Now I'm seeing nodes which are mini landing pages, with different structures than the articles. So let's back up and see if we can nail down what the software should be able to expect. Have the content authors been given specific guidelines ("use only the following types in the following ways, honoring the following constraints")? If so, it would be helpful (by which I mean much less expensive) if such guidelines were provided to the developers, preferably as early in the project as possible. If instead the authors were told "here's Drupal; poke around and see what you can find; if you like it use it, in any way you see fit" then you should be aware that you're going to be facing a certain amount of frustration as the the project evolves. It's going to be a little bit like trying to dance on quicksand.
The content types the authors use largely depend on the topic being worked on and they are provided specific content types to use. They are usually not aware of which content type to use until they are ready to create the content in Drupal and they are told which content type to use. I can find out if we might use other content types but I really doubt we would need any other content type beside cgov_article and cgov_mini_landing imported into the CDR. These two content types follow the CDR summary structure more than the other content types.
No idea what "DTC" means but I've generated and attached another set.
Sorry for the typo. It should be "DTD".
Please add AvailableAsModule = Yes to mapping and generate a new set for all the documents. Thanks!
Wouldn't that be redundant information? How could a summary marked as "module only" NOT be available as a module?
Yes, it seems redundant but I think that was the approach we took from the beginning. If you look at your comment in OCECDR-3644, that was one of the scenarios you gave, and it looks like that was what we decided to proceed with.
New set posted.
Thanks! The summaries have successfully been created on PROD. I will enter a new ticket for any future enhancements.
File Name | Posted | User |
---|---|---|
cgov_19035.csv | 2023-06-22 19:23:23 | Osei-Poku, William (NIH/NCI) [C] |
stomach-cancer-fields-2023061_updated_06092023_11PM.xlsx | 2023-06-09 23:29:59 | Osei-Poku, William (NIH/NCI) [C] |
stomach-cancer-fields-2023061_updated_06092023.xlsx | 2023-06-09 12:25:41 | Osei-Poku, William (NIH/NCI) [C] |
stomach-cancer-fields-2023061_updated_06092023-1.xlsx | 2023-06-09 23:27:41 | Osei-Poku, William (NIH/NCI) [C] |
stomach-cancer-fields-2023061.xlsx | 2023-06-02 10:06:38 | Osei-Poku, William (NIH/NCI) [C] |
stomach-cancer-xml-20230615.zip | 2023-06-15 13:19:24 | Kline, Bob (NIH/NCI) [C] |
stomach-cancer-xml-20230621.zip | 2023-06-21 15:49:19 | Kline, Bob (NIH/NCI) [C] |
stomach-cancer-xml-20230623.zip | 2023-06-23 08:43:01 | Kline, Bob (NIH/NCI) [C] |
stomach-cancer-xml-20230707.zip | 2023-07-07 11:22:01 | Kline, Bob (NIH/NCI) [C] |
Elapsed: 0:00:00.001465