CDR Tickets

Issue Number 5104
Summary Browser Title for PDQ Summaries - Can we edit this in the CDR?
Created 2022-03-17 16:53:24
Issue Type Inquiry
Submitted By Juthe, Robin (NIH/NCI) [E]
Assigned To Kline, Bob (NIH/NCI) [C]
Status Closed
Resolved 2023-06-07 06:53:13
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.313385
Description

We discussed this question in today's status meeting and realized we don't know the answer. While there are three TitleType attributes on the AltTitle field in the CDR, it appears that only one of these ("Short") is being sent to Cancer.gov. The short title appears to be used on the CTHP. It's unclear how the browser title field in Drupal is being populated, and which field is used to provide the browser title on a PDQ summary.

Let's revisit this discussion with the Cancer.gov team. Just putting in this ticket as a reminder.

Comment entered 2022-09-01 15:46:17 by Kline, Bob (NIH/NCI) [C]
Comment entered 2022-09-01 19:50:19 by Englisch, Volker (NIH/NCI) [C]

As discussed, I'm including an earlier email (2022-03-21) here summarizing the data flow of the summary AltTitle from the CDR to Drupal:

 

Currently, we have three valid values for the TitleType attribute of the AltTitle element:

  • Short

  • Display (unused)

  • Navlabel (unused)

The Short alt title is a required element and the CDR validation rules restrict the Short alt title to 64 characters or fewer.
The Navlabel alt title is limited to 100 characters or fewer but the Drupal software no longer makes any use of this Navlabel alt title. What now becomes the left navigation label value in Drupal is instead derived from the Short alt title.

The short title is plugged into the "field_browser_title" for summary documents, which is in turn used for Drupal's left nav label as well as for the CTHP cards (Cancer Type Home Page). So we still have a nav label, but it’s not coming from the CDR AltTitle with the Navlabel attribute.

Originally, the Browser tab was supposed to display the Short AltTitle but it would cause the browser title for HP and Patient to be identical, i.e. “Breast Cancer Treatment” and SEO doesn't like that. This was likely the reason why the browser tab is now showing the full summary title which is often cut off because the browser tab only displays about 66 characters.

The left nav label is typically identical to the short title (a.k.a CTHP title) but it can be overwritten manually. Once the left nav label has been overwritten it won’t be changed anymore when the short title is updated.  In short:  A CDR document that has a modified short title will always update the CTHP title. It will only update the left nav title if that title is identical to the CTHP title and not set manually.
An example for a summary with differing document title, short title, and left nav label is “Wilms Tumor and other Childhood Kidney Tumors” https://www.cancer.gov/types/kidney.

Doc title: Wilms Tumor and Other Childhood Kidney Tumors Treatment (PDQ®)–Patient Version
Short title: Wilms Tumor and Other Childhood Kidney Tumors Treatment

Nav title: Wilms & Other Childhood Kidney Tumors Treatment

 

Setting the nav_title is done by going to www-cms-dev.cancer.gov, select the menu Structure -> Taxonomy -> SiteSection -> Home (click the children link) -> Cancer Types (click the children link) -> find the cancer type, i.e. Kidney Cancer (click the children link) -> Patient or HP (click the children link) --> click the Edit button for your summary

Comment entered 2022-09-01 19:51:35 by Englisch, Volker (NIH/NCI) [C]

Hi everyone,

My last email described in detail where the SummaryTitle and AltTitle content ends up in Drupal.  The understanding of the data flow for these elements, however, was just a related question to the original question from Robin if it is possible to control the browser title from within the CDR.

As we have seen, there are two elements in Drupal storing title information.  These are the elements

  • Title
    This element holds the full document title (CDR SummaryTitle plus PDQ©-audience) and is also used to populate the HTML “<title/>” element which is created by using this field and concatenating the site name “ – National Cancer Institute”.  The content of this element, or at least the first approx. 60 characters, are used for the text displayed on the tab and the hover text of the tab.
    Example:
    Doc title: Childhood Midline Tract Carcinoma Involving the NUT Gene (NUT Midline Carcinoma) Treatment (PDQ®)–Health Professional Version
    Tab title: Childhood Midline Tract Carcinoma Involving the NUT Gene (NUT Midline Carcinoma) Treatment (PDQ®)–Health Professional Version - National Cancer Institute

  • Browser Title
    The description for this element indicates that it is used as the browser title (concatenated with the site name “National Cancer Institute”) but that is incorrect (see above).  The field contains the CDR AltTitle (type=Short).  As mentioned yesterday, the original idea was probably to use this information as the browser title but when that decision changed the element’s name and description didn’t get adjusted.  This text is used for the navigation links.
    Example:
    Short title: Childhood Midline Tract Carcinoma Treatment

 

The question for Lindsay is now:
If the CDR were to provide an additional AltTitle (type=Tab), for example, would Drupal have a place to store this information in order to control the browser title?

Comment entered 2022-09-01 19:57:09 by Englisch, Volker (NIH/NCI) [C]

Somehow Jira decided 2 comments are enough. It won't let me add a third one ... or will it?

Comment entered 2022-09-01 20:00:29 by Englisch, Volker (NIH/NCI) [C]

As far as the question about whether Drupal would have a way to store an additional AltTitle from CDR to control the Browser title on summaries, I think this is something for the larger Digital Platform product dev team and probably not really something Lindsay can answer. A preliminary discussion of what would be involved/how it would work could be a good first step, and then we’ll likely need a Cancer.gov Digital Platform ticket, assuming this isn’t something that can be done with no changes.

PS – Overriding nav labels in the Drupal CMS is obviously a closely held function for IAs and not something most folks are allowed to do. Just wanted to mention that in case you were wondering or worried about changes being made without approval.

Comment entered 2022-09-01 20:02:17 by Englisch, Volker (NIH/NCI) [C]

The previous comment was Amy's response to my last email but Jira refused to add it via copy/paste.  I hope you all did not receive the comment multiple times.  If you did blame Jira. 🙂

Comment entered 2022-10-27 10:17:32 by Kline, Bob (NIH/NCI) [C]

The steps for implementing this will include:

  • add the new field to the pdq_cancer_information_summary type (Drupal/Bob)

  • modify the REST plugin code to accept and store the new field's value (Drupal/Bob)

  • modify the twig template(s) to adjust which field values are used where (Drupal/TBD)

  • modify the Summary schema and XMetaL CSS to support the new value (CDR/Bob)

  • populate the summary documents with the new value (CDR/PCIB+CIAT)

  • modify the publishing filters to pick up and export the new value (CDR/Volker)

We had spoken (in an earlier CDR/EBMS status meeting), I believe, about using type values for the AltTitle element reflecting the usage of titles on the web site. So, for example, we might run a global change renaming "Short" to "Browser" or perhaps cloning the "Short" AltTitle elements as "Browser" AltTitle elements, and adding "CTHP" as another valid type value. Let's discuss the options in this afternoon's status meeting.

Comment entered 2022-11-03 15:04:27 by Kline, Bob (NIH/NCI) [C]

The previous comment was as close as we have to a "recommendation" as to how to proceed (referencing comments from this afternoon's status meeting). Subsequent discussions (particularly comments from William) have us leaning in the direction of leaving what's in place now intact (so, continuing to allow the "Short" value, and cloning those elements with the new type name ("Browser") rather than modifying the existing AltTitle elements in place) which would allow us to make the schema changes and populate the new title elements in advance, anticipating what will be exported when the Drupal end of things is ready, instead of needing to coordinate the timing of these changes carefully with the web site. As I mentioned in the meeting, , it sounds like the Digital Platform team will be ready to get the ball rolling on their end when we have given them clear indications that our own armies are on the move.

Comment entered 2022-11-11 14:29:22 by Juthe, Robin (NIH/NCI) [E]

Hi , I think we can use the following names for the AltTitle values: Browser and CancerTypeHomePage. (I prefer to spell it out for clarity, even though it's long.) As I mentioned CTHPs are gradually being replaced, but that's a slow process and we can always revise the name of this attribute if/when we need to. 🙂

 

As for the globals, would it be possible to provide a spreadsheet with existing data for the AltTitle fields with Short and Navlabel attributes? While intuitively it seems to make sense to replace Short with Browser, I'd like to get a better sense of how the fields are currently populated before I say what should map to what. Thanks!

Comment entered 2022-11-11 17:58:05 by Kline, Bob (NIH/NCI) [C]

Spreadsheet attached. Shows all the titles for all the summaries.

Comment entered 2022-11-15 14:34:19 by Kline, Bob (NIH/NCI) [C]

The new AltTitleType valid values have been installed on DEV.

Comment entered 2023-01-11 09:04:39 by Kline, Bob (NIH/NCI) [C]

Any guidance on this, ? I've got my weekly Digital Platform status meeting in an hour, and I'd like to know what progress I can give them. Thanks! 😉

Comment entered 2023-01-12 13:30:26 by Juthe, Robin (NIH/NCI) [E]

Hi . I've reviewed the spreadsheet and I think we can just rename the title labeled as "Short" with "Browser". For the few summaries that have an AltTitle without an attribute, we can leave them as is (if allowable by the schema) or move them to the new "Browser" title type. 

It doesn't really matter what we do with the blocked summaries. For simplicity sake, maybe we just migrate those titles over as well? Or we could just leave them be.

Comment entered 2023-01-12 14:40:13 by Kline, Bob (NIH/NCI) [C]

For this ticket we will:

  • copy the value in the "Short" alt titles into the "Browser" and "CTHP" alt titles

  • remove the title with "Short" as the title type

  • remove the rule requiring exactly one Short title

  • add rules requiring exactly one Browser title and at most one CTHP title

  • leave the short title type as allowable so we can bring up older version without hassles

  • do nothing at all with blocked documents

Comment entered 2023-01-12 17:23:46 by Osei-Poku, William (NIH/NCI) [C]

Should we create a ticket for the global or there is already a ticket?

Comment entered 2023-01-12 18:27:29 by Kline, Bob (NIH/NCI) [C]

I'm doing it as part of this ticket.

Comment entered 2023-01-12 18:28:41 by Kline, Bob (NIH/NCI) [C]

Though a separate ticket would probably have been better. Next time. 😛

Comment entered 2023-01-12 21:02:33 by Kline, Bob (NIH/NCI) [C]

I created the global change job and started it in test mode on DEV. When I came back to check on the status of the job after dinner I found that the job had been killed, a victim of a recent change in the configuration of the servers. We used to be able to keep a login session alive on a CDR server as long as we didn't leave an idle RDP session connected to it. That's was broken by a change CBIIT made on the 3rd of this month. I have put in a ticket to have CBIIT restore the access I used to have to the databases from my workstation so that I can execute long-running jobs directly from my laptop without having them killed. We'll have to write the global change scripts very carefully, so that if the VPN connection dies in the middle of a live job we don't leave corrupted data behind, and we can detect which documents have already been processed and skip them if we have to resume a job in such a situation. 🙁

Comment entered 2023-01-13 09:40:20 by Osei-Poku, William (NIH/NCI) [C]

Sure. I wanted to mention that it would be good to run the global in the following groups:

  1. Language

  2. Then by Audience

  3. Then by Summary Type (not required - you can skip this one if it will get complicated)

Comment entered 2023-01-17 15:09:06 by Kline, Bob (NIH/NCI) [C]

OK. Is there any reason you couldn't have told me this before I wrote the script? 😛

Comment entered 2023-01-18 10:21:46 by Osei-Poku, William (NIH/NCI) [C]

You were too fast for me 🙂. I mentioned it in the CDR meeting when we discussed the global.

Comment entered 2023-01-18 11:03:42 by Kline, Bob (NIH/NCI) [C]
Comment entered 2023-02-02 11:15:19 by Osei-Poku, William (NIH/NCI) [C]

Please run the global in live mode on DEV. Thanks!

Comment entered 2023-02-02 16:14:27 by Kline, Bob (NIH/NCI) [C]

Done.

Comment entered 2023-02-09 09:45:37 by Osei-Poku, William (NIH/NCI) [C]

We have verified the changes on DEV. However, we are concerned that every summary touched by the global will have a second Alt Title element (with attribute value of CTHP) even if the title is not used on any card on Cancer.gov, and even if currently, there is only one Alt Title element. The preference would have been to only include the CTHP title when it is used on a card on Cancer.gov so that the data in the CDR will match what is on Cancer.gov.  When I first raised this issue there was a suggestion to modify the filters to be able to use the Browser title for the CTHP cards if we don't have a CTHP Alt Title in a document. Having the data match what is on Cancer.gov would help inform users when making changes to the Alt Titles. Other than that, we are ready for a test run on QA.

Comment entered 2023-02-21 12:29:15 by Kline, Bob (NIH/NCI) [C]

As I got deeper into the work needed to make the Drupal-side modifications, it became clear that it's not sufficient to simply copy the short title to the browser title. We will need to come up with browser titles which satisfy the requirement that the browser titles be unique, and which are short enough that they will satisfy the 100-character limit imposed on that field by Drupal. Otherwise, we will run into the same issue which caused the web site to switch to using the node title for the browser title instead of the value in the browser title field. Tagging for awareness.

One approach which might go some way toward achieving uniqueness without exceeding the length limit would be to append "(Patient)" (or the Spanish equivalent) instead of " (PDQ®)–Patient Version" to the patient summary titles and not append anything to the HP summary titles. This won't be enough for all of the titles, but it will handle most of them, and we can come up with manually created titles for the handful which would otherwise still be too long. Other suggestions?

Comment entered 2023-02-23 13:13:17 by Kline, Bob (NIH/NCI) [C]

and I have attached a spreadsheet illustrating an approach to achieving the goal of ensuring uniqueness across the browser titles used for the cancer information summaries sent to Drupal. As noted in earlier comments, we have two constraints in tension with each other:

  1. the titles need to be unique

  2. the titles cannot exceed the length of 100 characters

Uniqueness can be achieved by taking the existing values formerly stored in the "Short" alternate title (still stored there in production) and appending qualifiers to at least some of those values to distinguish them from others without the appended qualifier. So, for example, if we have two English summaries, each with the same "short" title, one for patients, and the other for health professionals, we could add " (patient)" to the patient summary and the titles for the two summaries would be distinct from each other.

The more information we include in the appended qualifiers, the more titles we will push over the length limit, requiring us to manually construct a shorter version of the browser title. The approach represented by the spreadsheet appends " (patient)" or " (paciente)" to the patient summary titles, with the result that none of the resulting titles exceed the length limit.

This is just an example of how we can get to the goal, and I am not advocating that we necessarily use this exact approach. My intention is to get the discussion started for how to produce the titles we want. I realize that this particular approach does much less appending to the titles than is done for the main node titles for the summaries. Many of those titles, however, are too long to be stored in the browser title field. I am hoping that can help guide us with information on how the different choices we can make for achieving unique browser titles affect issues which are important (SEO, usability, accessibility, etc.).

Note that while the resulting titles are all short enough, some of the titles are still not unique. We will need to figure out why there are duplicates for the same title, and how we can write custom validation rules which ensure we do not publish duplicate titles, but do not get in the way of normal work to maintain the summaries.

The rows with MISSING or FAILURE in the title can be ignored for this purpose. Those are the ones which don't have any "Short" alternate title at all (or, in the case of the title with "FAILURE," no XML at all).

All of the summaries in the spreadsheet have an "Active" status in the production CDR.

Comment entered 2023-02-24 07:35:38 by Kline, Bob (NIH/NCI) [C]

Following up on our discussion in yesterday's weekly status meeting: we decided that we will distinguish patient summaries from health professional summaries by appending " (PDQ®)" to the HP titles as they are copied from the short title to the browser title during the global change for this ticket. As promised, I have analyzed the duplicate titles which appear on the spreadsheet I posted yesterday to determine the reason(s) for those duplicates. Here's what I found.

  • in some cases, one of the summaries is marked as a partner merge set

  • in some cases, one of the summaries is marked as a future replacement for the other

  • in some cases, both summaries are marked as only usable as modules

  • in some cases (for example, Cancell/Cantrol/Protocel) both languages have the same title

  • in a couple of cases an English summary has a Spanish title (CDR810760, CDR811723)

  • in one pair, one of the summaries (CDR763238) has "*TEMP*" in the main summary title

For most of these cases, at most only one of the summaries in a pair will actually be published with the browser title in question. In the handful of cases where the English and Spanish summaries have the same language independent title (Cancell/Cantrol/Protocel, PC-SPES, Angiosarcoma, 714-X), we can add " (español)" to the browser title for the Spanish summary.

The conditions which can explain pairs of duplicate titles make it impossible for the current validation subsystem to detect which duplicate titles should be treated as validation errors and which are benign. To achieve this capability we would need to create an elaborate extension to that subsystem, introducing significant additional complexity as well as a possibly noticeable hit in performance at document save time. So my next question for you, , is whether the uniqueness of the browser titles is a desirable condition as opposed to an inflexible block to publication. If the former, what I would propose is a nightly report which is sent to a distribution list showing pairs of summaries which have been published with the same browser title, similar to the nightly job which reports duplicate glossary term names repeatedly until the duplicates are resolved.

Comment entered 2023-02-24 21:03:23 by Burack, Lindsay (NIH/NCI) [C]

– I understand the duplicate title scenario we discussed yesterday, where both English and Spanish have the same title, and that is only a very small handful of summaries. I'm not a CDR product owner, but scenarios such as the last two bullets (an English summary has a Spanish title; one of the summaries has TEMP in the main summary title) seem more like errors that should be deleted than issues we need to worry about. Summaries that are marked as a future replacement should have distinct titles until they are published and the old version is deleted. This should all be verified with a CDR product owner or power user though, especially as I'm not familiar with scenarios/usage for the first and third bullets.

If my gut is correct though, we're only left to deal with the few summaries with duplicate titles in both English and Spanish.

  1. Easiest resolution is to add "(español)" as you've suggested, .

  2. From an SEO standpoint, we want browser titles to be as descriptive and unique as possible. They should give users and search engines a clear idea of what the page content will be about. While these particular topics may not have high search volumes, we can always go the route of more descriptive browser titles. Here are some options/examples:

    • 714-X and Cancer Care

    • 714-X and Cancer Treatment

    • 714-X Alternative Cancer Treatment 

    • 714-X Cancer Treatment Review

    • 714-X - Complementary and Alternative Medicine 

    • 714-X Lacks Study Support for Cancer Treatment

Lastly, I would not advocate for any elaborate build as you've described. I believe we only have a few duplicate titles to resolve and after that, this is something content authors should be aware of and resolve before implementation of content into the CDR. 

  - would you be able to (mostly) reuse duplicate glossary term name report for the duplicate browser report? Or would a duplicate browser report require additional effort? Once the summaries are in Drupal, we can easily request a report to identify any duplicate titles. This wouldn't be automated, but it's a simple request. Again, I believe all content creators/authors should be aware of the duplicate title issue and identify and resolve such issues before the content is entered in the CDR. Because of that, I don't think extra development effort is necessary. But CDR owners/users should have the final say over me.

Comment entered 2023-02-27 16:18:21 by Osei-Poku, William (NIH/NCI) [C]

Thanks Bob! Please see my comments about the data issues below:

  • in some cases, one of the summaries is marked as a partner merge set  - I assume this is OK since the partner summary won't be published to Cancer.gov.

  • in some cases, one of the summaries is marked as a future replacement for the other - This is OK as the issue will be resolved when the replacement is completed. 

  • in some cases, both summaries are marked as only usable as modules - This should be OK as long as they are marked as Module Only. However, we will review to them. 

  • in some cases (for example, Cancell/Cantrol/Protocel) both languages have the same title

  • in a couple of cases an English summary has a Spanish title (CDR810760, CDR811723) - One of these is a duplicate and other was still in the process and not published yet. It will be corrected before publishing.

  • in one pair, one of the summaries (CDR763238) has "*TEMP*" in the main summary title - This is OK. They will be resolved eventually like the replacement one above. 

Comment entered 2023-02-28 14:16:25 by Kline, Bob (NIH/NCI) [C]

I have transformed the summary documents again on CDR DEV, using the approach we settled on for making the browser titles unique, and I set up an ODE and populated it with these summaries. Please review the documents in the CDR and on the ODE. In order to do test runs on QA we would need to install the software and schema changes on that tier. Would that disrupt any testing of changes for other tickets? ?

Comment entered 2023-02-28 14:25:25 by Englisch, Volker (NIH/NCI) [C]

Not if you stop finding bugs in the filter code. 🙂

I have filter changes on QA for the Special Considerations that   is looking at but that can easily be restored if needed.

Comment entered 2023-02-28 14:46:34 by Kline, Bob (NIH/NCI) [C]

I have a browser-title branch in the cdr-server repository (for the summary schema changes), in the cdr-lib repository (for the cdrpub.py changes), and in the cdr-tools repository (to update the tool which creates sample YAML content for Drupal). Are there any filter changes for this ticket which need to get checked into this branch in the cdr-server repository? Or any other changes you've made for this ticket?

Comment entered 2023-02-28 15:28:22 by Englisch, Volker (NIH/NCI) [C]

I didn't make any changes for this ticket and if you didn't have any filter changes there shouldn't be any overlap.

Comment entered 2023-03-01 09:35:23 by Kline, Bob (NIH/NCI) [C]

I have installed the schema changes on CDR QA and I'm running the revised global change job in test mode on that tier. As I'm monitoring the job I can see that there are some "Short" AltTitles which contain markup, including some with {}Insertion{}/{}Deletion{} markup going back years. Here's one of the more recent examples (reformatted for readability):

  <AltTitle xmlns:cdr="cips.nci.nih.gov/cdr" TitleType="Short">
    <GeneName>
      <Comment user="isaacsjs" audience="Internal" date="2022-05-02">
        This module is not linked to Gen of Skin anymore - JI
      </Comment>
      PTEN
    </GeneName>
    hamartoma tumor syndromes (including Cowden syndrome)-Intro
  </AltTitle>

It would prohibitively expensive (and somewhat risky) to try and come up with logic which could reliably do the right thing with every possible combination of inline markup, so we have two feasible approaches that I can think of.

  1. Eliminate the markup in advance of running the global change.

  2. Have the global change skip documents with this problem, and fix them by hand.

Just as a reminder, in case this might affect your decision about which option to use, what we publish to the web site has all of that markup stripped out. I considered a third option, which was to have the global change software strip the markup, but when I realized that some of the markup was revision markup, it became clear that we would end up with garbled (or even empty) text content for the titles. I suppose we could adopt this approach of stripping the markup after having resolved the revision markup, but that would assume we know which revision markup should be applied and which backed out (presumably the old revision markup is still there—for example in CDR62936—because a decision hasn't yet been made which way to go).

Comment entered 2023-03-01 09:40:39 by Kline, Bob (NIH/NCI) [C]

Looks like there are 13 such documents on QA. The job just finished.

https://cdr-qa.cancer.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2023-03-01_08-01-11

Comment entered 2023-03-01 09:44:06 by Kline, Bob (NIH/NCI) [C]

I have attached a text file showing that markup.

Comment entered 2023-03-08 08:48:21 by Kline, Bob (NIH/NCI) [C]

Capturing the decision made in Thursday's status meeting: the next step is that the inline markup will be removed from the Short titles on QA, right?

Comment entered 2023-03-09 17:33:01 by Osei-Poku, William (NIH/NCI) [C]

It looks like both the browser title and CTHP title are required in the schema. I thought only the browser title was supposed to be required.

Comment entered 2023-03-10 07:04:09 by Kline, Bob (NIH/NCI) [C]

Drupal requires both values.

Comment entered 2023-03-10 09:37:44 by Osei-Poku, William (NIH/NCI) [C]

Markup removed from the affected 13 summaries on QA. Is the schema change on STAGE ?

Comment entered 2023-03-10 10:49:48 by Kline, Bob (NIH/NCI) [C]

Not yet. I think it will be easier to edit the Short titles before we make that change, right?

Comment entered 2023-03-13 20:35:18 by Osei-Poku, William (NIH/NCI) [C]

Some of the diffs appear not to include the text within the AltTitle tags for the CancerTypeHomePage, although the New XML appears to be fine and include the data. 

Example: CDR0000790949

  • - <AltTitle TitleType="Short">

  • + <AltTitle TitleType="Browser">Childhood Breast TumorsCancer Treatment</AltTitle>

  • + <AltTitle TitleType="CancerTypeHomePage">

Comment entered 2023-03-14 07:52:50 by Kline, Bob (NIH/NCI) [C]

That's because the diff software tries to normalize the serialized XML by putting each element on a separate line, as part of the attempt to provide the shorter lines you have requested. In this case, the "before" version for the "Short" title had the same inline markup which the "after" version had in the "CancerTypeHomePage" title, so only the lines with the opening tag changed. The content and closing tags (moved to separate lines) were the same, so they don't show up on the "diff" page. This won't happen for the diff reports when the unwanted inline mark has been removed from the documents being transformed.

Or to put it more succinctly, reduced context is a price paid for shorter diff lines.

Comment entered 2023-03-14 09:32:51 by Osei-Poku, William (NIH/NCI) [C]

Please run in live mode on QA. Thanks!

Comment entered 2023-03-14 09:46:09 by Kline, Bob (NIH/NCI) [C]

It's actually running in test mode on QA right now, since we haven't done that since the titles got fixed.

Comment entered 2023-03-14 14:22:11 by Kline, Bob (NIH/NCI) [C]
Comment entered 2023-03-15 10:00:49 by Kline, Bob (NIH/NCI) [C]

As soon as you have reviewed the test results I'll run the global change in live mode on QA and we can move on to the next steps of pushing everything to STAGE.

Comment entered 2023-03-15 20:21:38 by Osei-Poku, William (NIH/NCI) [C]

Test results looks good. Please run in live mode on QA. Thanks!

Comment entered 2023-03-17 10:12:47 by Kline, Bob (NIH/NCI) [C]

Global change has run in live mode on QA. Please review the results.

Comment entered 2023-03-17 10:24:13 by Kline, Bob (NIH/NCI) [C]

By the way—when you edit the existing Short titles to eliminate inline markup, don't add the new Browser or CancerTypeHomePage titles. Otherwise you'll end up with extra title elements after the global change has run.

Comment entered 2023-03-17 10:57:57 by Englisch, Volker (NIH/NCI) [C]

, if you are referring to a single document then that was likely me who did that.  I thought the documents on QA had already been converted when I created a test summary to work on the Bold/Underline copy/paste issue.

Comment entered 2023-03-17 13:15:49 by Osei-Poku, William (NIH/NCI) [C]

I believe some of them may have been added when we were getting the Special Consideration documents ready for publishing. We wouldn't have been able to create publishable versions without adding the new alt titles. We will have to go back and review those summaries now that the global has completed.

Comment entered 2023-03-20 08:07:59 by Kline, Bob (NIH/NCI) [C]

Just a reminder that I'm holding off on the preparations of QA for user acceptance testing of Pauling (which is scheduled to start tomorrow) until you've finished with your review of the live global change on that server.

Comment entered 2023-03-20 10:02:42 by Osei-Poku, William (NIH/NCI) [C]

Yes, Understood! However, we are starting to test on DEV while we complete the browser title testing ASAP. We will also want a refresh of QA so I will create a ticket for that.

Comment entered 2023-03-20 10:40:36 by Osei-Poku, William (NIH/NCI) [C]

I assume you will need to publish the summaries to the ODE before we can review them on there as pub preview appears to show the old display. 

Comment entered 2023-03-20 11:05:52 by Kline, Bob (NIH/NCI) [C]

I assumed you wouldn't want me to do that until you have reviewed the documents in the CDR.

Comment entered 2023-03-21 09:48:15 by Osei-Poku, William (NIH/NCI) [C]

They look good on QA. Please proceed to publish them to the ODE.

Comment entered 2023-03-21 10:40:02 by Kline, Bob (NIH/NCI) [C]

The DTD changes for this ticket don't seem to have made it into GitHub. Please create a browser-title branch in the cdr-publishing repository and check in those changes. I have temporarily made the necessary edits on QA in the file system so that I can publish from there.

Comment entered 2023-03-21 10:46:52 by Kline, Bob (NIH/NCI) [C]

I assume you're going to need to modify both DTDs, since the data partners will be getting the new attribute values, too, right?

Comment entered 2023-03-21 11:33:48 by Kline, Bob (NIH/NCI) [C]

The push job failed when it tried to push 62975. That's because the document had no browser title. Looking back at the global change job's logs, I see it couldn't save a new published version because

Failed link target rule: /GlossaryTermName/TermNameStatus != "Rejected" (2 times)

Looks like that error was also logged during the test run. (Always a good idea to investigate logged errors in the test results.)


So we have two paths we can take:

  1. fix the broken documents

  2. rewrite the publishing software to work around the broken documents

Comment entered 2023-03-21 12:47:40 by Osei-Poku, William (NIH/NCI) [C]

I have fixed the problem and created a publishable version of 62975. I had to do the same for CDR0000410719 which also contained a rejected term. So, it is likely that there are more of these.  I investigated and fixed several of these before the live run.

Generally, we don't fix problems like these on QA unless they will block live run of the global change, because that will be double work. We rather fix them on PROD if they exist on PROD. In this case, we are also running a publishing job, so I understand why this is an issue on QA now.

Is there a reason why the error is displayed only in the LASTP row? It does not appear to be an issue in the CWD and LASTV which we look at more carefully. Most of the errors are displayed in all the different versions/rows.

Comment entered 2023-03-21 12:49:49 by Kline, Bob (NIH/NCI) [C]

Just to be clear: for the purpose of this exercise, it's only the documents which have errors preventing the creation of a new publishing version which would need to be fixed.

Comment entered 2023-03-21 13:11:12 by Osei-Poku, William (NIH/NCI) [C]

Sure. If there are more, let me know and I will fix them.

Comment entered 2023-03-21 13:32:36 by Kline, Bob (NIH/NCI) [C]

Did you look at the version history report? When Chanita fixed the glossary term links she didn't create a publishable version for some reason.

Comment entered 2023-03-21 13:34:36 by Kline, Bob (NIH/NCI) [C]

Did you fix all of the documents which had errors in the LASTP row of the test results for the global change?

Comment entered 2023-03-21 13:52:49 by Englisch, Volker (NIH/NCI) [C]

I don't see a ticket for DTD changes, that's why "those changes" don't exist yet.

As for the partner documents, I'm pretty sure we don't want to send the partners the "CancerTypeHomePage" AltTitle and we may want to continue giving them a "Short" title.  I'll have to look first what we're currently sending.  It wasn't on my radar but I hear it beeping now! 🙂

Comment entered 2023-03-21 18:23:54 by Osei-Poku, William (NIH/NCI) [C]

I suppose that explains why only the LASTP will show the error. I assume it is OK now to publish to the ODE.

Comment entered 2023-03-22 07:31:31 by Kline, Bob (NIH/NCI) [C]
$ ls CDR*.pub*ERROR*
CDR0000062808.pub.NEW_ERRORS.txt  CDR0000787346.pub.NEW_ERRORS.txt
CDR0000062841.pub.NEW_ERRORS.txt  CDR0000799416.pub.NEW_ERRORS.txt
CDR0000062872.pub.NEW_ERRORS.txt  CDR0000799716.pub.NEW_ERRORS.txt
CDR0000062975.pub.NEW_ERRORS.txt  CDR0000799767.pub.NEW_ERRORS.txt
CDR0000062978.pub.NEW_ERRORS.txt  CDR0000805475.pub.NEW_ERRORS.txt
CDR0000410719.pub.NEW_ERRORS.txt  CDR0000805686.pub.NEW_ERRORS.txt
CDR0000446177.pub.NEW_ERRORS.txt  CDR0000809230.pub.NEW_ERRORS.txt
CDR0000517309.pub.NEW_ERRORS.txt  CDR0000809329.pub.NEW_ERRORS.txt
CDR0000700000.pub.NEW_ERRORS.txt  CDR0000810015.pub.NEW_ERRORS.txt
CDR0000752413.pub.NEW_ERRORS.txt  CDR0000810042.pub.NEW_ERRORS.txt
CDR0000774255.pub.NEW_ERRORS.txt
Comment entered 2023-03-22 08:00:11 by Kline, Bob (NIH/NCI) [C]

I assume it is OK now to publish to the ODE.

I tried (again). It failed (again).

It looks like JIRA is eating more comments (I posted a reply last night but it's gone). I will assume that this is why you didn't see my earlier question:

Did you fix all of the documents which had errors in the LASTP row of the test results for the global change?

To save you from having to scroll through the test results report I have listed all of those documents in my previous comment. I do not want to have you fix one document at a time, ask me to try publishing again, the publishing job fails again, then you fix one more document and ask me to try again, on and on.

Once you have fixed ALL the documents I will try publishing to the ODE again.

As for JIRA, my working theory is that trying to keep up with threading of replies may be causing (or at least contributing to) its failures to display all of the comments, so I'm going to avoid using the Reply feature (at least for a while) to see if that improves JIRA's behavior. I will instead create standalone comments, using quoting of relevant passages from earlier comments to provide any necessary context.

Comment entered 2023-03-22 10:33:58 by Osei-Poku, William (NIH/NCI) [C]

Please try again. I believe I fixed all the errors and created new publishable versions and double checked by running pub preview for each of them. Some of the errors did not show up during validation checks and even created publishable versions but Pub preview failed for those documents until the errors were fixed.

Comment entered 2023-03-22 12:12:18 by Kline, Bob (NIH/NCI) [C]

The summaries have been pushed to the ODE.

Comment entered 2023-03-22 14:13:59 by Osei-Poku, William (NIH/NCI) [C]

What is a good way to get to the list of summaries on the ODE? Searches provide results that point to cancer.gov and. It looks like you need to know the URL of the summary and replace the cancer.gov name with the ODE URL before you get to a summary.

Comment entered 2023-03-22 14:46:32 by Kline, Bob (NIH/NCI) [C]

What is a good way to get to the list of summaries on the ODE?

I have created the query Summaries on Drupal at https://cdr-qa.cancer.gov/cgi-bin/cdr/CdrQueries.py.

Comment entered 2023-03-22 15:09:51 by Kline, Bob (NIH/NCI) [C]

I modified the query to add a URL column.

Comment entered 2023-03-22 20:53:34 by Englisch, Volker (NIH/NCI) [C]

, I modified the DTD for Cancer.gov (pdqCG.dtd) and pushed it in the branch 'browser-title'. I also modified the filter CDR0000609947 to recreate the Short title for the PDQ partners.  I also created a branch "browser-title" for this change but later learned there's already a branch with that name in the repository and now GH won't let me push the change.  I will figure out how to get GH to cooperate tomorrow, probably by removing my branch and pulling yours.

The updated filter is currently on DEV.

Comment entered 2023-03-23 06:39:19 by Kline, Bob (NIH/NCI) [C]

there's already a branch with that name in the repository

This is truly odd. I have no trace of such a branch in any of my clones of the cdr-publishing repository, nor did I see one when I looked on GitHub the other day. I see it on GitHub now, however, with your commit from last night. Furthermore, when I go to the https://github.com/NCIOCPL/cdr-publishing/branches/yours I only see Pauling, so it would appear that GitHub is under the impression that you (or at least someone other than {}bkline{}) created that branch. Do you have more than one clone of this repository? Is it possible you created the branch in one and it confused the other? I have three clones: one on a network share which I can use from any machine on the NCI network, one on the government's laptop, and one on my own MacBook. None of them have that branch. I will create a fourth (temporary) clone so I can see if it gives my any clues about the history of the repositories.

Comment entered 2023-03-23 06:53:31 by Kline, Bob (NIH/NCI) [C]

Ah, never mind. You're not talking about the cdr-publishing repository, you're talking about the cdr-server repository, where I had indeed created the browser-title branch to store the schema changes.

I recommend using

git diff HEAD~ -p > /somewhere/browser-title.patch

Then:

  • remove the local branch

  • pull down the branch from GitHub

  • apply the patch

  • commit

  • push

"Apply the patch" would be

patch -1 < /somewhere/browser-title.patch
Comment entered 2023-03-27 18:24:59 by Osei-Poku, William (NIH/NCI) [C]

We've reviewed a good sample of the pages on the ODE and did not find anything odd. They all looked good. So, I think we can proceed to move this to STAGE. Thanks!

Comment entered 2023-04-03 09:56:11 by Kline, Bob (NIH/NCI) [C]

Here's the report for the test run of the global change job on STAGE. Better than on the lower tiers, but still a couple of summaries whose latest publishing version failed to get updated.

https://cdr-dev.cancer.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2023-03-31_18-22-16

Comment entered 2023-04-03 11:16:49 by Kline, Bob (NIH/NCI) [C]

Ah, I just noticed that the job failed when the database went away partway through. So I'm running it again. That might account for the fact that there were fewer last published versions with problems.

Comment entered 2023-04-03 13:05:31 by Kline, Bob (NIH/NCI) [C]

The job ran for about an hour, then failed again. Trying again for a third time.

Comment entered 2023-04-04 08:11:17 by Kline, Bob (NIH/NCI) [C]

Third time worked. https://cdr-dev.cancer.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2023-04-03_13-03-13. As you can see, there are more LASTP errors than the two I saw for the first (failed) run.

Comment entered 2023-04-06 21:13:49 by Osei-Poku, William (NIH/NCI) [C]

I think I fixed all the LASTP errors on STAGE. Would you want to run in test mode again just to be sure?

Comment entered 2023-04-07 11:47:23 by Kline, Bob (NIH/NCI) [C]

https://cdr-qa.cancer.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2023-04-07_07-44-22

I ran it from QA this time to see if that would avoid the failures I got running from DEV, and it succeeded the first time. Just under 3 hours (I misspoke a day or two ago when I said it took 2 hours on DEV: it took about the same amount of time on DEV as it took today on QA). I'll run from the QA server when we get to the production rollout.

Please check these results.

Comment entered 2023-04-07 11:48:57 by Kline, Bob (NIH/NCI) [C]

Again, the data involved is on STAGE, even though the job was launched on QA.

Comment entered 2023-04-07 12:43:03 by Osei-Poku, William (NIH/NCI) [C]

I have fixed the two LASTP errors in this latest run on STAGE. Please run in test mode again.

Comment entered 2023-04-07 15:46:17 by Kline, Bob (NIH/NCI) [C]
Comment entered 2023-04-07 16:12:52 by Osei-Poku, William (NIH/NCI) [C]

It looks there are no more LASTP errors so you may proceed with the live run on STAGE.

Comment entered 2023-04-08 01:44:33 by Kline, Bob (NIH/NCI) [C]

Live run complete on STAGE. Logs attached. Let me know when you've checked the results and are ready for me to publish to the ODE.

Comment entered 2023-04-10 11:58:50 by Osei-Poku, William (NIH/NCI) [C]

I reviewed some of the documents on STAGE. They looked good. Thanks!

Comment entered 2023-04-10 14:26:15 by Kline, Bob (NIH/NCI) [C]

I have loaded the summaries on STAGE to the ODE.

https://ncigovcdode539.prod.acquia-sites.com/

Giving you a change to check these over, , before I turn it over to & co.

Comment entered 2023-04-10 17:27:06 by Osei-Poku, William (NIH/NCI) [C]

Will the query "Summaries on Drupal" retrieve the right information from the ODE for STAGE? I ran the one on QA (as a test) but it is no longer giving me the ODE URLs. Rather I am getting URLs pointing to the live site. 

Comment entered 2023-04-10 17:42:11 by Kline, Bob (NIH/NCI) [C]

Sure. Just copy of the query to STAGE and uncomment (remove the "–" delimiters) the three lines for the REPLACE calls (you can comment out the following line which gives you the original URL).

Comment entered 2023-04-10 19:42:49 by Osei-Poku, William (NIH/NCI) [C]

Worked. Thanks!

Comment entered 2023-04-10 19:43:23 by Osei-Poku, William (NIH/NCI) [C]

I have reviewed several summaries on the ODE, and they all looked good. Thanks!

Comment entered 2023-04-12 15:54:01 by Englisch, Volker (NIH/NCI) [C]

I ran a weekly publishing job on STAGE and inspected the output to confirm that the new TitleType (Browser and CTHP) for the AltTitle element have been replaced for the partner output and are displayed as a single AltTitle with "TitleType=Short".

Comment entered 2023-06-07 06:53:13 by Kline, Bob (NIH/NCI) [C]

Deployed to production.

Comment entered 2023-06-15 11:47:58 by Osei-Poku, William (NIH/NCI) [C]

We published new summaries this morning and I was able to confirm that the right Browser Titles display on Cancer.gov. Thanks!

Attachments
File Name Posted User
alt-titles-with-markup.log 2023-03-01 09:43:26 Kline, Bob (NIH/NCI) [C]
browser-title-errors.png 2023-03-21 11:34:36 Kline, Bob (NIH/NCI) [C]
BrowserTitlePP_QA.PNG 2023-03-20 10:39:58 Osei-Poku, William (NIH/NCI) [C]
browser-titles.xlsx 2023-02-23 12:51:21 Kline, Bob (NIH/NCI) [C]
cthp error.PNG 2023-03-09 17:31:08 Osei-Poku, William (NIH/NCI) [C]
fixed-without-creating-publishable-version.png 2023-03-21 13:29:02 Kline, Bob (NIH/NCI) [C]
image-2023-04-10-17-26-13-744.png 2023-04-10 17:26:14 Osei-Poku, William (NIH/NCI) [C]
ocecdr-5104-stage-live.log 2023-04-08 01:45:24 Kline, Bob (NIH/NCI) [C]
summary-alt-titles.xlsx 2022-11-11 17:56:24 Kline, Bob (NIH/NCI) [C]

Elapsed: 0:00:00.001997