Issue Number | 5343 |
---|---|
Summary | [Storefront] Create report to identify which Drupal content needs to be updated in Storefront |
Created | 2024-11-07 14:31:19 |
Issue Type | Improvement |
Submitted By | Juthe, Robin (NIH/NCI) [E] |
Assigned To | Kline, Bob (NIH/NCI) [C] |
Status | Open |
Resolved | |
Resolution | |
Path | /home/bkline/backups/jira/ocecdr/issue.475204 |
We would like to catch up our content in Storefront to remove pages that we have since taken down from Cancer.gov and add pages we have added to Cancer.gov (such as the modernized patient-focused content).
There are 741 nodes in c.gov Drupal with the syndication flag turned on. My understanding is that even though that field is only stored with the English version of the node, it applies to all languages of each node, which total 1,348 pages marked for syndication. The HHS Storefront has 1,204 items with NCI as the source. I have attached an Excel workbook with three tables.
Missing From HHS Storefront (Drupal has it, HHS doesn't have it)—305 URLs
Not On Drupal (HHS has it, Drupal doesn't—or it is no longer flagged for syndication)—161 URLs
Dead URLs On Storefront (HHS has a link we gave them which no longer retrieves our content)—154 rows
In one case for the dead URLs table, the response came back with a 200 (OK) HTTP code, but the payload was HTML saying the content wasn't there any more (in this case, a YouTube video).
This is a starting point. I'm still digging into how to generate the report in a way that doesn't involve me logging into the production Drupal server. Also, you may very likely want to narrow the scope of the report (though I assume it would be useful for somebody to know that a URL we gave HHS is now dead, even if that URL is for something other than a page on cancer.gov—such as a YouTube video).
Let me know what changes, if any, would make this report more useful. There's a lot of duplication between the last two tables, but the second table will have URLs we don't have in cancer.gov Drupal because they link somewhere else (like smokefree.gov or YouTube). You'd need to go to the third table to find out which of those are no longer working links.
Thanks, ~bkline ! As discussed in the status meeting, please add a column on the far right to include the Description for everything on the Missing from HHS Storefront tab.
After looking at a few samples, it appears that Bob's list of "Dead URL's on Storefront" is correct.
I can confirm that all of the documents listed on the "Health Report" - which shows the unhealthy documents - of the storefront are included on Bob's report. I followed several of the documents (Fatigue, Nutrition in Cancer Care, Childhood Brain Stem Glioma Treatment, for instance) not on the "Health Report" and confirmed that they result in the Cancer.gov "Not Found" page.
It would be interesting to know why the Health Report isn't giving us a full picture but since nobody would fix the report anyway we should just go with what we have (and what we can confirm to be correct). It is also surprising to me that these missing documents are still served up by the Storefront. The Storefront provides the documents from its own data storage. It is possible that the "Health Report" and/or the process to update documents hasn't run in a very long time to preserve the status quo.
What this exercise is showing us is that the problem of missing documents in the Storefront is much worse that what I expected.
So we won't be keeping the descriptions in sync for the ones they already have?
File Name | Posted | User |
---|---|---|
hhs-storefront-report-20241125.xlsx | 2024-11-25 10:17:31 | Kline, Bob (NIH/NCI) [C] |
Elapsed: 0:00:00.001419