Issue Number | 3804 |
---|---|
Summary | [NVCG] Create a "Clean-up Report" |
Created | 2014-09-10 10:22:03 |
Issue Type | Task |
Submitted By | henryec |
Assigned To | Englisch, Volker (NIH/NCI) [C] |
Status | Closed |
Resolved | 2014-09-11 12:24:51 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/ocecdr/issue.137702 |
Create a Clean-up report (this is something that needs to be run on an on-going basis for content owners – at least until the clean-up is complete). This report will help Margaret to identify para level elements not in section tags for content cleaup/re-structuring in preparation for NVCG
A basic report has been created on DEV (SummarySectionCleanup.py) but it's not fully functional yet.
I finished the report SummarySectionCleanup.py on DEV. By
default it lists every summary along with the CDR-ID, Summary Title and
SummarySection information.
The SummarySection information displays for every section that we find
(in document order)
The SummarySection title (if it exists)
An indicator if the section is completely empty and
All children of the SummarySection (unless the section contains at least one Para element
I did not want to exclude too much from this output so that the users
will be able to identify "odd" paragraphs (i.e. a summary section
containing nothing but a title) rather than just looking for empty
sections.
The report can be found on DEV at
https://cdr.dev.cancer.gov/cgi-bin/cdr/SummarySectionCleanup.py
Adding Robin and William as watchers. I'm not sure who would be QC'ing this report.
I modified the report slightly to suppress the display of any section title and it's SummarySection children for which the content/child is at least one of Para, SummarySection, ItemizedList, or OrderedList elements.
I like the modification you made. The report no longer looks too big and daunting. You can quickly go to the reported sections with problems in the CDR right after looking at the report. However it does have some limitations. When empty sections are reported, you basically have to start from the beginning of the summary and look at each SummarySection element to see if it is empty or not. It is not impossible but it takes a little while to find the empty elements (examples 62967, 62968). Would it be possible to modify the report to display the Section title above it in cases where there are empty elements? Another limitation is if there are multiple sections/subsections with the same title, it may be difficult to identify the problem section but this will not be a major problem because I don’t think there are a lot of summaries like that.
Sometimes, it takes a long time to load the page on DEV. It may be a problem with DEV, it has been very slow when bringing up pub preview and running other reports. But with this particular report, at a minimum it takes one minute to load the interface.
I, too, have noticed that DEV has been a bit slower than usual over the past day or two.
In general, this report will take a long time as I need to load every selected document in order to retrieve the information. It is not enough to run a SQL query on individual elements, which would be much faster.
As for your previous comment: One option could be to add a check-box
allowing you to also print all of the headings again (as in the original
version). This would give you a general idea of where to search for the
empty sections.
I'm suggesting this option because it's faster to implement. If you'd
prefer your suggested option that's fine, too.
I tried to find your two sample documents on the report but couldn't. Did you change the documents since your comment?
Would it be possible to modify the report to display the Section title above it in cases where there are empty elements?
I have modified the report to display the section title
below. This makes finding it faster because it will get
you closer to the empty section when you search for it in XMetaL.
If the empty section is the last section of the document I display the
text *** Last Section.
Based on a discussion with Robin I am now also checking if a section appears within Insertion tags. If it does I do not display this section. This change reduces the report to 32 English and 33 Spanish summaries to be cleaned-up.
I've added a menu item for this report to the Summaries and Miscellaneous Documents menu.
R12918: SummaryAndMiscReports.py
R12918: SummarySectionCleanup.py
Verified on QA.
Verified on Stage.
Verified on PROD. We have started cleaning up the summaries.
Elapsed: 0:00:00.001748