Issue Number | 5110 |
---|---|
Summary | [General] URL List Report for GTCs |
Created | 2022-05-20 17:35:54 |
Issue Type | Bug |
Submitted By | Juthe, Robin (NIH/NCI) [E] |
Assigned To | Englisch, Volker (NIH/NCI) [C] |
Status | Closed |
Resolved | 2023-01-10 18:26:45 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/ocecdr/issue.318731 |
As I noted in OCECDR-5109, I ran the URL List Report for GTC documents and the list was quite short and incomplete. I have not used this report before, but I think it would be worth taking a closer look to make sure it's working as it should be.
~oseipokuw , could you please take a look at this report to see if the results seem funny to you (particularly for GTCs, although I didn't try every doc type)? Thank you!
Hi ~juther It looks like the report is working as expected even for the GTCs. The program appears to be looking for only external refs for GTC's which will be consistent with what the original ad-hoc query used to do. I also reviewed the original ticket for this report - OCECDR-4524 and the only modification we made to the original query is, we added -DrugReferenceLink to the list of link types that should be included specifically for the drug information summaries. In case you're wondering, this report will generally not include other link types like the related external refs links or related summary ref links, for example. The original use case for this report was to be able to see a list of all external links within a summary and be able to search the list for fact sheets that are being removed, for example, instead for going into each of the summary documents in order to search to see if the fact sheet was included as an external ref. Eventually we expanded it to include external links in other doc types as well.
I do think however, that it will be helpful for ~volker to confirm exactly what the report is doing. I will also take a look at the report with Amy K tomorrow just to make sure the number of GTCs the report is picking up is accurate.
I do think however, that it will be helpful for ~volker to confirm exactly what the report is doing. I will also take a look at the report with Amy K tomorrow just to make sure the number of GTCs the report is picking up is accurate.
The query picks up all of the ExternalRef elements it can find within the document's XML. For the GTC documents that means every document is listed with an ExternalRef element contained in the DefinitionText or the translated DefinitionText.
Please note that the RelatedExternalRef elements are not included in this report!
Thanks ~volker What we may want to do now is to include all elements that contain or generate URLs. This will be for all document types and not just GTCs. In the results, we may want to group the URLs by the element they are retrieved from.
Since you are saying the report works as expected I'm assuming this ticket shouldn't be categorized as a "Bug" anymore. I am wondering if the task now is to modify the existing report or to create a new report based on your new requirements (which would still need to be spelled out for each document type).
Is this a Pauling ticket or does it need to be pushed into Ohm?
I think for this ticket we haven't nailed down what needs to be done exactly. My first question was - given ~oseipokuw statement that "the report is working as expected" - if we want to keep this report as is and create a new report with the new requirements or if we're modifying the current report to match the new requirements. Obviously, it depends on what the report is used for and if the modified version will still support that use appropriately.
My second question would be to specify the changes in more detail. When you say: "include all elements that contain or generate URLs" it would probably make sense for you, ~oseipokuw , to give me the list of those elements rather than having me guess the list of elements. For instance, the elements SummaryURL and SummaryRef are related to creating links but I couldn't tell you if you'd like both of these elements be part of the modified/new report for summaries.
As for the display ("group the URLs by the element they are retrieved from") could you possibly provide a sample for a GTC document of what you think this should look like? Currently, the report is sorted by CDR-ID. You probably want to have the sort by element happen within each document or do you want elements sorted first?
~oseipokuw, could you please take a look at the comments for this ticket and clarify what exactly needs to happen for this report? I don't have a good handle on the requirements.
Hi ~volker Sure. I am working to gather the elements to include in this report. The elements would be similar to what we have for the URL Check Report. In the first place, I think we should modify this report rather than create a new report since the purpose of the report will be the same - to have a report that lists all URLs in a document to make it easy to search for a particular URL that needs updated in documents. For summaries, we will not be including the Summary URL for example. But we might include an "external" Summary Fragment Ref, for example. I will be providing more information soon.
~volker The only modification to make is to add the RelatedExternalRef element from the Glossary Term Concept schema.
I included the display of the RelatedExternalRef element to be included in the output for the GTC document type. The following program has been updated:
UrlListReport.py
https://github.com/NCIOCPL/cdr-admin/commit/5425313
Verified on QA. Thanks!
Verified on PROD. Thanks!
Elapsed: 0:00:00.001986