CDR Tickets

Issue Number 4369
Summary [Glossary] GTC by Type Report - Slow, Timeout Error
Created 2018-01-05 14:39:34
Issue Type Bug
Submitted By Juthe, Robin (NIH/NCI) [E]
Assigned To Kline, Bob (NIH/NCI) [C]
Status Closed
Resolved 2018-03-15 14:06:01
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.219396
Description

The Glossary Term Concept by Type report is very slow and it times out when running it with the following parameters:

Term type: Anatomy/physiology
Term name: %
Def Status: Approved
Audience: Patient

The report on PROD with these same parameters just came up in 5 min, 15 sec.

Comment entered 2018-01-06 07:13:42 by Kline, Bob (NIH/NCI) [C]

I was able to get the report to run in 14.25 minutes (you didn't say which display option you used, so I left it at the default of "English only"). I rewrote the document parsing for the concepts and terms and was able shave about three minutes off. I'll be very interested to see how the new code performs on the production tier. I added a line at the bottom to show the number of concepts processed and the amount of time it took to generate the report.

Comment entered 2018-01-06 07:25:58 by Kline, Bob (NIH/NCI) [C]

... you didn't say which display option you used, ...

It turns out not to make a significant difference in the processing time which display option is used, as the same amount of filtering and parsing is required for the concept document and its name documents.

Comment entered 2018-01-11 09:12:37 by Juthe, Robin (NIH/NCI) [E]

I just tried running this report again with the same specifications, and I'm getting a "page can't be displayed" error at 6 minutes in.

Comment entered 2018-01-11 10:41:09 by Kline, Bob (NIH/NCI) [C]

I am unable to get it to fail. I did add some more logging to try and capture more information about what's happening when it fails for you. Failure at 6 minutes doesn't sound like the web server timing out. On the other hand, reading the instructions on the form, it seems likely that the original requirements/design for this report did not anticipate entering nothing but a wildcard for the term name field. When you do that, the report has a lot of work to do, because it has to run many hundreds of concepts, as well as all their term names through the filter to resolve revision markup. If you're just doing that to be testing the limits of the software, that's OK. But if you do that on a regular basis in production, we should probably (a) remove the instructions that say you must enter something in either the name or definition text field, modifying the logic accordingly, and (b) consider making this an off line report (in Hawking).

Comment entered 2018-01-11 13:53:50 by Juthe, Robin (NIH/NCI) [E]

will check with Amy to see how this report is commonly used. It may make sense to add an option to run the report as a batch job (in Hawking).

Comment entered 2018-01-16 15:02:46 by Osei-Poku, William (NIH/NCI) [C]

Amy doesn't use the report but said she recognizes that it would be useful. Linda uses it a lot and she is okay with it being a batch report.

Comment entered 2018-01-16 15:05:30 by Osei-Poku, William (NIH/NCI) [C]

Updated title of ticket and placed it in the Hawking queue.

Comment entered 2018-03-07 09:01:25 by Kline, Bob (NIH/NCI) [C]

I just ran this report on QA using the parameters given above and it came back in 1 minute and 17 seconds. It's possible that the work I've been doing on OCECDR-4418 (reducing memory usage for database queries) has had a significant impact on the report processing time. If you can confirm that you get comparable times we probably won't need to rewrite this as a batch report. Please test on QA.

Comment entered 2018-03-15 13:09:42 by Osei-Poku, William (NIH/NCI) [C]

Yes, I am able to confirm that it is running much faster now than before. Thank you!

Comment entered 2018-03-15 14:06:01 by Kline, Bob (NIH/NCI) [C]

William has confirmed that this is no longer a problem on QA.

Comment entered 2018-05-31 13:10:47 by Osei-Poku, William (NIH/NCI) [C]

This is a bit slower on PROD than on QA (when I first tested) but it is still significantly faster and does not time out. The last run came up in 2mins 14secs.

I am marking this ticket as verified. Thanks!

Elapsed: 0:00:00.001605