Issue Number | 693 |
---|---|
Summary | Article Statistics Reports - Articles Imported, count errors |
Created | 2023-01-11 11:43:07 |
Issue Type | Bug |
Submitted By | Boggess, Cynthia (NIH/NCI) [C] |
Assigned To | Kline, Bob (NIH/NCI) [C] |
Status | Closed |
Resolved | 2023-01-12 05:26:20 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/oceebms/issue.336112 |
From the Article Statistics Reports, the Articles Imported is generating incorrect counts for summary topics. I have provided data for two examples below.
In EBMS4 and Prod:
When Review cycle = Sept 2022, Board = adult, Topic = AIDS Lymphoma
Import report indicates 18 citations imported
Article Search with no other limits retrieves 18 citations
BUT the Articles Imported report shows 35
In Prod the Articles Imported report shows 18
When Review cycle = Sept 2022, Board = adult, Topic = Adrenocortical Carcinoma
Import report indicates 12 citations imported
Article Search with no other limits retrieves 12 citations
BUT the Articles Imported report shows 22
In Prod the Articles Imported report shows 12
Since this will be easier to troubleshoot with the larger dataset, I will be working on it on https://ebms.rksystems.com, instead of my private developer Docker container. Just letting you know in case you notice any hiccups with this report while I'm noodling at it.
Fixed on https://ebms.rksystems.com. Please give it another try. Thank you for testing so thoroughly! 😃
OK I am seeing the correct numbers for both of the topics in erc.
And the other adult topics are matching with prod as well. I also looked at peds for the set 2022 and those numbers are matching with prod as well.
After looking at several of these Article Statistics Reports this morning, I am noticing something that I'll mention in this ticket because it changed after this fix to the Articles Imported report.
In erc if you go to the Article Statistics Report page and select the adult board and sept 2022 review cycle, then toggle through the different reports you will notice that some start with AIDS Lymphoma and others start with Adrenocortical Carcinoma. In EBMS4 they all start with AIDS and in PROD they all start with Adrenocortical. I first noticed the change in order yesterday in EBMS4 but because it was the same for all the reports, I assumed this was a change in treatment of AIDS as an abbrev rather than a word. But now I am thinking something else is happening.Â
Assuming erc was lined up with ebms4 before these changes were made, the Articles Imported report started with AIDS yesterday and is now starting with Adrenocortical.
Right. That's because this report was failing if the retrieval set was large, so I switched to a lower-level approach for collecting the information. As a result I'm sorting the rows myself instead of having the database do it. The database by default ignores case when it compares strings and PHP does not. Both approaches are deterministic, by which I mean you can count on the ordering being the same from one run of the report to another. If you have a strong preference for the order which ignores case I'll do a bit more work to see if I can make that happen.
The only other summary topic this may also impact is PC-SPES but this topic does not have many citations assigned to it. IACT board in general will not have nearly the retrieval as what we see with Adult. And currently in erc I am seeing PC-SPES listed in the same order as Prod.
As a librarian, I will always side with creating more consistency, but I think we may need to assess whether the added work to fix one topic in several reports that are not used more than a few times a month (with the exception of when testing) is worth it and of course what risk are we taking in customizing the code too much.Â
Currently we have accurate data being reported. This was my main objective. But I'll leave it to Victoria to decide if we should proceed with the decision to ignore case.
Actually, what I wrote in my previous comment was backwards. Because I was dropping from the higher-level entity query API (for which I had to sort the rows myself, because that API was giving me state entities when the report is organized around boards and topics) to querying the database directly (where I had the flexibility to tell the database to sort on the columns I needed for the ordering) what I SHOULD have written is:
Before the fix, I was sorting the table rows myself, using native PHP string comparison, which does NOT ignore case (hence "AIDS" before "Adrenocortical" because all the uppercase letters sort before all the lowercase letters, at least for ASCII and Unicode, and I think we can safely ignore EBCDIC 😉). That's why ebms4-dev differs from PROD.
After the fix I was able to let the database sort the rows, so case is ignored, giving you the order you see on PROD.
You're right that in general we want to avoid heavily customizing the code, and it is definitely true that when we're paging a report or a queue, modifying the sort we get from the database involves MASSIVE amounts of such customization. However, in this case, since we're not doing any pagination of the results, it would be relatively trivial for me to sort the rows if what you now have isn't what you want.
Hope I haven't made things way more confusing. 😛
I think I understand all of what you have explained 🙂 and if you think creating a consistent sort order for topics across these reports (where case is ignored and Adrenocortical would be first like in Prod, right?) is going to be "relatively trivial" and not cause other problems, then I think we should go for it.
Probably took less time to implement than it will take you to test it, but I think I've got what you want for the topic sorting in these reports.
Looks good on erc. Order of topics is now consistent across the collection of reports with Adrenocortical at the top of the list.
verified on ebms4
Elapsed: 0:00:00.000783