EBMS Tickets

Issue Number 693
Summary Article Statistics Reports - Articles Imported, count errors
Created 2023-01-11 11:43:07
Issue Type Bug
Submitted By Boggess, Cynthia (NIH/NCI) [C]
Assigned To Kline, Bob (NIH/NCI) [C]
Status Closed
Resolved 2023-01-12 05:26:20
Resolution Fixed
Path /home/bkline/backups/jira/oceebms/issue.336112
Description

From the Article Statistics Reports, the Articles Imported is generating incorrect counts for summary topics. I have provided data for two examples below.

In EBMS4 and Prod:

When Review cycle = Sept 2022, Board = adult, Topic = AIDS Lymphoma

Import report indicates 18 citations imported

Article Search with no other limits retrieves 18 citations

BUT the Articles Imported report shows 35

In Prod the Articles Imported report shows 18

When Review cycle = Sept 2022, Board = adult, Topic = Adrenocortical Carcinoma

Import report indicates 12 citations imported

Article Search with no other limits retrieves 12 citations

BUT the Articles Imported report shows 22

In Prod the Articles Imported report shows 12

Comment entered 2023-01-11 17:32:38 by Kline, Bob (NIH/NCI) [C]

Since this will be easier to troubleshoot with the larger dataset, I will be working on it on https://ebms.rksystems.com, instead of my private developer Docker container. Just letting you know in case you notice any hiccups with this report while I'm noodling at it.

Comment entered 2023-01-12 05:26:20 by Kline, Bob (NIH/NCI) [C]

Fixed on https://ebms.rksystems.com. Please give it another try. Thank you for testing so thoroughly! 😃

Comment entered 2023-01-12 10:58:12 by Boggess, Cynthia (NIH/NCI) [C]

OK I am seeing the correct numbers for both of the topics in erc.

And the other adult topics are matching with prod as well. I also looked at peds for the set 2022 and those numbers are matching with prod as well.

Comment entered 2023-01-12 11:51:28 by Boggess, Cynthia (NIH/NCI) [C]

After looking at several of these Article Statistics Reports this morning, I am noticing something that I'll mention in this ticket because it changed after this fix to the Articles Imported report.

In erc if you go to the Article Statistics Report page and select the adult board and sept 2022 review cycle, then toggle through the different reports you will notice that some start with AIDS Lymphoma and others start with Adrenocortical Carcinoma. In EBMS4 they all start with AIDS and in PROD they all start with Adrenocortical. I first noticed the change in order yesterday in EBMS4 but because it was the same for all the reports, I assumed this was a change in treatment of AIDS as an abbrev rather than a word. But now I am thinking something else is happening. 

Assuming erc was lined up with ebms4 before these changes were made, the Articles Imported report started with AIDS yesterday and is now starting with Adrenocortical.

Comment entered 2023-01-12 12:39:49 by Kline, Bob (NIH/NCI) [C]

Right. That's because this report was failing if the retrieval set was large, so I switched to a lower-level approach for collecting the information. As a result I'm sorting the rows myself instead of having the database do it. The database by default ignores case when it compares strings and PHP does not. Both approaches are deterministic, by which I mean you can count on the ordering being the same from one run of the report to another. If you have a strong preference for the order which ignores case I'll do a bit more work to see if I can make that happen.

Comment entered 2023-01-12 13:20:21 by Boggess, Cynthia (NIH/NCI) [C]

The only other summary topic this may also impact is PC-SPES but this topic does not have many citations assigned to it. IACT board in general will not have nearly the retrieval as what we see with Adult. And currently in erc I am seeing PC-SPES listed in the same order as Prod.

As a librarian, I will always side with creating more consistency, but I think we may need to assess whether the added work to fix one topic in several reports that are not used more than a few times a month (with the exception of when testing) is worth it and of course what risk are we taking in customizing the code too much. 

Currently we have accurate data being reported. This was my main objective. But I'll leave it to Victoria to decide if we should proceed with the decision to ignore case.

Comment entered 2023-01-12 15:07:50 by Kline, Bob (NIH/NCI) [C]

Actually, what I wrote in my previous comment was backwards. Because I was dropping from the higher-level entity query API (for which I had to sort the rows myself, because that API was giving me state entities when the report is organized around boards and topics) to querying the database directly (where I had the flexibility to tell the database to sort on the columns I needed for the ordering) what I SHOULD have written is:

Before the fix, I was sorting the table rows myself, using native PHP string comparison, which does NOT ignore case (hence "AIDS" before "Adrenocortical" because all the uppercase letters sort before all the lowercase letters, at least for ASCII and Unicode, and I think we can safely ignore EBCDIC 😉). That's why ebms4-dev differs from PROD.

After the fix I was able to let the database sort the rows, so case is ignored, giving you the order you see on PROD.

You're right that in general we want to avoid heavily customizing the code, and it is definitely true that when we're paging a report or a queue, modifying the sort we get from the database involves MASSIVE amounts of such customization. However, in this case, since we're not doing any pagination of the results, it would be relatively trivial for me to sort the rows if what you now have isn't what you want.

Hope I haven't made things way more confusing. 😛

Comment entered 2023-01-12 16:03:15 by Boggess, Cynthia (NIH/NCI) [C]

I think I understand all of what you have explained 🙂 and if you think creating a consistent sort order for topics across these reports (where case is ignored and Adrenocortical would be first like in Prod, right?) is going to be "relatively trivial" and not cause other problems, then I think we should go for it.

Comment entered 2023-01-12 16:56:30 by Kline, Bob (NIH/NCI) [C]

Probably took less time to implement than it will take you to test it, but I think I've got what you want for the topic sorting in these reports.

Comment entered 2023-01-13 11:38:20 by Boggess, Cynthia (NIH/NCI) [C]

Looks good on erc. Order of topics is now consistent across the collection of reports with Adrenocortical at the top of the list.

Comment entered 2023-01-18 12:01:27 by Boggess, Cynthia (NIH/NCI) [C]

verified on ebms4

Elapsed: 0:00:00.000710