PDQ Issues

Issue Number	593
Summary	[Literature] Identify Article Type on Queue page
Created	2021-05-25 12:17:54
Issue Type	Improvement
Submitted By	Juthe, Robin (NIH/NCI) [E]
Assigned To	Kline, Bob (NIH/NCI) [C]
Status	Closed
Resolved	2021-08-27 11:31:33
Resolution	Fixed
Path	/home/bkline/backups/jira/oceebms/issue.291077

Description

From Victoria:

When there’s a citation in the EBMS that looks like it’s probably an editorial/comment/letter/etc. because it doesn’t have an abstract or it has a single author or the title is sort of snappy, I always have to click on the PMID to see what the publication type is. It would be nice if that was shown on the review page.

Article types that would be helpful to identify include reviews, comments, and editorials. We'll need to see what we're able to parse from the data from NLM though.

Comment entered 2021-05-27 13:26:05 by Boggess, Cynthia (NIH/NCI) [C]

I think this is a great idea! During my librarian review it would be helpful to see the publication type and maybe even be able to sort my queue by publication type to better prioritize my review process. Not only for editorial, comments, and letters but also for clinical trials, meta-analysis, reviews, multicenter studies, etc. Below is a link to all the publication types that are identified in PubMed. Publication Characteristics (Publication Types) with Scope Notes (nih.gov)

This is the complete list so most of these publication types we will not want to use, but we can go through them and make a list of all the ones we would be interested in.

Comment entered 2021-05-27 16:36:19 by Boggess, Cynthia (NIH/NCI) [C]

Something to consider:

Many citations have multiple publication types. PubMed, for example, displays one of the publication types in a little light blue rectangle at the top near the title and then to see the other publication types you have to scroll down a bit below the abstract. I am not sure what criteria they use to select which publication type is displayed at the top. See the attached file for two examples.

We will have to decide which to display or to display all the publication types.Examples_multiple_pubtypes.docx

Comment entered 2021-06-02 22:57:07 by Juthe, Robin (NIH/NCI) [E]

Adding an offline email from Cynthia in which she's proposed a set of article types that we might consider displaying. We plan to discuss this further with the other Board managers too.

-----

Hi Robin & Victoria,

Below is the list of publication types that I think could be helpful for us to display with regards to OCEEBMS 593.

Like Mesh terms, some Publication types are hierarchical. This is the case for three of the publication types below, clinical trial, guideline and review. We could display the broad publication type “Clinical Trial” for any citation that has any of the narrower clinical trial publication types OR we could display the narrower publication types when available. Either would be fine for my work.

Case Reports

Clinical Trial +

Adaptive Clinical Trial

Clinical Trial, Phase I

Clinical Trial, Phase II

Clinical Trial, Phase III

Clinical Trial, Phase IV

Controlled Clinical Trial +

Randomized Controlled Trial

Equivalence Trial

Pragmatic Clinical Trial

Comment

Comparative Study ?

Editorial

Evaluation Study ?

Guideline +

Practice Guideline

Letter

Meta-Analysis

Multicenter Study

Observational Study ?

Published Erratum

Review+

Consensus Development Conference+

Consensus Development Conference, NIH

Systematic Review

Twin Study ?

Validation Study ?

Let me know if you need any more info or have questions,

Cynthia Boggess

NCI OCPL – Office of Communications & Public Liaison

Contractor: Publicis Sapient
678-906-5609

Comment entered 2021-06-04 13:26:20 by Kline, Bob (NIH/NCI) [C]

Is this for all of the review pages, or just some of them?

Comment entered 2021-06-04 15:42:05 by Juthe, Robin (NIH/NCI) [E]

I think we'd want it on all of the review citations pages.

Comment entered 2021-07-09 16:01:15 by Kline, Bob (NIH/NCI) [C]

Holding off on doing anything with this ticket until all of the discussions have been had and all of the decisions made.

Comment entered 2021-07-30 10:26:34 by Juthe, Robin (NIH/NCI) [E]

Hi Bob, we've decided to include all of the bold article types from Cynthia's comment above (listed below); however, we'd like to also include the specific clinical trial designation as listed below if possible. We'd also like to display multiple types if applicable. Let us know if you have additional questions.

Case Reports

Clinical Trial +

Clinical Trial, Phase I

Clinical Trial, Phase II

Clinical Trial, Phase III

Clinical Trial, Phase IV

Controlled Clinical Trial +

Randomized Controlled Trial

Comment

Editorial

Guideline +

Letter

Meta-Analysis

Multicenter Study

Published Erratum

Review+

Systematic Review

Comment entered 2021-07-30 11:11:19 by Kline, Bob (NIH/NCI) [C]

Do we include the plus signs? If "Clinical Trial +" and "Clinical Trial, Phase III" are found for an article, do we include both?

Comment entered 2021-08-05 12:57:18 by Juthe, Robin (NIH/NCI) [E]

Plus signs aren't needed. Let's discuss how to handle multiple values in the status meeting.

Comment entered 2021-08-09 16:15:03 by Kline, Bob (NIH/NCI) [C]

I have parsed the XML for all of the articles which were imported since 2021-01-01. The attached pubtypes-2021.txt file has two sets of numbers. The first set has a line for each unique combination of publication type values found in an article, with a count at the front of the line for the number of articles containing that combination. The second set has a line for each unique publication type value found in the article XML, with a count at the front of the line for the number of times that line's publication type value was found. Each set is sorted with the most frequent publication types (or combinations of publication types) first. I am running the same script against the XML for all of the articles in the EBMS, and I will attach the comparable report when the job has completed. I looked on NLM's web site, starting with the link provided by Cynthia, to see if there is an API for retrieving the hierarchy for the PublicationType MeSH concepts, but didn't have any luck, so I posted a request for assistance to the NLM Help Desk. I should receive a reply in the next few days.

Comment entered 2021-08-10 11:10:26 by Kline, Bob (NIH/NCI) [C]

The full parse of all the articles has finished, and the counts are attached as pubtypes.txt. Also, I got the information I needed from NLM (the descriptors which are publication types have a DescriptorClass attribute of "2" — how could I not have guessed something as obvious as that???). The publication types are listed (one per line, each line showing the type name and the MeSH tree numbers for the publication type) in the attachment mesh-publication-types.txt. The tree numbers show the hierarchical relationships. Basically, if a publication type has a tree number which is a substring of a tree number for another publication type, the second publication type is a descendant of the first publication type, and if we adopt the approach we've been discussing, we could omit display of the first publication type for an article which identified itself as having both of those types.

Since a new version of the MeSH tree is only released once a year (as it was when I created the software used by NLM for maintaining MeSH records), it shouldn't be difficult to store a copy of the publication types with hierarchy and refresh that copy when a new release of MeSH comes out, and we could use that local copy for deciding what to display for this ticket.

Let me know what you think.

Comment entered 2021-08-10 15:31:57 by Kline, Bob (NIH/NCI) [C]

Just to make it easier to find, I've teased out the top-level nodes in this tree.

Publication Components (V01)
Publication Formats (V02)
Study Characteristics (V03)
Support of Research (V04)

Comment entered 2021-08-11 09:27:09 by Kline, Bob (NIH/NCI) [C]

For my own reference, this is the URL for downloading MeSH: https://nlmpubs.nlm.nih.gov/projects/mesh/MESH_FILES/xmlmesh/desc2021.gz (they don't have a "current" or "latest" alias, so the year has to be changed as appropriate).

Comment entered 2021-08-12 15:16:49 by Juthe, Robin (NIH/NCI) [E]

We've looked over your comments and attachments and decided to proceed with your proposed approach of storing the hierarchy tree in the EBMS and updating it once/year or as needed. Keeping with reporting only the types we care most about (bolded in my comment above + the clinical trial subtypes), there isn't likely to be too many citations with several types (2-3 at most is more likely). Let us know if there are any outstanding questions.

Comment entered 2021-08-13 07:16:36 by Kline, Bob (NIH/NCI) [C]

Hi, ~juther. I think the two approaches you've just described are at odds with each other. The first approach is programmatically data-driven, using an algorithm based on the hierarchy we capture from MeSH to prune types whose descendants are also present. The second relies on a hand-picked list of terms as they exist at the moment. If we need to keep the hand-picked list there's no point in making the updated MeSH hierarchy (as retrieved and parsed directly from NLM) available to the software. Instead, we'd make the decisions made by the librarians about "which types we care about" available to the software (along with rules about which of the types on the hand-picked list are parents/ancestors of which other types), and eventually (when or after we do the Drupal 9 rewrite) create an interface for updating that hand-curated information as needed. Yes, we can use the MeSH hierarchy to inform the rules we capture for the hand-picked list about which are ancestors of which, but there would be no way to get around the fact that the composition of the list itself would be dependent on decisions which can't be automated. Of course, if there's something in the MeSH data which distinguishes the types in such a way that make programmatic reproduction of the librarians' list possible (for example, all the nodes in a certain branch of the tree), then that would provide a way to reconcile the two approaches. But I haven't been able to find such a pattern.🙁

Comment entered 2021-08-13 11:56:33 by Boggess, Cynthia (NIH/NCI) [C]

Robin, a few things to consider that may help:

Even though mesh is updated each year, the publication types are relatively unchanged and certainly not an annual event.
Our monthly searches and my citation review weed out all the unwanted publication types that have not been included in the hand-picked list. This is why only a selected set of publication types are represented in the EBMS to date. I don’t mind seeing all the publication types as it could assist my weeding process and by the time I am done reviewing, the published citations for BM review pretty much reflect only the publication types on the hand-picked list anyway. So if we want to keep everything automated and if I am understanding the above mentioned conflict correctly, could including all publication types make this easier at least for now? If so:
1. utilize the mesh hierarchy as Bob suggests selecting to display only the narrowest publication type in each mesh tree that applies
2. consider a limit to the total number of publication types displayed to 2-3, this would apply to citations that have publication types from multiple mesh trees

Comment entered 2021-08-13 12:50:46 by Juthe, Robin (NIH/NCI) [E]

Thank you both for your comments. I think I see what you mean. I'm fine with proceeding with this approach:

utilize the mesh hierarchy as Bob suggests selecting to display only the narrowest publication type in each mesh tree that applies

I don't think we should limit the number of publication types at the outset, at least not until we see how cumbersome these lists may be. In reviewing the list of citations in the EBMS with multiple types, some of the lists are quite long, but I understand those would likely be shortened substantially if we assume the approach above whereby only the narrowest type is displayed.

Comment entered 2021-08-16 13:42:10 by Kline, Bob (NIH/NCI) [C]

Here's an edge case question, unlikely to arise, but the software needs to know what to do when it does happen. Can we assume that if the XML for an article hasn't been refreshed recently enough (or NLM hasn't updated the XML to reflect the current MeSH tree for publication types) and we find a publication type in an article which isn't found in the MeSH tree, we should display the type, rather than suppress it?

Comment entered 2021-08-16 13:50:34 by Kline, Bob (NIH/NCI) [C]

One of the things I need to decide is whether to

create a new table for the article publication types and populate that table for all articles; or
parse the XML for the articles on the fly when constructing the review pages.

My inclination is to use the second approach for now and see how the performance is, and if it doesn't introduce too much of a performance hit we stick with that approach until the Drupal 9 rewrite, at which point we'll be rebuilding all the tables anyway. Any objections to what I'm proposing? Going with the first approach now would complicate the Denali deployment and/or saddle it with pretty substantial downtime.

Comment entered 2021-08-16 18:29:49 by Kline, Bob (NIH/NCI) [C]

This is pretty much moot, as it turns out we're parsing each article's XML already. My plan for the rewrite is that we extract what we need from the XML when we import it, storing the parsed information instead of the raw XML so we don't have to keep parsing the same XML over and over.

Comment entered 2021-08-16 18:37:17 by Kline, Bob (NIH/NCI) [C]

I've got a preliminary implementation running on DEV. It has a temporary version of the publication types, which shows all of the types I find for each article, displaying the ones which will be suppressed ~~like this~~ just so you can see what the software's going to be doing. Please take a look and let me know when I can replace that temporary version with the one which actually does the suppression of the ones we don't want.

Comment entered 2021-08-17 09:42:35 by Boggess, Cynthia (NIH/NCI) [C]

Just looked at this on dev and noticed a possible issue with how the suppression is working. In the attached file there are two citations that I pulled from the med lib queue: first one with Journal Article suppressed because the citation is a Meta-analysis, Review; second one is a Clinical Trial but Journal Article is not suppressed. pubtype_ex_JAnotsupprsd.docx

Comment entered 2021-08-17 10:16:36 by Kline, Bob (NIH/NCI) [C]

The MeSH tree number for Journal Article is V02.600. For PMID 32044160 we suppress Journal Article because it is an ancestor of Review (whose MeSH tree numbers are V02.600.500 and V02.912). Looking at the mesh-publication-types.txt I posted to the ticket, which of the other three publication types for PMID 30131387 has V02.600 in its MeSH tree number? (In other words, which of them is a descendent of Journal Article?)

Comment entered 2021-08-17 11:39:09 by Boggess, Cynthia (NIH/NCI) [C]

This is a link to pmid 30131387 as it is displayed on PubMed Phase II Study of Iniparib with Concurrent Chemoradiation in Patients with Newly Diagnosed Glioblastoma - PubMed (nih.gov). Even though the publication type Journal Article is in fact included in the full list of publication types assigned to this citation, it is not displayed. Any way we could replicate this same suppression? Where Journal Article is only displayed, for example, when no other publication type is listed (but of course only specifically V02.600 and not it's narrower pub types... such as Review V02.600.500)

Many of our citations are clinical trials or other pub types that will also have Journal Article assigned to them like this example, so we will be seeing this issue frequently.

Comment entered 2021-08-17 13:57:27 by Kline, Bob (NIH/NCI) [C]

Well, we could suppress all types whose position in the MeSH tree is less than three levels deep unless that would leave us with no types to display. Would you like me to implement that approach?

Or we could use the fragile approach of hard-coding special treatment for "Journal Article" (which would break if they ever renamed the type).

Or we could use the equally fragile approach of hard-coding special treatment for the node with tree number V02.600 (which would break when they modified the tree numbers).

(I should probably warn you in advance that programmers are trained — largely by experience — to push back against the "that will never happen" response. 😛)

Comment entered 2021-08-17 16:30:39 by Boggess, Cynthia (NIH/NCI) [C]

"Well, we could suppress all types whose position in the MeSH tree is less than three levels deep unless that would leave us with no types to display. Would you like me to implement that approach?"----No, you are right this would eliminate pub types that we want to see.

I should probably warn you in advance that librarians (especially the ones at NLM) are trained to create standardizations that are statistically unlikely to change. For example, Journal Article as a publication type was created in 1991 and has not changed since. The Journal Article mesh tree has been updated once in 2008 when narrower pub types were added. No other changes have been made to date, not even when they recently migrated to the new version of pubmed. So I can not help thinking that it is reasonable to consider hard-coding a solution at some point.

Comment entered 2021-08-18 09:38:40 by Kline, Bob (NIH/NCI) [C]

I should probably warn you in advance that librarians (especially the ones at NLM) are trained to create standardizations that are statistically unlikely to change.

Right, like the structure of the Pubmed documents? 😉

Comment entered 2021-08-19 12:46:40 by Kline, Bob (NIH/NCI) [C]

Implemented on DEV.

Comment entered 2021-08-26 14:36:19 by Kline, Bob (NIH/NCI) [C]

We have decided to hard-code exclusion of Journal Article as an article type to be displayed.

Comment entered 2021-08-27 11:31:33 by Kline, Bob (NIH/NCI) [C]

Hard-coded exclusion implemented on DEV. So now some of the articles have no type to display, in which case I leave out the PUB TYPE(S) label.

Comment entered 2021-08-27 12:46:21 by Boggess, Cynthia (NIH/NCI) [C]

Nicely done. Looks good on dev.

Comment entered 2021-08-27 17:03:38 by Boggess, Cynthia (NIH/NCI) [C]

Verified on Dev

Comment entered 2021-09-10 17:11:29 by Boggess, Cynthia (NIH/NCI) [C]

I have tested this in QA for the librarian's queue and everything looks good.

Comment entered 2021-09-22 09:20:21 by Shields, Victoria (NIH/NCI) [E]

Verified on QA.

Comment entered 2021-10-18 14:06:26 by Osei-Poku, William (NIH/NCI) [C]

Verified on PROD. Seems to be working. It is OK to close ticket. Thanks!

Comment entered 2021-10-21 09:53:16 by Shields, Victoria (NIH/NCI) [E]

Also verified on PROD. Closing ticket.

Attachments

File Name	Posted	User
Examples_multiple_pubtypes.docx	2021-05-27 16:36:40	Boggess, Cynthia (NIH/NCI) [C]
mesh-publication-types.txt	2021-08-10 10:57:09	Kline, Bob (NIH/NCI) [C]
pubtype_ex_JAnotsupprsd.docx	2021-08-17 09:39:44	Boggess, Cynthia (NIH/NCI) [C]
pubtypes.txt	2021-08-10 10:57:09	Kline, Bob (NIH/NCI) [C]
pubtypes-2021.txt	2021-08-09 16:07:29	Kline, Bob (NIH/NCI) [C]

Elapsed: 0:00:00.000992

EBMS Tickets