Issue Number | 593 |
---|---|
Summary | [Literature] Identify Article Type on Queue page |
Created | 2021-05-25 12:17:54 |
Issue Type | Improvement |
Submitted By | Juthe, Robin (NIH/NCI) [E] |
Assigned To | Kline, Bob (NIH/NCI) [C] |
Status | Closed |
Resolved | 2021-08-27 11:31:33 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/oceebms/issue.291077 |
From Victoria:
When there’s a citation in the EBMS that looks like it’s probably an editorial/comment/letter/etc. because it doesn’t have an abstract or it has a single author or the title is sort of snappy, I always have to click on the PMID to see what the publication type is. It would be nice if that was shown on the review page.
Article types that would be helpful to identify include reviews, comments, and editorials. We'll need to see what we're able to parse from the data from NLM though.
I think this is a great idea! During my librarian review it would be helpful to see the publication type and maybe even be able to sort my queue by publication type to better prioritize my review process. Not only for editorial, comments, and letters but also for clinical trials, meta-analysis, reviews, multicenter studies, etc. Below is a link to all the publication types that are identified in PubMed. Publication Characteristics (Publication Types) with Scope Notes (nih.gov)
This is the complete list so most of these publication types we will not want to use, but we can go through them and make a list of all the ones we would be interested in.
Something to consider:
Many citations have multiple publication types. PubMed, for example, displays one of the publication types in a little light blue rectangle at the top near the title and then to see the other publication types you have to scroll down a bit below the abstract. I am not sure what criteria they use to select which publication type is displayed at the top. See the attached file for two examples.
We will have to decide which to display or to display all the publication types.Examples_multiple_pubtypes.docx
Adding an offline email from Cynthia in which she's proposed a set of article types that we might consider displaying. We plan to discuss this further with the other Board managers too.
-----
Hi Robin & Victoria,
Below is the list of publication types that I think could be helpful for us to display with regards to OCEEBMS 593.
Like Mesh terms, some Publication types are hierarchical. This is the case for three of the publication types below, clinical trial, guideline and review. We could display the broad publication type “Clinical Trial” for any citation that has any of the narrower clinical trial publication types OR we could display the narrower publication types when available. Either would be fine for my work.
Case Reports
Clinical Trial +
Adaptive Clinical Trial
Clinical Trial, Phase I
Clinical Trial, Phase II
Clinical Trial, Phase III
Clinical Trial, Phase IV
Controlled Clinical Trial +
Randomized Controlled Trial
Equivalence Trial
Pragmatic Clinical Trial
Comment
Comparative Study ?
Editorial
Evaluation Study ?
Guideline +
Practice Guideline
Letter
Meta-Analysis
Multicenter Study
Observational Study ?
Published Erratum
Review+
Consensus Development Conference+
Consensus Development Conference, NIH
Systematic Review
Twin Study ?
Validation Study ?
Let me know if you need any more info or have questions,
Cynthia Boggess
NCI OCPL – Office of Communications & Public Liaison
Contractor: Publicis Sapient
678-906-5609
Is this for all of the review pages, or just some of them?
I think we'd want it on all of the review citations pages.
Holding off on doing anything with this ticket until all of the discussions have been had and all of the decisions made.
Hi Bob, we've decided to include all of the bold article types from Cynthia's comment above (listed below); however, we'd like to also include the specific clinical trial designation as listed below if possible. We'd also like to display multiple types if applicable. Let us know if you have additional questions.
Case Reports
Clinical Trial +
Clinical Trial, Phase I
Clinical Trial, Phase II
Clinical Trial, Phase III
Clinical Trial, Phase IV
Controlled Clinical Trial +
Randomized Controlled Trial
Comment
Editorial
Guideline +
Letter
Meta-Analysis
Multicenter Study
Published Erratum
Review+
Systematic Review
Do we include the plus signs? If "Clinical Trial +" and "Clinical Trial, Phase III" are found for an article, do we include both?
Plus signs aren't needed. Let's discuss how to handle multiple values in the status meeting.
I have parsed the XML for all of the articles which were imported since 2021-01-01. The attached pubtypes-2021.txt file has two sets of numbers. The first set has a line for each unique combination of publication type values found in an article, with a count at the front of the line for the number of articles containing that combination. The second set has a line for each unique publication type value found in the article XML, with a count at the front of the line for the number of times that line's publication type value was found. Each set is sorted with the most frequent publication types (or combinations of publication types) first. I am running the same script against the XML for all of the articles in the EBMS, and I will attach the comparable report when the job has completed. I looked on NLM's web site, starting with the link provided by Cynthia, to see if there is an API for retrieving the hierarchy for the PublicationType MeSH concepts, but didn't have any luck, so I posted a request for assistance to the NLM Help Desk. I should receive a reply in the next few days.
The full parse of all the articles has finished, and the counts are
attached as pubtypes.txt. Also, I got the information I needed from NLM
(the descriptors which are publication types have a
DescriptorClass
attribute of "2" — how could I not have
guessed something as obvious as that???). The publication types are
listed (one per line, each line showing the type name and the MeSH tree
numbers for the publication type) in the attachment
mesh-publication-types.txt. The tree numbers show the hierarchical
relationships. Basically, if a publication type has a tree number which
is a substring of a tree number for another publication type, the second
publication type is a descendant of the first publication type, and if
we adopt the approach we've been discussing, we could omit display of
the first publication type for an article which identified itself as
having both of those types.
Since a new version of the MeSH tree is only released once a year (as it was when I created the software used by NLM for maintaining MeSH records), it shouldn't be difficult to store a copy of the publication types with hierarchy and refresh that copy when a new release of MeSH comes out, and we could use that local copy for deciding what to display for this ticket.
Let me know what you think.
Just to make it easier to find, I've teased out the top-level nodes in this tree.
Publication Components (V01)
Publication Formats (V02)
Study Characteristics (V03)
Support of Research (V04)
For my own reference, this is the URL for downloading MeSH: https://nlmpubs.nlm.nih.gov/projects/mesh/MESH_FILES/xmlmesh/desc2021.gz (they don't have a "current" or "latest" alias, so the year has to be changed as appropriate).
We've looked over your comments and attachments and decided to proceed with your proposed approach of storing the hierarchy tree in the EBMS and updating it once/year or as needed. Keeping with reporting only the types we care most about (bolded in my comment above + the clinical trial subtypes), there isn't likely to be too many citations with several types (2-3 at most is more likely). Let us know if there are any outstanding questions.
Hi, ~juther. I think the two approaches you've just described are at odds with each other. The first approach is programmatically data-driven, using an algorithm based on the hierarchy we capture from MeSH to prune types whose descendants are also present. The second relies on a hand-picked list of terms as they exist at the moment. If we need to keep the hand-picked list there's no point in making the updated MeSH hierarchy (as retrieved and parsed directly from NLM) available to the software. Instead, we'd make the decisions made by the librarians about "which types we care about" available to the software (along with rules about which of the types on the hand-picked list are parents/ancestors of which other types), and eventually (when or after we do the Drupal 9 rewrite) create an interface for updating that hand-curated information as needed. Yes, we can use the MeSH hierarchy to inform the rules we capture for the hand-picked list about which are ancestors of which, but there would be no way to get around the fact that the composition of the list itself would be dependent on decisions which can't be automated. Of course, if there's something in the MeSH data which distinguishes the types in such a way that make programmatic reproduction of the librarians' list possible (for example, all the nodes in a certain branch of the tree), then that would provide a way to reconcile the two approaches. But I haven't been able to find such a pattern.🙁
Robin, a few things to consider that may help:
Even though mesh is updated each year, the publication types are relatively unchanged and certainly not an annual event.
Our monthly searches and my citation review weed out all the unwanted publication types that have not been included in the hand-picked list. This is why only a selected set of publication types are represented in the EBMS to date. I don’t mind seeing all the publication types as it could assist my weeding process and by the time I am done reviewing, the published citations for BM review pretty much reflect only the publication types on the hand-picked list anyway. So if we want to keep everything automated and if I am understanding the above mentioned conflict correctly, could including all publication types make this easier at least for now? If so:
utilize the mesh hierarchy as Bob suggests selecting to display only the narrowest publication type in each mesh tree that applies
consider a limit to the total number of publication types displayed to 2-3, this would apply to citations that have publication types from multiple mesh trees
Thank you both for your comments. I think I see what you mean. I'm fine with proceeding with this approach:
utilize the mesh hierarchy as Bob suggests selecting to display only the narrowest publication type in each mesh tree that applies
I don't think we should limit the number of publication types at the outset, at least not until we see how cumbersome these lists may be. In reviewing the list of citations in the EBMS with multiple types, some of the lists are quite long, but I understand those would likely be shortened substantially if we assume the approach above whereby only the narrowest type is displayed.
Here's an edge case question, unlikely to arise, but the software needs to know what to do when it does happen. Can we assume that if the XML for an article hasn't been refreshed recently enough (or NLM hasn't updated the XML to reflect the current MeSH tree for publication types) and we find a publication type in an article which isn't found in the MeSH tree, we should display the type, rather than suppress it?
One of the things I need to decide is whether to
create a new table for the article publication types and populate that table for all articles; or
parse the XML for the articles on the fly when constructing the review pages.
My inclination is to use the second approach for now and see how the performance is, and if it doesn't introduce too much of a performance hit we stick with that approach until the Drupal 9 rewrite, at which point we'll be rebuilding all the tables anyway. Any objections to what I'm proposing? Going with the first approach now would complicate the Denali deployment and/or saddle it with pretty substantial downtime.
This is pretty much moot, as it turns out we're parsing each article's XML already. My plan for the rewrite is that we extract what we need from the XML when we import it, storing the parsed information instead of the raw XML so we don't have to keep parsing the same XML over and over.
I've got a preliminary implementation running on DEV. It has a
temporary version of the publication types, which shows all of
the types I find for each article, displaying the ones which will be
suppressed like this just so you
can see what the software's going to be doing. Please take a look and
let me know when I can replace that temporary version with the one which
actually does the suppression of the ones we don't want.
Just looked at this on dev and noticed a possible issue with how the suppression is working. In the attached file there are two citations that I pulled from the med lib queue: first one with Journal Article suppressed because the citation is a Meta-analysis, Review; second one is a Clinical Trial but Journal Article is not suppressed. pubtype_ex_JAnotsupprsd.docx
The MeSH tree number for Journal Article is
V02.600
. For PMID 32044160 we suppress Journal
Article because it is an ancestor of Review (whose MeSH
tree numbers are V02.600.500
and
V02.912
). Looking at the mesh-publication-types.txt I posted to the
ticket, which of the other three publication types for PMID 30131387 has
V02.600
in its MeSH tree number? (In other words, which of
them is a descendent of Journal Article?)
This is a link to pmid 30131387 as it is displayed on PubMed Phase II Study of Iniparib with Concurrent Chemoradiation in Patients with Newly Diagnosed Glioblastoma - PubMed (nih.gov). Even though the publication type Journal Article is in fact included in the full list of publication types assigned to this citation, it is not displayed. Any way we could replicate this same suppression? Where Journal Article is only displayed, for example, when no other publication type is listed (but of course only specifically V02.600 and not it's narrower pub types... such as Review V02.600.500)
Many of our citations are clinical trials or other pub types that will also have Journal Article assigned to them like this example, so we will be seeing this issue frequently.
Well, we could suppress all types whose position in the MeSH tree is less than three levels deep unless that would leave us with no types to display. Would you like me to implement that approach?
Or we could use the fragile approach of hard-coding special treatment for "Journal Article" (which would break if they ever renamed the type).
Or we could use the equally fragile approach of hard-coding special
treatment for the node with tree number V02.600
(which
would break when they modified the tree numbers).
(I should probably warn you in advance that programmers are trained — largely by experience — to push back against the "that will never happen" response. 😛)
"Well, we could suppress all types whose position in the MeSH tree is less than three levels deep unless that would leave us with no types to display. Would you like me to implement that approach?"----No, you are right this would eliminate pub types that we want to see.
I should probably warn you in advance that librarians (especially the ones at NLM) are trained to create standardizations that are statistically unlikely to change. For example, Journal Article as a publication type was created in 1991 and has not changed since. The Journal Article mesh tree has been updated once in 2008 when narrower pub types were added. No other changes have been made to date, not even when they recently migrated to the new version of pubmed. So I can not help thinking that it is reasonable to consider hard-coding a solution at some point.
I should probably warn you in advance that librarians (especially the ones at NLM) are trained to create standardizations that are statistically unlikely to change.
Right, like the structure of the Pubmed documents? 😉
Implemented on DEV.
We have decided to hard-code exclusion of Journal Article as an article type to be displayed.
Hard-coded exclusion implemented on DEV. So now some of the articles have no type to display, in which case I leave out the PUB TYPE(S) label.
Nicely done. Looks good on dev.
Verified on Dev
I have tested this in QA for the librarian's queue and everything looks good.
Verified on QA.
Verified on PROD. Seems to be working. It is OK to close ticket. Thanks!
Also verified on PROD. Closing ticket.
File Name | Posted | User |
---|---|---|
Examples_multiple_pubtypes.docx | 2021-05-27 16:36:40 | Boggess, Cynthia (NIH/NCI) [C] |
mesh-publication-types.txt | 2021-08-10 10:57:09 | Kline, Bob (NIH/NCI) [C] |
pubtype_ex_JAnotsupprsd.docx | 2021-08-17 09:39:44 | Boggess, Cynthia (NIH/NCI) [C] |
pubtypes.txt | 2021-08-10 10:57:09 | Kline, Bob (NIH/NCI) [C] |
pubtypes-2021.txt | 2021-08-09 16:07:29 | Kline, Bob (NIH/NCI) [C] |
Elapsed: 0:00:00.000644