Issue Number | 803 |
---|---|
Summary | Related Citations Automatically Linked Upon Import - Published Errata? |
Created | 2023-10-19 15:28:04 |
Issue Type | Improvement |
Submitted By | Boggess, Cynthia (NIH/NCI) [C] |
Assigned To | |
Status | Open |
Resolved | |
Resolution | |
Path | /home/bkline/backups/jira/oceebms/issue.364090 |
Question: Are Published Erratum included as one of the publication types that are automatically linked upon import for core journals?
If not, can we include them?
I believe the code is looking for the
CommentsCorrectionsList/CommentsCorrections/PMID
element in
the XML document we get from NLM. Do you have an example of an article
which would be regarded as a "published erratum" which didn't get linked
as a related article where we would have expected it should have been
linked?
EBMS ID: 933172
EBMS ID: 933114
EBMS ID: 933113
EBMS ID: 933110
EBMS ID: 928054
EBMS ID: 728216 has two errata listed in pubmed but only one was programmatically linked which seems odd.
To see more, go to Article Search and enter %erratum% in the title field.
Note: In several cases Bonnie and I have linked erratum to their original article manually. If the citation was linked programmatically a note indicating such is included in the full record display just below Related Articles.
I went back to the original
requirements for this functionality, as well as the code for the
Drupal 7 site, and from what I can tell, the automatic linking happens
(and happened in the Drupal 7 system) when a related article which
we did not already have is automatically imported as a followup
based on the presence of a CommentsCorrections
element in
an article imported as a result of that article being requested in the
base import job.
So the short answer to your original question is yes, we automatically link errata, but only if we are automatically importing the erratum because we discovered its existence when importing the base article, and determined that we didn't already have it in the system.
Other restrictions apply, of course: the two articles must be published in the same journal, and that journal must have been designated as a "core" journal (but you know all about those restrictions).
This analysis also explains the case you referred to above as "seems odd." That case was actually a useful clue.
If I understand this correctly, in order for a related article to get programmatically linked it must exist (aka have a record in pubmed) at the time the base article is imported.
So I think we have a timing issue...
We run searches approximately every 30 days which is a short window for a base article to get published and then also be commented on or have a correction published. And therefore, any comment or correction, etc. made after that window is getting missed.
Many corrections take months to get identified after a base article is published. I have seen erratum published up to a year after the base article.
Many of the erratum that we have in the EBMS were caught by our searches and imported in a different review cycle than the base article and did not get linked. Some were manually linked by me or Bonnie.
The other issue is that not all errata will get caught by our searches. Errata have their own pubmed record and often have different and limited indexing (especially now that everything is automatically indexed with little to no human review) than the base article. Without good indexing these errata get missed which is why this related article feature has great value to link errata with the base article.
So currently when a base article from a core journal is imported, the system is checking for comments, editorials, errata, etc. also published in a core journal, prompting the user to import the related citation and then linking it to the base article. If the related citation is already in the database, no linking happens(right?). This is a onetime check and the reverse is not happening. Reverse being, for example, reply linked to base versus base linked to reply.
When a comment, editorial, errata, etc. is being imported, could the system link it to the base article even if that base article has been in the EBMS for months? ... with the same restriction of core journals only. It is possible that we catch an erratum in our searches and not the base article, so we would have to consider what to do in that case.
So I think we have a timing issue...
Right. If you go back and read through the original ticket (linked from my previous comment), you'll see some discussion of that issue.
When a comment, editorial, errata, etc. is being imported, could the system link it to the base article even if that base article has been in the EBMS for months?
We could modify the system to do that. Not trivial. I assume you'd want us to check to see if the articles were already linked and avoid adding a second link in that case.
It is possible that we catch an erratum in our searches and not the base article, so we would have to consider what to do in that case.
NLM provides CommentsCorrections
elements in both
directions. For errata one of the articles will have the
RefType
attribute set to "ErratumFor" and the other will
have it set to "ErratumIn" (we ignore those attributes). So we'd find it
either way.
Tagging ~juther and ~vshields for awareness, since this is another enhancement whose LOE would be several days.
I have some other ideas and concerns about erratum so I think we may need to discuss and brainstorm.
I have some other ideas and concerns about erratum so I think we may need to discuss and brainstorm.
Sure. This ticket? Or separate tickets for different issues?
This ticket.
Considerations for possibly handling errata in a separate/different way than other related citations:
Replies, comments, editorials, etc. discuss the content or impact of an article which may influence a board manager's or member's decision to use as a reference and, therefore, make sense being part of the review process.
Whereas errata report corrections that may or may not impact the article's content significantly or the content being referenced and require evaluation. They may only be relevant for citations that end up being cited in a summary. And articles from more than just core journals get cited so our current restriction to core journals only is not appropriate for errata.
So what specific changes are you proposing be made?
They may only be relevant for citations that end up being cited in a summary.
Are you thinking that errata should not be included in the automatically recorded relationships, but should be instead captured manually after the base article has been cited in a summary?
805 is the duplicate ticket and we will continue with 803
Copied from OCEEBMS-805
It has come to my attention that the Related Articles feature is not identifying and automatically linking errata to citations due to the timing of their publication. Unlike comments, very few errata are published at the same time as the base article. A mistake in an article must first be noticed and then the author/publisher must be contacted and then the errata is published. I have seen errata that are published a year or more after the base article is published. The Related Article feature is also limited to core journals only which is not an appropriate limit when it comes to errata.
Historically, Minaxi and I did not worry too much about errata because pubmed, for well over a decade, has linked errata to the base citation record. Users of the EBMS, CDR and cancer.gov have access to the PMID link to pubmed where errata are available for review with most having a link to view the full text. However, with the increasing urgency to push articles to be available ahead of print, we are seeing more errata even in the core journals that pride themselves for their vigorous peer review and editorial processes. Also due to NLM’s recent move to 100% automatic indexing, we are not catching errata in our monthly searches. Automatic indexing works from the bibliographic info, title, abstract, and author supplied keywords only, limiting the amount of information needed to properly index a citation. For errata which often have no abstract or keywords provided and have titles like “Department of Error” or “Erratum”, little to no indexing is assigned which means our searches will never retrieve it.
So moving forward, we need a way to automatically identify and link errata to EBMS citations, both new and existing but maybe not all citations. Many citations in the EBMS get eliminated by NOT journal filters, or rejected early on in the review process and errata for such citations may not need to be evaluated. We may want to consider limiting this new feature with a designated citation state to reduce unnecessary work. EBMS users would also need a means to track errata evaluation and errata information should be included in the citations full record as well as searchable in the database. Also as mentioned above, errata can be published any time after the base article is published and citations can have more than one erratum published at different times so timing will be an important aspect of handling errata.
Timing is also a factor in whether the full text pdf of the base citation in the EBMS reflects the current corrected state of the publication. If Bonnie uploads the pdf before an erratum for that base article has been published, then the pdf is not going to reflect the correction reported in the errata and the pdf will need to be replaced.
Results of my research on errata:
An ad hoc report was run in the CDR (10/26/23) to generate a list of all the PMIDs for all citations currently linked to at least one summary. I then used these PMIDs to search pubmed in combination with the “haserratumin” limit and identified which of the linked citations had errata.
Adult – 405 citations with erratum
Peds – 239 citations with erratum
Genetics – 183 citations with erratum
Screening and Prevention – 128 citations with erratum
Supportive care – 97 citations with erratum
IACT - 35 citations with erratum
See attached spreadsheet. Note: the pmids listed for each board in their corresponding errata tabs are for the base citation not the errata itself. To see the erratum, go to the base citation record in pubmed and click on the erratum link.
After searching the EBMS for errata, I can safely say that the majority of these errata identified do not have records in the ebms. The errata records that we have in the EBMS were mostly added and manually linked to the base article by myself or Bonnie at some point over the years. There are a few errata that did come through via the Related Articles feature because they were already published by the time the citation was imported.
Not all errata are going to be significant enough to impact the content in the summary. Errata report corrections to any part of the article. Mistakes made in the author affiliation or acknowledgements, for example, will probably have zero impact whereas mistakes in the results section, tables or conclusions could have significant impact. But there is no way to know for sure without reviewing the errata individually which may also require the user to review the content in the summary being referenced as well. To get an idea of the LOE and types of corrections being reported in errata, we conducted a pilot study of 60 adult treatment citations. All 60 were citations currently linked to at least one adult treatment summary and had at least one erratum. Jeff reviewed the errata for each of the base citations and recorded the type and location of the correction as well as giving a guess as to whether or not the errata may have an impact on the summary. Ning and I then reviewed his results and highlighted 24 of the 60 that we thought were significant enough to warrant further review. This pilot project shows that it is likely that 50% or more of the errata can be evaluated and identified as not significant without needing to review the summary content being referenced. It also shows that there are plenty of errata that are possibly significant and should be evaluated. Jeff also noted if the EBMS had the most recent corrected version of the pdf and for several citations it did not. See attached spreadsheet for more details for the pilot study.
Discussion in EBMS meeting:
All of the above was presented to Robin J, Victoria, William, Bonnie and other board managers in the Monthly EBMS meeting on November 21, 2023. It was agreed that this issue was substantial enough to require that some changes be made with the handling of errata in the future and that further discussion would be necessary. It was suggested in the meeting that perhaps we could have an errata report that could be run periodically to identify any new errata for citations in the EBMS with selected states. It was also suggested that maybe we could use existing features such as Tags to tag identified errata and use the Tag Comment to add notes regarding the errata evaluation. Having a flag (or equivalent) to temporarily identify new unevaluated errata that could be unflagged after evaluation was also suggested.
It was also decided in the meeting that we would focus efforts on moving forward ** with identifying new errata in the EBMS because it has been determined that an evaluation of all the errata identified to date (specific numbers per board listed above) would take years. It was suggested that retro errata evaluation could be incorporated into the comprehensive review process and handled on a summary by summary basis which could be facilitated by new or adhoc CDR reports to limit the number of errata to a specific summary or summary section. It was also suggested that if we were to consider evaluating the errata identified to date that this effort could be focused on maybe the past year or two only. Also each board manager may want to make their own decision as some boards had much fewer errata identified than others. No specific decisions were made in the meeting regarding the evaluation of errata identified to date and will need to be discussed further at a later date.
Board managers decided to discuss the errata issue further and continue brainstorming ideas of how best to handle them in the EBMS.
File Name | Posted | User |
---|---|---|
Errata Project - Citations_linked_to_summaries-20231024165513.xlsx | 2023-12-11 15:43:59 | Boggess, Cynthia (NIH/NCI) [C] |
Errata Review_AdultRx_PilotSampleResults_Nov23.xlsx | 2023-12-11 15:43:58 | Boggess, Cynthia (NIH/NCI) [C] |
Elapsed: 0:00:00.000539