Issue Number | 3057 |
---|---|
Summary | [CiteMs] Citations import error |
Created | 2010-01-05 14:12:06 |
Issue Type | Improvement |
Submitted By | Osei-Poku, William (NIH/NCI) [C] |
Assigned To | priced |
Status | Closed |
Resolved | 2010-01-06 17:01:44 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/ocecdr/issue.107385 |
BZISSUE::4733
BZDATETIME::2010-01-05 14:12:06
BZCREATOR::William Osei-Poku
BZASSIGNEE::Brent Carter
BZQACONTACT::William Osei-Poku
Problem as reported by Cynthia:
I tried to import a citation this morning and got an error message
something like…pmid not found. And the following errors made it seems as
though the citation was not imported but in fact a record was created
but it only contained the sumtop, pmid and cmsid. All the bibliographic
data was blank. Minaxi mentioned recent problems with importing
citations to the CDR due to recent NLM end of the year changes. I bet
this is also the reason the CMS is having problems importing citations
as well.
I am attaching the file I used. The record created is cmsid=196891,
pmid=20038717
We will need this fixed soon because we are due to start importing
citations for the Jan 2010 review cycle early next week.
Note:
I have attached all the relevant emails. I will be adding the current
email from Brent, Cynthia and Minaxi in the next comment.
Attachment RE CMS citation Importing problem.txt has been added with description: Email exchanges
BZDATETIME::2010-01-05 14:14:01
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::1
--Email from Brent
Hi All,
Pete got back to me and he was pretty confident that the plan we had to change the "S" to "M" is the correct action to prevent the error we're seeing, and that there shouldn't be any additional work that needs to be done (i.e. my worry about data structures or program logic that may need updating). It is great to have this confirmation from someone with experience with the system. If other NLM changes cause issues we may have to revisit this question.
He has also helped with our second issue regarding how to remove the partially imported records. Luckily, he was able to get in touch with another Lockheed person who also worked on CiteMS and she provided a re-usable script that can safely delete a single or range of records. This script has one required step and then one additional step if the citation has already gone through review. I assume the second step is not needed in this case but can someone confirm the citation(s) we want to remove have not yet gone through review?
Below are the next steps I think we should take and the sample delete script. We should store this script somewhere for future reference. Would it make sense to create a Bugzilla bug for this issue and put it in there?
Thanks,
Brent.
BZDATETIME::2010-01-05 14:15:16
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::2
-Emails from Cynthia and Minaxi
Correction: CMSID 196891-196903 are to be deleted.
------Original Message
From: Trivedi, Minaxi
Sent: Tuesday, January 05, 2010 1:30 PM
To: Carter, Brent (NIH/NCI) [C]; Grama, Lakshmi (NIH/NCI) [E]; Boggess,
Cynthia; Osei-Poku, William
Cc: Kline, Robert (NCI); 'Alan Meyer'; Safarnejad, Reza (NIH/NCI)
[C]
Subject: RE: CMS citation Importing problem
Hi All,
CDR ID 196891 to 196903 are to be deleted. These citations have not yet gone through the review.
Thanks,
Minaxi
-----
BZDATETIME::2010-01-05 15:11:10
BZCOMMENTOR::Bob Kline
BZCOMMENT::3
Here are two PMIDs for records which had the CommentsCorrections structure changes in the XML version of the documents:
19620164
19689789
It would be good to have the users import at least one of these to make sure that the CiteMS system wasn't affected.
BZDATETIME::2010-01-05 16:54:00
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::4
(In reply to comment #3)
> Here are two PMIDs for records which had the CommentsCorrections
structure
> changes in the XML version of the documents:
>
> 19620164
> 19689789
>
> It would be good to have the users import at least one of these to
make sure
> that the CiteMS system wasn't affected.
From Cynthia:
As I expected, when I imported this citation the same pmid not found
error occurs because of the 2 IS fields . The IS field issue needs to be
fixed before we can test the comments correction block. The import
filter reads line by line from the top of the citation record to the
bottom. It is getting hung up on the multiple IS fields and is not
reading the rest of the record.
We will need to ask that this citation that I just imported be deleted
as well and then we can use it after the IS issue is fixed to test this
again.
This Cmsid=196904 needs to be deleted.
The second citation is already imported into the CMS.
BZDATETIME::2010-01-05 16:59:19
BZCOMMENTOR::Bob Kline
BZCOMMENT::5
(In reply to comment #4)
> (In reply to comment #3)
> > Here are two PMIDs for records which had the
CommentsCorrections structure
> > changes in the XML version of the documents:
> >
> > 19620164
> > 19689789
> >
> > It would be good to have the users import at least one of
these to make sure
> > that the CiteMS system wasn't affected.
>
> From Cynthia:
>
> As I expected, when I imported this citation the same pmid not
found error
> occurs because of the 2 IS fields . The IS field issue needs to be
fixed before
> we can test the comments correction block. The import filter reads
line by line
> from the top of the citation record to the bottom. It is getting
hung up on the
> multiple IS fields and is not reading the rest of the record.
> We will need to ask that this citation that I just imported be
deleted as well
> and then we can use it after the IS issue is fixed to test this
again.
>
> This Cmsid=196904 needs to be deleted.
>
> The second citation is already imported into the CMS.
Right. I should have said that I was hoping she'd do the testing in the test system. Please make sure she tests again on the system where Brent has addressed the IS field issue.
BZDATETIME::2010-01-06 08:11:06
BZCOMMENTOR::Brent Carter
BZCOMMENT::6
Previous developers on the CiteMS system confirmed there is no way to delete citations from within the software itself and provided the following script that can be used to delete citations (of course the ID values need to change):
delete from asc_ORIGINAL_SUMMARIES where ref_id between 183223 and 183229
delete from asc_CITE_SUMMARIES where ref_id between 183223 and 183229
delete from bib where ref_id between 183223 and 183229
Extra steps if the citation already go through review:
delete from asc_SEARCH_TOPICS where review_id in(select review_id from mt_review where ref_id in(169300,166208,164997,121949,177813,177919,160484))
delete from mt_review where ref_id in(169300,166208,164997,121949,177813,177919,160484)
BZDATETIME::2010-01-06 09:14:02
BZCOMMENTOR::Bob Kline
BZCOMMENT::7
Brent:
Any idea how the IDs in the last two queries are derived? We might want to figure out (perhaps in consultation with the original developers) if it's possible to take a little bit more of the manual intervention out of the deletion process by plugging in the IDs from the first queries into a sub-sub-select. And maybe we could even wrap the whole script up into a stored procedure. I understand you didn't need to mess with the final two queries this time, since the citations we're dealing with right now hadn't made it to the review stage, but if I understand correctly what the queries are doing, and if it makes sense to add the sub-sub-selects to the last two queries, those two should be harmless no-ops if no review has taken place for the citation being deleted. Doesn't have to happen as part of this task (I realize you're jamming on other work), but I just wanted to see if you think this idea is worth pursuing at some point.
BZDATETIME::2010-01-06 09:48:33
BZCOMMENTOR::Brent Carter
BZCOMMENT::8
I didn't spend much time looking into it but my suspicion is that this is all just based on a list of input ref_id values. I'm betting the entire example script they gave us was never actually run as is. I think maybe they found more than one old script. The first part of the script was found from a time when that was all they had to do. Then, they remembered the optional second step and copied and pasted that part from some other old script that had a different set of ref_id inputs. Looking at some of the DDL used to create the DB seems to confirm this theory since there is a direct relationship from the tables in the second step to the "bib" table in the first based on ref_id:
ALTER TABLE [dbo].[mt_REVIEW] WITH NOCHECK ADD
CONSTRAINT [FK_mt_REVIEWref_i_00AA174D] FOREIGN
KEY([ref_id])
REFERENCES [dbo].[bib] ([Ref_ID])
ALTER TABLE [dbo].[asc_SEARCH_TOPICS] WITH NOCHECK ADD
CONSTRAINT [FK_asc_SEARCrevie_12C8C788] FOREIGN
KEY([review_id])
REFERENCES [dbo].[mt_REVIEW] ([review_id])
Given this, I'd think it would be pretty easy to wrap up in a stored procedure as you suggest. The only challenge might be making it general (e.g. it is nice to allow a range input which covers deleting a single or range of values). But we might also need to delete a set of random values (as they do in the second step by using an "in"). We wouldn't want to use "in" all the time since if we have a large range that would be ugly to input all the individual values... but I'm sure something could be done.
As for if I think it is a good idea, absolutely yes! Especially if we expect this to happen more in the future and we don't expect to re-write the app any time soon. Depending on those factors we may even consider adding a delete function to the software itself (based on the procedure) so admin users can do this themselves. Like you said, the more manual intervention we can remove the better.
BZDATETIME::2010-01-06 10:29:58
BZCOMMENTOR::Brent Carter
BZCOMMENT::9
Confirmed with Pete, the previous developer. From him:
I’m sure that the statements were just pieced together. The ref_id (or set of ref_ids) will be the same for all statements.
BZDATETIME::2010-01-06 11:48:38
BZCOMMENTOR::Brent Carter
BZCOMMENT::10
Here is the actual script run on prod CiteMS:
use NCI_CMS
go
Fix the issue where IS can now occur multiple times.
update mt_ImportDef set TagType = 'M' where ID = 551
Remove the bad citation records.
declare @FirstRefId int
declare @LastRefId int
set @FirstRefId = 196891
set @LastRefId = 196904
begin transaction
delete from asc_ORIGINAL_SUMMARIES where ref_id between @FirstRefId and
@LastRefId
delete from asc_CITE_SUMMARIES where ref_id between @FirstRefId and
@LastRefId
delete from bib where ref_id between @FirstRefId and @LastRefId
commit transaction
/*
Output:
(1 row(s) affected)
(15 row(s) affected)
(15 row(s) affected)
(14 row(s) affected)
*/
BZDATETIME::2010-01-06 16:59:55
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::11
Added additional email exchanges.
Attachment RE CMS citation Importing problem1.txt has been added with description: Email exchanges1
BZDATETIME::2010-01-06 17:00:44
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::12
Marked issue as "RESOLVED" in preparation for closing.
BZDATETIME::2010-01-06 17:01:44
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::13
(In reply to comment #12)
> Marked issue as "RESOLVED" in preparation for closing.
This issue has been resolved. It is now closed. Thank you!
File Name | Posted | User |
---|---|---|
RE CMS citation Importing problem.txt | 2010-01-05 14:12:06 | Osei-Poku, William (NIH/NCI) [C] |
RE CMS citation Importing problem1.txt | 2010-01-06 16:59:55 | Osei-Poku, William (NIH/NCI) [C] |
Elapsed: 0:00:00.001723