CDR Tickets

Issue Number 3456
Summary [Citation] status updates of premedline citations
Created 2011-11-21 12:02:32
Issue Type Improvement
Submitted By Osei-Poku, William (NIH/NCI) [C]
Assigned To Kline, Bob (NIH/NCI) [C]
Status Closed
Resolved 2011-12-01 11:21:12
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.107784
Description

BZISSUE::5150
BZDATETIME::2011-11-21 12:02:32
BZCREATOR::William Osei-Poku
BZASSIGNEE::Bob Kline
BZQACONTACT::William Osei-Poku

I am creating this issue so that we can discuss the possibilities of getting some programmatic help in updating premedline citations that have had their statuses changed since they were last imported or updated.
There is an existing ad hoc query on the query menu called "PreMedline Status" that CIAT runs every month to find existing PreMedline citations that have a status of "In-Process", "Publisher" or "In-data-review". The report generates approximately 200 citations per month. Someone goes into each of the citation document in the CDR, copies the PMID and searches for the citation in PubMed, checks to see if the status has changed to "PubMed - indexed for Medline". If the status has changed, the next manual process is to use the import tool to update each citation that has had its status changed to "PubMed - indexed for MEDLINE" OR “PubMed”. On the other hand if the status has changed from "Publisher" or "In-data-review" to "In-process", then we use the import tool to update the citation so that the citation would be updated with the new status.

1. Would it be possible to write a program that would retrieve all premedline citations in the CDR and compare the statuses in pubmed?

2. Could the program also be able to update the citation records with the new status and provide users with a report to review?

3. At the very least, would it be possible to have a query report only citations that have changed statuses?

Comment entered 2011-11-21 13:30:33 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-11-21 13:30:33
BZCOMMENTOR::Bob Kline
BZCOMMENT::1

(In reply to comment #0)

> There is an existing ad hoc query on the query menu called "PreMedline Status"

Which server? I don't see this query.

Comment entered 2011-11-21 14:08:52 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-11-21 14:08:52
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::2

(In reply to comment #1)
> (In reply to comment #0)
>
> > There is an existing ad hoc query on the query menu called "PreMedline Status"
>
> Which server? I don't see this query.

It is on all three servers. The name is Citation-PreMedline Status instead of just Premedline status as I stated above. Sorry about that. Here is the query:

SELECT query_term.doc_id, document.title, query_term.value
FROM query_term
INNER JOIN document
ON query_term.doc_id = document.id
WHERE query_term.path Like '/Citation/PubmedArticle/MedlineCitation/@Status'
AND query_term.value IN ('In-Process', 'Publisher', 'In-data-review')
ORDER BY query_term.value, query_term.doc_id

Comment entered 2011-11-21 14:50:28 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-11-21 14:50:28
BZCOMMENTOR::Bob Kline
BZCOMMENT::3

I see it now, thanks for the correction.

Comment entered 2011-11-22 09:35:25 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-11-22 09:35:25
BZCOMMENTOR::Bob Kline
BZCOMMENT::4

(In reply to comment #0)

> 3. At the very least, would it be possible to have a query report only
> citations that have changed statuses?

Can I safely assume, given the implications of "very least," that you don't really mean you want the report to ask NLM for the documents for every one of the 37,214 citations we have in the CDR? If you did mean that, the report would no longer be the "least" of the three requests, but would instead be the most expensive and risky ("risky" because it's generally a bad idea to submit massive requests for everything in the database to an outside service without explicit arrangements and permission having been established and obtained in advance; we don't want our access blocked for citation queries).

Comment entered 2011-11-22 12:03:51 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-11-22 12:03:51
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::5

(In reply to comment #4)
> (In reply to comment #0)
>
> > 3. At the very least, would it be possible to have a query report only
> > citations that have changed statuses?
>
> Can I safely assume, given the implications of "very least," that you don't
> really mean you want the report to ask NLM for the documents for every one of
> the 37,214 citations we have in the CDR?
That is correct. We don't want actual documents. We only want to know the status of the citations. That is, the ones that have changed status.

Comment entered 2011-11-22 14:53:23 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-11-22 14:53:23
BZCOMMENTOR::Bob Kline
BZCOMMENT::6

(In reply to comment #5)

> > Can I safely assume, given the implications of "very least," that you don't
> > really mean you want the report to ask NLM for the documents for every one of
> > the 37,214 citations we have in the CDR?
>
> That is correct. We don't want actual documents. We only want to know the
> status of the citations. That is, the ones that have changed status.

But in order to find out which status values have changed, I have to retrieve the entire documents. NLM doesn't have an interface through which I can ask for just one specific field for a set of documents, nor do they have an API for searching by status. They do have an interface for retrieving a summary set of information for each of the citations, but the status values retrieved by that interface don't match the status values we get when we retrieve the entire citation documents, and even if they did match I don't think NLM would be happy with a request for 37,214 summary documents at once. So in order to do what you're asking for I would have to retrieve complete documents for more than thirty-seven thousand citations. NLM states on their web site that they will block the IP addresses of any computers which engage in such abusive behavior. We probably don't want that to happen. :-)

So I'll repeat my question. Do you really need me to get the status value for every citation in the CDR every time this report is run? Or did you mean to say you wanted to find out the current status of citations which we have in the CDR with statuses of "In-Process," "In-Data-Review" (note the capitalization they use) or "Publisher"?

Let me ask in a different way. Since you begin the last request with "At the very least, would it be possible ..." can I infer that if I implement the first two requests but not the third you would be happy, since by giving CIAT the first two we would be providing more than the "very least"?

If you really meant that you have to have us get the status value for every single citation we have in the CDR, then I think you should consider having the report queued to run in the middle of the night instead of in real time, and I would still caution that there is risk involved in submitting such a large request to an outside service, even during off hours.

Comment entered 2011-11-22 15:05:35 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-11-22 15:05:35
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::7

(In reply to comment #6)
> (In reply to comment #5)

>
> So I'll repeat my question. Do you really need me to get the status value for
> every citation in the CDR every time this report is run? Or did you mean to
> say you wanted to find out the current status of citations which we have in the
> CDR with statuses of "In-Process," "In-Data-Review" (note the capitalization
> they use) or "Publisher"?

Yes. We are only interested in citations in the CDR that have a status of "In-Process,", "In-Data-Review" or "Publisher" and not the entire citations in the CDR. Sorry about the confusion.

> Let me ask in a different way. Since you begin the last request with "At the
> very least, would it be possible ..." can I infer that if I implement the first
> two requests but not the third you would be happy, since by giving CIAT the
> first two we would be providing more than the "very least"?
>
That is correct. Our preference is the first two. I included the third option just in case the first two could not be done.

Comment entered 2011-11-23 16:28:28 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-11-23 16:28:28
BZCOMMENTOR::Bob Kline
BZCOMMENT::8

(In reply to comment #0)

> 2. Could the program also be able to update the citation records with the new
> status and provide users with a report to review?

Let's confirm what you're asking for here. You want the software to update only the status attribute, leaving the decision about whether to re-import the rest of the citation document to a user to make after reviewing the report, right?

Comment entered 2011-11-23 17:39:39 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-11-23 17:39:39
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::9

(In reply to comment #8)
> (In reply to comment #0)
>
> > 2. Could the program also be able to update the citation records with the new
> > status and provide users with a report to review?
>
> Let's confirm what you're asking for here. You want the software to update
> only the status attribute, leaving the decision about whether to re-import
> the rest of the citation document to a user to make after reviewing the report,
> right?

Actually the software can also re-import the citation. The user only needs to know which citations have been re-imported.

Comment entered 2011-11-25 12:06:06 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-11-25 12:06:06
BZCOMMENTOR::Bob Kline
BZCOMMENT::10

This has been installed on the CIAT admin menu and is ready for testing on Mahler. You'll only get one shot at testing with the real data if everything goes well, since after a successful run there shouldn't be anything to update until NLM changes the the statuses of some more of the pre-Medline citations. You can artificially create additional opportunities for testing by manually editing existing Citation documents on Mahler, changing the status values, and re-running the utility. After you're satisfied that it's working properly on Mahler, we can install and test on Franck.

I couldn't test much myself without damaging your ability to test yourself, so I'm not as confident you won't run into snags as I might normally be.

Comment entered 2011-11-28 09:38:40 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-11-28 09:38:40
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::11

I got the following error when I clicked on the menu item "Update Pre-Medline Citations".

<type 'exceptions.AttributeError'> Python 2.7.1: D:\Python\python.exe
Mon Nov 28 09:36:26 2011

A problem occurred in a Python script. Here is the sequence of function calls leading up to the error, in the order they occurred.

D:\Inetpub\wwwroot\cgi-bin\cdr\UpdatePreMedlineCitations.py in ()
152 updated += 1

153 except Exception, e:

=> 154 notes = '<span class="errors">%s</span>' % cgi.escape(e)

155 cdr.unlock(session, citation.cdrId)

156 html.append(u"""\

notes = 'updated', cgi = <module 'cgi' from 'D:\Python\lib\cgi.pyc'>, cgi.escape = <function escape>, e = Exception(u'PubmedArticle 19777539 dropped',)
D:\Python\lib\cgi.py in escape(s=Exception(u'PubmedArticle 19777539 dropped',), quote=None)
1033 If the optional flag quote is true, the quotation mark character (")

1034 is also translated.'''

=> 1035 s = s.replace("&", "&") # Must be done first!

1036 s = s.replace("<", "<")

1037 s = s.replace(">", ">")

s = Exception(u'PubmedArticle 19777539 dropped',), s.replace undefined

<type 'exceptions.AttributeError'>: 'exceptions.Exception' object has no attribute 'replace'
args = ("'exceptions.Exception' object has no attribute 'replace'",)
message = "'exceptions.Exception' object has no attribute 'replace'"

d:\cdr\Log\tmph7an77.html contains the description of this error.

Comment entered 2011-11-28 10:43:48 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-11-28 10:43:48
BZCOMMENTOR::Bob Kline
BZCOMMENT::12

(In reply to comment #11)

> I got the following error when I clicked on the menu item "Update Pre-Medline
> Citations".

I fixed the problem. Please try the program again.

Comment entered 2011-11-28 11:06:00 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-11-28 11:06:00
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::13

(In reply to comment #12)
> (In reply to comment #11)
>
> > I got the following error when I clicked on the menu item "Update Pre-Medline
> > Citations".
>
> I fixed the problem. Please try the program again.

I still couldn't run the program. Got the following error (seems like the same error):

<type 'exceptions.AttributeError'> Python 2.7.1: D:\Python\python.exe
Mon Nov 28 11:04:21 2011

A problem occurred in a Python script. Here is the sequence of function calls leading up to the error, in the order they occurred.

D:\Inetpub\wwwroot\cgi-bin\cdr\UpdatePreMedlineCitations.py in ()
152 updated += 1

153 except Exception, e:

=> 154 notes = '<span class="errors">%s</span>' % cgi.escape(e)

155 cdr.unlock(session, citation.cdrId)

156 html.append(u"""\

notes undefined, cgi = <module 'cgi' from 'D:\Python\lib\cgi.pyc'>, cgi.escape = <function escape>, e = Exception(u'PubmedArticle 19777539 dropped',)
D:\Python\lib\cgi.py in escape(s=Exception(u'PubmedArticle 19777539 dropped',), quote=None)
1033 If the optional flag quote is true, the quotation mark character (")

1034 is also translated.'''

=> 1035 s = s.replace("&", "&") # Must be done first!

1036 s = s.replace("<", "<")

1037 s = s.replace(">", ">")

s = Exception(u'PubmedArticle 19777539 dropped',), s.replace undefined

<type 'exceptions.AttributeError'>: 'exceptions.Exception' object has no attribute 'replace'
args = ("'exceptions.Exception' object has no attribute 'replace'",)
message = "'exceptions.Exception' object has no attribute 'replace'"

d:\cdr\Log\tmpasnsuk.html contains the description of this error.

Comment entered 2011-11-28 11:12:18 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-11-28 11:12:18
BZCOMMENTOR::Bob Kline
BZCOMMENT::14

Oops! Have to actually save the changes for them to work. :-) Please try once more.

Comment entered 2011-11-28 16:12:11 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-11-28 16:12:11
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::15

Verified on Mahler. The updates were done correctly. This is very helpful. Thank you!

I have two minor requests:

1. Currently, the program runs as soon as a user clicks on the menu link. Could you make changes to the interface when you install it on Bach such that a user would need to click a submit button? Or make changes so that the program would run on the second page after the user clicks on the main menu item. It looks to me that the way it is right now, someone may run it accidentally.

2. Also, please assign a group to it when you install it on Bach.

Comment entered 2011-11-28 16:34:44 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-11-28 16:34:44
BZCOMMENTOR::Bob Kline
BZCOMMENT::16

(In reply to comment #15)

> 1. Currently, the program runs as soon as a user clicks on the menu link. Could
> you make changes to the interface when you install it on Bach such that a user
> would need to click a submit button? Or make changes so that the program would
> run on the second page after the user clicks on the main menu item. It looks to
> me that the way it is right now, someone may run it accidentally.

It won't do anything if the user does not have permission to update Citation documents in the first place. Is there any drawback to having the pre-Medline citations refreshed by someone with that permission, even if it was unintentional?

> 2. Also, please assign a group to it when you install it on Bach.

It's not sufficient that the user must have permission to modify Citation documents? I would think that this permission, which is granted for members of the "Citation Maintainer" group, would be sufficient. Also, I think we should test on Franck before promoting to Bach, particularly if we're going to modify the behavior of the script.

Comment entered 2011-11-28 16:43:19 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-11-28 16:43:19
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::17

(In reply to comment #16)

Is there any drawback to having the pre-Medline
> citations refreshed by someone with that permission, even if it was
> unintentional?

No. I don't see any drawback if only users with that permission will be able to run the program.
>
> > 2. Also, please assign a group to it when you install it on Bach.
>
> It's not sufficient that the user must have permission to modify Citation
> documents? I would think that this permission, which is granted for members of
> the "Citation Maintainer" group, would be sufficient. Also, I think we should
> test on Franck before promoting to Bach, particularly if we're going to modify
> the behavior of the script.

That should be sufficient. Please promote to Bach.

Comment entered 2011-11-29 08:29:29 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-11-29 08:29:29
BZCOMMENTOR::Bob Kline
BZCOMMENT::18

Promoted to Bach.

Comment entered 2011-11-29 11:40:07 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-11-29 11:40:07
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::19

(In reply to comment #18)
> Promoted to Bach.

We're getting the following error:
A problem occurred in a Python script.

d:\cdr\Log\tmpbtwgwk.html contains the description of this error. D:\Python\lib\cgitb.py:173: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6 value = pydoc.html.repr(getattr(evalue, name))

Comment entered 2011-11-29 13:25:33 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-11-29 13:25:33
BZCOMMENTOR::Bob Kline
BZCOMMENT::20

I knew testing on Franck was a good idea. :-) Try again.

Comment entered 2011-12-01 11:21:12 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-12-01 11:21:12
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::21

It is working correctly. Thank you!
I am closing this issue.

Elapsed: 0:00:00.000494