CDR Tickets

Issue Number 2223
Summary [CTGOV Import] Modify the Import program to download new and updated data only
Created 2007-05-30 15:29:43
Issue Type Improvement
Submitted By Grama, Lakshmi (NIH/NCI) [E]
Assigned To
Status Closed
Resolved 2014-11-05 11:01:41
Resolution Won't Fix
Path /home/bkline/backups/jira/ocecdr/issue.106551
Description

BZISSUE::3279
BZDATETIME::2007-05-30 15:29:43
BZCREATOR::Lakshmi Grama
BZASSIGNEE::Bob Kline
BZQACONTACT::Lakshmi Grama

We need to explore if the daily download of data from CTGOV can be modidified to only download those trials that have had updates since the last download as well as new trials that have been added since the last download. CTGOV has informed us about the syntax to use for incremental downloads.

Comment entered 2007-05-30 17:21:54 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2007-05-30 17:21:54
BZCOMMENTOR::Bob Kline
BZCOMMENT::1

I have begun testing of the new syntax.

Comment entered 2007-06-07 12:55:31 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2007-06-07 12:55:31
BZCOMMENTOR::Bob Kline
BZCOMMENT::2

We're going to need to coordinate this with the other issue (not yet filed, I don't think) dealing with the undocumented ❓ limitation on the complexity of the queries that we can submit to NLM.

Comment entered 2007-06-12 09:44:53 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2007-06-12 09:44:53
BZCOMMENTOR::Bob Kline
BZCOMMENT::3

I recommend that we put off further work on this task until we've put to bed the problems with query limitations in CT.gov. Implementing this enhancement will make those problems worse.

Comment entered 2007-07-05 13:42:01 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2007-07-05 13:42:01
BZCOMMENTOR::Bob Kline
BZCOMMENT::4

Lowered priority at Lakshmi's request.

Comment entered 2009-02-24 13:37:28 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2009-02-24 13:37:28
BZCOMMENTOR::Volker Englisch
BZCOMMENT::5

Removing Sheri from the CC list.

Comment entered 2009-06-30 09:40:59 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2009-06-30 09:40:59
BZCOMMENTOR::Bob Kline
BZCOMMENT::6

Not an active task right now.

Comment entered 2010-11-01 16:04:05 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2010-11-01 16:04:05
BZCOMMENTOR::Volker Englisch
BZCOMMENT::7

I'm guessing that this task won't be addressed any time soon.
Shouldn't it rather be canceled instead of keeping it around as a P10?

Comment entered 2010-11-01 16:45:39 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-11-01 16:45:39
BZCOMMENTOR::Bob Kline
BZCOMMENT::8

Unlike some of the other tasks which you have correctly identified as obsolete (or very nearly so), such as those connected with electronic mailers which will soon be turned off (if they haven't been already), this task is for importing documents from NLM, which we don't anticipate will ever go away (please correct me if I'm wrong, Lakshmi). Also, although I'm usually the one arguing against aggressive measure to reduce disk space usage, this is one area in which we could profitably invest in work on the software which would cut down dramatically on what we need to store on the production CDR server without any reduction in functionality or data safety. I would not want to eliminate the ability to submit a full query to NLM, but rather retain it as something we could use to periodically ensure that nothing has fallen through the cracks, while using the more efficient method on a daily basis.

Comment entered 2013-06-29 12:30:08 by Kline, Bob (NIH/NCI) [C]

In an email message sent by Nick Ide back in May of 2007, he provided the syntax for narrowing our queries to just get new or changed trials, using the following patterns:

http://clinicaltrials.gov/ct/search?term=%22May+15%2C+2007%22+:+MAX+%5BFIRST-RECEIVED-DATE%5D

http://clinicaltrials.gov/ct/search?term=%22May+15%2C+2007%22+:+MAX+%5BLAST-CHANGED-DATE%5D

For some reason, that had never been captured in this issue. If we ever actually come back and work on this (and it does seem like one of the issues on hold which might actually be worth while), we'll need this syntax information.

Note to myself: the original email thread used the subject line "Refinement to the CT.gov search interface" and spanned the date range 2007-05-22 through 2007-06-22. All the messages are in my email archives in the cips-2007 folder.

Comment entered 2014-11-05 11:01:41 by Kline, Bob (NIH/NCI) [C]

... this task is for importing documents from NLM, which we don't anticipate will ever go away ...

Never say "never"! Looks like that assumption was wrong. Closing this ticket. 😃

Elapsed: 0:00:00.001537