Issue Number | 3312 |
---|---|
Summary | [Terminology] Term data import from NCIt |
Created | 2011-03-01 16:11:54 |
Issue Type | Improvement |
Submitted By | Osei-Poku, William (NIH/NCI) [C] |
Assigned To | Kline, Bob (NIH/NCI) [C] |
Status | Closed |
Resolved | 2011-04-14 10:08:09 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/ocecdr/issue.107640 |
BZISSUE::5004
BZDATETIME::2011-03-01 16:11:54
BZCREATOR::William Osei-Poku
BZASSIGNEE::Bob Kline
BZQACONTACT::William Osei-Poku
Users have reported terminology import errors within the last few
months (Examples OCECDR-3279 OCECDR-2980 OCECDR-3228). The most recent
issue is that while importing C83721 (CDR696306), the complement of
other name did not come across (only the chemical name and CAS registry
name - not the brand name or common abbreviation and the definition
wasn't imported either). All import problems have already been reported
in Bugzilla except the problem with incomplete data.
Margaret and Mary recently had a meeting with EVS staff and this is the
email (below) I got from Mary which contains information about what
needs to be done. That is, contacting Larry or Tracy Saffran for
information about changes to the EVS/NCIt system and possibly updating
the import utility in the CDR.
...............Beginning of Email............
We had a meeting this afternoon with EVS staff (Margaret Haber, Larry
Wright, Lori Whiteman, and the MD staffmember – Mike someone). They seem
to think that the problem with the import function in the CDR is that it
needs to be updated in order to communicate effectively with LEX-EVS.
They are on version 6 of their DB and the importer was probably
developed for version 4.
Margaret asked me to have you put in a bugzilla issue about this
describing how term updates don’t work at all and the intermittent
problems with importing terms new to the CDR (some elements being
imported but others not). Larry will be sent the bugzilla issue and he
will be finding someone for Bob Kline to interface with in order to get
the import working again. Tracy Saffran was mentioned, but I’m not sure
this is the person Bob should deal with. We discussed which database at
EVS the CDR should pull from and agreed that it would be less
problematic to pull from the production database and not the
pre-production database.
.............End of email..............
BZDATETIME::2011-03-07 12:55:46
BZCOMMENTOR::Bob Kline
BZCOMMENT::1
I examined what they're sending us and they appear to have changed the structure significantly. I reported my findings to Larry, who's going to let us know with whom we should work to resolve the problems.
BZDATETIME::2011-03-14 08:28:37
BZCOMMENTOR::Bob Kline
BZCOMMENT::2
I am rewriting the import program to use the most recent version of the EVS service. Along the way Larry and I are lobbying for the owners of the service to introduce sufficient elements of stability that we would not run into these periodic failures in the future.
I am also looking at some anomalies in Charlie's code, which are difficult to untangle as he didn't leave comments behind explaining what he was doing or why. For one thing, it appears that the import fails when updating an existing Term document if the term does not have a SemanticType of "Drug/agent" but no such restriction is imposed when a concept is being imported for a new Term document. Furthermore, the import of a new document doesn't include a SemanticType element at all, so it's not possible to import a term and then update it with changes from the NCI thesaurus without manually editing the Term document to add the SemanticType element. Can you confirm that all of this behavior is correct? If so, I'll add some documentation to the code to explain why it works the way it does. If not, let's fix it so it works the way you want it to.
Another suspicious behavior is that the software failed to create a publishable version when updating a drug term, regardless of whether a publishable version exists. A branch of the code does create publishable versions for non drug terms if a publishable version already exists (this appears to be invoked not from the Advanced Search import, but from a separate interface invoked for bulk updates of Term documents from the thesaurus). Is this correct? Should the software never create publishable Term documents for drugs?
A third problem I found was that the software doesn't always put OtherName elements in the right place (possibly he was confused by the ambiguity in where Comment elements can go). I have corrected this in the code and we'll want to find and fix the problems this bug introduced (see, for example, CDR42760, "aggressive, adult non-Hodgkin lymphoma" on Bach).
The modified software has been installed on Mahler, and is ready for user testing.
BZDATETIME::2011-03-23 12:32:07
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::3
I tried the import and update functions on Mahler with the following terms but I am getting and error message:
CDR ID NCIt Code
690473 C95201
691443 C95207
692156 C95211
692284 C95212
Error:
Import cannot be completed since preferred names do not match....
BZDATETIME::2011-03-23 15:27:20
BZCOMMENTOR::Bob Kline
BZCOMMENT::4
(In reply to comment #3)
> I tried the import and update functions on Mahler with the
following terms but
> I am getting and error message:
>
> CDR ID NCIt Code
> 690473 C95201
> 691443 C95207
> 692156 C95211
> 692284 C95212
>
>
> Error:
>
> Import cannot be completed since preferred names do not
match....
The missing part of the error message (represented by the ellipsis above) showed an empty preferred name string for the NCIt concept, which means they didn't give us back a document. This is the expected result when you try to import a concept which was only recently added to the thesaurus. Please go back and read issues #3565 and #4969. You'll need to test with concepts which aren't so new.
BZDATETIME::2011-03-23 15:38:57
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::5
I was under the impression that this issue will be resolved with the changes you are making. According to Mary's email :
"We discussed which database at EVS the CDR should pull from and
agreed that it
would be less problematic to pull from the production database and not
the
pre-production database."
I was thinking that if we have the concept code and assuming everything else is correct, we should have no problems importing. But it seems that is not the case. One of the major problems for users is, knowing when to import a concept document since EVS does not provide a schedule for when new concepts are available for CDR import. I think it will good to discuss this with Larry.
BZDATETIME::2011-03-23 15:49:02
BZCOMMENTOR::Bob Kline
BZCOMMENT::6
(In reply to comment #5)
> I was under the impression that this issue will be resolved with
the changes
> you are making. According to Mary's email :
>
> "We discussed which database at EVS the CDR should pull from and
agreed that
> it would be less problematic to pull from the production database
and not
> the pre-production database."
My reading of that quote is consistent with what Lakshmi told us to do; comment #5 of issue #3565:
"Lakshmi asked me to switch the software to pull the concepts from
the older
copy of the thesaurus. I've done this on Bach and Mahler. I suggest you
try a
couple on Mahler first and confirm that the results are problem-free
before
using it on Bach. Let me know if this should be made permanent."
Similarly in issue #4969, my comment #4:
"I heard back from the caBIG team. They said the concepts you're not
able to
import were added recently and there's a lag between when the concepts
get
published and when they're available to the service. They didn't say how
long
the lag typically is, so I guess you'll just have to keep trying until
it
works."
... and your response in comment #5 of that issue:
"OK. Thanks!
Issue closed!"
BZDATETIME::2011-03-24 11:02:53
BZCOMMENTOR::Margaret Beckwith
BZCOMMENT::7
Here is my understanding of this. Back in 2007 when Lakshmi made the comment that Bob quoted in issue 3565, we decided that we would pull from the QA server instead of the production server since it could take quite a while to get the new terms on the production server, and that the data didn't change that much between the QA and the production environments. The trade off was that the QA server could be more unreliable.
When we met with Larry recently, he recommended that we pull from the production environment, probably to overcome some of the instability issues we have been encountering. However, the problem with the data not being updated there for a while still exists. It looks like we need to talk to them again about finding out when the terms and definitions get updated on the production server and when. We can discuss this further today at the meeting.
BZDATETIME::2011-03-24 11:30:02
BZCOMMENTOR::Bob Kline
BZCOMMENT::8
I believe the sequence of events had the decision to use the newer QA database earlier in the life of the CDR, and later on, because of problems we were running into at the time of issue #3565, Lakshmi decided that we would have to switch to using the production database, even though it had the older (out of date by comparison with the newer data in the QA database) data. This decision was made understanding that in exchange for a more stable, reliable service we would pay the price of a delay in the ability to import newer terms. I could be completely confused, though, so I agree that a discussion at today's status meeting would be valuable.
BZDATETIME::2011-03-31 15:06:20
BZCOMMENTOR::Bob Kline
BZCOMMENT::9
Install and test on Franck.
BZDATETIME::2011-04-01 11:57:30
BZCOMMENTOR::Bob Kline
BZCOMMENT::10
(In reply to comment #2)
> I am also looking at some anomalies in Charlie's code, ....
In the CDR status meeting a couple of weeks ago William assured us that everything Charlie was doing (except the part where he was putting OtherName elements in the wrong place, invalidating the documents) was correct.
BZDATETIME::2011-04-01 12:00:31
BZCOMMENTOR::Bob Kline
BZCOMMENT::11
The rewritten module has been installed on Franck, ready for user testing.
BZDATETIME::2011-04-08 15:35:56
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::12
(In reply to comment #11)
> The rewritten module has been installed on Franck, ready for user
testing.
Verified on Franck and Mahler. Please promote to Bach.
BZDATETIME::2011-04-08 17:23:04
BZCOMMENTOR::Bob Kline
BZCOMMENT::13
Promoted to Bach.
BZDATETIME::2011-04-14 10:08:09
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::14
(In reply to comment #13)
> Promoted to Bach.
Verified on Bach. Issue closed. Thanks!
Elapsed: 0:00:00.001438