CDR Tickets

Issue Number 3102
Summary Terminology vendor filter changes
Created 2010-03-03 16:31:39
Issue Type Improvement
Submitted By Osei-Poku, William (NIH/NCI) [C]
Assigned To Englisch, Volker (NIH/NCI) [C]
Status Closed
Resolved 2010-06-21 14:06:20
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.107430
Description

BZISSUE::4778
BZDATETIME::2010-03-03 16:31:39
BZCREATOR::William Osei-Poku
BZASSIGNEE::Volker Englisch
BZQACONTACT::William Osei-Poku

When multiple OtherNameType values are selected within a single OtherName block in a Term document, the second value (of the OtherNameType) does not display on cancer.gov. We work around this issue by adding another block of OtherName and select a second OtherNameType value. Example is 442270.

Please modify the vendor filter so that when multiple OtherNameType values are selected within a single OtherName block, they will all display.

Comment entered 2010-03-11 18:35:55 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2010-03-11 18:35:55
BZCOMMENTOR::Volker Englisch
BZCOMMENT::1

The following filter has been modified:
CDR000134 - Vendor Filter: Term

This is ready for review on MAHLER.

Comment entered 2010-03-12 14:00:14 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-03-12 14:00:14
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::2

(In reply to comment #1)
> The following filter has been modified:
> CDR000134 - Vendor Filter: Term
>
> This is ready for review on MAHLER.

How do you suggest that I test this one since there is not publish preview for the term document? I filtered the document on Mahler and compared it with the filtered document on Bach and I did not seem to have seen any difference?

Comment entered 2010-03-12 14:07:37 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2010-03-12 14:07:37
BZCOMMENTOR::Volker Englisch
BZCOMMENT::3

You could run the vendor filter for terms and look at the XML. The term documents aren't too complicated to compare the XML output.

Let me know if you need help with this.

Comment entered 2010-03-12 14:17:39 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-03-12 14:17:39
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::4

(In reply to comment #3)
> You could run the vendor filter for terms and look at the XML. The term
> documents aren't too complicated to compare the XML output.
>
> Let me know if you need help with this.

I did and saw that the US brand name and the foreign brand name appear on different rows in the output so I am assuming it is OK now.

Comment entered 2010-03-12 14:24:01 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-03-12 14:24:01
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::5

(In reply to comment #4)
> (In reply to comment #3)
> > You could run the vendor filter for terms and look at the XML. The term
> > documents aren't too complicated to compare the XML output.
> >
> > Let me know if you need help with this.
>
> I did and saw that the US brand name and the foreign brand name appear on
> different rows in the output so I am assuming it is OK now.

Verified on Mahler. Please promote to Bach.

Comment entered 2010-03-12 15:10:22 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2010-03-12 15:10:22
BZCOMMENTOR::Volker Englisch
BZCOMMENT::6

The following filter has been modified:
CDR000134 - R9523: Vendor Filter: Term

I will need to run a diff report on FRANCK before promoting to production.

Comment entered 2010-03-12 16:00:47 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2010-03-12 16:00:47
BZCOMMENTOR::Volker Englisch
BZCOMMENT::7

I ran the diff report and identified a change for 72 of the term documents to be published. At least for some of these terms a work-around has already been used to make the second term type appear in the vendor output. I am guessing that these terms will be displayed multiple times on Cancer.gov if the document doesn't get cleaned up before we're moving the changed vendor filter in production.
A sample of such a term that needs to be updated is CDR299061 - sunitinib malate.
The term name SU11248 will be displayed twice for the NameType of 'Code name'.

How would you like to proceed? Do we want to clean-up the data after we published or before?

Comment entered 2010-03-12 16:05:41 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2010-03-12 16:05:41
BZCOMMENTOR::Volker Englisch
BZCOMMENT::8

Comment entered 2010-03-12 16:05:41 by Englisch, Volker (NIH/NCI) [C]

Attachment term_ids.txt has been added with description: CDR-IDs for terms with multiple OtherNameType entries.

Comment entered 2010-03-12 16:58:59 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-03-12 16:58:59
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::9

(In reply to comment #7)
> I ran the diff report and identified a change for 72 of the term documents to
> be published. At least for some of these terms a work-around has already been
> used to make the second term type appear in the vendor output. I am guessing
> that these terms will be displayed multiple times on Cancer.gov if the document
> doesn't get cleaned up before we're moving the changed vendor filter in
> production.
> A sample of such a term that needs to be updated is CDR299061 - sunitinib
> malate.
> The term name SU11248 will be displayed twice for the NameType of 'Code name'.
> How would you like to proceed? Do we want to clean-up the data after we
> published or before?

We will fix the terms before you move the vendor filter into production. I will let you know when it is completed. Thank you!

Comment entered 2010-04-01 11:54:41 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-04-01 11:54:41
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::10

(In reply to comment #9)
> (In reply to comment #7)

>
> We will fix the terms before you move the vendor filter into production. I will
> let you know when it is completed. Thank you!

It will take a while to finish the cleanup. I will post a comment when the cleanup is finished.

Comment entered 2010-05-28 14:52:05 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-05-28 14:52:05
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::11

(In reply to comment #10)
> (In reply to comment #9)
> > (In reply to comment #7)
>
> >
> > We will fix the terms before you move the vendor filter into production. I will
> > let you know when it is completed. Thank you!
>
>
> It will take a while to finish the cleanup. I will post a comment when the
> cleanup is finished.

The cleanup is done.

Please promote to Bach. Can you do another diff report after the change on Bach? I think that will be helpful.

Comment entered 2010-05-28 14:55:46 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2010-05-28 14:55:46
BZCOMMENTOR::Volker Englisch
BZCOMMENT::12

How about we run a diff on FRANCK before we move this to production?
We refreshed FRANCK on Wednesday, so I would guess that possibly most of the updates were already included unless the updated were done mostly during the last two days.

Comment entered 2010-05-28 14:59:19 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-05-28 14:59:19
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::13

(In reply to comment #12)
> How about we run a diff on FRANCK before we move this to production?
> We refreshed FRANCK on Wednesday, so I would guess that possibly most of the
> updates were already included unless the updated were done mostly during the
> last two days.

A few of the changes were done yesterday and today.

Comment entered 2010-05-28 19:15:07 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2010-05-28 19:15:07
BZCOMMENTOR::Volker Englisch
BZCOMMENT::14

I ran a report identifying those Term documents that are still listing two OtherName blocks for the same OtherTermName. There are still 189 or these.

Unfortunately, I'm not certain how I had selected the list earlier. I probably searched the vendor output but I'm not sure.

I am putting the SQL query in the query interface with the name
Terms with duplicate OtherName block
to identify these terms with multiple OtherName blocks.

Please let me know if you'd like to proceed with putting the filter in production or if you would first want to clean up these other terms as well.

Comment entered 2010-06-03 09:42:12 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-06-03 09:42:12
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::15

(In reply to comment #14)
> I ran a report identifying those Term documents that are still listing two
> OtherName blocks for the same OtherTermName. There are still 189 or these.
>
> Unfortunately, I'm not certain how I had selected the list earlier. I probably
> searched the vendor output but I'm not sure.
>
> I am putting the SQL query in the query interface with the name
> Terms with duplicate OtherName block
> to identify these terms with multiple OtherName blocks.
>
> Please let me know if you'd like to proceed with putting the filter in
> production or if you would first want to clean up these other terms as well.

We are reviewing the terms. A lot of the terms have duplicate OtherName blocks because of the data imported from the NCI Thesaurus. These will continue to have duplicate OtherName blocks because we will always want to keep them separate. For example CDR38294 (entinostat), there are two blocks with the <OtherName> text value of 'MS 275'. One is from data imported from the NCI Thesaurus and the other is the Code name possibly entered by users.

Would the changes you have made to the vendor filters cause duplicate OtherName blocks with the same <OtherName> value (as in the example I described above) to display as duplicate on cancer.gov? I checked on cancer.gov and it is currently not displaying any duplicate information.

Comment entered 2010-06-03 11:25:29 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2010-06-03 11:25:29
BZCOMMENTOR::Volker Englisch
BZCOMMENT::16

(In reply to comment #15)
> there are two blocks with the <OtherName> text value of 'MS 275'.

I believe you are referring to the <OtherName> blocks with the value 'MS-275'.

> Would the changes you have made to the vendor filters cause duplicate
> OtherName blocks with the same <OtherName> value (as in the example I
> described above) to display as duplicate on cancer.gov?

I can not answer this question with certainty because not all of the data the vendor output contains for a term is displayed on Cancer.gov. It appears that lexical variants, for instance, are not listed at all. One of those 'MS-275' blocks you've mentioned for the example above is marked as lexical variant, so it probably wouldn't be displayed twice anyway.

... 10 minutes later ...

I tested on MAHLER: If you do have two blocks with the same name and they are both of the type of 'Code name' they do cause duplicates on Cancer.gov.
As a sample please see
http://wwwgk.cancer.gov/drugdictionary/?CdrID=38294

Comment entered 2010-06-03 11:33:08 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-06-03 11:33:08
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::17

(In reply to comment #16)
> (In reply to comment #15)
> > there are two blocks with the <OtherName> text value of 'MS 275'.
>
> I believe you are referring to the <OtherName> blocks with the value 'MS-275'.
>

Yes. Sorry about the confusion.

> > Would the changes you have made to the vendor filters cause duplicate
> > OtherName blocks with the same <OtherName> value (as in the example I
> > described above) to display as duplicate on cancer.gov?
>
> I can not answer this question with certainty because not all of the data the
> vendor output contains for a term is displayed on Cancer.gov. It appears that
> lexical variants, for instance, are not listed at all. One of those 'MS-275'
> blocks you've mentioned for the example above is marked as lexical variant, so
> it probably wouldn't be displayed twice anyway.
>
> ... 10 minutes later ...
>
> I tested on MAHLER: If you do have two blocks with the same name and they are
> both of the type of 'Code name' they do cause duplicates on Cancer.gov.
> As a sample please see
> http://wwwgk.cancer.gov/drugdictionary/?CdrID=38294

Thanks. This clarifies it. In the case of 'Code name' we will clean all that up.

(In reply to comment #14)

> Please let me know if you'd like to proceed with putting the filter in
> production or if you would first want to clean up these other terms as well.

I will let you know when these are done before you put the changes into production.

Comment entered 2010-06-03 11:42:58 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2010-06-03 11:42:58
BZCOMMENTOR::Volker Englisch
BZCOMMENT::18

(In reply to comment #17)
> In the case of 'Code name' we will clean all that up.

Just to be clear: I only tested the situation for a duplicate Code Name. I'm sure the same will also happen for a duplicate Synonym and ChemicalStructureName. Although I believe it's unlikely to have two blocks listed with the same StructureName.

Let me know if you'd like me to run a test for the Synonyms as well.

Comment entered 2010-06-15 11:05:03 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-06-15 11:05:03
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::19

We have completed the cleanup. Please put this change into production.

Comment entered 2010-06-16 12:35:37 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2010-06-16 12:35:37
BZCOMMENTOR::Volker Englisch
BZCOMMENT::20

The following filter has been copied to FRANCK and BACH:
CDR000134 - R9523: Vendor Filter: Term

Since we're publishing the terminology data nightly we will be able to verify the data on Cancer.gov tomorrow.

Comment entered 2010-06-18 16:54:15 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2010-06-18 16:54:15
BZCOMMENTOR::Volker Englisch
BZCOMMENT::21

It appears to me that everything looks good on Cancer.gov.
William, if you agree please close this issue.

Comment entered 2010-06-21 14:06:20 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-06-21 14:06:20
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::22

(In reply to comment #21)
> It appears to me that everything looks good on Cancer.gov.
> William, if you agree please close this issue.

Yes. It looks good in the dictionary on cancer.gov. Thanks!
I have closed this issue.

Attachments
File Name Posted User
term_ids.txt 2010-03-12 16:05:41 Englisch, Volker (NIH/NCI) [C]

Elapsed: 0:00:00.001690