Issue Number | 3233 |
---|---|
Summary | [Glossary] Populating Related Information Elements in Glossary Term Concept Documents |
Created | 2010-09-23 09:39:44 |
Issue Type | Improvement |
Submitted By | Juthe, Robin (NIH/NCI) [E] |
Assigned To | Kline, Bob (NIH/NCI) [C] |
Status | Closed |
Resolved | 2011-08-11 09:06:22 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/ocecdr/issue.107561 |
BZISSUE::4921
BZDATETIME::2010-09-23 09:39:44
BZCREATOR::Robin Juthe
BZASSIGNEE::Bob Kline
BZQACONTACT::Margaret Beckwith
We would like to use Wayne's completed spreadsheet of related information to populate the following elements in the Glossary Term Concept (GTC) documents:
Related Drug Summary Link
Related External Ref
Related Summary Ref
In the spreadsheet (attached), these three links will be populated with data from the following columns:
Drug Summary URL
PDQ Summary URL
Other URL & Page Title
However, for the drug summaries and PDQ summaries, rather than populating the element with the URL, I believe this element will contain a link to that CDR document. Maybe the URL can be used to identify which document should be linked.
The Other URL and Page Title information will be used to populate the Related External Ref element. As with other external refs, the page title will populate the element and the URL will be added to the cdr:xref field in the attribute inspector.
Please note that this spreadsheet needs review before it is used to populate the GTC documents on BACH. However, we would like to begin discussing how this will be done and determine whether any changes to the spreadsheet will be needed.
Let's talk about this in our meeting today.
BZDATETIME::2010-09-23 09:42:40
BZCOMMENTOR::Robin Juthe
BZCOMMENT::1
Attachment DictionaryReportfromWayne.xls has been added with description: Dictionary Report From Wayne - needs review
BZDATETIME::2010-09-23 14:16:27
BZCOMMENTOR::Bob Kline
BZCOMMENT::2
Bob will add 4 columns (ID/title for summary and drug information summary) and post a new version of the spreadsheet here.
BZDATETIME::2010-09-23 14:17:22
BZCOMMENTOR::Bob Kline
BZCOMMENT::3
Attachment lookup-urls.out has been added with description: The ones I couldn't find
BZDATETIME::2010-09-23 15:53:20
BZCOMMENTOR::Bob Kline
BZCOMMENT::4
Attachment lookup-urls.xls has been added with description: Four extra columns added
BZDATETIME::2010-10-05 09:55:05
BZCOMMENTOR::Robin Juthe
BZCOMMENT::5
(In reply to comment #3)
> Created attachment 2004 [details]
> The ones I couldn't find
We have identified the problems with each of these 11 URLs. We think the most common reasons these were not recognized were that the URL was placed in the wrong column in the spreadsheet or that there was a backslash at the end of the URL that was not present in the CDR. In any case, we are correcting these errors as we review the spreadsheet. We expect that the review will take about a month.
BZDATETIME::2010-10-21 10:10:10
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::6
Added my name to the cc list
BZDATETIME::2011-01-13 14:40:30
BZCOMMENTOR::Robin Juthe
BZCOMMENT::7
Added [Glossary] to the name of this issue.
Uploaded spreadsheet containing URLs and CDR IDs of related PDQ summaries and Drug Information summaries, and other URLs on Cancer.gov for importing to the CDR Glossary Term Concept Records in the Related Information section of the documents.
Attachment GlossaryURLS for Upload to CDR.xls has been added with description: Glossary URL spreadsheet ready for import
BZDATETIME::2011-01-20 09:03:31
BZCOMMENTOR::Robin Juthe
BZCOMMENT::8
OK, NOW I added [Glossary] to the name of the issue.
BZDATETIME::2011-01-25 15:49:44
BZCOMMENTOR::Bob Kline
BZCOMMENT::9
I've done a test run on Franck. Please review them and let me know if I need to change anything. If not, I'll proceed with a live run on Franck.
http://franck.nci.nih.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2011-01-25_15-42-15
BZDATETIME::2011-01-26 14:21:04
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::10
(In reply to comment #9)
> I've done a test run on Franck. Please review them and let me know
if I need
> to change anything. If not, I'll proceed with a live run on
Franck.
>
> http://franck.nci.nih.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2011-01-25_15-42-15
I have looked at some of the test results and compared them with the information in the spreadsheet. The program appears to be doing the right thing. I couldn’t figure out where the Spanish URLs were coming from since they are not in the spreadsheet :-)
I just have a few observations:
1. The page title text for the Spanish element is the same as the text for the English element. I believe this is because we don't have a separate document for the Spanish concepts. As long as the text is not published, I believe this should be fine but I just wanted to mention it.
2. There seems to be inconsistencies between the English and Spanish URLs.
For example: CDR 622260
The English URL goes directly to the "starting page":
http://www.cancer.gov/cancertopics/understandingcancer/StemCells
Whereas the Spanish URL goes to the "AllPages" location:
http://www.cancer.gov/cancertopics/understandingcancer/espanol/el-Trasplante-de-Celulas-Madre/AllPages
There are quite a few of the URLs that are like that throughout the
spreadsheet.
3. There are also some URLs that produce blank pages. They appear not
to be the correct URLs. Most of these URLs are ones that end with a page
number.
Examples:
622088
http://www.cancer.gov/cancertopics/types/cancersbodylocation/page9
622140
http://www.cancer.gov/cancertopics/cancersbodylocation/page5
621003
http://www.cancer.gov/cancertopics/cancersbodylocation/page16
623627
http://www.cancer.gov/cancertopics/cancersbodylocation/page8
BZDATETIME::2011-01-27 12:16:48
BZCOMMENTOR::Robin Juthe
BZCOMMENT::11
(In reply to comment #10)
> 3. There are also some URLs that produce blank pages. They appear
not to be the
> correct URLs. Most of these URLs are ones that end with a page
number.
It looks like the cancersbodylocation pages have changed since our review since the URLs now end with the name of the specific location, for example:
http://www.cancer.gov/cancertopics/types/cancersbodylocation/head-and-neck
I will update these in the spreadsheet.
BZDATETIME::2011-01-27 12:19:53
BZCOMMENTOR::Margaret Beckwith
BZCOMMENT::12
I wonder if this will be a problem for other entries after we switch to the new Web CMS? I don't think urls are supposed to change but we should check a few to make sure.
BZDATETIME::2011-01-27 12:59:59
BZCOMMENTOR::Robin Juthe
BZCOMMENT::13
I updated the CancersBodyLocation pages to reflect the new URLs, which no longer have page #s.
Attachment GlossaryURLS for Upload to CDR - updated BodyLocation pages.xls has been added with description: Updated spreadsheet
BZDATETIME::2011-02-01 10:04:40
BZCOMMENTOR::Bob Kline
BZCOMMENT::14
(In reply to comment #10)
> I couldn’t figure out where the Spanish URLs were coming from
since
> they are not in the spreadsheet :-)
After discussion with Robin I had the Cancer.gov DB architect run a query for us to retrieve the Spanish URLs which corresponded to the English URLs in the spreadsheet.
> 1. The page title text for the Spanish element is the same as
the text for the
> English element.
There are a couple of ways we could address this. One is to retrieve the page pointed to by the Spanish URL and extract the value of the <head><title/><head> element. The drawback to that approach is that at least in some cases the title is truncated (for example, "Entendiendo al Cáncer y Temas Relacionados: Entendiendo l - National Cancer Institute"). The other approach would be to retrieve one of the GlossaryTermName documents associated with the concept and pick one of the Spanish names. The drawback to this approach is that the name we'd end up with could be an arbitrary selection when there's more than one Spanish name for the concept. Of course, a third approach would be to have CIAT manually provide the Spanish string that should be used for the Spanish URL.
> 2. There seems to be inconsistencies between the English and Spanish URLs.
That's what's in the table on Cancer.gov. I assume that table is maintained by hand.
BZDATETIME::2011-02-01 10:06:02
BZCOMMENTOR::Bob Kline
BZCOMMENT::15
(In reply to comment #13)
> I updated the CancersBodyLocation pages to reflect the new URLs,
which no
> longer have page #s.
Would the changes you made mean the Spanish URLs I got from Cancer.gov are incorrect, or will they still be the right ones?
BZDATETIME::2011-02-03 13:47:22
BZCOMMENTOR::Bob Kline
BZCOMMENT::16
CIAT will create a new spreadsheet with two more columns, one for an external Spanish URL, and the other for the Spanish title to be used with that URL.
BZDATETIME::2011-02-08 10:18:23
BZCOMMENTOR::Bob Kline
BZCOMMENT::17
I've created the Media documents on Franck and run a test global on the glossary name documents. After CIAT has had a change to check the test documents I'll run the job live on Franck.
http://franck.nci.nih.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2011-02-08_10-09-47
Vanessa does a very good job with the pronunciations. Odd, though, that "predominio" comes out sounding as if it were "predomino" in CDR694751.
BZDATETIME::2011-02-08 10:27:18
BZCOMMENTOR::Bob Kline
BZCOMMENT::18
Oops! Ignore last comment - was attached to the wrong issue.
BZDATETIME::2011-02-10 12:06:45
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::19
I am attaching the updated spreadsheet. I added two columns "Spanish Page Title" and "Spanish URL", and as Margaret suggested, I sorted the spreadsheet using the "Other URL" column as the criterion. I then clicked on all the URLs under the "Other URL" column. The URLs that have Spanish pages (toggle), I added the Spanish page URL and title to the spreadsheet. I also had the translators look at the titles to make sure they looked Okay.
Attachment URLs for Spanish Pages.xls has been added with description: added Spanish URLs and page titles
BZDATETIME::2011-02-11 15:17:08
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::20
(In reply to comment #19)
> Created attachment 2079 [details]
> added Spanish URLs and page titles
>
> I am attaching the updated spreadsheet. I added two columns
"Spanish Page
> Title" and "Spanish URL", and as Margaret suggested, I sorted the
spreadsheet
> using the "Other URL" column as the criterion. I then clicked on
all the URLs
> under the "Other URL" column. The URLs that have Spanish pages
(toggle), I
> added the Spanish page URL and title to the spreadsheet. I also had
the
> translators look at the titles to make sure they looked Okay.
It looks like I accidentally attached the wrong excel file. I will attach the correct one shortly.
BZDATETIME::2011-02-11 16:50:16
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::21
> It looks like I accidentally attached the wrong excel file. I
will attach the
> correct one shortly.
I am attaching the correct file. Sorry about the confusion.
Attachment GlossaryURLS for Upload to CDR - updated BodyLocation pages.xls has been added with description: Spanish URLs and Page titles
BZDATETIME::2011-02-18 13:06:34
BZCOMMENTOR::Bob Kline
BZCOMMENT::22
I ran a fresh test-mode job on Franck:
http://franck.nci.nih.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2011-02-18_13-01-08
BZDATETIME::2011-02-18 13:11:43
BZCOMMENTOR::Bob Kline
BZCOMMENT::23
(In reply to comment #22)
> I ran a fresh test-mode job on Franck:
>
> http://franck.nci.nih.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2011-02-18_13-01-08
I just realized that not all of the modifications to the script had been saved when I ran that test job, so I'm running it again:
http://franck.nci.nih.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2011-02-18_13-10-09
BZDATETIME::2011-02-23 16:32:18
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::24
I looked at several of the tests results and they looked okay to me. Please proceed with the next steps.
BZDATETIME::2011-02-24 08:23:24
BZCOMMENTOR::Bob Kline
BZCOMMENT::25
(In reply to comment #24)
> Please proceed with the next steps.
Live run completed on Franck.
BZDATETIME::2011-02-28 14:07:19
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::26
(In reply to comment #25)
> (In reply to comment #24)
> > Please proceed with the next steps.
>
> Live run completed on Franck.
I reviewed a lot of the documents and found no problems so I guess it can be promoted to Bach.
I was hoping to be able to test the links from the QC report but the QC report does not contain the links. If the link is a cdr:xref link, it shows only the title and if it is a cdr:ref link, it does not show anything. If it is okay, I will enter a new issue to modify the QC report to display the links.
BZDATETIME::2011-03-01 08:29:25
BZCOMMENTOR::Bob Kline
BZCOMMENT::27
(In reply to comment #26)
> I reviewed a lot of the documents and found no problems so I
guess it can be
> promoted to Bach.
Before I do that, just out of curiosity, why do these documents use cdr:href links for the summaries? Don't we normally use cdr:href links when marking up in-line text and cdr:ref links when creating standalone links, outside of running text?
BZDATETIME::2011-03-01 13:12:47
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::28
(In reply to comment #27)
> (In reply to comment #26)
>
> > I reviewed a lot of the documents and found no problems so I
guess it can be
> > promoted to Bach.
>
> Before I do that, just out of curiosity, why do these documents use
cdr:href
> links for the summaries? Don't we normally use cdr:href links when
marking up
> in-line text and cdr:ref links when creating standalone links,
outside of
> running text?
I believe this was at least mentioned in OCECDR-2825 and OCECDR-2849. It looks like it was not addressed because changing it would have meant doing a global.
BZDATETIME::2011-03-15 08:36:39
BZCOMMENTOR::Bob Kline
BZCOMMENT::29
Run in test mode on Bach:
http://bach.nci.nih.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2011-03-15_08-27-18
BZDATETIME::2011-03-17 13:39:09
BZCOMMENTOR::Robin Juthe
BZCOMMENT::30
Next step is for Margaret to check with Lakshmi to find out whether we should change the cdr:href links discussed in comment #27 should be changed to cdr:ref links.
BZDATETIME::2011-03-17 13:39:09
BZCOMMENTOR::Robin Juthe
BZCOMMENT::31
Next step is for Margaret to check with Lakshmi to find out whether we should change the cdr:href links discussed in comment #27 should be changed to cdr:ref links.
BZDATETIME::2011-03-25 10:16:47
BZCOMMENTOR::Bob Kline
BZCOMMENT::32
The decision was made at yesterday's status meeting that we would create a separate issue to fix the cdr:href links which should be cdr:ref links, apply that fix, and then proceed with the work for this task. When you create the new issue, please add a dependency for this task on that one.
BZDATETIME::2011-03-25 12:18:10
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::33
(In reply to comment #32)
> The decision was made at yesterday's status meeting that we would
create a
> separate issue to fix the cdr:href links which should be cdr:ref
links, apply
> that fix, and then proceed with the work for this task. When you
create the
> new issue, please add a dependency for this task on that one.
Done - created OCECDR-3333 and added dependency to this bug.
BZDATETIME::2011-03-25 12:34:52
BZCOMMENTOR::Bob Kline
BZCOMMENT::34
(In reply to comment #32)
> The decision was made at yesterday's status meeting that we would
create a
> separate issue to fix the cdr:href links which should be cdr:ref
links, apply
> that fix, and then proceed with the work for this task. When you
create the
> new issue, please add a dependency for this task on that one.
The first step in that new task was to do some research to determine
how
extensively the RelatedSummaryRef has already been used. I jumped ahead
and
did that investigation and was unable to find any instances of that
element on
Bach in the query_term table, though I thought someone was saying that
users
had already been putting the links in. Perhaps we don't need a new
issue, and
we can just slip in a schema change as part of this task. Let me know if
I'm
missing something.
BZDATETIME::2011-04-01 11:03:10
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::35
We agreed yesterday that since we made an enhancement in OCECDR-3333 to change the Related Summary Ref links we need to run another test on Mahler (or Franck).
BZDATETIME::2011-04-01 12:41:56
BZCOMMENTOR::Bob Kline
BZCOMMENT::36
(In reply to comment #35)
> We agreed yesterday that since we made an enhancement in
OCECDR-3333 to change
> the Related Summary Ref links we need to run another test on Mahler
(or
> Franck).
Done:
http://mahler.nci.nih.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2011-04-01_12-22-02
BZDATETIME::2011-04-05 11:46:18
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::37
(In reply to comment #36)
> (In reply to comment #35)
> > We agreed yesterday that since we made an enhancement in
OCECDR-3333 to change
> > the Related Summary Ref links we need to run another test on
Mahler (or
> > Franck).
>
> Done:
>
> http://mahler.nci.nih.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2011-04-01_12-22-02
Verified. Please run in live mode on Mahler.
BZDATETIME::2011-04-11 13:36:56
BZCOMMENTOR::Bob Kline
BZCOMMENT::38
(In reply to comment #37)
> Verified. Please run in live mode on Mahler.
Done.
BZDATETIME::2011-04-11 14:23:09
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::39
(In reply to comment #38)
> (In reply to comment #37)
>
> > Verified. Please run in live mode on Mahler.
>
> Done.
Verified. Please run in test mode on Bach.
BZDATETIME::2011-04-11 15:09:05
BZCOMMENTOR::Bob Kline
BZCOMMENT::40
(In reply to comment #39)
> Please run in test mode on Bach.
http://bach.nci.nih.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2011-04-11_14-43-03
BZDATETIME::2011-04-11 16:27:38
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::41
(In reply to comment #40)
> (In reply to comment #39)
>
> > Please run in test mode on Bach.
>
> http://bach.nci.nih.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2011-04-11_14-43-03
Verified. Please run in live mode on Bach.
BZDATETIME::2011-04-12 14:27:56
BZCOMMENTOR::Bob Kline
BZCOMMENT::42
(In reply to comment #41)
> Verified. Please run in live mode on Bach.
Done.
BZDATETIME::2011-04-12 15:53:34
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::43
(In reply to comment #42)
> (In reply to comment #41)
>
> > Verified. Please run in live mode on Bach.
>
> Done.
I have reviewed several of changes and they all look to me. Since I am not the QA of this issue, I will leave it open until instructed to close it.
BZDATETIME::2011-04-14 15:01:23
BZCOMMENTOR::Robin Juthe
BZCOMMENT::44
I also checked a few documents and they looked fine. Per the discussion in today's status meeting, this issue can be closed.
BZDATETIME::2011-07-14 17:43:56
BZCOMMENTOR::Robin Juthe
BZCOMMENT::45
I'm re-opening this issue because we discovered a handful of terms that have only 1 related information link per language in the CDR yet more than 1 related pages per language were identified in the spreadsheet used to import this information. Here are a few examples:
alopecia (CDR620797)
cancer vaccine (CDR621015)
chemotherapy (CDR621208
diarrhea (CDR622576)
eye cancer (CDR623180)
gastrointestinal tract (CDR621939)
Kaposi sarcoma (CDR621143)
I'm thinking it may make the most sense to manually add the additional links to resolve the issue, but it is probably worth investigating what went wrong.
Bob, it would also be helpful if you could generate a list of each term that has multiple links associated with it in the latest spreadsheet above (the one dated 2/11/11). An efficient way to do this might be to look for a CDR ID that appears in more than one row.
BZDATETIME::2011-07-15 12:43:22
BZCOMMENTOR::Volker Englisch
BZCOMMENT::46
I did a quick check on which IDs appear multiple times on that
spreadsheet.
Here is the list of the 30 CDR-IDs:
619364
619424
619510
619605
619800
619802
619861
619886
620053
620414
620797
620822
621015
621143
621208
621753
621939
622207
622260
622330
622476
622576
622915
622963
622963
623082
623180
623199
623640
674682
BZDATETIME::2011-07-15 13:10:25
BZCOMMENTOR::Bob Kline
BZCOMMENT::47
I found the problem in the original script, which assumed there would be only one row in the spreadsheet for any given concept document. I agree that the best solution would be to populate the missing links by hand.
BZDATETIME::2011-08-11 09:06:08
BZCOMMENTOR::Robin Juthe
BZCOMMENT::48
(In reply to comment #46)
> I did a quick check on which IDs appear multiple times on that
spreadsheet.
> Here is the list of the 30 CDR-IDs:
Amy updated these GTC documents to add the additional related pages, so I think this issue can be closed.
BZDATETIME::2011-08-11 09:06:22
BZCOMMENTOR::Robin Juthe
BZCOMMENT::49
Closing issue.
File Name | Posted | User |
---|---|---|
DictionaryReportfromWayne.xls | 2010-09-23 09:42:40 | |
GlossaryURLS for Upload to CDR.xls | 2011-01-13 14:40:30 | |
GlossaryURLS for Upload to CDR - updated BodyLocation pages.xls | 2011-02-11 16:50:16 | Osei-Poku, William (NIH/NCI) [C] |
GlossaryURLS for Upload to CDR - updated BodyLocation pages.xls | 2011-01-27 12:59:59 | |
lookup-urls.out | 2010-09-23 14:17:22 | |
lookup-urls.xls | 2010-09-23 15:53:20 | |
URLs for Spanish Pages.xls | 2011-02-10 12:06:45 | Osei-Poku, William (NIH/NCI) [C] |
Elapsed: 0:00:00.001886