CDR Tickets

Issue Number 3360
Summary [Summaries] Pub Preview Error
Created 2011-05-16 11:47:06
Issue Type Improvement
Submitted By Juthe, Robin (NIH/NCI) [E]
Assigned To Englisch, Volker (NIH/NCI) [C]
Status Closed
Resolved 2011-06-21 10:19:57
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.107688
Description

BZISSUE::5053
BZDATETIME::2011-05-16 11:47:06
BZCREATOR::Robin Juthe
BZASSIGNEE::Volker Englisch
BZQACONTACT::William Osei-Poku

We noticed an error in Publish Preview of the Genetics of Prostate Cancer summary (CDR299612) in which text around a SummaryFragmentRef is missing, yet OK on the live site and in the CDR. A chain of emails about this issue is pasted below.

(Volker, please revise the component if necessary--not sure if I selected the right one. Thanks.)

-----------------------------------------

> Also, Volker, can you verify that the text is missing when it comes back
> from the cdrpreviewws web service?

No, I can't.

This one is actually on me. It appears that the regular expression that converts the HTML coming back from Gatekeeper in order to make the SummaryFragmentRefs clickable is a little too greedy and strips that missing text.

Robin, could you please enter an issue in Bugzilla to fix the PublishPreview program?

Thank you,

Volker


Volker Englisch
Communications Technology Branch (CTB)
Contractor: Sapient Government Services
Email: volker@mail.nih.gov
Phone: (301) 496-0102 (CTB)

From: Learn, Blair (NIH/NCI) [C]
Sent: Monday, May 16, 2011 9:55 AM
To: Beckwith, Margaret (NIH/NCI) [E]; Englisch, Volker (NIH/NCI) [C]
Subject: RE: Pub Preview Example

Hi Margaret,

Yes, please enter a sharepoint ticket; that’s a new one on me.

A few things that would be very helpful: The document ID and a copy of the document (Volker should be able to provide this).

Also, Volker, can you verify that the text is missing when it comes back from the cdrpreviewws web service? I’d just like to be certain the problem is on our side of the wall.

Thanks!

  • Blair

From: Beckwith, Margaret (NIH/NCI) [E]
Sent: Monday, May 16, 2011 9:33 AM
To: Englisch, Volker (NIH/NCI) [C]; Learn, Blair (NIH/NCI) [C]
Subject: FW: Pub Preview Example

Hi Volker and Blair,

Robin H noticed something very peculiar in one of her summaries that I wanted to ask you about. She noticed that there is actually text missing in the middle of a paragraph when she looks at it on Publish Preview, but the text is there in the CDR and also there on the live site. Weird huh? Anyway, should I put an issue into the Sharepoint site for this?

Thanks,

Margaret

From: Harrison, Robin (NIH/NCI) [E]
Sent: Friday, May 13, 2011 4:59 PM
To: Beckwith, Margaret (NIH/NCI) [E]
Subject: Pub Preview Example

Hi Margaret,

Here’s the pub preview example I mentioned:

The paragraph under the subheading “Genetic Testing” just before the Psychosocial section reads like this in Pub Preview (it’s missing text):

Genetic Testing
At this time, with the exception of prostate cancer in a family with evidence of hereditary breast/ovarian cancer (HBOC) syndrome, clinical genetic testing to detect inherited prostate cancer predisposition is not available. (Refer to the BRCA1 and BRCA2 subsection of the Prostate Cancer Susceptibility Loci section of this summary and the PDQ summary on Prostate Cancer Susceptibility Loci section of this summary for more information.) For families suspected of having an inherited susceptibility to prostate cancer, participation in ongoing research studies investigating the genetic basis of inherited prostate cancer susceptibility can be considered.

On the live site (and in the CDR), it reads fine:

Genetic Testing
At this time, with the exception of prostate cancer in a family with evidence of hereditary breast/ovarian cancer (HBOC) syndrome, clinical genetic testing to detect inherited prostate cancer predisposition is not available. (Refer to the BRCA1 and BRCA2 subsection of the Prostate Cancer Susceptibility Loci section of this summary and the PDQ summary on Genetics of Breast and Ovarian Cancer for more information about prostate cancer in HBOC.) None of the candidate susceptibility genes have been unequivocally associated with prostate cancer predisposition. (Refer to the Prostate Cancer Susceptibility Loci section of this summary for more information.) For families suspected of having an inherited susceptibility to prostate cancer, participation in ongoing research studies investigating the genetic basis of inherited prostate cancer susceptibility can be considered.

Comment entered 2011-05-18 13:50:13 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-05-18 13:50:13
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::1

Adding my name to the cc list.

Comment entered 2011-05-18 16:32:34 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2011-05-18 16:32:34
BZCOMMENTOR::Volker Englisch
BZCOMMENT::2

Blair didn't want to get Bugzilla emails, so I'm making William the QA.

Comment entered 2011-05-26 15:04:34 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2011-05-26 15:04:34
BZCOMMENTOR::Volker Englisch
BZCOMMENT::3

The problem was the following:
The HTML text consists of four links, three of which are internal links (SummaryFragmentRef) and one is an external link (SummaryRef) similar to the example below.

text
<a href="/hostname...#Section_50">link text 1</a>
text1
<a href="/hostname...#Section_28">link text 2</a>
text2
<a href="/hostname...">link text 3</a>
text3
<a href="/hostname...#Section_28">link text 4</a>
text4

I'm using a regular expression to replace the text of the href attribute to replace the hostname up to the following '#Section'. For the SummaryRef, however, there doesn't exist the '#Section' text and therefore the expression removed everything up and including to the following link.

I've fixed this in the program
PublishPreview.py
by removing the regular expression and parsing the HTML document using the python library lxml.html now. This gives us greater flexibility of modifying the links we're receiving from Gatekeeper.
In addition to correctly modifying the internal links now, I have also made the GlossaryTerm links and the SummaryLinks to external documents inactive so that the users won't link to an error page if they are clicking these links.

This is ready for review on MAHLER.

Comment entered 2011-06-16 11:23:28 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-06-16 11:23:28
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::4

(In reply to comment #3)
> This is ready for review on MAHLER.

I compared the text in pub preview on Mahler and the text from cancer.gov (Comment# 1) and I did not see any differences. If Robin agrees, this can be promoted to Bach.

Comment entered 2011-06-16 12:55:24 by Juthe, Robin (NIH/NCI) [E]

BZDATETIME::2011-06-16 12:55:24
BZCOMMENTOR::Robin Juthe
BZCOMMENT::5

(In reply to comment #4)
If Robin agrees, this can be
> promoted to Bach.

Agreed. Please promote to Bach.

Comment entered 2011-06-17 13:27:55 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2011-06-17 13:27:55
BZCOMMENTOR::Volker Englisch
BZCOMMENT::6

The following program has been copied to FRANCK and BACH:
PublishPreview.py - R10104

Please verify on BACH and close this bug.

Comment entered 2011-06-21 10:19:57 by Juthe, Robin (NIH/NCI) [E]

BZDATETIME::2011-06-21 10:19:57
BZCOMMENTOR::Robin Juthe
BZCOMMENT::7

(In reply to comment #6)
> The following program has been copied to FRANCK and BACH:
> PublishPreview.py - R10104
> Please verify on BACH and close this bug.

Verified on Bach. Closing issue. Thanks!

Elapsed: 0:00:00.000945