Issue Number | 3360 |
---|---|
Summary | [Summaries] Pub Preview Error |
Created | 2011-05-16 11:47:06 |
Issue Type | Improvement |
Submitted By | Juthe, Robin (NIH/NCI) [E] |
Assigned To | Englisch, Volker (NIH/NCI) [C] |
Status | Closed |
Resolved | 2011-06-21 10:19:57 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/ocecdr/issue.107688 |
BZISSUE::5053
BZDATETIME::2011-05-16 11:47:06
BZCREATOR::Robin Juthe
BZASSIGNEE::Volker Englisch
BZQACONTACT::William Osei-Poku
We noticed an error in Publish Preview of the Genetics of Prostate Cancer summary (CDR299612) in which text around a SummaryFragmentRef is missing, yet OK on the live site and in the CDR. A chain of emails about this issue is pasted below.
(Volker, please revise the component if necessary--not sure if I selected the right one. Thanks.)
-----------------------------------------
> Also, Volker, can you verify that the text is missing when it
comes back
> from the cdrpreviewws web service?
No, I can't.
This one is actually on me. It appears that the regular expression that converts the HTML coming back from Gatekeeper in order to make the SummaryFragmentRefs clickable is a little too greedy and strips that missing text.
Robin, could you please enter an issue in Bugzilla to fix the PublishPreview program?
Thank you,
Volker
–
Volker Englisch
Communications Technology Branch (CTB)
Contractor: Sapient Government Services
Email: volker@mail.nih.gov
Phone: (301) 496-0102 (CTB)
From: Learn, Blair (NIH/NCI) [C]
Sent: Monday, May 16, 2011 9:55 AM
To: Beckwith, Margaret (NIH/NCI) [E]; Englisch, Volker (NIH/NCI)
[C]
Subject: RE: Pub Preview Example
Hi Margaret,
Yes, please enter a sharepoint ticket; that’s a new one on me.
A few things that would be very helpful: The document ID and a copy of the document (Volker should be able to provide this).
Also, Volker, can you verify that the text is missing when it comes back from the cdrpreviewws web service? I’d just like to be certain the problem is on our side of the wall.
Thanks!
Blair
From: Beckwith, Margaret (NIH/NCI) [E]
Sent: Monday, May 16, 2011 9:33 AM
To: Englisch, Volker (NIH/NCI) [C]; Learn, Blair (NIH/NCI) [C]
Subject: FW: Pub Preview Example
Hi Volker and Blair,
Robin H noticed something very peculiar in one of her summaries that I wanted to ask you about. She noticed that there is actually text missing in the middle of a paragraph when she looks at it on Publish Preview, but the text is there in the CDR and also there on the live site. Weird huh? Anyway, should I put an issue into the Sharepoint site for this?
Thanks,
Margaret
From: Harrison, Robin (NIH/NCI) [E]
Sent: Friday, May 13, 2011 4:59 PM
To: Beckwith, Margaret (NIH/NCI) [E]
Subject: Pub Preview Example
Hi Margaret,
Here’s the pub preview example I mentioned:
The paragraph under the subheading “Genetic Testing” just before the Psychosocial section reads like this in Pub Preview (it’s missing text):
Genetic Testing
At this time, with the exception of prostate cancer in a family with
evidence of hereditary breast/ovarian cancer (HBOC) syndrome, clinical
genetic testing to detect inherited prostate cancer predisposition is
not available. (Refer to the BRCA1 and BRCA2 subsection of the Prostate
Cancer Susceptibility Loci section of this summary and the PDQ summary
on Prostate Cancer Susceptibility Loci section of this summary for more
information.) For families suspected of having an inherited
susceptibility to prostate cancer, participation in ongoing research
studies investigating the genetic basis of inherited prostate cancer
susceptibility can be considered.
On the live site (and in the CDR), it reads fine:
Genetic Testing
At this time, with the exception of prostate cancer in a family with
evidence of hereditary breast/ovarian cancer (HBOC) syndrome, clinical
genetic testing to detect inherited prostate cancer predisposition is
not available. (Refer to the BRCA1 and BRCA2 subsection of the Prostate
Cancer Susceptibility Loci section of this summary and the PDQ summary
on Genetics of Breast and Ovarian Cancer for more information about
prostate cancer in HBOC.) None of the candidate susceptibility genes
have been unequivocally associated with prostate cancer predisposition.
(Refer to the Prostate Cancer Susceptibility Loci section of this
summary for more information.) For families suspected of having an
inherited susceptibility to prostate cancer, participation in ongoing
research studies investigating the genetic basis of inherited prostate
cancer susceptibility can be considered.
BZDATETIME::2011-05-18 13:50:13
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::1
Adding my name to the cc list.
BZDATETIME::2011-05-18 16:32:34
BZCOMMENTOR::Volker Englisch
BZCOMMENT::2
Blair didn't want to get Bugzilla emails, so I'm making William the QA.
BZDATETIME::2011-05-26 15:04:34
BZCOMMENTOR::Volker Englisch
BZCOMMENT::3
The problem was the following:
The HTML text consists of four links, three of which are internal links
(SummaryFragmentRef) and one is an external link (SummaryRef) similar to
the example below.
text
<a href="/hostname...#Section_50">link text 1</a>
text1
<a href="/hostname...#Section_28">link text 2</a>
text2
<a href="/hostname...">link text 3</a>
text3
<a href="/hostname...#Section_28">link text 4</a>
text4
I'm using a regular expression to replace the text of the href attribute to replace the hostname up to the following '#Section'. For the SummaryRef, however, there doesn't exist the '#Section' text and therefore the expression removed everything up and including to the following link.
I've fixed this in the program
PublishPreview.py
by removing the regular expression and parsing the HTML document using
the python library lxml.html now. This gives us greater flexibility of
modifying the links we're receiving from Gatekeeper.
In addition to correctly modifying the internal links now, I have also
made the GlossaryTerm links and the SummaryLinks to external documents
inactive so that the users won't link to an error page if they are
clicking these links.
This is ready for review on MAHLER.
BZDATETIME::2011-06-16 11:23:28
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::4
(In reply to comment #3)
> This is ready for review on MAHLER.
I compared the text in pub preview on Mahler and the text from cancer.gov (Comment# 1) and I did not see any differences. If Robin agrees, this can be promoted to Bach.
BZDATETIME::2011-06-16 12:55:24
BZCOMMENTOR::Robin Juthe
BZCOMMENT::5
(In reply to comment #4)
If Robin agrees, this can be
> promoted to Bach.
Agreed. Please promote to Bach.
BZDATETIME::2011-06-17 13:27:55
BZCOMMENTOR::Volker Englisch
BZCOMMENT::6
The following program has been copied to FRANCK and BACH:
PublishPreview.py - R10104
Please verify on BACH and close this bug.
BZDATETIME::2011-06-21 10:19:57
BZCOMMENTOR::Robin Juthe
BZCOMMENT::7
(In reply to comment #6)
> The following program has been copied to FRANCK and BACH:
> PublishPreview.py - R10104
> Please verify on BACH and close this bug.
Verified on Bach. Closing issue. Thanks!
Elapsed: 0:00:00.000945