Issue Number | 3547 |
---|---|
Summary | URL Check report not working |
Created | 2012-10-01 14:37:04 |
Issue Type | Improvement |
Submitted By | Osei-Poku, William (NIH/NCI) [C] |
Assigned To | Englisch, Volker (NIH/NCI) [C] |
Status | Closed |
Resolved | 2012-12-13 11:57:26 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/ocecdr/issue.107875 |
BZISSUE::5244
BZDATETIME::2012-10-01 14:37:04
BZCREATOR::William Osei-Poku
BZASSIGNEE::Volker Englisch
BZQACONTACT::William Osei-Poku
The URL Check report under CIAT/OCCM Staff > Reports > General Reports is not working as expected. I tested this on Mahler and Bach and in both cases, it did not send out the email it's supposed to send after it has completed running.
BZDATETIME::2012-10-01 14:57:03
BZCOMMENTOR::Volker Englisch
BZCOMMENT::1
It appears that this report ran the last time successfully in Feb.
2009. Back then there were around 12,000 URLs to be checked. The next
time this report was started was in Oct. 2010 with 22,000 URLs to be
checked.
It appears that all reports that have been started since 2009 were
failing after checking around 13,000 URLs.
BZDATETIME::2012-10-04 18:09:09
BZCOMMENTOR::Volker Englisch
BZCOMMENT::2
Attached is a sample of the URL Check report (limited to 500 entries).
As discussed at our status meeting please take a look at this and identify what kind of changes you'd like to have implemented for the report and/or the user interface. We had talked about running this report by document type only. Currently, the report runs against all document types.
Attachment UrlCheck-4456.html has been added with description: URL Check Report (Mahler)
BZDATETIME::2012-10-17 12:39:24
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::3
Question:
Could you modify the report to also check for changes in the titles of
the pages? We normally copy the title of the page and enter it in URL
element in the CDR exactly as they appear on the pages. The URL itself
goes into the attribute inspector.
Here are the changes we will like to make:
1. Run the report by document type
2. For summaries and Glossaries, we want to be able to run it by the
Audience and Language
3. Instead of displaying the results of the report in the browser, we
want it to be emailed to the user.
BZDATETIME::2012-10-23 09:43:07
BZCOMMENTOR::Volker Englisch
BZCOMMENT::4
As discussed at last week's status meeting the document title - as specified in the Title element of the Header block and displayed in the browser toolbar - could be checked but the title of the text, because it could be displayed within H1, H2, P, B tags or even be an image, cannot be checked automatically.
BZDATETIME::2012-11-14 19:42:45
BZCOMMENTOR::Volker Englisch
BZCOMMENT::5
(In reply to comment #3)
> 3. Instead of displaying the results of the report in the browser,
we want it
> to be emailed to the user.
Are you saying instead of receiving the email with a link to the report you would like the content of the report included in the email body?
In what type of format would you like the report to be displayed? ASCII, CSV, HTML, other? I'm asking because I'm not sure that a tabular report within an email message will be more convenient that a link to the report.
BZDATETIME::2012-11-15 11:19:42
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::6
(In reply to comment #5)
> (In reply to comment #3)
> > 3. Instead of displaying the results of the report in the
browser, we want it
> > to be emailed to the user.
>
> Are you saying instead of receiving the email with a link to the
report you
> would like the content of the report included in the email
body?
>
An email with the link to the report should be fine.
BZDATETIME::2012-11-26 14:55:32
BZCOMMENTOR::Volker Englisch
BZCOMMENT::7
The report has been modified as requested and is ready to be tested on MAHLER.
BZDATETIME::2012-11-26 17:41:55
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::8
(In reply to comment #7)
> The report has been modified as requested and is ready to be tested
on MAHLER.
I tested this on Mahler and it is working pretty well and it is very fast also. Could you limit the doc types in the drop down menu to only the following?
1. Glossary Term Concept
2. Summary
3. InScopeProtocol
4. CTGovProtocol
5. Drug Information Summary
6. Person
6. Clinical Trials Search String
7. Citation
8. Miscellaneous Documents
9. Organization
However, you can let the All Types run against all document types. The list appears to be too long and there are some we know don't have URLs.
Also, in the user interface, could you make the default Language and Audience selections blank?
BZDATETIME::2012-11-27 09:37:36
BZCOMMENTOR::Volker Englisch
BZCOMMENT::9
(In reply to comment #8)
> I tested this on Mahler and it is working pretty well and it is
very fast
Unfortunately, it is only this fast because I limited the SQL query to a few hundred documents for testing. I've removed that limitation now.
> However, you can let the All Types run against all document types.
Sorry, but the 'All Types' was a left-over from some pasted code. For this report there is no 'All Types' since we've decided to run this by single document type.
The additional changes are ready for review on MAHLER.
BZDATETIME::2012-11-27 11:15:22
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::10
(In reply to comment #9)
> (In reply to comment #8)
> The additional changes are ready for review on MAHLER.
The interface 'forces' me to choose the Language and Audience before it will run without an error even though some of the document types don't have Language and Audience attributes. Could you allow the program to run successfully for document types that don't have the attributes without requiring the Language and Audience?
Also, for those that need to use the Lang and Audience attributes, if they are not selected, a CGI error is displayed. Could you rather prompt users to make a selection instead of displaying the CGI script error?
BZDATETIME::2012-11-27 14:34:08
BZCOMMENTOR::Volker Englisch
BZCOMMENT::11
The additional changes are ready for review on MAHLER.
BZDATETIME::2012-11-27 15:03:27
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::12
(In reply to comment #11)
> The additional changes are ready for review on MAHLER.
Thanks! It is working as expected. I will do a few more tests and have it promoted to Bach tomorrow.
BZDATETIME::2012-11-28 10:59:34
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::13
(In reply to comment #12)
> (In reply to comment #11)
> > The additional changes are ready for review on MAHLER.
>
> Thanks! It is working as expected. I will do a few more tests and
have it
> promoted to Bach tomorrow.
Verified on Mahler. Please promote to Bach.
BZDATETIME::2012-11-28 11:51:03
BZCOMMENTOR::Volker Englisch
BZCOMMENT::14
The following programs have been copied to FRANCK and BACH:
CheckUrls.py - R10842
CdrLongReports.py - R10843
Please verify on BACH and close this bug.
BZDATETIME::2012-11-29 14:23:43
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::15
As reported in the meeting this afternoon, some of the URLs are coming up as inactive but when copied from the report page and pasted in the address bar, they come up as active. Here are a few of them:
62959 http://m.cancer.gov/topics/treatment/bycancer/rectal/Patient
404: Not Found MobileURL
62960 http://m.cancer.gov/topics/treatment/bycancer/esophageal/Patient
404: Not Found MobileURL
62961 http://m.cancer.gov/topics/treatment/bycancer/cervical/Patient
404: Not Found MobileURL
62962 http://m.cancer.gov/topics/treatment/bycancer/child-brain-stem-glioma/Patient
404: Not Found MobileURL
446198 http://www.cancer.gov/cam/clinicaltrials_intro.html
404: Not Found ExternalRef
446574 http://www.cancer.gov/cam/clinicaltrials_pdq.html
404: Not Found ExternalRef
446574 http://www.cancer.gov/cam/bestcase_intro.html
404: Not Found ExternalRef
446574 http://nccam.nih.gov/research/clinicaltrials
404: Not Found ExternalRef
446574 http://www.cancer.gov/cam/clinicaltrials_intro.html
404: Not Found ExternalRef
446574 http://nccam.nih.gov/health/decisions/consideringcam.htm
404: Not Found ExternalRef
446574 http://nccam.nih.gov/health/decisions/practitioner.htm
404: Not Found ExternalRef
BZDATETIME::2012-12-03 16:29:27
BZCOMMENTOR::Volker Englisch
BZCOMMENT::16
(In reply to comment #15)
> As reported in the meeting this afternoon, some of the URLs are
coming up as
> inactive but when copied from the report page and pasted in the
address bar,
> they come up as active.
I have made some minor changes, namely using an updated class of the used httplib Python module. This seemed to resolve the problem for one of the mobile URLs I've used for testing.
Please rerun the report and let me know if this change is solving the problem.
BZDATETIME::2012-12-05 10:56:25
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::17
(In reply to comment #16)
> Please rerun the report and let me know if this change is solving the problem.
It is a lot better. Nearly all the mobile URLs which live are gone from the report. But there are still some that are being reported as "Bad Request" but they are live links. For example:
http://www.cancer.gov/cancertopics/druginfo/colorectalcancer#dal1 400: Bad Request
Also, I am not sure why the ones designated as "302:Found" are being reported as inactive. For example:
62955 http://www.cancer.gov/cancertopics/factsheet/therapy/sentinel-node-biopsy 302: Found
BZDATETIME::2012-12-05 11:50:30
BZCOMMENTOR::Volker Englisch
BZCOMMENT::18
Please keep in mind that the report doesn't just list pages that are inaccessible (error code 404 - Not found) but lists pages that are not OK (error code 200).
302 Moved Temporarily
The page has been redirected, that's why you think the URL was
valid.
The URL you're following is
http://www.cancer.gov/cancertopics/factsheet/therapy/sentinel-node-biopsy
but the page presented is
http://www.cancer.gov/cancertopics/factsheet/detection/sentinel-node-biopsy
400 Bad Request: The request cannot be fulfilled due to bad
syntax.
It is possible that another client (a.k.a. browser) does try - and even
succeed- to retrieve this page but it doesn't change the fact that
there's a problem with the URL. In the case of the URL
http://www.cancer.gov/cancertopics/druginfo/colorectalcancer#dal1
the link target 'dal1' must be listed in the document as a unique ID
attribute value but it exists multiple times. Your browser probably
picks the first one which may or may not be the correct target.
BZDATETIME::2012-12-05 17:07:35
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::19
Thanks for the clarification. I will close this bug after we've done a few more reviews.
BZDATETIME::2012-12-13 11:57:26
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::20
Verified on Bach. Thank You! Closing issue.
File Name | Posted | User |
---|---|---|
UrlCheck-4456.html | 2012-10-04 18:09:09 | Englisch, Volker (NIH/NCI) [C] |
Elapsed: 0:00:00.000593