Issue Number | 2979 |
---|---|
Summary | New Summary Document Not Published to Cancer.gov |
Created | 2009-10-05 15:12:47 |
Issue Type | Improvement |
Submitted By | Englisch, Volker (NIH/NCI) [C] |
Assigned To | Englisch, Volker (NIH/NCI) [C] |
Status | Closed |
Resolved | 2010-05-17 11:26:22 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/ocecdr/issue.107307 |
BZISSUE::4655
BZDATETIME::2009-10-05 15:12:47
BZCREATOR::Volker Englisch
BZASSIGNEE::Volker Englisch
BZQACONTACT::William Osei-Poku
On Friday, Margaret and Anne identified a Summary document (CDR614165) on BACH that had been published to the vendors but did not appear on Cancer.gov.
After some investigation we identified that this summary - a new
document - had been picked up for publishing on Aug 21st and got pushed
to Gatekeeper.
However, the load of the document failed due to a 'pretty-URL unique
value' failure. This failure was reported back to us.
At this point, the summary probably should have been corrected and
manually republished.
Once the document was corrected the users expected the publishing job to
pick up the summary for publishing because it had not been published
yet. However, due to the earlier publishing event this document already
existed in the table pub_proc_cg, but our queries of the publishing
document identify a document as new (to be picked up as part of the
nightly updates) if it doesn't exist in the table pub_proc_cg.
Therefore, this document was not published as part of a nightly job but
only as part of the weekly publishing job.
Obviously, this situation does not occur frequently but we could adjust the SQL queries for picking up new documents to adjust for this situation.
I've modified the original query (for summaries) from the
following:
SELECT TOP ?NumDocsPerDocType? d.id, max(v.num)
FROM doc_version v
JOIN document d
ON v.id = d.id
JOIN doc_type t
ON d.doc_type = t.id
WHERE t.name = 'Summary'
AND d.active_status = 'A'
AND v.publishable = 'Y'
AND v.val_status = 'V'
AND v.updated_dt <= '?MaxDocUpdatedDate?'
AND v.id NOT IN (SELECT id FROM pub_proc_cg)
AND d.id NOT IN (
SELECT doc_id
FROM query_term_pub
WHERE path = '/Summary/WillReplace/@cdr:ref'
)
GROUP BY d.id
ORDER BY d.id
to this query, which would also pick up documents that do exist in
the pub_proc_cg table but for which no successful push job to Cancer.gov
exists.
SELECT d.id, max(v.num)
FROM doc_version v
JOIN document d
ON v.id = d.id
JOIN doc_type t
ON d.doc_type = t.id
WHERE t.name = 'Summary'
AND d.active_status = 'A'
AND v.publishable = 'Y'
AND v.val_status = 'V'
AND v.updated_dt <= '?MaxDocUpdatedDate?'
AND d.id NOT IN (
SELECT doc_id
FROM query_term_pub
WHERE path = '/Summary/WillReplace/@cdr:ref'
)
AND (v.id NOT IN (SELECT id FROM pub_proc_cg)
OR
v.id IN (SELECT DISTINCT ppc.id
FROM pub_proc_cg ppc
JOIN pub_proc_doc ppd
ON ppc.id = ppd.doc_id
JOIN pub_proc pp
ON ppd.pub_proc = pp.id
AND pp.pub_subset LIKE 'Push_%'
WHERE ppd.failure = 'Y'
AND NOT EXISTS (SELECT 'x'
FROM pub_proc_doc i
JOIN pub_proc ipp
ON ipp.id = i.pub_proc
AND ipp.pub_subset LIKE 'Push_%'
WHERE i.doc_id = ppd.doc_id
AND i.failure IS NULL
)
)
)
GROUP BY d.id
ORDER BY d.id
The query adds about 2-5 minutes for each doc_type as it's written but it might be possible to speed things up a little more should we decide to modify the publishing queries.
I've tested this with the summaries only for now but more testing would be needed.
BZDATETIME::2009-10-12 10:53:50
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::1
I have added the email communication between Volker and me regarding the reported problem below. As we discussed in the CDR meeting last Thursday, we (CIAT) knew what the problem was and tried to fix it but failed to fix it correctly. I have taken note of this so that if another problem comes up similarly to this, we will adequately address it. Thank you.
------Original Message
From: Englisch, Volker (NIH/NCI) [C] volker@mail.nih.gov
Sent: Monday, August 24, 2009 10:19 AM
To: Osei-Poku, William
Subject: RE: Problems with loading of job 6426 to Cancer.gov
The document CDR614165 could not get loaded because another document with the identical SummaryURL already exists: CDR62781.
Here is the error message from Gatekeeper:
Database Error: Checking prettyURL duplication failed.
Document CDRID=614165.
Database Error: The prettyURL
/cancertopics/pdq/treatment/child-cerebellar-astrocytoma/healthprofessional
is not unique. It is used by another Summary/DrugInfoSummary
document. Reference document CDRID=62781.
–
Volker Englisch
Contractor - Lockheed Martin
phone: (301) 496-0102 (CTB)
mailto:volker@mail.nih.gov
------Original Message
From: Osei-Poku, William william.osei-poku@lmco.com
Sent: Monday, August 24, 2009 7:25 AM
To: Englisch, Volker (NIH/NCI) [C]
Subject: RE: Problems with loading of job 6426 to Cancer.gov
Hi Volker,
Do you know what the error means?
I filtered the document and there were no errors.
Validation results for CDR614165
Document is valid: no errors, no warnings!
Thanks,
William
------Original Message
From: cdr@bach.nci.nih.gov cdr@bach.nci.nih.gov
Sent: Friday, August 21, 2009 11:58 PM
To: bkline@rksystems.com; vrmeyer@comcast.net; volker@mail.nih.gov;
Osei-Poku, William
Subject: Problems with loading of job 6426 to Cancer.gov
1 failures and 0 warnings were encountered in the loading of
documents
for job 6426 to Cancer.gov.
Please visit the following link for further details:
BZDATETIME::2010-03-24 13:09:01
BZCOMMENTOR::Volker Englisch
BZCOMMENT::2
I'm changing this to a P6.
This situation comes up very rarely (maybe once or twice a year) and is
fixed automatically by the weekly publishing job.
BZDATETIME::2010-04-29 11:20:28
BZCOMMENTOR::Volker Englisch
BZCOMMENT::3
The original query runs in 3 seconds.
The new query runs in 230 seconds.
I found out that the new query will run in 10 sec if the 'OR'
statement is split into two queries joined by a UNION.
This is the version I am going to include into our publishing
document.
BZDATETIME::2010-05-10 17:33:18
BZCOMMENTOR::Volker Englisch
BZCOMMENT::4
I've updated the Summaries SQL query for the Interim-Export
publishing job.
The document CDR178 has been replaced on production (current CDR version
52).
I'll monitor during this week before closing the issue.
BZDATETIME::2010-05-17 11:26:22
BZCOMMENTOR::Volker Englisch
BZCOMMENT::5
Publishing is now running for a week without any problems.
Closing issue.
Elapsed: 0:00:00.000756