Issue Number | 3129 |
---|---|
Summary | Publishing Organization with attached MP3 files |
Created | 2010-04-15 16:06:19 |
Issue Type | Improvement |
Submitted By | Osei-Poku, William (NIH/NCI) [C] |
Assigned To | Englisch, Volker (NIH/NCI) [C] |
Status | Closed |
Resolved | 2010-06-18 16:25:29 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/ocecdr/issue.107457 |
BZISSUE::4806
BZDATETIME::2010-04-15 16:06:19
BZCREATOR::William Osei-Poku
BZASSIGNEE::Volker Englisch
BZQACONTACT::William Osei-Poku
Below is the initial email from Volker about the errors encountered during publishing organization records that have media docs attached:
........
We probably want to talk about this publishing error at our status
meeting.
It appears that the MP3 Media document has been made publishable because
it has been attached to the publishable organization document (the PDQ
Treatment Board).
Maybe it's time to talk about removing the organizations from the
publishing process?
–
Volker Englisch
Contractor - Lockheed Martin
phone: (301) 496-0102 (CTB)
mailto:volker@mail.nih.gov
Also, another email from Bob emphasizing the need to address this issue.
--------
There is one issue which needs to be discussed before too long: now that
Media documents are being linked by Organization documents (for the
board meeting recordings), those Media documents are being exported as
part of the publication. As Volker pointed out elsewhere, this might be
a good time to speed up the timing of our decision to discontinue
publishing of Organization documents.
--
Bob Kline
http://www.rksystems.com
mailto:bkline@rksystems.com
BZDATETIME::2010-04-29 18:20:28
BZCOMMENTOR::Volker Englisch
BZCOMMENT::1
As discussed in our status meeting we are ready to stop publishing organization documents for licensees and Cancer.gov. Once we've identified that Cancer.gov isn't using the organization information we'll make the necessary changes at our end.
BZDATETIME::2010-05-07 11:33:00
BZCOMMENTOR::Volker Englisch
BZCOMMENT::2
(In reply to comment #1)
> Once we've identified that Cancer.gov isn't using the
organization
> information we'll make the necessary changes at our end.
Blair has looked at the GateKeeper code and the email summarizes his findings:
------Original Message
From: Learn, Blair (NIH/NCI) [C]
Sent: Thursday, May 06, 2010 12:27 PM
To: Englisch, Volker (NIH/NCI) [C]
Cc: Gammell, Mini (NIH/NCI) [C]
Subject: RE: Organization Publishing
The organization documents are used in protocol search. Definitely lead organization, possibly institution names, possibly in displaying individual protocols as well. There are nine stored procedures which reference the organizationName table, so far I've only traced down the place where the list of lead organizations uses it.
... and another email ...
------Original Message
From: Learn, Blair (NIH/NCI) [C]
Sent: Thursday, May 06, 2010 12:45 PM
To: Englisch, Volker (NIH/NCI) [C]
Cc: Gammell, Mini (NIH/NCI) [C]
Subject: RE: Organization Publishing
Also used for Institution names. Haven't tracked down the other places yet, but I have a distinct impression that they're kinda important....
BZDATETIME::2010-05-13 19:10:08
BZCOMMENTOR::Volker Englisch
BZCOMMENT::3
Since it appears that we will have to continue publishing Org documents for the moment we want to move to Plan B and change the publishing process such that the MP3 files will not get published.
One option would be to modify our software and allow MP3 files to be
published.
However, I don't think that anybody would want to include these big
audio files in the publishing output.
Another option would be to modify our publishing document and update the SELECT statement such that MP3 (or sound) Media documents will be excluded from being processed. However, this would require to update the Organization vendor filter and drop any MediaLink blocks pointing to MP3 files or possibly drop all MediaLink blocks from Organization documents.
Maybe someone can think of another option to address this problem?
BZDATETIME::2010-05-25 16:33:36
BZCOMMENTOR::Volker Englisch
BZCOMMENT::4
I've modified the publishing documents
Primary - CDR178
QcFilterSets - CDR257983
to suppress publishing the sound Media documents.
I will now need to modify the organization filters to drop the MediaLink block if it points to one of the sound files.
BZDATETIME::2010-05-26 17:12:51
BZCOMMENTOR::Volker Englisch
BZCOMMENT::5
(In reply to comment #3)
> Another option would be to modify our publishing document and
update the
> SELECT statement such that MP3 (or sound) Media documents will be
excluded
> from being processed. However, this would require to update
the
> Organization vendor filter and drop any MediaLink blocks pointing
to
> MP3 files or possibly drop all MediaLink blocks from Organization
documents.
This last sentence turned out to be incorrect. Since the MediaLink elements are only included within the PDQBoardInformation block. Since this block isn't being exported to the vendor output there is nothing to do in regard of the Organization vendor filters.
I successfully ran a test job on FRANCK with the old and the new publishing document. This change will actually reduce the run time for the publishing job by about an hour. This is due to the fact that the large MP3 files are each extracted from the CDR as part of publishing before it's decided that these MP3 files are not supported for the vendor output.
I am not sure how William should QC the change but here is the output
of the filter failure report on FRANCK:
Job run with old publishing document
http://franck.nci.nih.gov/cgi-bin/cdr/PubStatus.py?id=7130&type=FilterFailure
Job run with new publishing document
http://franck.nci.nih.gov/cgi-bin/cdr/PubStatus.py?id=7131&type=FilterFailure
BZDATETIME::2010-05-26 17:20:05
BZCOMMENTOR::Volker Englisch
BZCOMMENT::6
(In reply to comment #5)
> Job run with old publishing document
> http://franck.nci.nih.gov/cgi-bin/cdr/PubStatus.py?id=7130&type=FilterFailure
> Job run with new publishing document
> http://franck.nci.nih.gov/cgi-bin/cdr/PubStatus.py?id=7131&type=FilterFailure
A diff between the output for both jobs showed no differences.
BZDATETIME::2010-05-28 10:47:56
BZCOMMENTOR::Volker Englisch
BZCOMMENT::7
Should I go ahead and update the publishing document in time for tonight's weekly job?
BZDATETIME::2010-06-07 17:39:14
BZCOMMENTOR::Volker Englisch
BZCOMMENT::8
If nobody objects I would like to put this in production since it extends processing time for the publishing job unnecessarily.
Any objections?
BZDATETIME::2010-06-11 18:02:13
BZCOMMENTOR::Volker Englisch
BZCOMMENT::9
The two publishing documents have been updated on BACH to exclude the
Media document from being selected for publishing if the media document
represents an audio file (path =
/Media/PhysicalMedia/SoundData/SoundEncoding):
Primary - CDR178
QcFilterSets - CDR257983
This solution works as long as we're not trying to publish other audio files as part of our documents (i.e. pronunciation of terms).
Lakshmi suggested to add an attribute to the schema for media
document that identifies these as Internal/External use documents and
modify our selection criteria accordingly.
We may want to create a new issue for this since a schema change would
be required or wait with the change until there is actually a need to
publish audio files.
I will close this issue once I've determined that the publishing job ran successfully.
BZDATETIME::2010-06-14 11:15:29
BZCOMMENTOR::Volker Englisch
BZCOMMENT::10
The weekly publishing job finished on Friday without the error
messages for publishing the MP3 files.
As a side-effect, publishing finished about one hour earlier today than
last week.
I will leave this issue open until our next status meeting to decide if we want to make the changes suggested by Lakshmi now or if we'll wait until there is an actual need for it.
BZDATETIME::2010-06-18 16:25:29
BZCOMMENTOR::Volker Englisch
BZCOMMENT::11
Publishing runs fine.
Closing issue.
Elapsed: 0:00:00.000383