CDR Tickets

Issue Number 3129
Summary Publishing Organization with attached MP3 files
Created 2010-04-15 16:06:19
Issue Type Improvement
Submitted By Osei-Poku, William (NIH/NCI) [C]
Assigned To Englisch, Volker (NIH/NCI) [C]
Status Closed
Resolved 2010-06-18 16:25:29
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.107457
Description

BZISSUE::4806
BZDATETIME::2010-04-15 16:06:19
BZCREATOR::William Osei-Poku
BZASSIGNEE::Volker Englisch
BZQACONTACT::William Osei-Poku

Below is the initial email from Volker about the errors encountered during publishing organization records that have media docs attached:

........
We probably want to talk about this publishing error at our status meeting.
It appears that the MP3 Media document has been made publishable because it has been attached to the publishable organization document (the PDQ Treatment Board).
Maybe it's time to talk about removing the organizations from the publishing process?


Volker Englisch
Contractor - Lockheed Martin
phone: (301) 496-0102 (CTB)
mailto:volker@mail.nih.gov

Also, another email from Bob emphasizing the need to address this issue.

--------
There is one issue which needs to be discussed before too long: now that Media documents are being linked by Organization documents (for the board meeting recordings), those Media documents are being exported as part of the publication. As Volker pointed out elsewhere, this might be a good time to speed up the timing of our decision to discontinue publishing of Organization documents.

--
Bob Kline
http://www.rksystems.com
mailto:bkline@rksystems.com

Comment entered 2010-04-29 18:20:28 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2010-04-29 18:20:28
BZCOMMENTOR::Volker Englisch
BZCOMMENT::1

As discussed in our status meeting we are ready to stop publishing organization documents for licensees and Cancer.gov. Once we've identified that Cancer.gov isn't using the organization information we'll make the necessary changes at our end.

Comment entered 2010-05-07 11:33:00 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2010-05-07 11:33:00
BZCOMMENTOR::Volker Englisch
BZCOMMENT::2

(In reply to comment #1)
> Once we've identified that Cancer.gov isn't using the organization
> information we'll make the necessary changes at our end.

Blair has looked at the GateKeeper code and the email summarizes his findings:

---Original Message---
From: Learn, Blair (NIH/NCI) [C]
Sent: Thursday, May 06, 2010 12:27 PM
To: Englisch, Volker (NIH/NCI) [C]
Cc: Gammell, Mini (NIH/NCI) [C]
Subject: RE: Organization Publishing

The organization documents are used in protocol search. Definitely lead organization, possibly institution names, possibly in displaying individual protocols as well. There are nine stored procedures which reference the organizationName table, so far I've only traced down the place where the list of lead organizations uses it.

... and another email ...

---Original Message---
From: Learn, Blair (NIH/NCI) [C]
Sent: Thursday, May 06, 2010 12:45 PM
To: Englisch, Volker (NIH/NCI) [C]
Cc: Gammell, Mini (NIH/NCI) [C]
Subject: RE: Organization Publishing

Also used for Institution names. Haven't tracked down the other places yet, but I have a distinct impression that they're kinda important....

Comment entered 2010-05-13 19:10:08 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2010-05-13 19:10:08
BZCOMMENTOR::Volker Englisch
BZCOMMENT::3

Since it appears that we will have to continue publishing Org documents for the moment we want to move to Plan B and change the publishing process such that the MP3 files will not get published.

One option would be to modify our software and allow MP3 files to be published.
However, I don't think that anybody would want to include these big audio files in the publishing output.

Another option would be to modify our publishing document and update the SELECT statement such that MP3 (or sound) Media documents will be excluded from being processed. However, this would require to update the Organization vendor filter and drop any MediaLink blocks pointing to MP3 files or possibly drop all MediaLink blocks from Organization documents.

Maybe someone can think of another option to address this problem?

Comment entered 2010-05-25 16:33:36 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2010-05-25 16:33:36
BZCOMMENTOR::Volker Englisch
BZCOMMENT::4

I've modified the publishing documents
Primary - CDR178
QcFilterSets - CDR257983
to suppress publishing the sound Media documents.

I will now need to modify the organization filters to drop the MediaLink block if it points to one of the sound files.

Comment entered 2010-05-26 17:12:51 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2010-05-26 17:12:51
BZCOMMENTOR::Volker Englisch
BZCOMMENT::5

(In reply to comment #3)
> Another option would be to modify our publishing document and update the
> SELECT statement such that MP3 (or sound) Media documents will be excluded
> from being processed. However, this would require to update the
> Organization vendor filter and drop any MediaLink blocks pointing to
> MP3 files or possibly drop all MediaLink blocks from Organization documents.

This last sentence turned out to be incorrect. Since the MediaLink elements are only included within the PDQBoardInformation block. Since this block isn't being exported to the vendor output there is nothing to do in regard of the Organization vendor filters.

I successfully ran a test job on FRANCK with the old and the new publishing document. This change will actually reduce the run time for the publishing job by about an hour. This is due to the fact that the large MP3 files are each extracted from the CDR as part of publishing before it's decided that these MP3 files are not supported for the vendor output.

I am not sure how William should QC the change but here is the output of the filter failure report on FRANCK:
Job run with old publishing document
http://franck.nci.nih.gov/cgi-bin/cdr/PubStatus.py?id=7130&type=FilterFailure
Job run with new publishing document
http://franck.nci.nih.gov/cgi-bin/cdr/PubStatus.py?id=7131&type=FilterFailure

Comment entered 2010-05-26 17:20:05 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2010-05-26 17:20:05
BZCOMMENTOR::Volker Englisch
BZCOMMENT::6

(In reply to comment #5)
> Job run with old publishing document
> http://franck.nci.nih.gov/cgi-bin/cdr/PubStatus.py?id=7130&type=FilterFailure
> Job run with new publishing document
> http://franck.nci.nih.gov/cgi-bin/cdr/PubStatus.py?id=7131&type=FilterFailure

A diff between the output for both jobs showed no differences.

Comment entered 2010-05-28 10:47:56 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2010-05-28 10:47:56
BZCOMMENTOR::Volker Englisch
BZCOMMENT::7

Should I go ahead and update the publishing document in time for tonight's weekly job?

Comment entered 2010-06-07 17:39:14 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2010-06-07 17:39:14
BZCOMMENTOR::Volker Englisch
BZCOMMENT::8

If nobody objects I would like to put this in production since it extends processing time for the publishing job unnecessarily.

Any objections?

Comment entered 2010-06-11 18:02:13 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2010-06-11 18:02:13
BZCOMMENTOR::Volker Englisch
BZCOMMENT::9

The two publishing documents have been updated on BACH to exclude the Media document from being selected for publishing if the media document represents an audio file (path = /Media/PhysicalMedia/SoundData/SoundEncoding):
Primary - CDR178
QcFilterSets - CDR257983

This solution works as long as we're not trying to publish other audio files as part of our documents (i.e. pronunciation of terms).

Lakshmi suggested to add an attribute to the schema for media document that identifies these as Internal/External use documents and modify our selection criteria accordingly.
We may want to create a new issue for this since a schema change would be required or wait with the change until there is actually a need to publish audio files.

I will close this issue once I've determined that the publishing job ran successfully.

Comment entered 2010-06-14 11:15:29 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2010-06-14 11:15:29
BZCOMMENTOR::Volker Englisch
BZCOMMENT::10

The weekly publishing job finished on Friday without the error messages for publishing the MP3 files.
As a side-effect, publishing finished about one hour earlier today than last week.

I will leave this issue open until our next status meeting to decide if we want to make the changes suggested by Lakshmi now or if we'll wait until there is an actual need for it.

Comment entered 2010-06-18 16:25:29 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2010-06-18 16:25:29
BZCOMMENTOR::Volker Englisch
BZCOMMENT::11

Publishing runs fine.
Closing issue.

Elapsed: 0:00:00.001536