CDR Tickets

Issue Number 3370
Summary [Media] Modify Vendor Filters to Process Audio Files
Created 2011-05-23 11:05:13
Issue Type Improvement
Submitted By Englisch, Volker (NIH/NCI) [C]
Assigned To Englisch, Volker (NIH/NCI) [C]
Status Closed
Resolved 2011-07-27 18:09:53
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.107698
Description

BZISSUE::5063
BZDATETIME::2011-05-23 11:05:13
BZCREATOR::Volker Englisch
BZASSIGNEE::Volker Englisch
BZQACONTACT::William Osei-Poku

We need to modify the vendor filters for GlossaryTerm and DIS to include the MediaLink element.

Also, due to our decision to modify the MediaLink element to include the (mime-)type attribute - in order to distinguish between image and audio files - vendor filters containing this MediaLink element will also need to be updated.

Comment entered 2011-05-25 15:29:43 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2011-05-25 15:29:43
BZCOMMENTOR::Volker Englisch
BZCOMMENT::1

The following filters have been updated on MAHLER in order to produce the MediaLink for audio files:
CDR616048 - Vendor Filter: GlossaryTermName
CDR271370 - Module: Vendor Filter Templates

The DTD has been modified (OCECDR-3371) to include a "type" attribute.
a) Sample for Audio
<MediaLink ref="CDR0000696822" type="audio/mpeg"
alt="Alt-text" language="es" id="_4"/>

b) Sample for Image
<MediaLink ref="CDR0000428405" type="image/jpeg"
alt="Alt-text" language="en" thumb="Yes" id="_3">
<Caption language="en">...</Caption>
</MediaLink>

Comment entered 2011-05-31 17:16:35 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2011-05-31 17:16:35
BZCOMMENTOR::Volker Englisch
BZCOMMENT::2

Email from Blair regarding the implemented changes

----------------------------------------------------------------------

From: Learn, Blair (NIH/NCI) [C]
Sent: Wednesday, May 25, 2011 1:25 PM
To: Englisch, Volker (NIH/NCI) [C]; Prasad, Betnag (NIH/NCI) [C]; Kline, Robert (NCI)
Cc: Luke, Emile (NIH/NCI) [C]
Subject: RE: PDQ DTD R10093

I spoke with Volker earlier about how Media documents are referenced from DrugInformationSummary documents. What I understood from our conversation is that the plan is for GateKeeper to retrieve the MediaLink used by the GlossaryTerm document referenced from within the DrugInfoMetaData element.

This is a problem in that GateKeeper only loads one document at a time and therefore doesn't have access to the contents of other documents. (It's similar to the problem we had with the previous plan for MediaLink to not include the type attribute.) GateKeeper's use of references to other documents (e.g. the TerminologyLink) is presently limited to creating links to other pages and using the CDR id as an argument.

Volker suggested that it might be possible to resolve this on the CDR side by adding a filter in the post-processing to add a MediaLink to the DrugInfoMetaData. I believe that would solve it for GateKeeper.

Comment entered 2011-05-31 17:23:00 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2011-05-31 17:23:00
BZCOMMENTOR::Volker Englisch
BZCOMMENT::3

Due to the difficulties that the proposed DTD would have presented to the Gatekeeper processing we've decided to denormalize the MediaLink information for the vendors and create a new element named PronunciationInfo within the meta data block. This PronunciationInfo contains the TermPronunciation and the audio MediaLinks.

These changes have been implemented and tested on MAHLER.

Comment entered 2011-06-06 13:03:33 by Beckwith, Margaret (NIH/NCI) [E]

BZDATETIME::2011-06-06 13:03:33
BZCOMMENTOR::Margaret Beckwith
BZCOMMENT::4

Volker, is there anything we need to do to test this before it can be promoted or are you doing all of the testing?

Comment entered 2011-06-06 13:12:41 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2011-06-06 13:12:41
BZCOMMENTOR::Volker Englisch
BZCOMMENT::5

We won't be able to move these filters in production until Cancer.gov is able to process the output.
I've just refreshed FRANCK to run a publishing job with the old (current) filters to identify if the audio files could be loaded into the CDR without affecting publishing. That way we could start loading audio files now rather than having to wait until Cancer.gov is ready for the new data.

Right now I am doing all of the testing but I'm guessing that you would be able to preview the changes on the preview site once Cancer.gov is getting closer to roll out the changes.

Comment entered 2011-06-07 18:34:43 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2011-06-07 18:34:43
BZCOMMENTOR::Volker Englisch
BZCOMMENT::6

We have refreshed the CDR database on FRANCK (using a backup from BACH).
I ran a publishing job for
DrugInfoSummary
GlossaryTerm
Media
Summary
Terminology

After the publishing jobs finished, Bob loaded all available audio files to the CDR on FRANCK and I ran the same publishing jobs in order to run a before/after diff.
All of the diffs between the publishing jobs of the individual doc types came out without changes with one exception. The exception is that there were 8 glossary documents missing from the later run.

Bob: Would this be related to the fact that you had reverted an earlier run?

Comment entered 2011-06-07 18:35:47 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2011-06-07 18:35:47
BZCOMMENTOR::Volker Englisch
BZCOMMENT::7

I think I should add Bob to this issue so he can answer my question in the last comment.

Comment entered 2011-06-07 22:45:37 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-06-07 22:45:37
BZCOMMENTOR::Bob Kline
BZCOMMENT::8

(In reply to comment #6)

> Bob: Would this be related to the fact that you had reverted an earlier run?

Don't think so. Which documents were missing?

Comment entered 2011-06-08 10:01:12 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2011-06-08 10:01:12
BZCOMMENTOR::Volker Englisch
BZCOMMENT::9

(In reply to comment #8)
> Don't think so. Which documents were missing?

Only in Job8760: CDR44220.xml
Only in Job8760: CDR44229.xml
Only in Job8760: CDR44244.xml
Only in Job8760: CDR44286.xml
Only in Job8760: CDR44301.xml
Only in Job8760: CDR44338.xml
Only in Job8760: CDR44428.xml
Only in Job8760: CDR44441.xml

You're probably right. I see for all of these documents exists a newer publishable version which should have been picked up for publishing but didn't.
I'll have a look at the selection criteria.

Comment entered 2011-06-08 13:34:17 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2011-06-08 13:34:17
BZCOMMENTOR::Volker Englisch
BZCOMMENT::10

(In reply to comment #8)
> Don't think so. Which documents were missing?

Only in Job8760: CDR44220.xml
Only in Job8760: CDR44229.xml
Only in Job8760: CDR44244.xml
Only in Job8760: CDR44286.xml
Only in Job8760: CDR44301.xml
Only in Job8760: CDR44338.xml
Only in Job8760: CDR44428.xml
Only in Job8760: CDR44441.xml

You're probably right. I see for all of these documents exists a newer publishable version which should have been picked up for publishing but didn't.
I'll have a look at the selection criteria.

I reran the publishing job for these and it turns out that they are linking to non-existing Media documents. That's why they haven't been created.

Comment entered 2011-06-08 13:35:58 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2011-06-08 13:35:58
BZCOMMENTOR::Volker Englisch
BZCOMMENT::11

Seems the new Bugzilla has a new way to deal with mid-air collisions and I clicked the button I thought would throw away the second-to-last comment but guess what? :-)

Comment entered 2011-07-17 23:35:09 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2011-07-17 23:35:09
BZCOMMENTOR::Volker Englisch
BZCOMMENT::12

The following filters have been copied to BACH:
CDR0000271370.xml - R10121: Module: Vendor Filter Templates
CDR0000505580.xml - R10121: Module: Vendor Filter: DrugInfoSummary
CDR0000415359.xml - R10118: DocTitle for Media
CDR0000616047.xml - R10121: Denormalization Filter: GlossaryTermName
CDR0000616048.xml - R10121: Vendor Filter: GlossaryTermName
CDR0000486313.xml - R10121: Denormalization Filter: DrugInfoSummary
CDR0000617324.xml - R10121: Denormalization Filter: GlossaryTermName - MediaLink

Comment entered 2011-07-20 18:41:15 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2011-07-20 18:41:15
BZCOMMENTOR::Volker Englisch
BZCOMMENT::13

All of the changes are in production since Sunday night.

After we've gone through Friday's regular weekly publishing job without problems we should probably be able to close these issues.

Comment entered 2011-07-27 18:09:53 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2011-07-27 18:09:53
BZCOMMENTOR::Volker Englisch
BZCOMMENT::14

We encountered no problems during last weeks publishing jobs.

Closing issue.

Elapsed: 0:00:00.001362