CDR Tickets

Issue Number 4890
Summary [Glossary/Media] Modify audio import program
Created 2020-09-15 17:31:11
Issue Type Improvement
Submitted By Osei-Poku, William (NIH/NCI) [C]
Assigned To Kline, Bob (NIH/NCI) [C]
Status Closed
Resolved 2020-10-29 17:51:20
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.274801
Description

We are now able to include the audio re-recording process (which was a manual process before Leibniz) in the standard audio recording process (OCECDR-4507). However, for the re-recorded documents, we still need to go into each of the documents to capture administrative information. Please modify the import process so that this information will be entered programmatically when the re-recorded documents are imported into the CDR. I have attached the specs in the attached spreadsheet. 

Audio recording program proposed updates.xlsx

Comment entered 2020-10-08 09:49:42 by Kline, Bob (NIH/NCI) [C]

I don't understand what "Update Media Doc Title [With English Media Doc title]" means. Also, what is "rr" in "GTN CDRID_en_rr.mp3"?

Comment entered 2020-10-09 12:49:28 by Kline, Bob (NIH/NCI) [C]

 will be posting screen shots/examples to illustrate how we are to derive the English media document title from the English media document title.

Comment entered 2020-10-09 14:04:57 by Osei-Poku, William (NIH/NCI) [C]

That was a mistake. Sorry.  I thought this was talking about the Spanish document. I have updated the spreadsheet and posted the revised statement. Also, "rr" stands for "re-recorded". It only shows that the file was re-recorded. In essense, entering the filename as is, is fine.  

 

Audio recording program proposed updates_Corrected.xlsx

Comment entered 2020-10-09 14:29:24 by Kline, Bob (NIH/NCI) [C]

Whew! I thought I had fallen into an M.C. Escher drawing. 😛

Comment entered 2020-10-13 17:55:19 by Kline, Bob (NIH/NCI) [C]

This is another ticket for which 20 story points is probably lowballing the estimate. Until now, we have been able to use the global change harness to apply the necessary document changes (basically adding a media link), and we only had to do that for the glossary term name documents. One of the most valuable features of that package is the very carefully crafted logic to ensure that we are preserving the decisions made by the users about which changes to the document should be made publishable. However, to do what is being requested for this ticket we will have to use a completely different path for updating the documents when a media document is being re-recorded, avoiding the use of the global change harness, and doing it for both the glossary term name documents, as well as for the media documents, than the path we use for a new media document. It is unusual for the software to be making the decision to publish changes which a user has decided should not yet be published, and it's easy to imagine other changes to a document besides linking to a different pronunciation which might get swept up in this blind publishing of changes which were not previously marked as publishable. Are you confident that this is really what you want? If so, do you want the software – when it is dealing with a re-recorded pronunciation – to grab the latest saved copy of the document or the latest version?

Tagging  so she is aware of the implications of this request.

Comment entered 2020-10-13 18:18:42 by Osei-Poku, William (NIH/NCI) [C]

Can you please remind me what the difference is, between the latest saved copy and the latest version?

Comment entered 2020-10-13 20:31:32 by Kline, Bob (NIH/NCI) [C]

If you save changes to a document without checking the box to create a new version, that copy will be different from the latest version. Maybe I don't understand your question. Can you ask it a different way?

Comment entered 2020-10-13 20:56:23 by Osei-Poku, William (NIH/NCI) [C]

That actually answers my question . Volker has also explained it to me a few times  🙂.  So, in answer to your question, please use the latest saved copy.

Comment entered 2020-10-14 10:08:42 by Kline, Bob (NIH/NCI) [C]

Well, that answers one of my two questions. Can you also explicitly respond to the other, more significant question?

 

Are you OK with this modification,  (see comment with analysis above from yesterday at 5:55)?

Comment entered 2020-10-14 13:15:42 by Beckwith, Margaret (NIH/NCI) [E]

I got more information from CIAT on the importance of this request and it seems like something that would save a great deal of time:

This is an effort to make our audio re-recording process more efficient. By modifying the audio import program, it will greatly reduce the number of manual steps involved in this process and save us a significant amount of time. The modified program would automate many of the steps involved in this process by updating the data elements in each of the existing audio media docs (media doc titles, content descriptions, dates, comments, and processing statuses) and the GTN docs (dates and comments) as the re-recorded audios are published. The program already links the re-recorded audios to the media docs and GTNs, but this proposed change would update the data elements in each of the docs. We’re hoping this will be possible.

We can discuss this at the CDR meeting tomorrow in terms of development time needed and whether it should be bumped to the next release.

Comment entered 2020-10-15 16:51:58 by Kline, Bob (NIH/NCI) [C]

We discussed this ticket in depth at this afternoon's weekly status meeting, and William agreed that the standard global change processing will provide him with what he wants. I am beginning the work for the request.

Comment entered 2020-10-15 17:15:49 by Kline, Bob (NIH/NCI) [C]

: Can you explain why we would set the DateLastModified in the GTN document for a re-recorded MP3 but not when adding a new one? Same question for the TranslatedNameStatusDate.

Comment entered 2020-10-16 11:44:37 by Osei-Poku, William (NIH/NCI) [C]

For Glossary Terms, we only assign Date Last Modified when an existing document is modified so we do not add the element for new ones. I have seen other document types where the element is added for new ones as well but our business process for Glossary Terms only  requires us to add the element (or update the date) when an existing document is modified. The TranslatedNameStatusDate element is added when a new term is added and updated when a term is revised. We add/update this date when we change the TranslatedNameStatus element.

Comment entered 2020-10-16 12:10:44 by Kline, Bob (NIH/NCI) [C]

I can understand why you would not want to have the software set the DateLastModified value for a Media document it has just created. But I'm not asking about those. I'm asking about the GlossaryTermName documents, which in all cases (re-recordings or first recordings) by definition must have existed already before the spreadsheet was generated to ask Vanessa to record the pronunciations (otherwise there wouldn't be a CDR ID or the other data needed for populating the rows in that spreadsheet).

At the other end of the spectrum, you say you update TranslatedNameStatusDate when the TranslatedNameStatus changes, but you're not having the software change the latter value.

Finally, when you say "when a term is revised" are you referring just to cases in which the term name has been changed or also when the term name stays the same but the pronunciation is corrected in a re-recording of the same name?

Comment entered 2020-10-16 13:16:38 by Osei-Poku, William (NIH/NCI) [C]

At the other end of the spectrum, you say you update TranslatedNameStatusDate when the TranslatedNameStatus changes, but you're not having the software change the latter value.

 

Please do not update the TranslatedNameStatus either. So the changes should be. 

 

Glossary Term Name

     Add new Comment after Translated Name Status Date:   [Approved audio re-recording linked]. Should be the topmost if there are existing comments.

    Add or update DateLastModified with [System Date]

    Make document publishable with the following version comment:  [Spn Pub ver, system date, Audio re-recording re-linked]

I have removed the line that says "Change Translated Name Status Date  to [System date] "

 

Finally, when you say "when a term is revised" are you referring just to cases in which the term name has been changed or also when the term name stays the same but the pronunciation is corrected in a re-recording of the same name?

 

  That refers to when the term name has been changed.

Comment entered 2020-10-16 15:33:54 by Kline, Bob (NIH/NCI) [C]

We agreed yesterday that the global change harness will handle version creation, so you'll want to drop the rows which say "Make document publishable with the following version comment: ..."

Also, you haven't responded to my comment about the DateLastModified for all GTN documents (since for this process all of those documents are pre-existing and all are being modified).

Comment entered 2020-10-16 15:56:52 by Osei-Poku, William (NIH/NCI) [C]

We agreed yesterday that the global change harness will handle version creation, so you'll want to drop the rows which say "Make document publishable with the following version comment: ..."

That is right.

Also, you haven't responded to my comment about the DateLastModified for all GTN documents (since for this process all of those documents are pre-existing and all are being modified).

I am not sure I understand your question about this one. Could you elaborate? I think the request is to update DateLastModified for both the English and the Spanish.

Comment entered 2020-10-16 16:12:50 by Osei-Poku, William (NIH/NCI) [C]

We agreed yesterday that the global change harness will handle version creation, so you'll want to drop the rows which say "Make document publishable with the following version comment: ..."

Would it be possible to add the comment to the publishable version that the global change harness will create ?

Comment entered 2020-10-16 16:23:19 by Kline, Bob (NIH/NCI) [C]

OK, to recap the conversation:

  • I asked why you want the DateLastModified only set for the re-recorded names, instead of for all of the GTN documents processed by the script

  • You responded that you don't use that element for new documents but only for existing documents

  • I pointed out that while some of the Media documents are old and some are new for this processing, all of the GTN documents already exist, and they're all being modified

The logical conclusion would seem to be that all of the GTN documents should have the DateLastModified set, right?

Comment entered 2020-10-16 16:27:01 by Kline, Bob (NIH/NCI) [C]

[Jira isn't chaining the replies correctly]

Would it be possible to add the comment to the publishable version that the global change harness will create ?

Not without rewriting the global change harness, which would push this ticket into a 40-pointer.

Comment entered 2020-10-16 16:31:25 by Kline, Bob (NIH/NCI) [C]

We can have any comment we want for the documents saved by the global change system. We've never had a requirement before, though, for having customized comments for each version of each document in the batch.

Comment entered 2020-10-16 16:40:16 by Osei-Poku, William (NIH/NCI) [C]

OK, to recap the conversation:
I asked why you want the DateLastModified only set for the re-recorded names, instead of for all of the GTN documents processed by the script
You responded that you don't use that element for new documents but only for existing documents
I pointed out that while some of the Media documents are old and some are new for this processing, all of the GTN documents already exist, and they're all being modified

Okay got it. The GTNs that are not re-recording are essentially new term names that are getting audio pronunciations for the first time. It is possible that there are some outliers but the majority of them are new terms that have been created weeks before generating the spreadsheet. On the other hand, if it makes things simpler to have the DLM set for all documents processed by the script, I think we can consider that. I don't think it makes a huge difference to have the program set the DLM for all GTNs but we will have to review some of our reports to make sure they don't depend on the date so we know what to expect. So, please let me know.

Comment entered 2020-10-16 16:40:32 by Osei-Poku, William (NIH/NCI) [C]

Okay. No problem.

Comment entered 2020-10-16 16:43:28 by Osei-Poku, William (NIH/NCI) [C]

We can have any comment we want for the documents saved by the global change system. We've never had a requirement before, though, for having customized comments for each version of each document in the batch.

So it means the comment will be applied to all the documents saved but not just the specific pub version? If that is the case, we can modify the comment so it will be applied to all the saved documents.

Comment entered 2020-10-16 17:05:31 by Kline, Bob (NIH/NCI) [C]

So adding pronunciation audio for the term isn't a modification? We can do it either way. I'm just trying to make sure we're doing what makes sense the FIRST time we implement the ticket. 😉

Comment entered 2020-10-16 17:23:13 by Osei-Poku, William (NIH/NCI) [C]

It is a modification and I think we can add the DLM for all of them.

Comment entered 2020-10-19 12:08:27 by Kline, Bob (NIH/NCI) [C]

I would think the same logic would apply to the TranslatedNameStatusDate element. If it makes sense to set that date to the date the recording was linked from the GTN document for replacement recordings, it should make just as much sense to set that date to the date the recording was linked from the GTN document for new recordings. Do you agree?

Comment entered 2020-10-19 14:51:33 by Osei-Poku, William (NIH/NCI) [C]

We decided to drop this change (earlier comments)

I have removed the line that says "Change Translated Name Status Date to [System date] "

Comment entered 2020-10-19 15:48:17 by Kline, Bob (NIH/NCI) [C]

Ah, OK. Doesn't look as if that got removed from the latest version of the requirements specification. I will strip that logic from the code.

Comment entered 2020-10-19 16:18:18 by Kline, Bob (NIH/NCI) [C]

Do you really want "- Spanish" appended to the English name (with the space) instead of "-Spanish" the way the existing documents appear?

Comment entered 2020-10-19 16:25:33 by Kline, Bob (NIH/NCI) [C]

Can you confirm that we are dropping ", VR Voice" for the Creator element?

Comment entered 2020-10-19 16:27:45 by Kline, Bob (NIH/NCI) [C]

Please confirm that you really want me to drop the path name for the week from the SourceFilename element's value.

Comment entered 2020-10-19 16:36:21 by Kline, Bob (NIH/NCI) [C]

The ContentDescription element allows multiple occurrences, with at least one required. If I don't find one, I'll throw an exception. If I find more than one should I leave that block alone?

Comment entered 2020-10-19 16:39:22 by Kline, Bob (NIH/NCI) [C]

The requirements spec doesn't have anything to say about where to put the new ProcessingStatus elements. Does that mean you don't care where I put them, as long as they're inside the MediaProcessingStatuses block?

Comment entered 2020-10-19 17:09:52 by Osei-Poku, William (NIH/NCI) [C]

Do you really want "- Spanish" appended to the English name (with the space) instead of "-Spanish" the way the existing documents appear?

No. Please append "-Spanish" instead.

Comment entered 2020-10-19 17:15:32 by Osei-Poku, William (NIH/NCI) [C]

Can you confirm that we are dropping ", VR Voice" for the Creator element?

No. Please don't make any changes to the Creator element.

Comment entered 2020-10-19 17:28:03 by Osei-Poku, William (NIH/NCI) [C]

I will be surprised to see more than one content description element for the mp3 files. The images usually have more than one.

If I find more than one should I leave that block alone?

Yes, and report the problem but don't abort the program.

If I don't find one, I'll throw an exception.

Will that abort the program ?

Comment entered 2020-10-19 17:30:01 by Kline, Bob (NIH/NCI) [C]

It's difficult to know what questions you're answering, because you've stopped using the Reply links.

Comment entered 2020-10-19 19:10:06 by Osei-Poku, William (NIH/NCI) [C]

Please confirm that you really want me to drop the path name for the week from the SourceFilename element's value.

No. Please don't drop it. We had some ideas before the implementation of the new filename convention was implemented.

Comment entered 2020-10-19 19:25:49 by Osei-Poku, William (NIH/NCI) [C]

The requirements spec doesn't have anything to say about where to put the new ProcessingStatus elements. Does that mean you don't care where I put them, as long as they're inside the MediaProcessingStatuses block?

Please place the new ProcessingStatus elements on top so they become the topmost.

Comment entered 2020-10-21 09:38:10 by Kline, Bob (NIH/NCI) [C]

Will that abort the program ?

No, but it will prevent the modified document from being saved. An error is logged and displayed with the job results.

Comment entered 2020-10-21 09:42:53 by Kline, Bob (NIH/NCI) [C]

... we can modify the comment so it will be applied to all the saved documents.

Have you decided what wording to use for the common comment used for a global change job?

Comment entered 2020-10-21 09:46:00 by Kline, Bob (NIH/NCI) [C]

What should the software do if a single Media document is reused in the same job for two different GTN documents, or for two different name blocks within the same GTN document? Or is this another one of those "we'll never do that" cases?

Comment entered 2020-10-21 11:13:14 by Kline, Bob (NIH/NCI) [C]

You can have separate comment strings for the global change job used to transform the GTN documents and the global change job used to transform the recycled Media documents.

Comment entered 2020-10-21 11:44:55 by Osei-Poku, William (NIH/NCI) [C]


What should the software do if a single Media document is reused in the same job for two different GTN documents, or for two different name blocks within the same GTN document? Or is this another one of those "we'll never do that" cases?

So far we have no use cases for the scenarios above.

Comment entered 2020-10-21 12:24:20 by Osei-Poku, William (NIH/NCI) [C]
Comment entered 2020-10-22 08:01:22 by Kline, Bob (NIH/NCI) [C]

This is ready for some testing on DEV. The modifications are extensive, and I even had to make some changes to the global change harness, which we had never used before for documents with changed blobs. I have done some preliminary testing, but I have only scratched the surface. You will want to check this very thoroughly.

Comment entered 2020-10-26 15:52:58 by Osei-Poku, William (NIH/NCI) [C]

We've done some testing on DEV and so far no major issues. Essentially all the major processes are covered. However, here are a few observations:

  1. In the Media docs, the newly created files (vs the re-recordings)  filename formats follow the new naming convention "Week_2020_43/800019_en.mp3". But the re-recordings do not appear to follow the same filename format but instead showing "798602_es_rr.mp3", for example. This is not a big deal as it doesn't interfere with the process. It is just not consistent so if it can be changed to be consistent with the new filename format, that would be great. Examples: Media Doc ID - CDR 802962. MEDIA DOC ID  - 800988

  2. In the media docs for re-recordings, could you please reverse the order of processing statuses so that “Processing Complete” is always on top of  “Audio re-recording approved”? Example CDR 800988.

Comment entered 2020-10-27 11:47:02 by Kline, Bob (NIH/NCI) [C]

But the re-recordings do not appear to follow the same filename format but instead showing "798602_es_rr.mp3", for example.

I was using the pattern you asked me to use in the latest version of the requirements document.

could you please reverse the order of processing statuses ...

Once again, if you have preferences for details of how a request is implemented, please let us know before we begin implementing.

These changes have been implemented on DEV.

Comment entered 2020-10-27 19:13:54 by Osei-Poku, William (NIH/NCI) [C]

I just finished importing the audios for Week 44. It doesn't look like the documents in the Media Reuse MediaID column were updated, at least judging from the document history report. The new ones appear to have been created and updated correctly. 

728116

706202

713153

715166

 

708931

733277

Comment entered 2020-10-28 11:05:17 by Kline, Bob (NIH/NCI) [C]

It looks like you left one of the MP3 files out of the zip archive.

KeyError: "There is no item named 'Week_2020_44/44959_es.mp3 ' in the archive"

I will look into modifying the import program to show that failure and stop the program when the global change job aborts.

Comment entered 2020-10-28 11:32:29 by Kline, Bob (NIH/NCI) [C]

I have added the code to catch the missing MP3 file and abort the program.

Comment entered 2020-10-28 11:55:48 by Osei-Poku, William (NIH/NCI) [C]

Can I generate a new spreadsheet for week_44 for testing again?

Comment entered 2020-10-28 12:10:36 by Kline, Bob (NIH/NCI) [C]

I think that should work. You might want to first test the change I just made with your existing set to confirm that the software now catches the error, reports it, and stops the job.

Comment entered 2020-10-28 12:12:10 by Kline, Bob (NIH/NCI) [C]

Let me know when you're done with that check, and I'll clear the old set from the server and the database.

Comment entered 2020-10-28 17:19:09 by Osei-Poku, William (NIH/NCI) [C]

I keep getting the KeyError: "There is no item named 'Week_2020_44/44959_es.mp3 ' in the archive" with the latest files even after making sure that the mp3 file exists. I wonder if it is a trailing space issue in the spreadsheet instead.

Comment entered 2020-10-28 18:05:58 by Kline, Bob (NIH/NCI) [C]

I think you must have missed my previous message. The sequence is:

  1. You test the change I just made with your existing set to confirm that the software now catches the error, reports it, and stops the job.

  2. I clear the old set from the server and the database.

  3. You test with a corrected set.

Comment entered 2020-10-28 18:30:08 by Osei-Poku, William (NIH/NCI) [C]

Yes, I missed it. I thought I would be able to test with a corrected set after testing the fix. The fix worked when I used the same set. So, please proceed to clear the old set so I can test with the corrected set.

Comment entered 2020-10-29 07:45:49 by Kline, Bob (NIH/NCI) [C]

OK, I have performed the surgery on the file system and the database. You should be able to test now with a correct set.

Comment entered 2020-10-29 16:39:52 by Osei-Poku, William (NIH/NCI) [C]

The current batch ran successfully. However, I am not sure why newly created media docs - example 803000 - have a Date Created of "2020-10-27" instead of today's date. Also, I didn't see it (803000 )in the report I get after the run that it was created as part of the most recent batch.

Comment entered 2020-10-29 17:06:14 by Kline, Bob (NIH/NCI) [C]

What's the date on the MP3 file? That's where DateCreated comes from. Show me the report, please.

Comment entered 2020-10-29 17:11:01 by Kline, Bob (NIH/NCI) [C]

The DateCreated element is in the OriginalSource block, so it doesn't reflect when the CDR Media document was created (and never has).

Comment entered 2020-10-29 17:16:36 by Kline, Bob (NIH/NCI) [C]

Comment entered 2020-10-29 17:27:29 by Osei-Poku, William (NIH/NCI) [C]


**What's the date on the MP3 file? That's where DateCreated comes from. Show me the report, please.

This the report I was referring to and in fact the media doc ID is on there. Sorry for the false alarm. I didn't know the date created is taken from the the mp3 file so I was expecting today's date instead. That is good to know. It seems to me that all the issues have been addressed for this ticket. 

CDR ID  Processing

CDR711295          updated Media doc for CDR373934 ('radical laparoscopic prostatectomy' [en]) from Week_2020_44.zip

CDR713443          updated Media doc for CDR413891 ('stage I distal bile duct cancer' [en]) from Week_2020_44.zip

CDR798064          updated Media doc for CDR796880 ('positivo para ALK' [es]) from Week_2020_44.zip

CDR798846          updated Media doc for CDR796946 ('positivo para ROS1' [es]) from Week_2020_44.zip

CDR802999          created Media doc for CDR799488 ('Rombo syndrome' [en]) from Week_2020_44.zip

CDR803000          created Media doc for CDR799488 ('síndrome de Rombo' [es]) from Week_2020_44.zip

CDR373934          Updating link from this document to Media document CDR711295

CDR413891          Updating link from this document to Media document CDR713443

CDR796880          Updating link from this document to Media document CDR798064

CDR796946          Updating link from this document to Media document CDR798846

CDR799488          Adding link from this document to Media document CDR802999

CDR799488          Adding link from this document to Media document CDR803000

Comment entered 2020-10-29 17:50:36 by Kline, Bob (NIH/NCI) [C]

So 803000 was on the report after all?

Comment entered 2020-10-30 10:58:30 by Osei-Poku, William (NIH/NCI) [C]

Yes, as stated above. 

 

Verified on DEV. Thanks!

Comment entered 2020-11-10 14:28:48 by Osei-Poku, William (NIH/NCI) [C]

The first batch was successfully imported and everything appears to be fine apart from this media doc - 736926. The Date Created has today's date 2020-11-10 instead of the date the MP3 file was created (other media docs that were newly created, correctly retained the date the MP3 file was created). Could you please take a look to see why it is showing the date the media doc was updated?

Comment entered 2020-11-10 15:00:17 by Kline, Bob (NIH/NCI) [C]

Because that's what you asked us to do for the re-recorded docs in your requirements?

Comment entered 2020-11-10 15:28:32 by Osei-Poku, William (NIH/NCI) [C]

Please use the date the mp3 was created just like the others so there will be consistency.

Comment entered 2020-11-10 16:21:02 by Kline, Bob (NIH/NCI) [C]

Sure. Please create a ticket for that change and add it to Newton.

Comment entered 2020-11-12 12:24:38 by Osei-Poku, William (NIH/NCI) [C]

OK. Will do.

Comment entered 2020-11-19 10:33:51 by Osei-Poku, William (NIH/NCI) [C]

Verified on QA. Thanks!

Comment entered 2020-12-29 11:40:25 by Osei-Poku, William (NIH/NCI) [C]

I am closing this ticket because it may take a while to verify all the changes on PROD. Some of the changes have been verified already but we will have to wait to receive the recordings from the Vanessa and that may take a while. So far we all the changes up the point of receiving the files from Vanessa have been verified.

Attachments
File Name Posted User
Audio recording program proposed updates_Corrected.xlsx 2020-10-09 14:04:55 Osei-Poku, William (NIH/NCI) [C]
Audio recording program proposed updates.xlsx 2020-09-15 17:31:02 Osei-Poku, William (NIH/NCI) [C]
image-2020-10-10-14-23-33-249.png 2020-10-10 14:23:32 Kline, Bob (NIH/NCI) [C]
image-2020-10-29-17-16-29-398.png 2020-10-29 17:16:30 Kline, Bob (NIH/NCI) [C]

Elapsed: 0:00:00.001495