CDR Tickets

Issue Number 5131
Summary Update to Python 3rd party modules in Ohm broke audio file upload automation
Created 2022-08-17 12:24:04
Issue Type Bug
Submitted By Dugan, Amy (NIH/NCI) [C]
Assigned To Kline, Bob (NIH/NCI) [C]
Status Closed
Resolved 2022-08-19 09:43:35
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.325368
Description

Automated audio file importing As part of the Python upgrade for the Ohm release, the third-party modules were updated to their current versions. The package used for reading spreadsheets for the GlossaryTermAudioReview.py script was apparently recently modified to remove support for reading .xlsx workbooks and now that package only reads the legacy .xls format (this package's specialty is handling Microsoft's original format more completely than any other package, so that's the direction they're going in).  

This issue has caused the glossary and Spanish teams to revert to a manual process for upload and review. The added work (for English) is about 10 hours for 30 terms (this assumes being able to work uninterrupted for 10 hours on just the 30 terms). The Spanish LOE appears to be higher. We expect to take up to a month or two for the current batch of 72 terms, as the team fits this manual work into other work priorities. Thus fixing this issue is a high priority.

Comment entered 2022-08-19 09:43:35 by Kline, Bob (NIH/NCI) [C]
Comment entered 2022-08-29 14:56:57 by Osei-Poku, William (NIH/NCI) [C]

Hi Was the above info intended for OCECDR-5132? If it was meant for this ticket (OCECDR-5131) then please more clarification is needed to help with testing. Thanks!

Comment entered 2022-08-29 15:11:40 by Kline, Bob (NIH/NCI) [C]

Sorry, moved the comment to the right ticket.

Comment entered 2022-08-29 16:34:54 by Osei-Poku, William (NIH/NCI) [C]

Thanks, Bob! I assume the fix allows the program to continue to recognize .xlsx  files as it did previously, right? I am trying to find out which file extensions we should  be testing with on DEV.

Comment entered 2022-08-29 17:07:31 by Kline, Bob (NIH/NCI) [C]

Right, the audio software uses modern Excel files to track the pronunciations. The legacy Excel format is unsupported. Since the initial step in the process is the creation of a new Excel workbook by the software, someone would have to deliberately choose to wander from the default path and explicitly save the workbook as the older format to end up with a problem, I think we should be OK.

Comment entered 2022-08-29 17:37:21 by Osei-Poku, William (NIH/NCI) [C]

Thanks for the clarification. Please upload the attached zip file to the DEV server for testing. It is the same file Amy sent you to upload to PROD. Week_2022_30.zip

Comment entered 2022-08-29 18:15:22 by Kline, Bob (NIH/NCI) [C]

Amy has already tested that batch on DEV.

Comment entered 2022-08-29 20:19:08 by Osei-Poku, William (NIH/NCI) [C]

OK. That's good to know. We will do more testing with this file and probably another file too. Thanks!

Comment entered 2022-08-29 21:27:20 by Kline, Bob (NIH/NCI) [C]

If you want to do end-to-end testing, you'll need to craft a custom subset of the batch. Otherwise, at a certain point in the pipeline, the job will fail when it hits a document which is on PROD, but not on the tier on which you're testing.

Comment entered 2022-09-06 13:34:39 by Osei-Poku, William (NIH/NCI) [C]

Please upload this file to the QA SFTP 

 

Week_2022_36.zip

Comment entered 2022-09-06 14:01:52 by Kline, Bob (NIH/NCI) [C]

Done.

Comment entered 2022-09-06 14:31:43 by Osei-Poku, William (NIH/NCI) [C]

Getting "No zip files found to be transferred" on QA.

Comment entered 2022-09-06 15:12:54 by Kline, Bob (NIH/NCI) [C]

Oops, put it on the wrong tier. Please try again.

Comment entered 2022-09-06 15:27:35 by Osei-Poku, William (NIH/NCI) [C]

Thanks! It worked!

Comment entered 2022-09-06 15:27:54 by Osei-Poku, William (NIH/NCI) [C]

Verified on QA. Thanks!

Attachments
File Name Posted User
Week_2022_30.zip 2022-08-29 17:37:01 Osei-Poku, William (NIH/NCI) [C]
Week_2022_36.zip 2022-09-06 13:34:36 Osei-Poku, William (NIH/NCI) [C]

Elapsed: 0:00:00.002074