Software Support for Reviewing Term Pronunciation Analysis Draft 1 Alan Meyer March 31, 2011 INTRODUCTION ------------ This document analyzes the requirements for software to support review of the mp3 files containing term pronunciations in English and Spanish. See Bugzilla issue 5020. We assume the following: Zip files will periodically be sent to our ftp server by Vanessa. Each zip file will contain: A spreadsheet describing the contents. A collection of mp3 files, one file per term pronunciation. Users will review the mp3 files using the interface requested by William in his summary of Issue 5020. At the end of a review of a batch of mp3 files, a spreadsheet will be created and sent back to Vanessa showing the terms for which problems were found during review, together with short descriptions of the problems. For any such problem list, Vanessa will send back to us a zip file like the original with updated mp3 files for corrected pronunciations. The cycle will repeat if review of that file reveals more problem pronunciations. More information about the whole process can be found in Bugzilla issues 4926, 5013, and any future issue with "[Glossary Audio]" in the title. HOW THE PROGRAM SHOULD WORK --------------------------- The program should work as follows: 1. Prepare a menu of pronunciation zip files for review. The program would look somewhere where all of the zip files are stored and present a list of them, in sequence order, to the user. The listing should show for each zip file: Name of the file. Date and time on the file. The status of the file: Completely reviewed (perhaps these should not be included in the listing?) Started but not yet completely reviewed. Not yet started. A user will be able to select one of the files for review. Some of the files will be original files containing the first recordings for a batch of terms. Others may be batches of corrections. A batch of corrected terms will contain re-recordings of terms that were rejected in an earlier batch. 2. Display a list of all mp3 files in the zip file. The list will display: Language of the term. The term string. A link to the mp3 sound file: When clicked, the link will download the mp3 to whatever program the user has configured in his browser to play the file. A collection of controls to manage the disposition of the term, probably radio buttons with the following meanings: Accepted as is. Rejected - needs re-recording. Unreviewed. In a file that no user has seen yet, all terms would have the initial value "unreviewed". If the file has already been reviewed, some or all of the terms may have other status values. The user can change the value as desired after listening to the recording. If a user rejects a term, an input box will appear to enter a reason why the term should be re-recorded. After processing all of the terms that the user wishes to process, he or she can press one of two buttons: "Submit" - to send the information back to the server. "Cancel" - to end processing and discard any changes. 3. At the server end, when a submission is received the server will process it as follows: If the Submit button was pressed, then: For every term in the file: Check to see if the term is in a database table associated with this file (the term may also appear in a the table associated with an earlier file.) If the term is already there: Update the status and reason if required. Else Insert a new row in the table for the term. The row contains columns for: Zip file ID Language Status Term Reason Userid of the user who set the status Datetime of the row creation If all of the terms in the file have been assigned an accepted or rejected status: If any of the terms have been rejected: Produce a spreadsheet to be sent to Vanessa with information needed to re-record rejected terms. Return a message to the user saying what was done: a. All terms were processed. All were accepted. b. All terms were processed. One or more was rejected. A spreadsheet was produced. Here is a link to the spreadsheet. Note: One spreadsheet will only include terms from one original zip file. If a batch of 500 terms appears in an original zip file and only one is rejected, a spreadsheet will be produced with just one row in it. After Vanessa re-records the term, that spreadsheet will be entered into the system and appear as an item in the listing of zip files that can be selected for review. Else (there are still some unreviewed terms): Go back to step 2, reconstructing the input listing to allow the user continue entering data. The user can continue working or Cancel and work on the zip file another time. NEXT STEPS ---------- The next step is to review the above analysis and accept or amend it. I won't do any more work on this task until the review is complete. If it is approved as is, I agree with Bob that it will take a good 40 hours work to implement.