Software Support for Reviewing Term Pronunciation Analysis Draft 2 Alan Meyer April 7, 2011 INTRODUCTION ------------ Draft 2 updates the analysis after it was discussed in our weekly CDR status meeting. The update provides a record of what we said should be done in this task. If we run into problems and need changes during the implementation I'll try to update this document with another draft. . . . This document analyzes the requirements for software to support review of the mp3 files containing term pronunciations in English and Spanish. See Bugzilla issue 5020. We assume the following: Zip files will periodically be sent to our ftp server by Vanessa. Each zip file will contain: A spreadsheet describing the contents. A collection of mp3 files, one file per term pronunciation. Users will review the mp3 files using the interface requested by William in his summary of Issue 5020. At the end of a review of a batch of mp3 files, a spreadsheet will be created and sent back to Vanessa showing the terms for which problems were found during review, together with short descriptions of the problems. For any such problem list, Vanessa will send back to us a zip file like the original with updated mp3 files for corrected pronunciations. The cycle will repeat if review of that file reveals more problem pronunciations. More information about the whole process can be found in Bugzilla issues 4926, 5013, and any future issue with "[Glossary Audio]" in the title. HOW THE PROGRAM SHOULD WORK --------------------------- The program should work as follows: 1. Prepare a menu of pronunciation zip files for review. The program would look somewhere where all of the zip files are stored and present a list of them to the user. The listing should show for each zip file: Name of the file. Date and time on the file. The status of the file: Completely reviewed (no longer selectable for editing.) Started but not yet completely reviewed. Reviews have not yet started. The zip files will be displayed in reverse chronological order, newest at the top of the list, oldest at the bottom. A user will be able to select one of the files for review. Some of the files will be original files containing the first recordings for a batch of terms. Others may be batches of corrections. A batch of corrected terms will contain re-recordings of terms that were rejected in an earlier batch. 2. Display a web page listing all information for each term in the zip file. The list will display everything that was in the spreadsheet with the exceptions below. Spreadsheet columns are defined (in Bug #4926 comment #54 and the attached spreadsheet, which also adds CDR ID to the list. The exceptions are: Columns G ("Approved") and H ("Notes (NCI)") will be handled specially. Neither column is used in original spreadsheets that Vanessa receives and then sends back to NCI. Those columns will be ignored and left off the page when constructing the display of data from a spreadsheet that has no column G data. When sending a spreadsheet of unapproved records back to Vanessa, Column G will contain the Notes from the NCI reviewer. When Vanessa returns it to NCI, column G will contain data and _will_ be displayed back to the NCI user in the web form. The web page will provide the following form controls: A link to the mp3 sound file: When clicked, the link will download the mp3 to whatever program the user has configured in his browser to play the file. It is conceivable that an error could occur. The link to the mp3 file might be missing in the spreadsheet. The mp3 file might be corrupt and unplayable. The spreadsheet link could point to the pronunciation of something other than the listed term. The program won't know about some of these errors and, in any case won't communicate errors to Vanessa. If any error occurs, the NCI user should mark the term as unapproved and make a note in column G for Vanessa to see. A collection of controls to manage the disposition of the term, probably radio buttons with the following meanings: Approved/Accepted as is. Unapproved/Rejected - needs re-recording. Unreviewed. In a file that no user has seen yet, all terms would have the initial value "unreviewed". If the file has already been reviewed, some or all of the terms may have other status values. The user can change the value as desired after listening to the recording. If a user rejects a term, an input box will appear to enter a reason why the term should be re-recorded. After processing all of the terms that the user wishes to process, he or she can press one of two buttons: "Submit" - to send the information back to the server. "Cancel" - to end processing and discard any changes. 3. At the server end, when a submission is received the server will process it as follows: If the Submit button was pressed, then: For every term in the file: Check to see if the term is in a database table associated with this specific zip file (the term may also appear in the table in association with an earlier file.) If the term is already there: Update the status and reason if required. Else Insert a new row in the table for the term. The row contains all columns from the spreadsheet plus: A code for the disposition of the review: Approved Unapproved Not yet reviewed Zip file ID - what zip file this came from. Notes (reasons for rejecting a pronunciation) from NCI. These may have been edited or replaced during the session or could be old notes from a previous review that were let stand. Userid of the user who reviewed the term. Datetime of the row creation. If all of the terms in the file have been assigned an accepted or rejected status, then: If any of the terms have been rejected: Produce a spreadsheet to be sent to Vanessa with information needed to re-record rejected terms. The spreadsheet only includes rejected terms, not accepted ones. The "Notes (NCI)" column (column G now) will contain any new info entered during the review. Return a message to the web page user saying what was done: a. All terms were processed. All were accepted. b. All terms were processed. One or more was rejected. A spreadsheet was produced. Here is a link to the spreadsheet. Note: One spreadsheet will only include terms from one original zip file. If a batch of 500 terms appears in an original zip file and only one is rejected, a spreadsheet will be produced with just one row in it. After Vanessa re-records the term, that spreadsheet will be entered into the system and appear as an item in the listing of zip files that can be selected for review. Close the source zip file. It is now no longer available for review. Else (there are still some unreviewed terms): Go back to step 2, reconstructing the input listing to allow the user continue entering data. The user can continue working or Cancel and work on the zip file another time. If a user will be working on the page for a significant period of time, it would be wise for him or her to Submit periodically in order to save work done so far in case there is a crash or inadvertent closing of the browser. NEXT STEPS ---------- The above analysis is amended after review and assumed to now be correct. I am beginning the design and programming.