Issue Number | 3327 |
---|---|
Summary | [Glossary Audio] Web Interface for reviewing audio pronunciations |
Created | 2011-03-18 11:49:30 |
Issue Type | Improvement |
Submitted By | Osei-Poku, William (NIH/NCI) [C] |
Assigned To | alan |
Status | Closed |
Resolved | 2011-07-07 22:33:10 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/ocecdr/issue.107655 |
BZISSUE::5020
BZDATETIME::2011-03-18 11:49:30
BZCREATOR::William Osei-Poku
BZASSIGNEE::Alan Meyer
BZQACONTACT::William Osei-Poku
We discussed in yesterday's meeting that Bob will create a web
interface for reviewing completed audio pronunciations.
The web interface will:
1. Provide a link to the zip file on Bach. (The zip file will be
uploaded from the FTP site).
2. Provide CIAT access to a report that will contain the audio
files.
3. The report will also contain all the columns in the
spreadsheet.
4. CIAT will be able to click on an audio icon to listen to the
pronunciation in the browser.
5. CIAT Will be able to approve and disapprove a pronunciation through
the web interface (desired).
6. CIAT will be able generate a spreadsheet report of a list of
disapproved pronunciations to be sent to Vanessa for correction
(desired).
I hope I haven’t left out anything. If I did, please feel free to add or make corrections.
BZDATETIME::2011-03-18 12:51:37
BZCOMMENTOR::Bob Kline
BZCOMMENT::1
Looks like we've picked up some substantial additional requirements for this task that weren't discussed in yesterday's meeting. I'll hold off on doing anything on this task until Robin or Margaret approve the additional work.
BZDATETIME::2011-04-01 10:54:06
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::2
We agreed yesterday to re-assign this issue to Alan who will provide an analysis of the technical requirements for this enhancement. CIAT will also provide additional information to clarify what they want.
I have re-assigned the issue to Alan.
BZDATETIME::2011-04-05 12:02:55
BZCOMMENTOR::Alan Meyer
BZCOMMENT::3
I've attached a plain text document showing what we might want to do.
Attachment Bug5020Analysis.txt has been added with description: Analysis of what needs to be done.
BZDATETIME::2011-04-07 11:48:40
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::4
(In reply to comment #3)
> Created attachment 2103 [details]
> Analysis of what needs to be done.
>
> I've attached a plain text document showing what we might want to
do.
This looks good to me. I just have a few comments:
1. I think it is OK to keep the files that have been completely reviewed on the menu list.
2. The disposition controls should have a 'Y' and 'N' or 'Yes' and 'No' buttons since the disposition would be populated in the 'Approved?' column of the spreadsheet.
3. I think you should prevent a user from generating a spreadsheet if all of the terms in a file have not been reviewed. That is, they should all have dispositions set by a user before the spreadsheet can be generated. Two or more people will be doing the review and only one spreadsheet is to be generated so it will be good to allow the spreadsheet to be generated at the end of the review.
4. The spreadsheet has a column titled ‘Notes (Vanessa)’. The column should be displayed on the review page for reviewers to see when reviewing.
BZDATETIME::2011-04-07 23:06:25
BZCOMMENTOR::Alan Meyer
BZCOMMENT::5
Here's a revision of the analysis based on the discussion in today's CDR status meeting.
I updated the document so that we (and especially I) have a record of just what I'm supposed to do.
Attachment Bug5020Analysis.txt has been added with description: Analysis of what needs to be done - draft 2.
BZDATETIME::2011-04-08 07:55:31
BZCOMMENTOR::Bob Kline
BZCOMMENT::6
From the analysis document:
"If a user rejects a term, an input box will appear to enter a
reason why the term should be re-recorded."
You might want to get confirmation from the users that they will never want to enter a comment for a pronunciation marked as accepted (for example, "I know this sounds funny, but I called Felipe Calderón, and he confirmed that this is really how they say it").
BZDATETIME::2011-04-12 15:13:12
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::7
(In reply to comment #6)
> From the analysis document:
>
> "If a user rejects a term, an input box will appear to enter
a
>
> reason why the term should be re-recorded."
>
> You might want to get confirmation from the users that they will
never want to
> enter a comment for a pronunciation marked as accepted (for
example, "I know
> this sounds funny, but I called Felipe Calderón, and he confirmed
that this is
> really how they say it").
If I remember correctly, we decided comments that users enter in the spreadsheet should be comments meant for Vanessa. In view of that if users want to enter comments as described above, it may be better to enter them in the CDR after the document is created, so the initial approach should be fine.
BZDATETIME::2011-04-22 00:26:36
BZCOMMENTOR::Alan Meyer
BZCOMMENT::8
Status report:
I've completed a lot of the code, mostly untested, but I think I'm on track to finish next week.
BZDATETIME::2011-04-27 00:06:17
BZCOMMENTOR::Alan Meyer
BZCOMMENT::9
I've completed everything in the program except sending out the output spreadsheet by email.
However it's only partially debugged. I've got more work to do.
The most recent bug I'm working on appears to be a failure to read a cell from the Excel spreadsheet that looks fine when viewed in Excel itself. It might be a bug in my code, or it might be a bug in our ExcelReader module. If Bob is available, I'd like to go over it with him on Thursday.
BZDATETIME::2011-05-04 00:17:59
BZCOMMENTOR::Alan Meyer
BZCOMMENT::10
While Bugzilla was down during the last week we had a problem
in
which the spreadsheet for Week_015 could not be loaded.
The cause appears to be an inability of ExcelReader to read one
of the data structures in the spreadsheet that varies from the
Excel documentation Bob used to develop ExcelReader.py.
Apparently, this variation is legal since other programs besides
Excel (Open Office and the xlrd Python module) can successfully
read the spreadsheet.
Rather than try to fix the problem (especially because we don't
have the required documentation for the Excel format), we have
decided to port the program from ExcelReader to xlrd. The parts
that depended on ExcelReader have now been re-written. It can
now read Week_015.
Concomittant with this change I made several other changes to
the
program:
Sorted the file list.
The list of zip files was unsorted before - which doesn't do
any harm when the number of files is small, but as it grows,
it gets harder to find what needs to be done in the pile of
filenames. So I now sort them as follows:
Started files first, by filename.
Unreviewed files next, by filename.
Completed files last, by filename.
To make the filename sorting work I took the liberty of
renaming files to have 3 digit week numbers. So for example:
Week_09 is now Week_009.
I don't know if that will fit with Vanessa's workflow. If
not, we'll either change them back or I'll fix my program, or
do whatever we need to do.
Additional format checking.
I put in a check to be sure that the row of labels was
present as the top row of the spreadsheet. Unfortunately, at
least one spreadsheet (Week_016) is missing labels and failed
the check. I'm tempted to fix the spreadsheet rather than
remove the check from the program and accept various
different formats, but it's not hard to change the program
if that is wanted.
I also check that the first column (CDR ID) contains nothing
but numbers and that every row has a filename.
The changes are on Mahler only. I think it needs renewed
testing
there because of the amount of change. I have copied the
Week_015 file that would not load on Bach to Mahler so that it
can be used for testing.
I have reset all of the test results to unreviewed, to enable
testing.
BZDATETIME::2011-05-04 10:33:10
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::11
I have added the email exchanges for this issue as an attachment
Attachment Re Glossary term audio review is ready for testing.txt has been added with description: offline communication
BZDATETIME::2011-05-05 11:35:38
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::12
(In reply to comment #10)
> - Sorted the file list.
>
> The list of zip files was unsorted before - which doesn't do
> any harm when the number of files is small, but as it grows,
> it gets harder to find what needs to be done in the pile of
> filenames. So I now sort them as follows:
>
> Started files first, by filename.
> Unreviewed files next, by filename.
> Completed files last, by filename.
>
This is verified on Mahler.
> To make the filename sorting work I took the liberty of
> renaming files to have 3 digit week numbers. So for example:
>
> Week_09 is now Week_009.
>
> I don't know if that will fit with Vanessa's workflow. If
> not, we'll either change them back or I'll fix my program, or
> do whatever we need to do.
>
This is fine with me but it looks like we need to check this with
Vanessa if it will not disrupt her workflow since she has started
sending the files over in the old format.
> - Additional format checking.
>
> I put in a check to be sure that the row of labels was
> present as the top row of the spreadsheet. Unfortunately, at
> least one spreadsheet (Week_016) is missing labels and failed
> the check. I'm tempted to fix the spreadsheet rather than
> remove the check from the program and accept various
> different formats, but it's not hard to change the program
> if that is wanted.
>
> I also check that the first column (CDR ID) contains nothing
> but numbers and that every row has a filename.
>
> The changes are on Mahler only. I think it needs renewed
testing
> there because of the amount of change. I have copied the
> Week_015 file that would not load on Bach to Mahler so that
it
> can be used for testing.
>
> I have reset all of the test results to unreviewed, to enable
> testing.
This is also good. Will all the error messages for missing labels read like the existing one - "Expected first line to begin with "CDR ID", but got "44135":", ?
I wanted to mention that we decided in last week's meeting you will populate the table with all the files prior to Week 17. CIAT will begin using the tool for Week 17.
Besides the questions/comments I above, I think this is good to be promoted to Bach.
BZDATETIME::2011-05-05 14:45:08
BZCOMMENTOR::Volker Englisch
BZCOMMENT::13
Alan, I was wondering why some of the file names in the list of ZIP files are displayed with the apostrophe character prefixed (i.e. "'Week_017.zip" but "Week_016.zip")?
BZDATETIME::2011-05-05 15:25:06
BZCOMMENTOR::Alan Meyer
BZCOMMENT::14
(In reply to comment #13)
> Alan, I was wondering why some of the file names in the list of ZIP
files are
> displayed with the apostrophe character prefixed (i.e.
"'Week_017.zip" but
> "Week_016.zip")?
A stray quote in a display string. Now fixed on Mahler.
Thanks.
BZDATETIME::2011-05-05 23:16:18
BZCOMMENTOR::Alan Meyer
BZCOMMENT::15
I have implemented the two changes that we discussed today -
supporting spreadsheets with and without a row of column header
labels, and allowing missing file names (I actually put in the
word "MISSING!" in the cell) without failing.
I spent a lot of time testing these changes and doing general
testing of some of the internals. I think everything is good.
I took the liberty of copying the latest program to Bach. What
we had there isn't working anyway because of the column header
problem.
I also reset the tables on Mahler to enable more testing there
if
desired. I'm hoping that extensive testing is not required but
it might not be a bad idea for William to just try a few things
to satisfy himself that nothing obvious is broken.
I have not touched any of the data on Bach, though there isn't
any to speak of.
If and when requested, I can fill in all of the historical data
from Week_009 to whenever, marking everything Approved. I won't
touch anything until so directed.
BZDATETIME::2011-05-06 10:16:43
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::16
There is a problem with the report. When the save button is clicked it generates the following error message (and a bunch of other text that is not discernible)
"Error updating mp3 row for zipId=%d, cdrId=%d:
Exception Type:"
And no spreadsheet is generated when the review is complete.
BZDATETIME::2011-05-06 10:20:45
BZCOMMENTOR::Volker Englisch
BZCOMMENT::17
I received an error message, too, when clicking one row:
Expecting CDR ID integer on row=34, got "":
-> ->
<type 'exceptions.UnboundLocalError'> Python 2.7.1:
D:\Python\python.exe
Fri May 06 10:17:19 2011
A problem occurred in a Python script. Here is the sequence of
function calls leading up to the error, in the order they
occurred.
D:\Inetpub\wwwroot\cgi-bin\cdr\GlossaryTermAudioReview.py in ()
1186 # Specific zipfile display requested by name
1187 elif zipName is not None:
=> 1188 zipId = installZipFile(zipName)
1189 showZipfile(zipId, session)
1190 # By ID
zipId = None, installZipFile = <function installZipFile>, zipName
= 'Week_015_Rev1.zip'
D:\Inetpub\wwwroot\cgi-bin\cdr\GlossaryTermAudioReview.py in
installZipFile(zipName='Week_015_Rev1.zip')
499 finally:
500 # If we processed all rows successfully, commit both tables
=> 501 if done:
502 conn.commit()
503 else:
done undefined
<type 'exceptions.UnboundLocalError'>: local variable 'done'
referenced before assignment
args = ("local variable 'done' referenced before assignment",)
message = "local variable 'done' referenced before assignment"
I believe was was happening is that I tried to access a file that William had accessed at the same time and possibly completed reviewing.
BZDATETIME::2011-05-06 10:21:30
BZCOMMENTOR::Volker Englisch
BZCOMMENT::18
By the way, where is the 'Audio Review' menu option on MAHLER and FRANCK?
BZDATETIME::2011-05-06 10:23:51
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::19
(In reply to comment #18)
> By the way, where is the 'Audio Review' menu option on MAHLER and
FRANCK?
On Mahler, it is in the list right after you click on "3.Developers/System Administrators"
I am not sure if it is on Franck yet.
BZDATETIME::2011-05-06 10:33:58
BZCOMMENTOR::Volker Englisch
BZCOMMENT::20
(In reply to comment #19)
> On Mahler, it is in the list right after you click on
"3.Developers/System
> Administrators"
Is the plan to actually keep it in different places on different
systems???
This would be a first-time, wouldn't it?
BZDATETIME::2011-05-06 10:45:25
BZCOMMENTOR::Bob Kline
BZCOMMENT::21
I believe Alan wrote (in an offline message) that he was putting it there temporarily until William decided where he wanted it permanently. He just hasn't posted that message in Bugzilla yet.
BZDATETIME::2011-05-06 11:10:14
BZCOMMENTOR::Alan Meyer
BZCOMMENT::22
(In reply to comment #16)
> There is a problem with the report. When the save button is clicked
it
> generates the following error message (and a bunch of other text
that is not
> discernible)
>
> "Error updating mp3 row for zipId=%d, cdrId=%d:
> Exception Type:"
>
> And no spreadsheet is generated when the review is complete.
I've fixed the error message display. Now I need to find and
fix
the underlying error.
Can you tell me which server and which zipfile you were
processing
when this occurred?
Thanks.
BZDATETIME::2011-05-06 11:11:55
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::23
(In reply to comment #22)
> Can you tell me which server and which zipfile you were
processing
> when this occurred?
>
> Thanks.
On Bach and "Week_017.zip"
BZDATETIME::2011-05-06 12:00:25
BZCOMMENTOR::Alan Meyer
BZCOMMENT::24
(In reply to comment #17)
> I received an error message, too, when clicking one row:
>
> Expecting CDR ID integer on row=34, got "":
Can you tell me which spreadsheet that was?
> <type 'exceptions.UnboundLocalError'> Python 2.7.1: D:\Python\python.exe
Fixed on Mahler.
> I believe was was happening is that I tried to access a file
that William had
> accessed at the same time and possibly completed reviewing.
I'm not sure what happens in such cases. I think a lot depends on the
order of
events between the two users. I can implement a lock mechanism, but that
has
dangers of its own.
BZDATETIME::2011-05-06 12:08:40
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::25
(In reply to comment #24)
> (In reply to comment #17)
> > I received an error message, too, when clicking one row:
> >
> > Expecting CDR ID integer on row=34, got "":
>
> Can you tell me which spreadsheet that was?
>
> > <type 'exceptions.UnboundLocalError'> Python 2.7.1:
D:\Python\python.exe
>
> Fixed on Mahler.
>
Verified on Mahler. Please promote to Bach.
> > I believe was was happening is that I tried to access a
file that William had
> > accessed at the same time and possibly completed
reviewing.
>
> I'm not sure what happens in such cases. I think a lot depends on
the order of
> events between the two users. I can implement a lock mechanism, but
that has
> dangers of its own.
I am still getting the same error on Bach so it may not be because of the concurrent access to the same file.
BZDATETIME::2011-05-06 12:22:55
BZCOMMENTOR::Volker Englisch
BZCOMMENT::26
(In reply to comment #24)
> Can you tell me which spreadsheet that was?
It was Week_015_Rev1.zip
BZDATETIME::2011-05-06 12:59:36
BZCOMMENTOR::Alan Meyer
BZCOMMENT::27
(In reply to comment #26)
> (In reply to comment #24)
> > Can you tell me which spreadsheet that was?
>
> It was Week_015_Rev1.zip
I'm going to make some manipulations of the data there to try to find the bug. I'd like everyone to stay out of that particular spreadsheet for a while.
BZDATETIME::2011-05-06 13:44:49
BZCOMMENTOR::Alan Meyer
BZCOMMENT::28
Well, assumptions are falling like dominoes.
I assumed that the last line of a spreadsheet would have data.
When I got no more data, I was at the end. But Week_015_Rev1 has
a blank line as the last row of the spreadsheet. But it had no
CDR ID. So my program bailed out with what it thought was a data
error.
I fixed that in two ways:
1. Changed handling of blank CDR ID cells.
Instead of declaring an error, I log it to the debug log and
continue on, ignoring it in the program.
If we get a spreadsheet that has data in all the other cells
on the row except the CDR ID, that row will be logged but
ignored.
I think it should not happen that a term without a CDR ID
will appear. If it does, there isn't much we can do with it
anyway.
If all of the other alligators stop biting long enough, I may
do more with the warning.
2. I included the creation of the database entry for the zip
file and for all of the mp3 files within the zip file in a
single transaction.
If anything goes wrong loading the zip file, the database
should be completely unaffected.
Another assumption that fell was that there would be no data in
the "Approval" column of the spreadsheet. We had thought that
would always be empty. If that column is not empty, it's a
reviewer note. If it is empty then the reviewer note is in the
next column to the right.
In Week_015_Rev1 both the Approval column and the Notes (NCI)
column are filled in, e.g., "N" / "No accent over the final a".
I don't know if my fix is safe or not. What I did is this:
If there's any data in column H (Notes (NCI)), treat it as
the reviewer note.
If there's no data in column H but there is data in column G
(Approval in some spreadsheets, Notes (NCI) in others), use
that as the reviewer note.
Is that safe?
Probably not too safe. There are easy ways to break it. But I
can't think of anything better that doesn't involve making more
assumptions that can be broken, or using even trickier
heuristics. I decided to adopt this solution in spite of its
fragility because it will not lead to any permanent harm to our
CDR database. The worst it can do, I think, is to cause an
original reviewer note to disappear when the next reviewer looks
at the record. The rejection will still be there.
As we've all known for many years, writing software that parses
data where the input process is uncontrolled and unvalidated, is
always prone to these kinds of errors. I guess we just have to
do our best.
Fixes are still only on Mahler. I'll move them soon.
BZDATETIME::2011-05-06 16:07:48
BZCOMMENTOR::Alan Meyer
BZCOMMENT::29
(In reply to comment #23)
> (In reply to comment #22)
> > Can you tell me which server and which zipfile you were
processing
> > when this occurred?
> >
> > Thanks.
>
> On Bach and "Week_017.zip"
I fixed the bug in the error reporting, copied the program to
Bach, and tried again. A full error message came out,
complaining of a Unicode error in the reviewer note string for
CDR ID for CDR ID 44270, Spanish. There was garbage in the
reviewer_note field.
I checked the spreadsheet and didn't see any garbage in the
field.
I checked the database and didn't see any garbage in the
corresponding database row/column.
I reset the row for that zip file in the database table to
indicate that it was not yet complete and called it up again.
This time everything was fine. I tried several more tiems but
couldn't reproduce the error.
This spreadsheet row is different from all of the others in
that
it had a note from Vanessa saying that there was "No Spanish
Pronunciation*" for "Physician Data Query".
I searched the code looking for places the read or update the
notes from Vanessa, and the notes from the reviewer, but nothing
stood out as wrong. If there's nothing wrong in the data and
nothing wrong in the database, it stands to reason that something
is wrong in the program, but I didn't see anything.
At this point, I'm giving up looking unless and until it
happens
again. I've reset the flag on Week_017 to not yet complete, and
reset the approval on that particular record to unreviewed, so
you can try it again.
BZDATETIME::2011-05-06 16:12:53
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::30
I have attached the file with the unicode characters. I copied it from the comments boxex of 44270 and 44286. I was able to generate the spreadsheet after removing them. Sorry, I didn't know you were still working on this, I was going to attach the documents later. I will attach the other file shortly.
Attachment 44270.txt has been added with description: unicode characters
BZDATETIME::2011-05-06 16:15:11
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::31
here is the second file. Linda has already left the office. I will ask her on Monday exactly what comments were in there (if she can remember). That might provide some clues as to why the unicode characters got in there.
Attachment 44286.txt has been added with description: unicode characters1
BZDATETIME::2011-05-06 16:17:54
BZCOMMENTOR::Alan Meyer
BZCOMMENT::32
(In reply to comment #21)
> I believe Alan wrote (in an offline message) that he was putting it
there
> temporarily until William decided where he wanted it permanently.
He just
> hasn't posted that message in Bugzilla yet.
Right.
I've now updated Mahler and put the update in Subversion, and
I've
removed the menu entry from the developer / sys admin menu.
BZDATETIME::2011-05-06 16:21:15
BZCOMMENTOR::Alan Meyer
BZCOMMENT::33
(In reply to comment #30)
> I have attached the file with the unicode characters. I copied
it from the
> comments boxex of 44270 and 44286. I was able to generate the
spreadsheet after
> removing them. Sorry, I didn't know you were still working on this,
I was going
> to attach the documents later. I will attach the other file
shortly.
I think the fault was mine. I think you told me you were doing some
things
on Bach and I should have coordinated with you.
Sorry.
My program should be able to handle Unicode, but my sense of the error message was that what it was finding in that one cell wasn't correct Unicode.
Well, maybe it's straightened out now.
BZDATETIME::2011-05-11 10:35:11
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::34
We are unable to listen to the files in”Week_017Rev1.zip”,. When we click on the .mp3 file, we get the following message:
Failed to read mp3 file named 'Week_017/44272_en.mp3':
Exception Type:
Exception msg: "There is no item named u'Week_017/44272_en.mp3' in the
archive"
BZDATETIME::2011-05-11 10:46:51
BZCOMMENTOR::Volker Englisch
BZCOMMENT::35
The error message is correct.
Vanessa has made a mistake and has not modified the entries in the
spreadsheet to match the file names. I believe she had used a file from
Linda instead of using the file that was created by the Alan-tool. The
entries in the file name are named
Week_017/1234_es.mp3
She has provided an Excel file name
Week_017_Rev1 LS.xls (which should have been Week_017_Rev1.xls)
and a directory name
Week_017_Rev1.
Therefore the file names listed in the spreadsheet should have been
named
Week_017_Rev1/1234.mp3
I will notify Vanessa to resubmit a corrected Week_017_Rev1 file.
BZDATETIME::2011-05-11 11:01:23
BZCOMMENTOR::Alan Meyer
BZCOMMENT::36
(In reply to comment #35)
...
> I will notify Vanessa to resubmit a corrected Week_017_Rev1
file.
I'm going to clear out the existing database record for that
file
in expectation that we'll get a complete new zip file.
It's currently showing 8 unreviewed files.
BZDATETIME::2011-05-11 11:11:48
BZCOMMENTOR::Volker Englisch
BZCOMMENT::37
Yes, that's a good idea. That latest zip file contains 8 files all pointing to Week_017 instead of Week_017_Rev1.
BZDATETIME::2011-05-11 11:19:16
BZCOMMENTOR::Alan Meyer
BZCOMMENT::38
(In reply to comment #36)
> I'm going to clear out the existing database record for that
file
> in expectation that we'll get a complete new zip file.
Done.
The way the program works is that it only reads a spreadsheet once, loading the info from the sheet into the database. From then on, it uses what's in the database (on the theory that any reviews that are stored were based on that original spreadsheet.)
If we have a lot of problems of this type then it might be desirable for me to make some greater provision for this (Bob, I think I've already exceeded the 40 hours you predicted.)
If this happens when I'm away and it's necessary to reset a spreadsheet, the instructions are:
Log in to the cdr database with a SQL Server client. Then:
SELECT id FROM term_audio_zipfile WHERE filename = '{name of the zipfile}'
DELETE FROM term_audio_mp3 WHERE zipfile_id = {id from last query}
DELETE FROM term_audio_zipfile WHERE id = {id from last query}
BZDATETIME::2011-05-16 15:05:05
BZCOMMENTOR::Alan Meyer
BZCOMMENT::39
As we have previously discussed, I have updated the term audio
tables on Bach as follows:
For each initial zip file in Weeks 012, 015, 016:
Set the initial values of all review_status to 'A'
(approved).
For each review file in each of the above series:
Set the initial values of all review_status to 'A'
(approved).
For each entry in the file matching an entry in the
previous file (same cdr_id, same language):
Set the review_status of each corresponding entry in
the preceding file to 'R' (rejected). This overrides
the 'A' that was put in in a previous step.
Set the reviewer_note of each corresponding entry in
the preceding file equal to the value found in the
review file. This overrides the NULL value that was
put in when the table was loaded from the zip file.
I have not done anything with the reviewer_id or the date
reviewed. They have whatever values they had when the tables
were first loaded. Some have my userid.
If anyone wants that changed, please post the rules I should
follow.
BZDATETIME::2011-05-16 16:16:23
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::40
I reviewed the files (on the initial review page) and they all look good. They are marked as "Completed". However, there is one, "Week_17_Rev1.zip" which has a review status of "Started". It looks like it is the one that had errors and for which Vanessa submitted a replacement file. Could you give it a different status? Or should it be completely removed?
Also, in one of our meetings we talked about allowing multiple users the ability to review the files. Is that still possible? I guess this may be difficult to implement since users don't want the files sorted by Language? At this point, I have told users that only one of them can review at a time and they coordinate who needs to review at a certain time or another but sometimes that becomes a challenge.
BZDATETIME::2011-05-16 16:43:17
BZCOMMENTOR::Alan Meyer
BZCOMMENT::41
(In reply to comment #40)
> I reviewed the files (on the initial review page) and they all look
good. They
> are marked as "Completed". However, there is one,
"Week_17_Rev1.zip" which has
> a review status of "Started". It looks like it is the one that had
errors and
> for which Vanessa submitted a replacement file. Could you give it a
different
> status? Or should it be completely removed?
I think the best thing to do would be to move that file out of the directory and delete it from the database. I can do that for you but I'll wait for you to say so before I do it. It will be as if it were never there.
If that's the right thing to do, let me know before tomorrow afternoon to be sure I do it before I leave on vacation on Wednesday.
> Also, in one of our meetings we talked about allowing multiple
users the
> ability to review the files. Is that still possible? I guess this
may be
> difficult to implement since users don't want the files sorted by
Language? At
> this point, I have told users that only one of them can review at a
time and
> they coordinate who needs to review at a certain time or another
but sometimes
> that becomes a challenge.
Right now the software won't support two users reviewing at the same time. When one clicks the Save button, his view of the review status and notes will overwrite whatever is there, clobbering any Save by the other reviewer that happened after the first one opened the file. The behavior is like Bugzilla, but without the mid-air collision detection and resolution. In my software, the first airplane crashes and burns and the second one flies on as if nothing happened.
Changing that would be pretty hard, for the same reasons that we don't allow two users to edit one CDR document at the same time.
I could setup a locking mechanism to alert someone if another user had called up the interface. It would require that a user press a button when he finished working. But it would involve some trickiness. We might need a way to break locks in case someone goes home or exits the system without clicking the button.
Alternatively, I could setup a Bugzilla style collision detection. It wouldn't enable two users to work at once or even prevent them from doing it, but would at least alert someone that he is going to clobber changes that were made since he opened the files.
I'd prefer to leave it as is and have the reviewers coordinate their access with each other - unless this is important for productivity. I might also not be able to implement anything complicated before I leave on vacation on Wednesday.
BZDATETIME::2011-05-16 17:31:08
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::42
(In reply to comment #41)
> I think the best thing to do would be to move that file out of the
directory
> and delete it from the database. I can do that for you but I'll
wait for you
> to say so before I do it. It will be as if it were never
there.
>
> If that's the right thing to do, let me know before tomorrow
afternoon to be
> sure I do it before I leave on vacation on Wednesday.
>
I am OK with that. Please proceed as you suggested above.
> Right now the software won't support two users reviewing at the
same time.
> I'd prefer to leave it as is and have the reviewers coordinate
their access
> with each other - unless this is important for productivity. I
might also not
> be able to implement anything complicated before I leave on
vacation on
> Wednesday.
Sure. Let's keep things as they are now. Thanks!
BZDATETIME::2011-05-16 17:42:13
BZCOMMENTOR::Alan Meyer
BZCOMMENT::43
(In reply to comment #42)
> (In reply to comment #41)
> > I think the best thing to do would be to move that file out of
the directory
> > and delete it from the database. I can do that for you but
I'll wait for you
> > to say so before I do it. It will be as if it were never
there.
> >
> > If that's the right thing to do, let me know before tomorrow
afternoon to be
> > sure I do it before I leave on vacation on Wednesday.
> >
>
> I am OK with that. Please proceed as you suggested above.
Done.
I moved the file to the QuestionableNames directory and removed the data from the database.
BZDATETIME::2011-05-18 13:08:30
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::44
(In reply to comment #43)
> (In reply to comment #42)
> > (In reply to comment #41)
> > > I think the best thing to do would be to move that file
out of the directory
> > > and delete it from the database. I can do that for you
but I'll wait for you
> > > to say so before I do it. It will be as if it were never
there.
> > >
> > > If that's the right thing to do, let me know before
tomorrow afternoon to be
> > > sure I do it before I leave on vacation on
Wednesday.
> > >
> >
> > I am OK with that. Please proceed as you suggested
above.
>
> Done.
>
> I moved the file to the QuestionableNames directory and removed the
data from
> the database.
Verified on Bach. Marked as resolved.
BZDATETIME::2011-05-18 13:09:04
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::45
Closing Issue. Thank you!
BZDATETIME::2011-06-09 14:41:31
BZCOMMENTOR::Alan Meyer
BZCOMMENT::46
I'm re-opening this task in light of the problems we've had
with
Week 21.
Studying the database and the spreadsheets, I see two problems.
1. The directory name portion of the mp3 filename is generated
incorrectly by my program.
Where I should have generated:
"Week_21_Rev2/12345_en.mp3"
^^^
I actually generated:
"Week_21_Rev1/12345_en.mp3"
^^^
In other words, my program generated Rev1 as an internal name
even though it correctly generated Rev2 as the name of the
output spreadsheet.
This bug did NOT cause the error that users saw when they
tried to open the Rev2 zip file. It should be fixed (and I
think I've fixed it), but I'm not sure it did any harm since
the "directory" part of the mp3 filename is an insignificant
artifact on our system. No actual directory is created. The
program works fine no matter what directory name is
generated.
It might have caused problems for Vanessa however. I don't
know enough about her workflow to say.
I found the cause of the bug and have written a fix. I'm
going to do some more testing before I promote it to Bach.
2. There is an off-by-one row column mismatch in both Rev1 and
Rev2 spreadsheets.
The row for CDR ID = 44645 has a filename referencing 44648.
We see the following:
44645 : English : Week_021_Rev1/44648_en.mp3
44648 : English : Week_021_Rev1/44648_es.mp3
44648 : Spanish : Week_021_Rev1/44657_es.mp3
44657 : Spanish : Week_021_Rev1/44670_es.mp3
...
44854 : English : Week_021_Rev1/44856_en.mp3
44856 : English : Week_021_Rev1/_es.mp3
It turns out that this is NOT caused by a bug in the audio
review program. The spreadsheets named "Week_021_Rev1.xls"
and "Week_021_Rev2.xls" were not generated by my program and
do not match the ones generated by my program. I save my
spreadsheets after generating them, and the ones I saved
don't have this problem.
The munged spreadsheets contain a macro in column E, the mp3
Filename column. That macro has a bug in it that caused the
filename to be generated from the CDR ID on the following row
instead of the same row.
Clearly, somebody edited the spreadsheets by hand and
introduced the errors. Due to the hand editing, the review
program ran into a filename that didn't exist in the zipfile,
displayed the error message, and quit.
I'm going to do some more work and I'll post again.
BZDATETIME::2011-06-09 15:45:46
BZCOMMENTOR::Alan Meyer
BZCOMMENT::47
I installed my revised program on Bach to fix the Rev1 instead
of
Rev2 directory name bug. I tested it on Mahler by taking part of
the original Week_021 file and generating a Rev1, Rev2, and Rev3
spreadsheet series from it. It looked okay. Since we know that
the older version was wrong, it shouldn't be worse.
As to how to proceed, I think we need to discuss it.
I've attached the Rev1 spreadsheet generated by the audio
review
program. That's the one that should have been used. Since I
don't yet know why it wasn't used, I can't say for sure what we
should do next.
My inclination is the following:
1. Delete the Week_021_Rev1 and _Rev2 zip files, but NOT the
original Week_021 file.
2. Create a new Week 21 Rev1 file using the correct spreadsheet
that was generated by the program and just adding in the mp3
files into the zip file with that spreadsheet.
3. Delete all of the existing Week 21 Rev1 and Rev2 data from
the
database table.
That will make the status of the Rev1 data "Unreviewed". I
can do the backing out by hand.
4. Re-review Rev1. There aren't that many terms and someone has
already reviewed them, so I would think the re-review will
only take a few minutes.
But that plan may need to be modified in light of the hand
construction of the Rev1 spreadsheet on Bach. If that was done
because of problems with the program generated one, then I'll
need to find out what the problem was and fix it.
Can someone explain why the Rev1 spreadsheet from the program
was
edited, replacing the filenames with macros?
Attachment Week_021_Rev1.xls has been added with description: Week_021_Rev1 generated by the review program
BZDATETIME::2011-06-10 09:30:54
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::48
Alan, I think we should proceed with the your plan outlined above and have CIAT review Rev1 again. I will let everyone involved in the review process know how important it is to use only the spreadsheet generated by the program.
BZDATETIME::2011-06-10 15:35:33
BZCOMMENTOR::Alan Meyer
BZCOMMENT::49
(In reply to comment #48)
> Alan, I think we should proceed with the your plan outlined above
and have CIAT
> review Rev1 again. I will let everyone involved in the review
process know how
> important it is to use only the spreadsheet generated by the
program.
Okay.
I'll get it done some time this weekend to have it ready for Monday.
I'll post a comment when it's done.
BZDATETIME::2011-06-13 00:09:11
BZCOMMENTOR::Alan Meyer
BZCOMMENT::50
I've backed out the Week_021_Rev1 and _Rev2 entries from the
database, as described in Comment #47. In case anyone wants to
refer to them, I've renamed the existing zip files to:
Old_Week_021_Rev1.zip
Old_Week_021_Rev2.zip
Because of the name changes, they will no longer show up in the
table of zipfiles to be reviewed.
The spreadsheet that should be used in place of them is the one
attached in Comment #47, which also shows up as the
"Week_021_Rev1 generated by the review program" in the list of
attachments.
BZDATETIME::2011-06-13 09:38:07
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::51
(In reply to comment #50)
> I've backed out the Week_021_Rev1 and _Rev2 entries from the
> database, as described in Comment #47. In case anyone wants
to
> refer to them, I've renamed the existing zip files to:
>
> Old_Week_021_Rev1.zip
> Old_Week_021_Rev2.zip
>
> Because of the name changes, they will no longer show up in
the
> table of zipfiles to be reviewed.
>
> The spreadsheet that should be used in place of them is the
one
> attached in Comment #47, which also shows up as the
> "Week_021_Rev1 generated by the review program" in the list
of
> attachments.
It looks like the next step is to create a new Week 21 Rev1 file for CIAT to review.
BZDATETIME::2011-06-13 10:36:56
BZCOMMENTOR::Alan Meyer
BZCOMMENT::52
(In reply to comment #51)
> (In reply to comment #50)
> > I've backed out the Week_021_Rev1 and _Rev2 entries from
the
> > database, as described in Comment #47. In case anyone wants
to
> > refer to them, I've renamed the existing zip files to:
> >
> > Old_Week_021_Rev1.zip
> > Old_Week_021_Rev2.zip
> >
> > Because of the name changes, they will no longer show up in
the
> > table of zipfiles to be reviewed.
> >
> > The spreadsheet that should be used in place of them is the
one
> > attached in Comment #47, which also shows up as the
> > "Week_021_Rev1 generated by the review program" in the list
of
> > attachments.
>
> It looks like the next step is to create a new Week 21 Rev1 file
for CIAT to
> review.
Yes. The files should be all ready to go. The Old_Week_021_Rev1.zip file should contain everything that's needed. The main thing is just to correct the filenames that are off by one. My preferred way to do that would be for someone to insert them into the program generated spreadsheet attached to comment #47. If someone isn't sure what to do, I can do it for you.
Then the reviewer(s) have to re-review the mp3 files. That has been done before and the reviewer comments are there from the original review, so I'm hoping that will just be a few minutes work.
BZDATETIME::2011-06-13 10:41:19
BZCOMMENTOR::Alan Meyer
BZCOMMENT::53
(In reply to comment #52)
...
> The main thing is just to correct the filenames that are off by
one.
...
That's for the spreadsheet. We also need to create a new zip file containing the spreadsheet and the mp3 files - which I presume are also in the Old... file.
Someone should check to be sure those aren't off by one also. If so, they need to be renamed.
I can fix the whole thing if desired, or maybe Vanessa will get a better idea of what the problem was if she does it. Or maybe it will just confuse her. Let's do whatever you, William, think is best.
BZDATETIME::2011-06-13 11:11:00
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::54
(In reply to comment #52)
> > It looks like the next step is to create a new Week 21 Rev1
file for CIAT to
> > review.
>
> Yes. The files should be all ready to go. The Old_Week_021_Rev1.zip
file
> should contain everything that's needed. The main thing is just to
correct the
> filenames that are off by one. My preferred way to do that would be
for
> someone to insert them into the program generated spreadsheet
attached to
> comment #47. If someone isn't sure what to do, I can do it for
you.
I checked the audio directory in the ftp site but did not find
Old_Week_021_Rev1.zip. My guess is that we don't have access to that
folder. If this is the case, you can proceed to rename the files. I will
let Vanessa know about this.
BZDATETIME::2011-06-13 11:21:26
BZCOMMENTOR::Volker Englisch
BZCOMMENT::55
(In reply to comment #54)
> I checked the audio directory in the ftp site
The files are not on the FTP site but on BACH.
BZDATETIME::2011-06-13 11:21:41
BZCOMMENTOR::Alan Meyer
BZCOMMENT::56
Sorry, I left the file in the directory where Volker's program puts it, but that's not accessible from outside.
I've attached it here.
Attachment Old_Week_021_Rev1.zip has been added with description: Old Week_21_Rev1 zipfile, containing munged spreadsheet
BZDATETIME::2011-06-13 11:37:54
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::57
(In reply to comment #56)
> Created attachment 2123 [details]
> Old Week_21_Rev1 zipfile, containing munged spreadsheet
>
> Sorry, I left the file in the directory where Volker's program puts
it, but
> that's not accessible from outside.
>
> I've attached it here.
Could you also post the Old Week_21_Rev2 zipfile? The filename problem was presumably fixed before loading that folder.
BZDATETIME::2011-06-13 11:45:26
BZCOMMENTOR::Alan Meyer
BZCOMMENT::58
Here it is.
Attachment Old_Week_021_Rev2.zip has been added with description: Old Week_21_Rev2
BZDATETIME::2011-06-13 13:30:20
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::59
(In reply to comment #58)
> Created attachment 2124 [details]
> Old Week_21_Rev2
>
> Here it is.
Thank you! This one still had the errors in it so Vanessa used the new program generated spreadsheet and also re-compressed the files and uploaded zip file to her ftp site:
The link to the folder is here:
www.vrvoice.com/clients/Week_021_Rev1.zip
We have reviewed the spreadsheet and listened to the audio and
everything appears to be okay. How do you want to proceed?
1. Should Vanessa reload it to the cips ftp site?
2. Should I attach the zip file or you will access it from the
link?
3. Or I should just attach it to this issue?
BZDATETIME::2011-06-13 13:34:52
BZCOMMENTOR::Bob Kline
BZCOMMENT::60
Ready for you to do the "after" publishing job, Volker.
Attachment Request4926-franck-201110613124219.log has been added with description: Log from import job on Franck
BZDATETIME::2011-06-13 13:36:10
BZCOMMENTOR::Bob Kline
BZCOMMENT::61
(In reply to comment #60)
> Created attachment 2125 [details]
> Log from import job on Franck
>
> Ready for you to do the "after" publishing job, Volker.
Oops! Wrong issue.
BZDATETIME::2011-06-13 13:41:36
BZCOMMENTOR::Alan Meyer
BZCOMMENT::62
(In reply to comment #59)
> We have reviewed the spreadsheet and listened to the audio and
everything
> appears to be okay. How do you want to proceed?
> 1. Should Vanessa reload it to the cips ftp site?
That would be my choice - just so everything goes through the regular cycle.
In order to get everything into the database we'll want to go through the audio review program and approve each item. If CIAT has listened to everything already it won't be necessary to listen to the mp3's again, unless someone wants to. Just check the approve radio button for each line. Should be a two minute job.
Thank you to you and Vanessa. I hope this wasn't painful.
BZDATETIME::2011-06-14 12:26:05
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::63
Vanessa loaded the new file and we have reviewed all of the terms without encountering any problems so I think the problem is fixed now. However, there is another problem with unicode characters.
It looks like the program doesn't like some of the Spanish special characters :-). Please take a look at Week_023 on Bach (CDR IDs - 45576 - 45585 - 45649) They are all cases where users had either accidentally used special characters or needed to use special characters for the Spanish comments. But upon saving the page, more Unicode characters are generated making the comment unreadable.
BZDATETIME::2011-06-14 14:42:02
BZCOMMENTOR::Alan Meyer
BZCOMMENT::64
(In reply to comment #63)
...
> It looks like the program doesn't like some of the Spanish special
characters
...
After consulting with Bob I learned what I was doing wrong. I was reading input incorrectly, putting out Unicode but not accounting for input correctly. The problem was compounded every time the user clicks the Save button so that what started as one wrong character got re-interpreted and expanded with each save until it became a long string of trash.
I have a fix that seems to work but I'd like to test a bit more before installing it. I'll also need to fix the data to remove the trash from the existing reviewer notes.
Unfortunately, I have to go to a meeting soon and won't be able to install the fix until after that. I plan to get it all working late this afternoon. I hope that won't inconvenience anyone too much.
BZDATETIME::2011-06-14 17:47:56
BZCOMMENTOR::Alan Meyer
BZCOMMENT::65
I think the character set problem is fixed on Bach. I modified
the program to accept input of Unicode characters and edited the
reviewer notes in the Week_023 file to delete the garbage and put
in reasonable characters (apostrophes in some cases, accented 'i'
in another.)
While testing I encountered a different error. Someone copied
an
error message into a reviewer note and, for some reason, it
included
a great number of blank lines, so many that it crashed the
database
update. They may have been in the error message, or may have been
added by accident in copying it into the note field. I'll check
further on Mahler.
I edited that reviewer note to eliminate the extraneous data
and
to test the length of input. Anything above 2040 characters is
now chopped off at that point.
It's ready to resume use on Bach.
BZDATETIME::2011-06-16 09:51:57
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::66
(In reply to comment #65)
> It's ready to resume use on Bach.
Everything seems to be working fine now. Let me know if I should close this issue.
BZDATETIME::2011-06-16 15:12:43
BZCOMMENTOR::Alan Meyer
BZCOMMENT::67
(In reply to comment #66)
> Everything seems to be working fine now. Let me know if I should
close this
> issue.
We might have more to do but we can re-open if so. There are no outstanding issues at the moment so closing it is fine with me.
BZDATETIME::2011-06-22 17:04:06
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::68
The report has been working without any problem so I am going to close the issue.
BZDATETIME::2011-06-22 17:04:36
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::69
(In reply to comment #68)
> The report has been working without any problem so I am going to
close the
> issue.
Issue closed. Thank you!
BZDATETIME::2011-07-07 22:33:10
BZCOMMENTOR::Alan Meyer
BZCOMMENT::70
William reported a problem in the Week_024_Rev1 processing. The problem was discovered during a period when Bugzilla was down, so this comment is being added after everything is complete.
I tracked the problem down to what appeared to be a newline translation problem where newlines embedded in the data get doubled each time the data is saved to the database. Two of the note fields had lots of embedded newlines which may have come from cutting and pasting an HTML error message into the note field of the spreadsheet, and they eventually inflated the size beyond what the table accommodated.
I fixed the problem in the program. The fix normalizes newlines in the midst of a note to a single newline, and deletes all whitespace from both ends of the text. Both William and I tested it on Mahler. I then promoted it to Bach.
I also searched the stored data to see if there were other records that had extra whitespace. 15 of them did, so I ran a program to clean all of them up in the same way as if they came from the newly revised term audio review program.
I made a backup of the table on Bach, then ran the program, then compared the new and old tables. Everything looks good.
Since all of the work on this is done I'm leaving the issue closed rather than re-opening it.
File Name | Posted | User |
---|---|---|
44270.txt | 2011-05-06 16:12:53 | Osei-Poku, William (NIH/NCI) [C] |
44286.txt | 2011-05-06 16:15:11 | Osei-Poku, William (NIH/NCI) [C] |
Bug5020Analysis.txt | 2011-04-07 23:06:25 | |
Bug5020Analysis.txt | 2011-04-05 12:02:55 | |
Old_Week_021_Rev1.zip | 2011-06-13 11:21:41 | |
Old_Week_021_Rev2.zip | 2011-06-13 11:45:26 | |
Re Glossary term audio review is ready for testing.txt | 2011-05-04 10:33:10 | Osei-Poku, William (NIH/NCI) [C] |
Request4926-franck-201110613124219.log | 2011-06-13 13:34:52 | |
Week_021_Rev1.xls | 2011-06-09 15:45:46 |
Elapsed: 0:00:00.000638