CDR Tickets

Issue Number 3091
Summary Modify server to allow larger media documents to be stored and retrieved
Created 2010-02-18 13:06:05
Issue Type Improvement
Submitted By Kline, Bob (NIH/NCI) [C]
Assigned To Kline, Bob (NIH/NCI) [C]
Status Closed
Resolved 2010-03-26 10:57:25
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.107419
Description

BZISSUE::4767
BZDATETIME::2010-02-18 13:06:05
BZCREATOR::Bob Kline
BZASSIGNEE::Bob Kline
BZQACONTACT::William Osei-Poku

Increase size of request allowed to 150MB.

Comment entered 2010-02-18 14:35:13 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-02-18 14:35:13
BZCOMMENTOR::Bob Kline
BZCOMMENT::1

New build of CDR Server installed on Mahler; ready for user testing.

Comment entered 2010-02-23 16:05:17 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-02-23 16:05:17
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::2

I tried loading the audio file (.mp3) (84.1MB) but it does not attach to the media document. I will attach a screen shot of the error (attached). It really doesn’t tell you what is wrong. I am able to load and play smaller files without a problem.

Comment entered 2010-02-23 16:05:17 by Osei-Poku, William (NIH/NCI) [C]

Attachment Media error.doc has been added with description: Media Load error

Comment entered 2010-02-23 16:25:00 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-02-23 16:25:00
BZCOMMENTOR::Bob Kline
BZCOMMENT::3

Can you tell me when the error occurred (time of day) and can you attach the CdrClient.log file from the workstation on which it happened (in the Author directory under XMetaL)?

Comment entered 2010-02-23 16:49:09 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-02-23 16:49:09
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::4

I have attached the log file marked with today's date.

The error happened within 5 to 10 minutes time of posting my comment #2 (that will be about 4:15 to 4:25 and it happened several times as I tried it several times with 2 files of nearly equal size.

Comment entered 2010-02-23 16:49:09 by Osei-Poku, William (NIH/NCI) [C]

Attachment CdrClient.log has been added with description: CDRClient log file

Comment entered 2010-02-23 17:05:06 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-02-23 17:05:06
BZCOMMENTOR::Bob Kline
BZCOMMENT::5

Are you sure you were testing on Mahler? From the log file you posted it would appear that you were logged into Bach during that time frame.

Comment entered 2010-02-23 17:17:57 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-02-23 17:17:57
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::6

I tested in Mahler and not Bach. Please see a screen shot of my document activity report from Mahler

Comment entered 2010-02-23 17:17:57 by Osei-Poku, William (NIH/NCI) [C]

Attachment Document Activity Report.doc has been added with description: Document Activity Report

Comment entered 2010-02-23 17:34:50 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-02-23 17:34:50
BZCOMMENTOR::Bob Kline
BZCOMMENT::7

(In reply to comment #6)
> Created an attachment (id=1867) [details]
> Document Activity Report
>
> I tested in Mahler and not Bach. Please see a screen shot of my document
> activity report from Mahler

OK, I was looking at the command logs between 4:00 and 4:30 based on the times you gave me in comment #2. Was the save command submitted at 3:25 one of the ones that failed?

Comment entered 2010-02-23 17:44:23 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-02-23 17:44:23
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::8

(In reply to comment #7)
> (In reply to comment #6)
> > Created an attachment (id=1867) [details] [details]
> > Document Activity Report
> >
> > I tested in Mahler and not Bach. Please see a screen shot of my document
> > activity report from Mahler
>
> OK, I was looking at the command logs between 4:00 and 4:30 based on the times
> you gave me in comment #2. Was the save command submitted at 3:25 one of the
> ones that failed?

It should be about that time. However, I am not completely sure about the time but I can try to repeat the save to see if it can reproduce the same error.

Comment entered 2010-02-24 09:12:03 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-02-24 09:12:03
BZCOMMENTOR::Bob Kline
BZCOMMENT::9

Can you put the file you're trying to store on the FTP server and let me know where you put it?

Comment entered 2010-02-24 10:45:05 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-02-24 10:45:05
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::10

(In reply to comment #9)
> Can you put the file you're trying to store on the FTP server and let me know
> where you put it?

I have uploaded it to ftp://cipsftp.nci.nih.gov/
The name of the file is "cammeeting.MP3"

Comment entered 2010-02-24 17:11:52 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-02-24 17:11:52
BZCOMMENTOR::Bob Kline
BZCOMMENT::11

I've been pulling my hair out all day long trying to track down the failure users are running into trying to submit MP3 blobs to the CDR. Inside the DLL we use under XMetaL our code is assembling the command string to be sent to the CDR Server and when it tries to concatenate the part for the blob Windows sends back win32 error code 8 (out of memory). The total concatenated string would be about 90MB. As soon as I hit that failure I ask the win32 API to tell me how much memory the system has, and this is what I get back:

4025999360 total bytes of physical memory
1235927040 free bytes of physical memory
8158535680 total bytes of paging file
5338144768 free bytes of paging file
2147352576 total bytes of virtual memory
1778651136 free bytes of virtual memory

I've asked John Rehmert if Windows has the equivalent of the Unix ulimit feature, and if so, how we might be able to control it.

Comment entered 2010-03-01 11:25:41 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-03-01 11:25:41
BZCOMMENTOR::Bob Kline
BZCOMMENT::12

I was finally able to wrestle the Microsoft bug on the client to the floor with a workaround, but now I've run into a brick wall on the server end. The problem is that the client requests are transmitted as XML commands, and when we have a binary attachment for a CDR document being stored, the bytes for that file are wrapped into the command (in an encoding which tunnels the binary bytes safely). The problem is that the XML parser we are using is unable to parse a document as large as 100 megabytes, and an exception is thrown when we try. I have given some thought to the options available for dealing with the problem, and here's an outline of what I've come up with so far.

1. Debug the XML parser and determine whether it can be patched so
that it's able to parse very large documents.

This option is possible because the parser we're using is open
source, but we have no way of knowing how long such an effort
would take, and we don't know if we would succeed in eliminating
the failures without ugly and extensive modifications to the
internal workings of the parser, which could be difficult to
maintain and debug.

2. Investigate other DOM parsers to determine if another is available
which would be able to parse documents this large.

This, too, could be a protracted project without much confidence
that we'll succeed. In addition, we'd have to do some analysis
before we'd know how much of our server code would have to be
re-written to work with the new parser.

3. Use a SAX parser to pull out the CdrBlob portion of the document,
reassembling the rest to be parsed into a DOM tree, passing this
DOM tree and the attachment bytes to the command processing routines
separately.

This approach has the advantage that most of the solution can
be implemented at a single point in the server software (the top-
level module). The impact on the rest of the server code would
be the addition of an optional additional parameter which would be
ignored by everything except the commands for saving and retrieving
a document. We would need to identify the best choice for the
third-party SAX parser and incorporate it into the build process
for the server. A SAX parser is better equipped for dealing with
large XML documents because it does not build a tree for the
entire document in memory at once, but only works on a piece at
a time, allowing the user callback code to do what it wants with
what the parser finds. It is still possible that, depending on
the implementation decisions of the SAX parser, it would still
fail with very large documents.

4. Use string processing and/or regular expression methods to extract
the CdrBlob portion of the command, as in option #3.

This approach is a less expensive variation of the previous
option, slightly less clean (though we can be reasonably
confident that we can come up with an implementation which is
robust and reliable), and slightly less likely to fail because
of the size of the attachment.

5. Modify the CDR client/server command interface to have the client
send the binary attachment outside the XML command document.

This approach combines the advantages of the previous two options
with the additional benefit that if we adopt it we would probably
do so symmetrically (meaning that the server would send binary
attachments to the client separate from the XML command document).
This would address a problem we haven't encountered yet (namely
that the client might fail trying to unpack a document coming back
from the server with a very large attachment wrapped up in an
XML command document), but we shouldn't be greatly surprised if
we do run into it.

This option comes in a number of flavors. I'll try to identify
as many of them as I can think of here.

(a) The client sends two size integers to the server where it
currently sends only one. The first would represent the
XML response document's size, as it currently does, and the
second would represent the number of bytes being sent for
a binary attachment to follow the XML command document.
The second size would be zero when no binary attachment
is present.

(b) The client sets the high bit of the integer for the size
of the XML document only when a binary attachment is present,
in which case a second size integer is also sent (as is
the binary attachment). The high bit of the first integer
is stripped off (after being recorded as the flag indicating
the presence of the second integer) from the value of the
size of the XML command document. This variation has the
advantage that we would have less CDR client-server code
to modify (some, like the CdrLoader program, never deals
with binary document attachments).

(c) Include an indication within the XML command document of
the availability of a binary attachment (along with the
number of bytes in the attachment). The server would
then send the client a request for the bytes of the attach-
ment, and the client would reply with the bytes without
an XML command document.

(d) Decouple the transmission of the binary attachment from
the command to save the CDR document, and implement a
separate protocol for storing a binary object for an
existing CDR document, possibly storing only portions of
the attachment at a time.

In all of these options, I have used language describing what
the client would do when storing a CDR document, but a symmetrical
implementation would behave similarly when the client is
retrieving a document from the CDR server.

6. Avoid the problem by refusing to store very large binary files
with the CDR documents.

This is certainly the least expensive, as well as the least
satisfactory option.

7. Investigate other approaches to storing binary attachments
than directly in the CDR, such as a DAM, or an FTP server, or
the file system, with cdr:xref links to those attachments.

This option addresses issues which we are very likely to need
to deal with eventually anyway, but it would probably mean a
longer delay for the ability to store meeting minutes than
some of the other choices.

The option which I believe is the most likely to achieve what the users want in the least amount of time, assuming we don't run into the same problem on the client when we retrieve a document with a very large binary attachment, would be the fourth option. We're not as likely to hit the mirror problem on the client, because we're already doing something along the lines of option #4 when we retrieve responses from the server in our XMetaL DLL, but it's still a possibility, in which case one of the flavors of the fifth option would likely be the quickest path to success.

I'd like to discuss these options with the other members of the CDR team before proceeding any further with this issue. I'm changing the status of the issue to reflect reality. :-)

Comment entered 2010-03-01 11:28:09 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-03-01 11:28:09
BZCOMMENTOR::Bob Kline
BZCOMMENT::13

Another advantage of option #5 which I should have mentioned is that it provides a cleaner workaround to the Microsoft bug I ran into on the client than the one I implemented last week.

Comment entered 2010-03-01 11:42:28 by Beckwith, Margaret (NIH/NCI) [E]

BZDATETIME::2010-03-01 11:42:28
BZCOMMENTOR::Margaret Beckwith
BZCOMMENT::14

Since it looks like this could take a while to fix, I am wondering what to tell the Board Managers about accessing the digital recordings? Is it okay for them to use the FTP server temporarily until we get this fixed? Thanks.

Comment entered 2010-03-01 11:52:32 by Grama, Lakshmi (NIH/NCI) [E]

BZDATETIME::2010-03-01 11:52:32
BZCOMMENTOR::Lakshmi Grama
BZCOMMENT::15

I think that should be OK

Comment entered 2010-03-02 11:34:10 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-03-02 11:34:10
BZCOMMENTOR::Bob Kline
BZCOMMENT::16

Added Volker to CC list.

Comment entered 2010-03-02 13:48:44 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-03-02 13:48:44
BZCOMMENTOR::Bob Kline
BZCOMMENT::17

(In reply to comment #12)

> The option which I believe is the most likely to achieve what the users want in
> the least amount of time, assuming we don't run into the same problem on the
> client when we retrieve a document with a very large binary attachment, would
> be the fourth option.

I discussed the options with Alan and Volker, and we decided to go with the fourth option (pre-process the command on the server to pull out the blob using string handling tools before handing the rest of the XML command to the DOM parser). So unless the users object, that's what I plan to do.

Comment entered 2010-03-05 14:19:46 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-03-05 14:19:46
BZCOMMENTOR::Bob Kline
BZCOMMENT::18

Making progress. I now have storing of documents with large blobs working on Mahler (but please hold off on testing until I remove all of the debugging dialog windows from the code). So technically speaking I've done what the title of the issue asked me to do. However, that's not of much use until retrieving them works as well, and that's not the case right now. I'm hitting an "unexpected exception" when I try to do that, so I'll be digging in to find out where that's being triggered, and ponder the options for what to do about it. Changing the title of the issue to reflect the expanded scope.

Comment entered 2010-03-09 13:59:05 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-03-09 13:59:05
BZCOMMENTOR::Bob Kline
BZCOMMENT::19

I came up with a solution for retrieving the larger media blobs by bypassing the CDR Server API and using a CGI script to pull the bits from the database directly and stream them out over HTTP. So users should be able to test storing and retrieving the meeting minutes on Mahler. If you are using an older version of Internet Explorer, you may see "Action Canceled" in the web browser's page when you launch the audio file from XMetaL. It's hard to say why Microsoft thought it was a good idea to tell you that your action was canceled when you didn't cancel it and in fact the launching of the audio file succeeds. It's hard to tell why Microsoft does a lot of the things it does. You can ignore the "Action Canceled" message.

We can revisit this solution when we have fewer other high-priority tasks on our plates. We still have to address the issue of storing the blobs outside the main CDR database, but that's another task.

Comment entered 2010-03-10 11:43:30 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-03-10 11:43:30
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::20

I am testing this on Mahler but CDR freezes when I try to save the document. I had to use the task manager to close CDR each time. This is what happens: When I attach the mp3 file and click OK, I get the attached message. Then CDR freezes from that point on.

Comment entered 2010-03-10 11:43:30 by Osei-Poku, William (NIH/NCI) [C]

Attachment error_media.doc has been added with description: Media Load error1

Comment entered 2010-03-10 12:22:30 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-03-10 12:22:30
BZCOMMENTOR::Bob Kline
BZCOMMENT::21

How long do you wait before shutting down XMetaL using the task manager?

Comment entered 2010-03-10 12:27:54 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-03-10 12:27:54
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::22

(In reply to comment #21)
> How long do you wait before shutting down XMetaL using the task manager?

At one time, I waited for about 5 minutes. In other instances, I waited for less than 5 minutes but more than 2 minutes.

Comment entered 2010-03-10 12:29:05 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-03-10 12:29:05
BZCOMMENTOR::Bob Kline
BZCOMMENT::23

How much memory do you have on your system (real and virtual) (task manager will tell you)?

Comment entered 2010-03-10 12:41:46 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-03-10 12:41:46
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::24

(In reply to comment #23)
> How much memory do you have on your system (real and virtual) (task manager
> will tell you)?

From task manager:
Physical Memory = Total 3620644

I see Kernel Memory from Task Manager but not Virtual Memory (I see Kernel Memory instead). When I go to My Computer > Right-click >Properties>Advanced>Settings>Advanced, I see a Virtual Memory of 2046

Comment entered 2010-03-10 15:33:45 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-03-10 15:33:45
BZCOMMENTOR::Bob Kline
BZCOMMENT::25

Margaret:

I will switch back from the GP mailers to working on this task, unless you and/or William adjust the priorities in the issue tracker so that this task no longer has a priority higher than #4630.

Comment entered 2010-03-10 15:38:29 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-03-10 15:38:29
BZCOMMENTOR::Bob Kline
BZCOMMENT::26

William:

I'd like to do some further testing to see if we can learn a little more about where it's hanging on your machine. Please log out of XMetaL, then bring it back up again and when when you get the CDR login dialog box click on the "Options" button and change 2019 to 2010 in the CDR port field for Mahler at the bottom of the window. Click OK then finish entering your login credentials and log into Mahler. Try again to save the audio file with the Media document and tell me which messages you see.

Comment entered 2010-03-10 15:39:08 by Beckwith, Margaret (NIH/NCI) [E]

BZDATETIME::2010-03-10 15:39:08
BZCOMMENTOR::Margaret Beckwith
BZCOMMENT::27

We can talk about this at tomorrow's meeting.

Comment entered 2010-03-10 15:57:18 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-03-10 15:57:18
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::28

I was not able to connect using port 2010 on the Z-Tech network. I had to connect using VPN (which we typically don't do while at work) before I was able to connect to Mahler using port 2010.

1. It returned the same message as before
2. It returned a second message after I clicked OK (please see attached file)
3. It returned a third message after I clicked OK (Please see attached file)
4. It froze after that..

Comment entered 2010-03-10 15:57:18 by Osei-Poku, William (NIH/NCI) [C]

Attachment error_media1.doc has been added with description: Media error attachment

Comment entered 2010-03-10 16:02:59 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-03-10 16:02:59
BZCOMMENTOR::Bob Kline
BZCOMMENT::29

Please give it one more try. You don't need to bother to stop and restart XMetaL this time, nor send the screenshots for the messages. Don't kill XMetaL until I give you the word.

Comment entered 2010-03-10 16:07:24 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-03-10 16:07:24
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::30

(In reply to comment #29)
> Please give it one more try. You don't need to bother to stop and restart
> XMetaL this time, nor send the screenshots for the messages. Don't kill XMetaL
> until I give you the word.

Done! I am on the frozen page.

Comment entered 2010-03-10 16:09:58 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-03-10 16:09:58
BZCOMMENTOR::Bob Kline
BZCOMMENT::31

It may look frozen, but the client is sending the bytes to the server as fast as it can, given the speed of the network connection. The server has received about a tenth of the save command so far. This will take some patience and/or a faster network connection. Please don't kill the client unless I tell you it's really hung.

Comment entered 2010-03-10 16:11:37 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-03-10 16:11:37
BZCOMMENTOR::Bob Kline
BZCOMMENT::32

I've got 1/5 of the bytes now.

Comment entered 2010-03-10 16:19:27 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-03-10 16:19:27
BZCOMMENTOR::Bob Kline
BZCOMMENT::33

Halfway there.

Comment entered 2010-03-10 16:26:47 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-03-10 16:26:47
BZCOMMENTOR::Bob Kline
BZCOMMENT::34

We're coming down the backstretch now: only 30 million bytes to go!

Comment entered 2010-03-10 16:29:49 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2010-03-10 16:29:49
BZCOMMENTOR::Volker Englisch
BZCOMMENT::35

This is sooo exciting to watch. :-)

Comment entered 2010-03-10 16:33:45 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-03-10 16:33:45
BZCOMMENTOR::Bob Kline
BZCOMMENT::36

Done. And lest anyone think that the network connection speed is only part of the performance problem, I ran a save command using the same mp3 file from a local XMetaL client, and the entire command completed in under 18 seconds.

Comment entered 2010-03-10 16:38:40 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-03-10 16:38:40
BZCOMMENTOR::Bob Kline
BZCOMMENT::37

So it takes a little under 25 minutes to save it from your workstation. It will be important, given the speed of the network connection you're dealing with, to make sure that whoever needs to launch the media document has the browser set up to stream the bytes (that is, feed them to the user agent which plays the audio file as they come in), rather than make you save the file completely to the client's disc and then open it locally. I'm not an expert in explaining how to do that with Internet Explorer on Windows, so you may need to enlist the advice of your desktop application support specialists.

Comment entered 2010-03-11 13:25:18 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-03-11 13:25:18
BZCOMMENTOR::Bob Kline
BZCOMMENT::38

Need to expand the new CGI script for streaming the blobs so that it handles Supplementary Document mime types.

Comment entered 2010-03-11 13:30:18 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-03-11 13:30:18
BZCOMMENTOR::Bob Kline
BZCOMMENT::39

Alan will look into software that could be used to bump up the level of compression for the audio files.

Comment entered 2010-03-11 15:07:56 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-03-11 15:07:56
BZCOMMENTOR::Bob Kline
BZCOMMENT::40

(In reply to comment #38)
> Need to expand the new CGI script for streaming the blobs so that it handles
> Supplementary Document mime types.

Done.

Comment entered 2010-03-11 22:23:43 by alan

BZDATETIME::2010-03-11 22:23:43
BZCOMMENTOR::Alan Meyer
BZCOMMENT::41

(In reply to comment #39)
> Alan will look into software that could be used to bump up the
> level of compression for the audio files.

There appear to be two basic approaches to compressing MP3 files.
One is to write a simple application using an audio API library
like gstreamer or Xine, some of which have Python bindings like
PyMedia for gstreamer. There seem to be quite a few APIs
available. The other is to get a packaged application with a
command line or a graphical user interface.

It seems to me that we only want to go the API route if we don't
find an acceptable packaged utility. After reading documentation
of the gstreamer system it seemed to me that the learning curve
was steep. So I spent more time evaluating packages.

I found and installed evaluation copies of a number of programs
on my machine. The one I like best so far is Virtuosa. See:

http://www.sonicspot.com/virtuosa/virtuosa.html

It was able to handle a big file, albeit very slowly, and it had
very flexible parameters that allowed compression at many
different user selectable levels. Most of the others I tried
would convert from mp3 to something else or back again, but not
mp3 to mp3 with a different compression level, and in another
case, it would handle mp3 to mp3 conversions, but I was only able
to get a compression ratio of about a 4:3. Worse, it crashed on
the big board meeting file. I don't know if that was caused by
the file size or by corruption in the data (see below.)

I tried two test compressions using Virtuosa. One was a
compression of some music (Vivaldi's Gloria), composed of a dozen
tracks totaling around 35 MB that had been converted ("ripped")
from a CD at with high quality variable bitrate compression
probably averaging about 175 Kbs (implied by the numbers below.)
The other was the board meeting file that I extracted from the
database on mahler. In both cases I converted to a fixed rate 64
Kbs file.

Here are the compression ratios:

Vivaldi:
Input: 35,753,009 bytes
Output: 13,291,008 bytes
Ratio: 2.6:1

Board meeting:
Input: 88,757,060 bytes
Output: 26,994,816 bytes
Ratio: 3.3:1

I listened to the Vivaldi and I thought it was surprisingly good.
I'll ask Bob, whose ears are better trained (and probably more
functional since I have some high frequency hearing loss) than
mine, to listen to the two versions and give his opinion.

The board meeting was unlistenable in both versions, before and
after processing. It sounded to me like the file is corrupt. I
don't know whether it got corrupted before, during, or after the
transfer to Mahler, or whether I did something wrong with the way
I extracted it from the database. I'll need a good copy that
sounds right to do further testing.

The Virtuosa evaluation copy that I installed will be functional
for 3 weeks, but the price is cheap. The company is advertising
it for $39.95.

The disadvantages of Virtuosa are:

It has a very peculiar user interface - this seems to be de
rigeur in music programs.

Documentation is missing in the version I got.

When I clicked "Help" it told me that my version of
Windows would not run the old Windows Help file format
needed for the help file.

The software is dated 2005. I don't know that it's still
being maintained.

Conversion is very slow. It spent 48 minutes on the board
meeting file running on my $6,000 3.2 GHz dual Xeon
workstation. On the Z-Tech laptops it might have to run
overnight. But the software seemed well behaved and did not
hamper my running other programs while the conversion was in
progress.

Despite the disadvantages, this is not mission critical software
and is not expected to be frequently used. It's a utility
program. There are others. If it fails in the future we'll just
look for a replacement. If we test it on a good copy of the
board meeting minutes, and if the compression ratio remains this
good and the sound quality is acceptable, I think we should get
at least two copies, one for Z-Tech and one for CTB.

Meanwhile, if anyone knows any techno-audiophiles, ask them if
they have a favorite program for doing this. I'll ask at the CTB
programmers' meeting next Tuesday.

Comment entered 2010-03-11 22:29:27 by alan

BZDATETIME::2010-03-11 22:29:27
BZCOMMENTOR::Alan Meyer
BZCOMMENT::42

We should also check the documentation for the mp3 recorder used
at the board meetings. It may be able to produce a 64 kbs
encoding directly. If it can, we should experiment with it. It
will be a lot less trouble to get what we want right out of the
meeting than to convert it later.

Comment entered 2010-03-12 10:18:29 by Englisch, Volker (NIH/NCI) [C]

BZDATETIME::2010-03-12 10:18:29
BZCOMMENTOR::Volker Englisch
BZCOMMENT::43

One widely used software package to convert audio files is Audacity.
http://audacity.sourceforge.net/

It's open software and is available for Windows as well as Linux.

Comment entered 2010-03-12 11:54:08 by alan

BZDATETIME::2010-03-12 11:54:08
BZCOMMENTOR::Alan Meyer
BZCOMMENT::44

(In reply to comment #43)
> One widely used software package to convert audio files is Audacity.
> http://audacity.sourceforge.net/
>
> It's open software and is available for Windows as well as Linux.

I'd forgotten about that one. I'll test it on Tuesday.

I also realize what I did wrong in extracting the mp3 file
from the database. I'll fix that and retest.

Comment entered 2010-03-12 12:32:41 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-03-12 12:32:41
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::45

(In reply to comment #42)
> We should also check the documentation for the mp3 recorder used
> at the board meetings. It may be able to produce a 64 kbs
> encoding directly. If it can, we should experiment with it. It
> will be a lot less trouble to get what we want right out of the
> meeting than to convert it later.

Here are the technical details of the recorder. It is a "Philips Voice Tracer 620". We have 4 of them.
Compression rate/sampling frequency
HQ mode: 64 kbps/22 kHZ
SP mode: 48 kbps/16 kHZ
LP mode: 32 kbps/16 kHZ

When I checked the recorder that was used for recording the meeting minutes we used for the testing on Mahler, It was set to HQ mode and the user said that she had not made any changes to the settings so there is a high probability that the recording was done in HQ mode. I set the recorder to LP mode and did a few tests and the recording appeared to be very good. We will do further testing to compare the quality of the other modes and see if which mode will be good enough for the recordings and for storing in the CDR. I am assuming that if we record in either the SP mode or the LP mode it will remove the need for a converter?

Comment entered 2010-03-12 14:05:38 by alan

BZDATETIME::2010-03-12 14:05:38
BZCOMMENTOR::Alan Meyer
BZCOMMENT::46

(In reply to comment #45)
> ... I am assuming that
> if we record in either the SP mode or the LP mode it
> will remove the need for a converter?

I should think so.

But if we're going to store records of all the board meetings
from now on, let's do a little experimenting. Until we've
tried it, we won't know what combination of recording with
or without post-processing produces the best combination
of small size and high quality.

Comment entered 2010-03-16 11:57:40 by alan

BZDATETIME::2010-03-16 11:57:40
BZCOMMENTOR::Alan Meyer
BZCOMMENT::47

(In reply to comment #44)
> (In reply to comment #43)
> > One widely used software package to convert audio files is Audacity.
> > http://audacity.sourceforge.net/
> >
> > It's open software and is available for Windows as well as Linux.
>
> I'd forgotten about that one. I'll test it on Tuesday.
>
> I also realize what I did wrong in extracting the mp3 file
> from the database. I'll fix that and retest.

I fixed the extraction and it changed everything. The 64 Kbps
compression did nothing because the file was already at 64 Kbps.

Changing to 32 Kbps cut the file size almost in half. Conversion
time was dramatically faster - presumably because I was now
working with a correct mp3 file rather than a corrupted one.

I tried this in Virtuosa and in Audacity, as suggested by Volker.
Both worked but I liked Audacity better. Audacity was 50%
faster, had more options, had what I thought was a better user
interface (though it is very technical), appears to be better
supported and documented, and is free. Conversion time was a
little over two minutes on my very fast machine. Even on a
laptop I would think the conversion would be acceptably fast.

For the future, if 32 Kbps is acceptable to everyone, I think we
should just record in that format and get files that are only a
little more than half the size of the original 64 Kbps file. I
can provide the 32 Kbps file from the Audacity conversion. That
will save space in the database, reduce memory demands on the
user computers, and upload and download faster.

32 Kbps seemed fine to me, but I'm not a great judge of these
things. We need to wait for the outcome of William's tests.

If there is a detectable difference, we might want to record at
64 Kbps and save the files somewhere, but convert the data to 32
Kbps for storage in the CDR. This is like what we do with
Photoshop source files for images with compressed JPEG files
stored in the CDR.

I plan to uninstall the trial software from my machine and just
leave Audacity. William may wish to install Audacity on his
machine too. It is free. But he'll only need it if we record
in the future at 64 instead of 32 Kbps.

Comment entered 2010-03-16 14:49:35 by alan

BZDATETIME::2010-03-16 14:49:35
BZCOMMENTOR::Alan Meyer
BZCOMMENT::48

I have placed five versions of the Board meeting minutes on
Mahler so everyone can listen to determine what is acceptable
quality. The versions are:

http://mahler.nci.nih.gov/BoardMeeting-08kbps.mp3 (15.9 MB)
http://mahler.nci.nih.gov/BoardMeeting-16kbps.mp3 (25.4 MB)
http://mahler.nci.nih.gov/BoardMeeting-24kbps.mp3 (35.9 MB)
http://mahler.nci.nih.gov/BoardMeeting-32kbps.mp3 (47.0 MB)
http://mahler.nci.nih.gov/BoardMeeting-64kbps.mp3 (88.3 MB)

The last one, 64kbps, is the unconverted HQ file from the digital
recorder.

All of the recordings are difficult to listen to at the very
beginning. Presumably people were still walking into the
meeting, sitting down, getting organized, and so on. Skipping
the first few minutes may give a better idea of the sound.

Comment entered 2010-03-24 10:20:41 by Beckwith, Margaret (NIH/NCI) [E]

BZDATETIME::2010-03-24 10:20:41
BZCOMMENTOR::Margaret Beckwith
BZCOMMENT::49

Did we decide at the last meeting that we were going to go with the 16 kbps version?

Comment entered 2010-03-24 10:30:32 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-03-24 10:30:32
BZCOMMENTOR::Bob Kline
BZCOMMENT::50

I believe so.

Comment entered 2010-03-24 11:50:11 by alan

BZDATETIME::2010-03-24 11:50:11
BZCOMMENTOR::Alan Meyer
BZCOMMENT::51

(In reply to comment #49)
> Did we decide at the last meeting that we were going to go with the 16 kbps
> version?

My recollection was that CIAT would record the meetings at 64 Kbps
and save the recording somewhere, just as our illustrator does
with the Photoshop versions of images. Then they'd make a
conversion to 16 Kbps that they'd upload to the CDR. I can
help William set that up.

We ought to have some provision for saving both the Photoshop and
the high quality recording files here at NCI. I assume something
like that exists already and we can just add the meeting recordings
to it. If not, we need to set something up.

Comment entered 2010-03-24 11:58:21 by Beckwith, Margaret (NIH/NCI) [E]

BZDATETIME::2010-03-24 11:58:21
BZCOMMENTOR::Margaret Beckwith
BZCOMMENT::52

Yes, I think that is what we decided. FYI, we haven't actually been getting the Photoshop versions of the illustrations that Terese does for us. We can always ask her for those, but my understanding was that there are many "layers" to the illustrations and that it isn't just getting one illustration. Once we decide what we want and where we want to store them then I can ask her to send them. We should still proceed with the audio files now. There are some the Board Managers are waiting to hear. Can we go ahead and promote this?

Comment entered 2010-03-24 13:31:31 by alan

BZDATETIME::2010-03-24 13:31:31
BZCOMMENTOR::Alan Meyer
BZCOMMENT::53

(In reply to comment #52)
> ... FYI, we haven't actually been getting
> the Photoshop versions of the illustrations that Terese does for us. We can
> always ask her for those, but my understanding was that there are many "layers"
> to the illustrations and that it isn't just getting one illustration. Once we
> decide what we want and where we want to store them then I can ask her to send
> them. ...

Photoshop files can be very large. A good illustrator will use layers to represent different elements of the image, for example one layer for one kind of tissue, one for another, one for text overlays, etc. That way she can edit, delete or replace one without damaging the others. I believe that Photoshop files contain their own history of edits in a built-in version control. That also makes them very large.

I'm in favor of getting our own copies of these. If anything happened to Therese or to her computer, we could lose a lot of the value built up in those images.

Comment entered 2010-03-24 15:21:15 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-03-24 15:21:15
BZCOMMENTOR::Bob Kline
BZCOMMENT::54

(In reply to comment #52)

> ... Can we go ahead and promote this?

Wasn't sure if you were addressing this question to me or to CIAT. No technical barriers to promotion that I'm aware of (I'd do it at night in order to avoid disrupting work during business hours).

Comment entered 2010-03-24 15:26:34 by Beckwith, Margaret (NIH/NCI) [E]

BZDATETIME::2010-03-24 15:26:34
BZCOMMENTOR::Margaret Beckwith
BZCOMMENT::55

Okay, then let's promote it. William can work with Alan to save the files and convert them to the 16 kbps format. Are you okay with that William?

Comment entered 2010-03-24 15:48:03 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-03-24 15:48:03
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::56

(In reply to comment #55)
> Okay, then let's promote it. William can work with Alan to save the files and
> convert them to the 16 kbps format. Are you okay with that William?

Yes that is fine with me. However, I installed Audacity and converted the file to 16kps and uploaded it into CDR0000657331. I was able to successfully play the recording on my computer but when I attempted to play it in the CDR, I got the following error message:

('Cursor.execute', (u'unexpected failure for query: Query: " SELECT xml\n FROM document\n WHERE id = ?" Params: ('CDR0000657331',)',))

Bob, the other file we uploaded on 3/10 during the test also generated a similar error message.

As long as this will not be a problem on Bach, it is Okay with me to promote it.

Comment entered 2010-03-24 15:49:57 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-03-24 15:49:57
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::57

(In reply to comment #56)
> (In reply to comment #55)
> > Okay, then let's promote it. William can work with Alan to save the files and
> > convert them to the 16 kbps format. Are you okay with that William?
>
> Yes that is fine with me. However, I installed Audacity and converted the file
> to 16kps and uploaded it into CDR0000657331. I was able to successfully play
> the recording on my computer but when I attempted to play it in the CDR, I got
> the following error message:
>
> ('Cursor.execute', (u'unexpected failure for query: Query: " SELECT xml\n FROM
> document\n WHERE id = ?" Params: ('CDR0000657331',)',))
>
> Bob, the other file we uploaded on 3/10 during the test also generated a
> similar error message.
>
> As long as this will not be a problem on Bach, it is Okay with me to promote
> it.

I forgot to mention that it took less than 2 minutes to upload the file into the CDR.

Comment entered 2010-03-24 16:07:01 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-03-24 16:07:01
BZCOMMENTOR::Bob Kline
BZCOMMENT::58

(In reply to comment #56)

> ... when I attempted to play it in the CDR, I got the following error message:

Please give it another try.

Comment entered 2010-03-24 16:54:48 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-03-24 16:54:48
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::59

(In reply to comment #58)
> (In reply to comment #56)
>
> > ... when I attempted to play it in the CDR, I got the following error message:
>
> Please give it another try.

It worked. Please promote to Bach. Thanks!

Comment entered 2010-03-24 22:23:17 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2010-03-24 22:23:17
BZCOMMENTOR::Bob Kline
BZCOMMENT::60

(In reply to comment #59)

> It worked. Please promote to Bach. Thanks!

Done. I believe I've installed all of the pieces:

  • new CDR server

  • new client DLL

  • new macro file

  • new CGI script for launching the blob

Please check carefully to make sure nothing is broken!

Comment entered 2010-03-26 10:57:25 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2010-03-26 10:57:25
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::61

(In reply to comment #60)
> (In reply to comment #59)
>
> > It worked. Please promote to Bach. Thanks!
>
> Done. I believe I've installed all of the pieces:
>
> * new CDR server
> * new client DLL
> * new macro file
> * new CGI script for launching the blob
>
> Please check carefully to make sure nothing is broken!

I tested on Bach and did not encounter any problems. I have created 3 Media documents using the Screening and Prevention and CAM Boards meeting minutes.

CDR0000669357 - Screening and Prevention Board Minute Meeting March 24, 2010
CDR0000669311 - CAM Meeting Minutes _February 19, 2010
CDR0000669343 - Screening and Prevention Meeting Minutes January 27, 2010

The titles may change in the long run because we are yet to meet to decide on naming conventions.

I will need to put in an issue for possible schema and Template changes.

I am closing this issue at this point. Thank you!

Attachments
File Name Posted User
CdrClient.log 2010-02-23 16:49:09 Osei-Poku, William (NIH/NCI) [C]
Document Activity Report.doc 2010-02-23 17:17:57 Osei-Poku, William (NIH/NCI) [C]
error_media.doc 2010-03-10 11:43:30 Osei-Poku, William (NIH/NCI) [C]
error_media1.doc 2010-03-10 15:57:18 Osei-Poku, William (NIH/NCI) [C]
Media error.doc 2010-02-23 16:05:17 Osei-Poku, William (NIH/NCI) [C]

Elapsed: 0:00:00.001931