CDR Tickets

Issue Number 4592
Summary [Media] Prevent download of files with wrong format
Created 2019-03-12 17:14:26
Issue Type Improvement
Submitted By Osei-Poku, William (NIH/NCI) [C]
Assigned To Englisch, Volker (NIH/NCI) [C]
Status Closed
Resolved 2019-06-14 17:55:47
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.241349
Description

Files meant for the audio pronunciation review need to be in a specific format Week_X.zip for new ones and Week_X_RevY.zip etc. However, if the files are uploaded to the FTP site with the wrong format or filename, and downloaded to the CDR server, an error message is displayed when users try to review the files and that prevents users from reviewing the files. Please investigate a way of preventing the download of files that have a wrong format or wrong file name.

Comment entered 2019-06-14 17:55:40 by Englisch, Volker (NIH/NCI) [C]

The following file has been updated to prevent files with the wrong name format to be downloaded:

  • FtpAudio.py

This change is ready for review on DEV.  

Currently, these files are stored on DEV in the audio download directory:
Week_124.zip
Week_132_Rev1N.zip
Week_132_Rev1.zip
Week_132_Rev2 1.zip
Week_132_Revx.zip
Week_132.zip
Week_999.zip

Comment entered 2019-06-20 17:05:17 by Englisch, Volker (NIH/NCI) [C]

This ticket is ready for review on DEV, .

Comment entered 2019-06-21 08:47:22 by Osei-Poku, William (NIH/NCI) [C]

I tried this in test mode and only the files in the correct format displayed. So, it worked on DEV as expected (will run in live mode later). However, can we have a notification about the files with the wrong format? Maybe some type of a warning message that does not prevent us from proceeding with downloading the files in the correct format.

Comment entered 2019-06-21 08:49:56 by Osei-Poku, William (NIH/NCI) [C]

Running in live mode produced this error message.

Comment entered 2019-06-21 13:34:50 by Englisch, Volker (NIH/NCI) [C]

Oh good!  This "error" is not a problem but a feature.  If a file had already been downloaded you're not allowed to run the download again and overwrite the file when you're in live mode.  Only in test mode are you allowed to rerun the job.

I will delete the file and let you know when you can try a live run again.

Comment entered 2019-06-21 13:47:17 by Englisch, Volker (NIH/NCI) [C]

 However, can we have a notification about the files with the wrong format?

This is a good idea.  However, I just want to mention that you will likely see all of the existing files with a wrong file format during every download attempt until someone will submit a ticket to have those files cleaned up.

Comment entered 2019-06-21 14:40:54 by Englisch, Volker (NIH/NCI) [C]

What file names would you like to see?  Only those with a specific name format or any files?

Comment entered 2019-06-21 15:46:56 by Englisch, Volker (NIH/NCI) [C]

The requested changes have been implemented.  I'm displaying now an additional block listing all files not downloaded.  I also removed all "live" files so that you'll be able to run the script in live mode at least once.

This change is ready for review on DEV.

Comment entered 2019-06-24 15:21:12 by Osei-Poku, William (NIH/NCI) [C]

Any filename that fails to be downloaded.

Comment entered 2019-06-24 15:22:13 by Osei-Poku, William (NIH/NCI) [C]

Sure. Understood. Would the program download the files that have the right format?

Comment entered 2019-06-24 15:33:05 by Osei-Poku, William (NIH/NCI) [C]

 I will delete the file and let you know when you can try a live run again.

 

I am running into the same error message when I attempt to run in live mode:

Comment entered 2019-06-24 15:34:07 by Osei-Poku, William (NIH/NCI) [C]

Verified part of the changes on DEV. I am not able to run in live mode, however.

Comment entered 2019-06-25 12:02:24 by Englisch, Volker (NIH/NCI) [C]

Yes, the files with the correct file name format will be downloaded.

Comment entered 2019-06-25 12:03:09 by Englisch, Volker (NIH/NCI) [C]

I don't understand your comment.  Could you please explain?

Comment entered 2019-06-25 12:15:58 by Englisch, Volker (NIH/NCI) [C]

The problem is that you are first running the script in test mode and then in live mode.

When you're running the script in test mode the files are being copied from the upload directory to the CIAT directory, overwriting anything that already exists in the upload directory.

When you're running the script in live mode, however, the files are being moved, after checking that the files don't already exist in the CIAT directory. 

Therefore, when you're testing you're running first in test mode which copies the files to the CIAT directory.  Then you're running the script in live mode and you see the error "Unable to copy because this file already exists in the CIAT directory".  If you are running the script first in live mode you will then be able to run it again as often as you like in test mode but not the other way around. 

We could change this behavior but a change might bring other challenges, i.e. overwriting a file that hasn't been imported yet.  We could possibly skip copying files in test mode, or copy and delete (instead of move) files in live mode, etc.

As is often the case, this isn't a problem on the PROD server.

I've removed the previously copied files again for you if you'd like to test one more time.

Comment entered 2019-06-25 12:29:45 by Osei-Poku, William (NIH/NCI) [C]

   

I've removed the previously copied files again for you if you'd like to test one more time.
Reply

 
I just ran the report in live mode (without first running in live mode) and still got the same error message. It appears to be something else causing this problem.

Comment entered 2019-06-25 12:43:36 by Englisch, Volker (NIH/NCI) [C]

I see, there is an additional check testing if the *.zip file has already been downloaded to the CDR server.  I had to remove the existing *.zip files from that directory as well to test in live mode.

Comment entered 2019-06-25 12:52:11 by Osei-Poku, William (NIH/NCI) [C]

It worked this time around but on the second page, I got the following error message "FTP Error: name 'sterr' is not defined".

Should this be expected ?

Comment entered 2019-06-25 13:01:06 by Englisch, Volker (NIH/NCI) [C]

No, that was not expected and it was a good catch!  I've fixed this and restored the *.zip files again.

Comment entered 2019-06-25 13:06:06 by Osei-Poku, William (NIH/NCI) [C]

There is still an error but it is almost blank with just opening quotation mark.

Comment entered 2019-06-25 13:09:07 by Osei-Poku, William (NIH/NCI) [C]

I was answering your question about the filename format but I guess I misunderstood your question. As far as I can tell, the files/filenames are displayed as expected.

Comment entered 2019-06-25 13:10:24 by Englisch, Volker (NIH/NCI) [C]

At which point do you see this error?  I ran the download without problems.  What link/button do you click/press once the files have been downloaded?

Comment entered 2019-06-25 13:15:02 by Osei-Poku, William (NIH/NCI) [C]

This is at the point when the files have been downloaded. That is, the second page. I click on the GetAudio button.

Comment entered 2019-06-25 13:37:50 by Englisch, Volker (NIH/NCI) [C]

You are downloading the audio files by clicking the GetAudio button.  Then the page appears listing the files downloaded and those not downloaded due to invalid file name format, and then you click the GetAudio button again?

If this was the issue then I have fixed it and added a message indicating that no files exist to download.

Comment entered 2019-06-25 13:47:31 by Osei-Poku, William (NIH/NCI) [C]

Yes, I believe that is fixed as I was able to see the downloaded files.

Comment entered 2019-06-25 13:49:08 by Osei-Poku, William (NIH/NCI) [C]

One more question. Besides checking for the overall filename format, are you also checking the individual audio filenames within the folders to see they are formatted correctly ?

Comment entered 2019-06-25 13:56:59 by Englisch, Volker (NIH/NCI) [C]

When you're saying "individual audio filenames within the folders" are you referring to the *.mp3 files - the members fo the *.zip file?

No, the task was to prevent downloading incorrectly formatted *.zip files because we ran into a problem where those incorrectly formatted *.zip files blocked further processing (the import, I believe).  It's the "Audio Import" tool that actually opens the *.zip file to determine its content.

Comment entered 2019-06-25 15:54:10 by Osei-Poku, William (NIH/NCI) [C]

Okay. Thanks! I may have to create a new ticket for a future release to check for wrong formatting of the .mp3 files as they have also caused us a lot of trouble in the past. 

 

This ticket is Verified on DEV. Thanks!

Comment entered 2019-06-26 13:55:37 by Englisch, Volker (NIH/NCI) [C]
Comment entered 2019-08-07 07:56:16 by Osei-Poku, William (NIH/NCI) [C]

How do you suggest I test this on QA?

Comment entered 2019-08-09 14:19:04 by Englisch, Volker (NIH/NCI) [C]

I've placed a few incorrectly formatted files in the QA directory.

Comment entered 2019-08-09 16:27:56 by Osei-Poku, William (NIH/NCI) [C]

I am getting "FTP Error: name "sterr' is not defined, message.

Comment entered 2019-08-09 17:28:57 by Englisch, Volker (NIH/NCI) [C]

Could you please try it again?  I did not try it myself because I don't want to download the files. There was a typo in the code that I have fixed.

Comment entered 2019-08-09 17:54:10 by Osei-Poku, William (NIH/NCI) [C]

I still get an error message but with only one open quote " displayed.

Comment entered 2019-08-09 18:04:41 by Englisch, Volker (NIH/NCI) [C]

Interesting!  

The code says:  If there is no error, then do something.  If there is an error, print the error.

What you see is the printed error which doesn't look like an error to me.  I'll have to look at this more closely on Monday.

Comment entered 2019-08-12 17:36:52 by Englisch, Volker (NIH/NCI) [C]

I have made a couple more modifications which will make testing a bit easier in addition to fixing one bug.

  • On the confirmation page I"m not displaying the Get Audio button anymore. This was causing problems - especially during testing - because it tried to download files in live mode.

  • I've modified the program to copy files on the FTP server but rename those files during test mode.  This prevents the program from failing when it's first run in test mode and then in live mode.  The live mode had failed in this situation because the files had already been copied as part of the test mode.

  • I've modified the program to display an error message when there are no files to be downloaded.  This situation cause the last error reported (displaying an empty string)

  • I've also modified the code to skip downloading the files to the CDR server during test mode and clarified the error message if the file to be downloaded already exists on the CDR server.

I've tested these changes extensively on the DEV server and copied the file to the QA server.

These changes should allow you to run the program repeatedly in both, test mode and live mode.

Comment entered 2019-08-12 23:19:56 by Osei-Poku, William (NIH/NCI) [C]

Verified on QA. Thanks!

Comment entered 2019-09-10 15:38:36 by Osei-Poku, William (NIH/NCI) [C]

How do I test this on PROD ? Should I just wait until we encounter and issue?

Comment entered 2019-09-10 15:53:11 by Englisch, Volker (NIH/NCI) [C]

In my opinion, this would be the best path of action.  Creating bad data on the production machine in order to identify that the bad data has been correctly identified seems wrong to me.

I would leave it as is and re-open in case we find a problem in the future.

Comment entered 2019-09-10 15:57:28 by Osei-Poku, William (NIH/NCI) [C]

Sounds good. I will proceed to close this ticket then. Thanks!

Attachments
File Name Posted User
audio download live mode error.JPG 2019-06-21 08:49:28 Osei-Poku, William (NIH/NCI) [C]

Elapsed: 0:00:00.001973