CDR Tickets

Issue Number 3917
Summary FileSweeper Fails to remove File
Created 2015-05-28 16:17:59
Issue Type Improvement
Submitted By Englisch, Volker (NIH/NCI) [C]
Assigned To alan
Status Closed
Resolved 2016-02-23 20:54:41
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.162109
Description

When a file cannot be removed by FileSweeper after it has been archived, the following run of FileSweeper will fail. The problem is typically related to permissions which causes all future FileSweeper runs to fail prematurely with the potential of causing the disk to fill up since the logs aren't archived anymore.
We want FileSweeper to send out a message if such an error occurs.

Comment entered 2015-05-28 16:20:44 by Englisch, Volker (NIH/NCI) [C]

A very similar issue was addressed in OCECDR-3819.

Comment entered 2015-05-28 16:34:48 by Englisch, Volker (NIH/NCI) [C]

I submitted a ticket to remove the bad file: WEBTEAM-6411.

Comment entered 2015-07-09 14:21:54 by Kline, Bob (NIH/NCI) [C]

Modify the script so that it notifies us when this happens.

Comment entered 2015-08-06 22:54:41 by alan

How does this sound:

Add an email address to the FileSweeper config file, for example, something like this, with as many email addresses as desired.

<NotificationEmail>volker@mail.nih.gov</NotificationEmail>
<NotificationEmail>alan@mail.nih.gov</NotificationEmail>
<NotificationEmail>cdr.admin@mail.nih.gov</NotificationEmail>

In the FileSweeper.py script, modify the fatalError(msg) routine to log the error message, as it does now, then send email to all NotificationEmail addresses. I suggest that messages only be sent in the event of a fatal error, not upon successful completion.

Another possible change, tangentially related to this one, could be to store the config file in the database as a miscellaneous document. That would make it easy for us to make changes in emails and other things without needing to get help from CBIIT.

Comment entered 2015-08-07 07:27:12 by Kline, Bob (NIH/NCI) [C]

Why not use a CDR group for email notification, as we do elsewhere? As For storing the config file in the repository, wouldn't we need a separate document type in order to accommodate the elements this file uses?

Comment entered 2015-08-11 11:24:56 by alan

Yes, I think we would want a document type.

This could be a schema controlled doc_type for FileSweeperConfig, or a more general schema for Config files, for example:

<element name = 'Config' type='Config'>
 <complexType name = 'Config'>
  <choice>
   <element name='FileSweeperConfig' type='FileSweeperConfig'>
   ...
  </choice>
 ...

Or alternatively we could make it one of the control types like css that is not schema validated - which has advantages. Then we could add new config files as needed without having to create more document types or editing a schema.

This frees us from requiring CBIIT support. Whether it's worth the trouble depends on how often we need to make changes to this and/or other config files.

Comment entered 2016-02-23 20:54:41 by alan

I have added the code to send email to one or more people in the event of a fatal error. The error message will be mailed and logged instead of just logged as it was. The list of addresses to which mail is sent is also logged.

To enable emails, create the user group "FileSweeper Error Notification" in the CDR (I've created it on DEV) and check the boxes for desired users. This can also be overridden on the command line - which may or may not be of any value.

I made no changes to the actual FileSweeper operation and so did not test that. I only tested the email capability - by temporarily inserting an instruction in the code to abort with a fatal error.

The new code, including some minor formatting cleanups, is on DEV and in SVN.

I'm marking this as Resolved Fixed.

Comment entered 2016-05-13 15:00:13 by Englisch, Volker (NIH/NCI) [C]

I've created the new FileSweeper Error Notification group on PROD.
Since we have no way of actively creating the conditions to make the job fail in order to test on PROD I'm going to close the ticket and reopen it later if it turns out the FileSweeper is not working properly.

Elapsed: 0:00:00.001418