Issue Number | 4025 |
---|---|
Summary | Create Web Interface to Maintain PDQ Partner Info |
Created | 2016-02-09 14:27:18 |
Issue Type | Sub-task |
Submitted By | Englisch, Volker (NIH/NCI) [C] |
Assigned To | Kline, Bob (NIH/NCI) [C] |
Status | Closed |
Resolved | 2016-04-18 07:54:39 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/ocecdr/issue.178581 |
We need to create a web page allowing us to maintain the data stored in the PDQ Licensee tables.
I'm assigning this ticket to Bob.
The file to update the partner information exists in
http://cdr-dev.cancer.gov/cgi-bin/cdr/edit-pdq-contact.py?p=TEST
The variable p will either be passed at TEST or CDR. The webpage currently allows to update individual contact records. Several things need to be modified (only the first is absolutely necessary):
Ability to add user records
if possible, display content of Show all Contacts in table format: Organization Name, Contact Name, Email address
maybe ability to only display active/inactive contacts
Calendar widget for Activation/Termination date
Yesterday you mentioned batch functionality that you didn't mention in the previous comment. Have you decided against including that feature?
Not really but I had the feeling I'm asking for too much and wasn't
sure how to put in words what I'm doing manually.
The point is that an account is given to an organization, i.e. UBOB, and
the organization lists several users to be notified. That could be
2,3,4,5 users for UBOB. Some of the changes would be for individuals
only, like their name, phone number, email address, but others would be
for all of the users of UBOB like the organization name, termination
date or the renewal date.
Which values are invariant across all users for a given organization, and which vary per user? It's starting to sound as if we should really have a second table for the organizations, rather than storing our data in a way which violates normalization requirements.
Maybe that's the Einstein release.
The org-wide values would be product, org_name, org_status,
ftp_username, activation_date, termination_date, renewal_date.
All others are record specific and the product or ID would not
change.
Here's what we need, then:
CREATE TABLE data_partner_org
INTEGER NOT NULL IDENTITY PRIMARY KEY,
(org_id VARCHAR(255) NOT NULL UNIQUE,
org_name VARCHAR(64) NOT NULL,
product CHAR(1) NOT NULL,
org_status NOT NULL,
activated DATETIME NULL,
terminated DATETIME NULL)
renewal DATETIME
CREATE TABLE data_partner_contact
INTEGER NOT NULL IDENTITY PRIMARY KEY,
(contact_id INTEGER NOT NULL REFERENCES data_partner_org,
org_id VARCHAR(64) NOT NULL,
email_addr VARCHAR(255) NOT NULL,
person_name VARCHAR(64) NULL,
phone INTEGER NOT NULL,
notif_count CHAR(1) NOT NULL,
contact_type VARCHAR(64) NULL,
ftp_username CHAR(1) NOT NULL,
notified NULL) notif_date DATETIME
It's not clear why both notified
and
notif_date
are both needed. If we have the nullable date
column, that would seem to answer the question about whether
notification has taken place (in which case, I would call
that column notified
). On the other hand,
all of the columns for the date are NULL, so maybe these columns don't
mean what I think they do.
Using the normalized tables is the only sensible way to support bulk updates of the organization information. I can do that for this task, and it shouldn't be hard to alter the scripts which use the data to pull them from these tables.
Want me to alter the table definitions and load them?
In fact, how about if I come up with a view which presents the original denormalized representation of the data?
A view will work for pulling in the data from the FTP server but I will have to rewrite my loader program. In the interest of getting things done I was only concerned about mimicking the current process. I agree with you that it needs more thought but that will also need more time.
Try this:
SELECT * FROM pdq_data_partner
Shall I drop the pdq_contact table and rename pdq_data_partner to pdq_contact?
... I will have to rewrite my loader program.
Most (if not all) of the work for that is already done with the script I just wrote to pull the data from the existing pdq_contact table and populate the two new tables.
Shall I drop the pdq_contact table and rename pdq_data_partner to pdq_contact?
I guess there is no stopping you! :-)
Go ahead.
import cdrdb
class Contact:
= ("product", "email_address", "person_name", "org_name", "phone",
COLS "org_status", "notified_count", "contact_type", "ftp_username",
"activation_date", "termination_date", "renewal_date", "notified",
"notified_date")
def __init__(self, *cols):
for i, name in enumerate(self.COLS):
setattr(self, name, cols[i])
= {}
orgs = cdrdb.connect()
conn = conn.cursor()
cursor = cdrdb.Query("pdq_contact", *Contact.COLS)
query "product", "org_name", "person_name")
query.order(= query.execute(cursor).fetchall()
rows for row in rows:
= Contact(*row)
contact = orgs.get(contact.org_name.lower())
org_id if not org_id:
"""\
cursor.execute(INSERT INTO data_partner_org (org_name, product, org_status, activated,
terminated, renewal)
VALUES (?, ?, ?, ?, ?, ?)""",
(contact.org_name, contact.product, contact.org_status,
contact.activation_date, contact.termination_date,
contact.renewal_date))"SELECT @@IDENTITY")
cursor.execute(= cursor.fetchall()[0][0]
org_id print repr(org_id), "IS NEW ORG ID"
= org_id
orgs[contact.org_name.lower()] """\
cursor.execute(INSERT INTO data_partner_contact (org_id, email_addr, person_name, phone,
notif_count, contact_type, ftp_username,
notified, notif_date)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)""",
(org_id, contact.email_address, contact.person_name,
contact.phone, contact.notified_count,
contact.contact_type, contact.ftp_username,
contact.notified, contact.notified_date)) conn.commit()
Before I go much farther down this road, do you have any light to shed on the notified_count, notified, and notified_date columns? Are all three needed? Does notified provide information that can't be derived from one of the other two columns? And is it a temporary condition that the notified_date column is always NULL?
Before I go much farther down this road, do you have any light to shed on the notified_count, notified, and notified_date columns? Are all three needed?
Again, coming from the text document we're currently
maintaining:
I have the notified column that set to 'N' at the beginning for all rows
to be notified. When the email has been submitted the individual's
record gets set to notified='Y' (without a date). You came up with the
notified_date column and I hadn't thought through how to use the date
alone in situations when a job needs to be rerun after failing half-way
through.
I'm sure one or the other would be good enough.
I just noticed that in the original data, some of the people for the same organization have different termination and renewal dates (e.g., Gregory Wall and Eugene Fratkin have termination dates of January 26, 2015 and renewal dates of September 30, 2014, whereas James Shima terminated May 18, 2015 and renewed January 13, 2015; all three are with Silicon Valley Biosystems). It seems more plausible that partnership termination really does belong with the organization, so I'm regarding that as dirty data. We probably do need a flag or a date to indicate that an individual contact person is no longer actively associated with the organization (at least for the purposes of the PDQ data partnership). Your thoughts?
The Silicon Valley Biosystems is a difficult organization. We had two organization records for these because they were from different departments. One department quit and some of the people switched. Anyway, the activation does go with the organization and my thought was to use one of the status values to indicate a person record as inactive: P - primary contact, S - secondary, D - inactive.
Capturing modifications from the prototype review:
Fix contact type values (Primary, Secondary, Internal, Deleted) ✔
Make filtering apply to deleted contacts decided against this, after #6 below)
Move ftp_account column to partner table ✔
Re-create tables ✔
Re-populate tables ✔
Fix data entry forms ✔
Fix partner status values (Active, Test, Special) ✔
Make products a lookup table ✔
Re-create tables ✔
Re-populate tables ✔
Fix data entry forms ✔
Add interface to manage product table ✔
Add contact type to contact list display ✔
Drop notified column ✔
Rename "Notification Details" to "Notification History" ✔
Fix phone/name bug ✔
Sort inactive partners to bottom of landing page display ✔
Make CDR the default product ✔
Fix partner status values (Active, Test, Special)
I noticed yesterday that item 4. doesn't really apply yet. In the SendEmail.py program I had only active and test partners and I marked the special partners active. I'm adjusting the notification program so that we can use the flag S for special partners in the future in case you're wondering that you don't see any record with status=S.
Make CDR the default product (PROD only?)
Oh, that's an excellent idea to have CDR only be the default on PROD.
Add interface to manage product table
I think it would be OK to skip this. If there is a product change in the future we can update the table with a SQL insert stagement. The products have not changed in the past 10 years.
Make CDR the default product (PROD only?)
Go ahead and make CDR the default on all tiers. We're not sending out the emails through this interface, which is what we really want to protect us from.
I've been going on the assumption that all the date fields were just that (based on what I saw in the data), but now I'm seeing some times with the notification dates. Do we need the times? If we do, I'll need to remove the calendar widget for that field (and rewrite some of the other code).
I don't see a need for the activation, termination, and renewal dates. For the notification, however, I could see some usefulness in a time stamp being available.
Come to think of it, should we have a DateLastModified date in the tables?
✔ All five of the screens have been built.
✔ The interface has been plugged into the Developer/SA menu.
✔ I added last_mod columns to all the tables.
✔ The notification date now includes the time.
✔ The view column names now match those of the tables.
✔ I removed the need for clicking a Refresh
button to make
the filtering choices take effect.
✔ I also added a check for uniqueness of the product or partner name on
save (to avoid ugly SQL error messages).
Manual deployment tasks:
Add new action MANAGE PDQ DATA PARTNERS
Add new user group Data Partner Managers
Add Volker and Bob to the new group
Please test everything, ~volker.
R13904 /branches/Darwin/Inetpub/wwwroot/cgi-bin/cdr/DevSA.py
R13904 /branches/Darwin/Inetpub/wwwroot/cgi-bin/cdr/manage-pdq-data-products.py
R13919 /branches/Darwin/Inetpub/wwwroot/cgi-bin/cdr/manage-pdq-data-partners.py
R13919 /branches/Darwin/Inetpub/wwwroot/cgi-bin/cdr/edit-data-product.py
R13919 /branches/Darwin/Inetpub/wwwroot/cgi-bin/cdr/edit-data-partner-org.py
R13919 /branches/Darwin/Inetpub/wwwroot/cgi-bin/cdr/edit-data-partner-contact.py
R13919 /branches/Darwin/lib/Python/pdq_data_partner.py
R13909 /branches/Darwin/lib/Python/cdrcgi.py
I received the following error while trying to enter a new Person "notif_count field is required" to a special partner. The count for a new person should be 0 by default.
An error has been created in
D:\cdr\Log\tmpirhpnj.html
Those were two separate problems. I fixed both. I also corrected problems with the org/contact editing links.
I have not been able to break these pages again. It seems to be Volker-proof. :-)
Elapsed: 0:00:00.001357