CDR Tickets

Issue Number 4025
Summary Create Web Interface to Maintain PDQ Partner Info
Created 2016-02-09 14:27:18
Issue Type Sub-task
Submitted By Englisch, Volker (NIH/NCI) [C]
Assigned To Kline, Bob (NIH/NCI) [C]
Status Closed
Resolved 2016-04-18 07:54:39
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.178581
Description

We need to create a web page allowing us to maintain the data stored in the PDQ Licensee tables.

Comment entered 2016-04-12 15:33:48 by Englisch, Volker (NIH/NCI) [C]

I'm assigning this ticket to Bob.

The file to update the partner information exists in
http://cdr-dev.cancer.gov/cgi-bin/cdr/edit-pdq-contact.py?p=TEST

The variable p will either be passed at TEST or CDR. The webpage currently allows to update individual contact records. Several things need to be modified (only the first is absolutely necessary):

  • Ability to add user records

  • if possible, display content of Show all Contacts in table format: Organization Name, Contact Name, Email address

  • maybe ability to only display active/inactive contacts

  • Calendar widget for Activation/Termination date

Comment entered 2016-04-13 10:57:28 by Kline, Bob (NIH/NCI) [C]

Yesterday you mentioned batch functionality that you didn't mention in the previous comment. Have you decided against including that feature?

Comment entered 2016-04-13 11:32:59 by Englisch, Volker (NIH/NCI) [C]

Not really but I had the feeling I'm asking for too much and wasn't sure how to put in words what I'm doing manually.
The point is that an account is given to an organization, i.e. UBOB, and the organization lists several users to be notified. That could be 2,3,4,5 users for UBOB. Some of the changes would be for individuals only, like their name, phone number, email address, but others would be for all of the users of UBOB like the organization name, termination date or the renewal date.

Comment entered 2016-04-13 11:40:28 by Kline, Bob (NIH/NCI) [C]

Which values are invariant across all users for a given organization, and which vary per user? It's starting to sound as if we should really have a second table for the organizations, rather than storing our data in a way which violates normalization requirements.

Comment entered 2016-04-13 11:47:08 by Englisch, Volker (NIH/NCI) [C]

Maybe that's the Einstein release.

The org-wide values would be product, org_name, org_status, ftp_username, activation_date, termination_date, renewal_date.
All others are record specific and the product or ID would not change.

Comment entered 2016-04-13 12:09:38 by Kline, Bob (NIH/NCI) [C]

Here's what we need, then:

CREATE TABLE data_partner_org
     (org_id INTEGER      NOT NULL IDENTITY PRIMARY KEY,
    org_name VARCHAR(255) NOT NULL UNIQUE,
     product VARCHAR(64)  NOT NULL,
  org_status CHAR(1)      NOT NULL,
   activated DATETIME     NOT NULL,
  terminated DATETIME         NULL,
     renewal DATETIME         NULL)

CREATE TABLE data_partner_contact
 (contact_id INTEGER      NOT NULL IDENTITY PRIMARY KEY,
      org_id INTEGER      NOT NULL REFERENCES data_partner_org,
  email_addr VARCHAR(64)  NOT NULL,
 person_name VARCHAR(255) NOT NULL,
       phone VARCHAR(64)      NULL,
 notif_count INTEGER      NOT NULL,
contact_type CHAR(1)      NOT NULL,
ftp_username VARCHAR(64)      NULL,
    notified CHAR(1)      NOT NULL,
  notif_date DATETIME         NULL)

It's not clear why both notified and notif_date are both needed. If we have the nullable date column, that would seem to answer the question about whether notification has taken place (in which case, I would call that column notified). On the other hand, all of the columns for the date are NULL, so maybe these columns don't mean what I think they do.

Using the normalized tables is the only sensible way to support bulk updates of the organization information. I can do that for this task, and it shouldn't be hard to alter the scripts which use the data to pull them from these tables.

Comment entered 2016-04-13 12:12:33 by Kline, Bob (NIH/NCI) [C]

Want me to alter the table definitions and load them?

Comment entered 2016-04-13 12:53:07 by Kline, Bob (NIH/NCI) [C]

In fact, how about if I come up with a view which presents the original denormalized representation of the data?

Comment entered 2016-04-13 13:27:07 by Englisch, Volker (NIH/NCI) [C]

A view will work for pulling in the data from the FTP server but I will have to rewrite my loader program. In the interest of getting things done I was only concerned about mimicking the current process. I agree with you that it needs more thought but that will also need more time.

Comment entered 2016-04-13 13:31:38 by Kline, Bob (NIH/NCI) [C]

Try this:

SELECT * FROM pdq_data_partner

Shall I drop the pdq_contact table and rename pdq_data_partner to pdq_contact?

Comment entered 2016-04-13 13:34:48 by Kline, Bob (NIH/NCI) [C]

... I will have to rewrite my loader program.

Most (if not all) of the work for that is already done with the script I just wrote to pull the data from the existing pdq_contact table and populate the two new tables.

Comment entered 2016-04-13 13:37:40 by Englisch, Volker (NIH/NCI) [C]

Shall I drop the pdq_contact table and rename pdq_data_partner to pdq_contact?

I guess there is no stopping you! :-)

Go ahead.

Comment entered 2016-04-13 13:39:14 by Kline, Bob (NIH/NCI) [C]
import cdrdb

class Contact:
    COLS = ("product", "email_address", "person_name", "org_name", "phone",
            "org_status", "notified_count", "contact_type", "ftp_username",
            "activation_date", "termination_date", "renewal_date", "notified",
            "notified_date")
    def __init__(self, *cols):
        for i, name in enumerate(self.COLS):
            setattr(self, name, cols[i])

orgs = {}
conn = cdrdb.connect()
cursor = conn.cursor()
query = cdrdb.Query("pdq_contact", *Contact.COLS)
query.order("product", "org_name", "person_name")
rows = query.execute(cursor).fetchall()
for row in rows:
    contact = Contact(*row)
    org_id = orgs.get(contact.org_name.lower())
    if not org_id:
        cursor.execute("""\
INSERT INTO data_partner_org (org_name, product, org_status, activated,
                              terminated, renewal)
     VALUES (?, ?, ?, ?, ?, ?)""",
                       (contact.org_name, contact.product, contact.org_status,
                        contact.activation_date, contact.termination_date,
                        contact.renewal_date))
        cursor.execute("SELECT @@IDENTITY")
        org_id = cursor.fetchall()[0][0]
        print repr(org_id), "IS NEW ORG ID"
        orgs[contact.org_name.lower()] = org_id
    cursor.execute("""\
INSERT INTO data_partner_contact (org_id, email_addr, person_name, phone,
                                  notif_count, contact_type, ftp_username,
                                  notified, notif_date)
     VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)""",
                   (org_id, contact.email_address, contact.person_name,
                    contact.phone, contact.notified_count,
                    contact.contact_type, contact.ftp_username,
                    contact.notified, contact.notified_date))
    conn.commit()

Before I go much farther down this road, do you have any light to shed on the notified_count, notified, and notified_date columns? Are all three needed? Does notified provide information that can't be derived from one of the other two columns? And is it a temporary condition that the notified_date column is always NULL?

Comment entered 2016-04-13 13:58:08 by Englisch, Volker (NIH/NCI) [C]

Before I go much farther down this road, do you have any light to shed on the notified_count, notified, and notified_date columns? Are all three needed?

Again, coming from the text document we're currently maintaining:
I have the notified column that set to 'N' at the beginning for all rows to be notified. When the email has been submitted the individual's record gets set to notified='Y' (without a date). You came up with the notified_date column and I hadn't thought through how to use the date alone in situations when a job needs to be rerun after failing half-way through.
I'm sure one or the other would be good enough.

Comment entered 2016-04-13 14:19:46 by Kline, Bob (NIH/NCI) [C]

I just noticed that in the original data, some of the people for the same organization have different termination and renewal dates (e.g., Gregory Wall and Eugene Fratkin have termination dates of January 26, 2015 and renewal dates of September 30, 2014, whereas James Shima terminated May 18, 2015 and renewed January 13, 2015; all three are with Silicon Valley Biosystems). It seems more plausible that partnership termination really does belong with the organization, so I'm regarding that as dirty data. We probably do need a flag or a date to indicate that an individual contact person is no longer actively associated with the organization (at least for the purposes of the PDQ data partnership). Your thoughts?

Comment entered 2016-04-14 00:49:55 by Englisch, Volker (NIH/NCI) [C]

The Silicon Valley Biosystems is a difficult organization. We had two organization records for these because they were from different departments. One department quit and some of the people switched. Anyway, the activation does go with the organization and my thought was to use one of the status values to indicate a person record as inactive: P - primary contact, S - secondary, D - inactive.

Comment entered 2016-04-14 13:22:52 by Kline, Bob (NIH/NCI) [C]

Capturing modifications from the prototype review:

  1. Fix contact type values (Primary, Secondary, Internal, Deleted) ✔

  2. Make filtering apply to deleted contacts decided against this, after #6 below)

  3. Move ftp_account column to partner table ✔

    • Re-create tables ✔

    • Re-populate tables ✔

    • Fix data entry forms ✔

  4. Fix partner status values (Active, Test, Special) ✔

  5. Make products a lookup table ✔

    • Re-create tables ✔

    • Re-populate tables ✔

    • Fix data entry forms ✔

    • Add interface to manage product table ✔

  6. Add contact type to contact list display ✔

  7. Drop notified column ✔

  8. Rename "Notification Details" to "Notification History" ✔

  9. Fix phone/name bug ✔

  10. Sort inactive partners to bottom of landing page display ✔

  11. Make CDR the default product ✔

Comment entered 2016-04-15 13:13:29 by Englisch, Volker (NIH/NCI) [C]

Fix partner status values (Active, Test, Special)

I noticed yesterday that item 4. doesn't really apply yet. In the SendEmail.py program I had only active and test partners and I marked the special partners active. I'm adjusting the notification program so that we can use the flag S for special partners in the future in case you're wondering that you don't see any record with status=S.

Make CDR the default product (PROD only?)

Oh, that's an excellent idea to have CDR only be the default on PROD.

Comment entered 2016-04-15 15:34:45 by Englisch, Volker (NIH/NCI) [C]

Add interface to manage product table

I think it would be OK to skip this. If there is a product change in the future we can update the table with a SQL insert stagement. The products have not changed in the past 10 years.

Comment entered 2016-04-15 15:37:24 by Englisch, Volker (NIH/NCI) [C]

Make CDR the default product (PROD only?)

Go ahead and make CDR the default on all tiers. We're not sending out the emails through this interface, which is what we really want to protect us from.

Comment entered 2016-04-15 15:49:25 by Kline, Bob (NIH/NCI) [C]

I've been going on the assumption that all the date fields were just that (based on what I saw in the data), but now I'm seeing some times with the notification dates. Do we need the times? If we do, I'll need to remove the calendar widget for that field (and rewrite some of the other code).

Comment entered 2016-04-15 16:02:54 by Englisch, Volker (NIH/NCI) [C]

I don't see a need for the activation, termination, and renewal dates. For the notification, however, I could see some usefulness in a time stamp being available.

Come to think of it, should we have a DateLastModified date in the tables?

Comment entered 2016-04-18 07:54:39 by Kline, Bob (NIH/NCI) [C]

✔ All five of the screens have been built.
✔ The interface has been plugged into the Developer/SA menu.
✔ I added last_mod columns to all the tables.
✔ The notification date now includes the time.
✔ The view column names now match those of the tables.
✔ I removed the need for clicking a Refresh button to make the filtering choices take effect.
✔ I also added a check for uniqueness of the product or partner name on save (to avoid ugly SQL error messages).

Manual deployment tasks:

  • Add new action MANAGE PDQ DATA PARTNERS

  • Add new user group Data Partner Managers

  • Add Volker and Bob to the new group

Please test everything, .

R13904 /branches/Darwin/Inetpub/wwwroot/cgi-bin/cdr/DevSA.py
R13904 /branches/Darwin/Inetpub/wwwroot/cgi-bin/cdr/manage-pdq-data-products.py
R13919 /branches/Darwin/Inetpub/wwwroot/cgi-bin/cdr/manage-pdq-data-partners.py
R13919 /branches/Darwin/Inetpub/wwwroot/cgi-bin/cdr/edit-data-product.py
R13919 /branches/Darwin/Inetpub/wwwroot/cgi-bin/cdr/edit-data-partner-org.py
R13919 /branches/Darwin/Inetpub/wwwroot/cgi-bin/cdr/edit-data-partner-contact.py
R13919 /branches/Darwin/lib/Python/pdq_data_partner.py
R13909 /branches/Darwin/lib/Python/cdrcgi.py
Comment entered 2016-04-18 14:38:58 by Englisch, Volker (NIH/NCI) [C]

I received the following error while trying to enter a new Person "notif_count field is required" to a special partner. The count for a new person should be 0 by default.

An error has been created in

D:\cdr\Log\tmpirhpnj.html 
Comment entered 2016-04-18 15:58:42 by Kline, Bob (NIH/NCI) [C]

Those were two separate problems. I fixed both. I also corrected problems with the org/contact editing links.

Comment entered 2016-04-20 18:40:13 by Englisch, Volker (NIH/NCI) [C]

I have not been able to break these pages again. It seems to be Volker-proof. :-)

Elapsed: 0:00:00.001357