CDR Tickets

Issue Number 4100
Summary Combine Licensee and Content Partner Tables
Created 2016-05-16 16:31:26
Issue Type Improvement
Submitted By Englisch, Volker (NIH/NCI) [C]
Assigned To Kline, Bob (NIH/NCI) [C]
Status Closed
Resolved 2018-03-29 12:08:20
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.184252
Description

We created a new set of DB tables to control in the CDR the email notification of our PDQ content partners. Additional organization information is also stored as the licensee document type.

We should combine the information (and modify the reports) to combine both sets of information into a single set.

Comment entered 2016-11-09 14:23:07 by Kline, Bob (NIH/NCI) [C]

LOE estimate is a wild guess, as the requirements are specified at a very high level and will need to be refined, in the process of which the story points may need to be adjusted.

Comment entered 2018-03-14 14:19:40 by Kline, Bob (NIH/NCI) [C]

Steps for this task:

✔ Catalog/analyze information stored in database tables and licensee documents (Bob)
✔ Identify obsolete information (Volker)
✔ Choose new approach for storing information (Bob and Volker)
✔ Adjust story points estimate for task (Bob)
✔ Adjust table and/or document structures (Bob)
✔ Update CSS and Schema for Licensee documents (possibly needed; Bob/Volker)
✔ Move data (Bob)
✔ Identify data maintenance scripts (Volker)
✔ Modify data maintenance scripts (Bob)
✔ Identify reports and ad-hoc queries (Volker)
✔ Modify reports and ad-hoc queries (Bob)
✔ Test (Volker)

Have I forgotten anything?

Comment entered 2018-03-14 14:28:52 by Englisch, Volker (NIH/NCI) [C]

Have I forgotten anything?

That depends on item (3) and may need to be added as new tickets:

  • Update CSS for Licensee templates (if we're still using XML documents to store information)

  • Update ad-hoc queries (unless that's included in Modify reports)

Comment entered 2018-03-14 15:15:18 by Kline, Bob (NIH/NCI) [C]

Here's what I have for step 1. I have tried to identify the corresponding location of the information in the other source for each of the two sources (database tables and CDR Licensee documents). The next step would be for you () to identify which pieces of information are obsolete.

Licensee Documents

Licensee
  LicenseeInformation
    LicenseeNameInformation
      OfficialName (data_partner_org.org_name)
        @cdr:id (data_partner_org.org_id)
      OtherNames? (not in tables)
        AlternateName* [name and ID] (not in tables)
        FormerName* [name and ID] (not in tables)
        Comment* [standard CDR Comment element] (not in tables)
      Comment* (not in tables)
    LicenseeDepartment* (not in tables)
    OrganizationType? [ForProfit|NotForProfit|Government|PublicationReprint]
                      (not in tables)
    ContactDetail? (standard CDR ContactDetail block; not in tables)
    LicenseeType [Regular|Internal|Special|Printing] (not in tables)
    LicenseeStatus [Production|Production-inactive|Test|Test-inactive]
                   (data_partner_org.org_status; need mapping with A|S|T vals)
    LicenseeStatusDates (mappings not exact with database tables)
      TestActivation [date] (data_partner_org.activated) (same thing?)
      TestInactivation? [date] (data_partner_org.terminated) (same thing?)
      TestExtension? [date] (data_partner_org.renewal) (same thing?)
      ProductionActivation? [date] (data_partner.activated) (same thing?)
    Comment* (not in tables)
  ContactPersons
    ContactPerson+
      ContactName (data_partner_contact.person_name)
      ContactType+ [Business|BusinessBackup|Technical|TechnicalBackup]
                   (data_partner_contact.contact_type - single occurrence;
                    need mapping with D|I|P|S values)
      ContactDetail [standard CDR ContactDetail block]
        PostalAddress? (not in tables)
        Phone? (data_partner_contact.phone - required)
        TollFreePhone? (not in tables)
        Fax? (not in tables)
        Email? (data_partner_contact.person_name - required)
        WebSite? (not in tables)
  ApplicationInformation? (none of these are in the database tables)
    DistributionMethods [CD-ROM|E-mail|World Wide Web|Print|Other]
      DistributionMethod+ (not in tables)
    Audiences (not in tables)
      Audience+ [controlled values, e.g.: Attorneys, Hospitals, Universities]
    SupplementaryInfoLink* (not in tables)
  FtpInformation
    UserName (data_partner_org.ftp_username)
    Password? (not in tables)
    AccountBlocked [Yes/No] (not in tables)
  DateLastModified? [date] (data_partner_org.last_mod)

Database tables

data_partner_contact
  contact_id [PK required integer] (not in Licensee docs)
  org_id [FK required integer] (not applicable to Licensee docs)
  person_name [required string] (ContactPersons/ContactPerson/ContactName)
  email_addr [required string] (.../ContactPerson/ContactDetail/Email)
  phone [required string] (.../ContactPerson/ContactDetail/Phone)
  contact_type [required character, one of D, I, P, S]
               (possible mapping with .../ContactPerson/ContactType?)
  notif_count [required integer] (not in Licensee docs)
  notif_date [optional datetime] (not in Licensee docs)
  last_mod [required datetime] (not in Licensee docs)

data_partner_org
  org_id [PK required integer] (LicenseeNameInformation/OfficialName@cdr:id)
  org_name [required string] (LicenseeNameInformation/OfficialName)
  prod_id [FK required integer]
       (reflected as "Test" and "Production" in Licensee docs?)
  org_status [required character, one of A, S, T]
     (LicenseeInformation/LicenseeStatus - need mapping with its 4 values)
  activated [required date] (LicenseeInformation/LicenseeStatusDates)
  terminated [optional date] (not in Licensee docs for Production;
           LicenseeInformation/LicenseeStatusDates/TestInactivation for TEST)
  renewal [optional date] (.../LicenseeStatusDates/TestExtension)
  ftp_username [optional string] (FtpInformation/UserName)
  last_mod [required datetime] (DateLastModified - date only, no time)

data_partner_product
  prod_id [PK required integer]
  prod_name [required string, "CDR" or "TEST"]
  prod_desc [optional string]
  inactivated [optional date]
  last_mod [required datetime]
Comment entered 2018-03-14 17:04:58 by Englisch, Volker (NIH/NCI) [C]

None of the elements in the DB tables are obsolete.

A few elements are obsolete in the XML doc:

  • ApplicationInformation including all child elements

  • FTPInformation/Password

Regarding the contact_type element:
The contact_type in the DB is used for the email notification: Deleted, Internal, Primary, Secondary
The ContactType in the XML could be better described as a "role". Not every person in the XML document is included in the DB because the Business or BusinessBackup aren't always receiving the notification emails and vice versa but we can make it mandatory to have everyone included.

org_status [required character, one of A, S, T]
     (LicenseeInformation/LicenseeStatus - need mapping with its 4 values)

A = Production
T = Test
S = Special (this value doesn't exist but it should)
The org_status in the DB doesn't indicate if an organization is active or inactive. That's what the date is used for if I remember correctly.

Comment entered 2018-03-15 09:57:29 by Kline, Bob (NIH/NCI) [C]

So the following information would need to be captured in the tables if the consolidation puts everything in the database (I'm ignoring elements which the schema allows, but isn't present in any of the documents):

  • LicenseeInformation/LicenseeNameInformation/OtherNames/AlternateName/Name

  • LicenseeInformation/LicenseeNameInformation/OtherNames/FormerName/Name

  • LicenseeInformation/LicenseeNameInformation/Comment

  • LicenseeInformation/LicenseeDepartment

  • LicenseeInformation/OrganizationType

  • LicenseeInformation/ContactDetail/Email

  • LicenseeInformation/ContactDetail/Fax

  • LicenseeInformation/ContactDetail/Phone

  • LicenseeInformation/ContactDetail/PostalAddress/City

  • LicenseeInformation/ContactDetail/PostalAddress/Country

  • LicenseeInformation/ContactDetail/PostalAddress/PoliticalSubUnit_State

  • LicenseeInformation/ContactDetail/PostalAddress/PostalCode_ZIP

  • LicenseeInformation/ContactDetail/PostalAddress/Street

  • LicenseeInformation/ContactDetail/TollFreePhone

  • LicenseeInformation/ContactDetail/WebSite

  • LicenseeInformation/LicenseeType

  • LicenseeInformation/LicenseeStatus (partially in the database)

  • LicenseeInformation/LicenseeStatusDate (partially in the database)

  • LicenseeInformation/Comment

  • ContactPersons/ContactType

  • ContactPersons/ContactDetail/PostalAddress

  • ContactPersons/ContactDetail/TollFreePhone

  • ContactPersons/ContactDetail/Fax

  • ContactPersons/ContactDetail/WebSite

  • FtpInformation/AccountBlocked

So we would need, at a minimum, four additional tables for this approach:

  • data_partner_other_name

  • data_partner_licensee_department

  • data_partner_comment

  • data_partner_contact_type

... as well as quite a few new columns added to the existing tables.

If, on the other hand, we decide to consolidate the information into Licensee documents, the following additional information would need to be captured in the XML documents:

  • data_partner_contact.contact_type

  • data_partner_contact.notif_count

  • data_partner_contact.notif_date

  • data_partner_contact.last_mod

  • data_partner_org.prod_id (represented in the documents in a different way)

  • data_partner_org.org_status (not quite a one-to-one mapping with document)

  • data_partner_org.terminated (not in Licensee docs for Production)

  • data_partner_product.prod_desc

  • data_partner_product.inactivated

  • data_partner_product.last_mod

Some of this information is awkward to capture in the XML documents, as it would be denormalization of data which by its nature should be normalized (for example, the product information).

So neither approach is obviously the compelling Right Choice. It seems to me that it would be more work, more cumbersome, and less intuitive to capture the information needed by the extra tables required for the first approach, but I'm guessing more of the existing software is built around the database tables than around the XML documents. I'm inclined to think we should go with the second approach (expand the XML documents to capture the additional information contained in the database tables and eliminate those tables and rewrite the software that relies on those tables). We might want to pull out the information which would be denormalized (i.e., product information) into a separate document type, or store it somewhere else in the system.

What are your thoughts?

Comment entered 2018-03-16 11:35:36 by Englisch, Volker (NIH/NCI) [C]

The problem is that we used to have one system for our email notification on the FTP server and another system for administrative work. The administrative portion (a.k.a. XML document) is very convenient. It's very nice to have all of the information in one place but the email system duplicates some of the information.

Another thought I have is this: Why not keep what we have (2 systems) but adjust it somewhat to eliminate duplication?
For instance, the FTP information is not needed in the XML document and can be added to the tables. In fact, we already have the username as part of the tables but we would need to add the Account Blocked which identifies when CBIIT has disabled the account.
The Application Information isn't used anymore and will get dropped.
Contact Persons: We don't really need to include the postal address. What we would need to add to the tables would be the Type (Business, Technical, Backups) and maybe name it Org Function
What's left is the Licensee Information of the XML document. That's the portion I use the most - updating Licensee Status and Type, updating Activation/Extension, entering Comment in the XML file.

Comment entered 2018-03-19 08:15:31 by Kline, Bob (NIH/NCI) [C]

I'm a little confused. You've said ...

Why not keep what we have (2 systems) but adjust it somewhat to eliminate duplication?

But then you wrote:

What we would need to add to the tables would be the Type ....

That seems to me to be heading in the direction of more duplication. What am I missing?

Comment entered 2018-03-19 10:08:55 by Kline, Bob (NIH/NCI) [C]

Maybe we should do the logical thing and start all the way down stream and work our way backwards from the end product(s). Can you identify all of the scripts which consume (as opposed to maintain) this data?

Comment entered 2018-03-19 13:41:14 by Englisch, Volker (NIH/NCI) [C]

If we want to eliminate the Contact Persons from the XML document we will need to add a person type or Org Function value. The type element in the DB tables is different from the type element in the XML document, so we would want to capture that information.

Comment entered 2018-03-19 15:39:40 by Englisch, Volker (NIH/NCI) [C]

There isn't too much.

  • data_partners.py (scheduled job running weekly)

  • licensee_list.py (scheduled job running monthly)

  • a job extracting the FTP log data that I'm just realizing is not running anymore.

  • XMetaL

  • CDRQueries Interface

  • Licensee QC Report Filter for QC reports

Comment entered 2018-03-20 09:52:49 by Kline, Bob (NIH/NCI) [C]

* XMetaL

That would fall in the pile of tools which maintain the data, rather than scripts which consume that data, right?

* a job extracting the FTP log data that I'm just realizing is not running anymore.

Should it be?

Comment entered 2018-03-21 11:51:38 by Kline, Bob (NIH/NCI) [C]

Looks like I did leave out a step. As reflected in the attached Excel report, we will need to reconcile some discrepancies between the two data sources.

  1. Do we need to back fill Licensee documents for terminated test partners appearing in the data partner table but not in the repository (e.g. B4CC, Inc. or AM Systems)?

  2. Which date do we keep when the Licensee document has one activation date and the data partner table has another (see, for example, American Association for Cancer Research, or City of Hope - Department of Information Sciences)?

  3. What do we do with Test accounts whose activation date is much further in the past than 90 days (e.g., Arctic Biosystems, LLC, activated March 28, 2014)?

  4. We have some Test accounts with a deactivation date (and no production activation), but whose status is "Test" instead of "Test-inactive" (for example, Asan Medical Center).

  5. What do we do with accounts which are marked as Test accounts in the repository but not in the database table (for example, Belong)?

  6. Some Licensee documents don't have a corresponding row in the database table (for example, BreastCancerTrials.org, activated as a production licensee in July 2008, or Damien Leri (s), activated in August 2011).

  7. Some active partners appear in the data partner table, but don't have a corresponding Licensee document in the repository (e.g., Cancer Trials (Coalition)).

  8. Some partners have a different name in their Licensee document than in the data partner table (e.g., CiLogic versus CiLogik, Inc.).

  9. Some of the name discrepancies are caused by an appended suffix of " (s)"; assuming this is intended to reflect the "Special" nature of the data partner relationship, this information should be stored in the document in a separate element or attribute (see, for example, "Cincinnati Children's Hospital Medical Center" versus "Cincinnati Children's Hospital Medical Center (s)").

Comment entered 2018-03-21 12:07:19 by Kline, Bob (NIH/NCI) [C]

As noted in the comments at the head of the script which sends out notifications to the data partners, there are problems with the "TEST" product.

  1. The records for partners are wiped out by a database refresh to the lower tiers from production.

  2. There are redundant ways to test partner notification, so the "TEST product" partners are superfluous.

  3. The use of "test" to mean completely different things in the data partner system is very confusing.

  4. This third (data_partner_product) table complicates the software unnecessarily, and complicates efforts to consolidate the partner information in the Licensee documents, as it would introduce unwanted denormalization.

I propose that as part of the reorganization for this task, we eliminate this artificial "product" concept, and do our testing on the lower tiers, where notification emails are sent to the members of the "Test Publishing Notification"
CDR group instead of the outside partners' email addresses anyway. We could even have an option flag which did the same thing on PROD if we want.

Comment entered 2018-03-21 12:41:28 by Englisch, Volker (NIH/NCI) [C]

Do we need to back fill Licensee documents for terminated test partners appearing in the data partner table but not in the repository (e.g. B4CC, Inc. or AM Systems)?

There are partners in the tables that are not available as XML documents??? I need to look into that.

Which date do we keep when the Licensee document has one activation date and the data partner table has another (see, for example, American Association for Cancer Research, or City of Hope - Department of Information Sciences)?

There were a few partners who were activated, stopped using the data, and came back again. This was treated as a new activation rather than a reactivation. I'll have to look at the comments for these partners. I'd say the activation date in the XML document should be used if there are differences.

What do we do with Test accounts whose activation date is much further in the past than 90 days (e.g., Arctic Biosystems, LLC, activated March 28, 2014)?

These should be terminated.

We have some Test accounts with a deactivation date (and no production activation), but whose status is "Test" instead of "Test-inactive" (for example, Asan Medical Center).

I probably just forgot to change the status. They should be "Test-inactive".

What do we do with accounts which are marked as Test accounts in the repository but not in the database table (for example, Belong)?

I'll have to check the comments. This is probably a partner who changed to a "Special" partner.

Some Licensee documents don't have a corresponding row in the database table (for example, BreastCancerTrials.org, activated as a production licensee in July 2008, or Damien Leri (s), activated in August 2011).

I believe these are "Special" partners who opted out of being notified by email. We need a way to suppress the email notification for special partners or business persons who don't receive an email now.

Some active partners appear in the data partner table, but don't have a corresponding Licensee document in the repository (e.g., Cancer Trials (Coalition)).

No? I have the Coalition listed as a partner.

Some partners have a different name in their Licensee document than in the data partner table (e.g., CiLogic versus CiLogik, Inc.).

That's a mistake.

Some of the name discrepancies are caused by an appended suffix of " (s)"; assuming this is intended to reflect the "Special" nature of the data partner relationship, this information should be stored in the document in a separate element or attribute (see, for example, "Cincinnati Children's Hospital Medical Center" versus "Cincinnati Children's Hospital Medical Center (s)").

Agreed. Since I only had "Active"/"Test" available I had to mark "Special" vendors as "Active" and added the (s) to their name to easily identify them.

I know, there are still a lot of open questions here. I need some time to look into the records to come up with all answers.

Comment entered 2018-03-21 12:53:20 by Englisch, Volker (NIH/NCI) [C]

I agree. The "Test" product was only used when we had to run both, the test and production notification on the same machine. If it simplifies the system removing the Product (we haven't used a true second product for 10 years or so) you should go for it.

Comment entered 2018-03-21 13:38:13 by Kline, Bob (NIH/NCI) [C]

It looks like there are quite a few Licensee documents with a bogus status:

Comment entered 2018-03-21 13:43:38 by Englisch, Volker (NIH/NCI) [C]

Where is the status coming from? I'm looking at the first one and see its status to be Test-inactive. I'm using the query Licensee Contacts from the Query interface as my go-to report.

Comment entered 2018-03-21 13:45:06 by Kline, Bob (NIH/NCI) [C]

There are partners in the tables that are not available as XML documents??? I need to look into that.

Comment entered 2018-03-21 13:51:14 by Kline, Bob (NIH/NCI) [C]

Here are some more incorrect statuses, this time "Production" where either the document or the table (but in none of the five cases both) have a deactivation date. I can see why you're eager to have this data consolidated. :-)

Comment entered 2018-03-21 13:51:45 by Englisch, Volker (NIH/NCI) [C]

Are you looking at PROD?
I understand why AM systems is not available because Alan was an internal user but I see B4CC available both in the XML documents and in the tables.

Comment entered 2018-03-21 13:55:37 by Englisch, Volker (NIH/NCI) [C]

Yes, double data entry is somewhat error prone. Even for someone like me.

Comment entered 2018-03-21 13:57:02 by Kline, Bob (NIH/NCI) [C]

Oops! Half of the analysis was looking at PROD (the tables part) but there's a bug in my script which I thought was pulling down the documents from PROD, but wasn't. Back to the drawing board. :-(

Comment entered 2018-03-21 14:13:47 by Kline, Bob (NIH/NCI) [C]

All better! Sorry about that. Still some anomalies, but on a much more manageable scale. Corrected spreadsheet attached. Sorted by org status and then by name, but it's a spreadsheet, so you can sort it any way you want. There's still a bunch of mismatched names, and "Test" Licensee documents whose corresponding row in the partner table has a termination date but no deactivation recorded in the document.

Comment entered 2018-03-21 14:46:53 by Englisch, Volker (NIH/NCI) [C]

Can I go in and add the "(s)" to the special partners in the tables as part of the cleanup? We can remove this once we have another indicator for the specials after the merge.

Comment entered 2018-03-21 15:03:30 by Kline, Bob (NIH/NCI) [C]

My recommendation for an approach is starting to take shape. It might look something like this:

  1. Eliminate discrepancies between the two sources.

  2. Drop the obsolete elements from the schema/docs (e.g. passwords and application information).

  3. Change ContactType in the documents to ContactRole and use ContactType to capture the Primary/Secondary (and whatever "I" and "D" stand for) from the data_partner_contact table.

  4. Replace the three existing data partner tables with single data_partner_notification table, where we will continue to record when we last sent an email notification message to a particular email address.

  5. Rewrite get-pdq-contacts.py and get-pdq-partners.py (and perhaps the stored queries in CdrQueries.py) to pull data from the query_term table instead of the partner tables.

I'm anticipating using a global change to do much of the data juggling.

I considered eliminating the database tables altogether, storing the last-notified dates in the ContactPerson blocks, but that would mean creating a new version for each email message sent out, and would mean I'd have to force-break any locks you had on the documents. Those wouldn't necessarily be showstoppers, and if you prefer to do a complete consolidation, that's what we'll do, but right now I'm leaning toward using a database table for this purpose. One approach to this table would be to have only two columns, one for the email address (possibly normalized to all lowercase), and one for the notification date/time. That way, we could keep a notification history, rather than just a record of the last one. Unless you're determined to retain the notif_count column's data, I'd be inclined to just drop it (or stuff it as an historical number somewhere in the Licensee document), and let the number of rows with the same email address take over it's role (sort of). Note that I'm not taking the trouble to link the rows in this table with the ContactPerson elements in the documents (though I gave that some thought, too, perhaps assigning the ContactPerson blocks unique IDs, but decided against it). If the contact person changes, or changes her email address, the next row in the notification table won't have any indication that it belongs with the rows containing the older email address. I'm viewing the sole required purpose of the database table as answering the question "if the notification job aborts in the middle and I have to pick up where I left off, how will I know which notifications have already been sent out for this job?".

Comment entered 2018-03-21 15:04:25 by Kline, Bob (NIH/NCI) [C]

Sure.

Comment entered 2018-03-21 15:16:18 by Englisch, Volker (NIH/NCI) [C]

Ok, unless you're telling me this will mess up the work you're doing right now I will correct/synchronize the org name either in the partner table or the XML document.

Comment entered 2018-03-21 15:23:59 by Englisch, Volker (NIH/NCI) [C]

Unless you're determined to retain the notif_count column's data, I'd be inclined to just drop it

That's OK. In the beginning of the email notifications we didn't use a date and used the notification count for identifying the time past. Using an actual date makes slightly more sense. :-)

Comment entered 2018-03-21 15:28:37 by Kline, Bob (NIH/NCI) [C]

Good.

Comment entered 2018-03-21 16:40:08 by Englisch, Volker (NIH/NCI) [C]

I updated all org names in the table. There is one name I'll have to update in the XML document but I don't have access to XMetaL at home.
All of the other mismatches are internal emails that we should merge into a single "organization" and for which I'll have to create a document in the CDR.

Comment entered 2018-03-21 17:03:09 by Kline, Bob (NIH/NCI) [C]

Good. So you're OK with my general approach?

Comment entered 2018-03-21 18:06:12 by Englisch, Volker (NIH/NCI) [C]

Note that I'm not taking the trouble to link the rows in this table with the ContactPerson elements in the documents

That's OK. We don't pay a lot of attention to the persons other that we need to have one per org for the notification. Any relationship between one person being replaced or changing phone/email is captured in the comments.

I'm viewing the sole required purpose of the database table as answering the question "if the notification job aborts in the middle and I have to pick up where I left off, how will I know which notifications have already been sent out for this job?".

I have a little problem with this. Currently, the SendEmail program also disables test accounts after their time is up and stops sending more messages. I was hoping we could update the status directly in the XML document but I'm OK continuing this as a manual process if I get a message to do so.
I do agree, creating new document versions for each notification email is a little overkill.

Comment entered 2018-03-21 18:13:41 by Kline, Bob (NIH/NCI) [C]

Well, we could compromise, and say that the notification script will update the document (with a new version) only when the account is disabled.

Comment entered 2018-03-21 18:50:45 by Englisch, Volker (NIH/NCI) [C]

I like that.

Comment entered 2018-03-26 07:50:47 by Kline, Bob (NIH/NCI) [C]

Drilling down into the contacts (see attachment) we've got some discrepancies there to take care of as well. One question I have is whether there is any semantic difference conveyed by having two separate ContactPerson blocks each with its own ContactType (as for example with Aqil Ahsan in the CiLogik, Inc. document) versus one ContactPerson block with two ContactType elements (e.g. Emily DeVoto's block in the document for the National Breast Cancer Coalition Fund (NBCCF)).

My plan is to mark all of the contacts in the documents which have no counterpart in the database partner table with a ContactType (see notes above for renaming of the existing ContactType element to ContactRole and the use of ContactType to capture the P|S|I|D values from the contact table) with "D" unless you think that approach would be a problem, in which case we can come up with a fifth value.

Comment entered 2018-03-26 08:26:52 by Kline, Bob (NIH/NCI) [C]

: Since you're out for the next couple of days, I'm going to go ahead and do some cleanup on the production Licensee documents myself.

Comment entered 2018-03-26 08:33:31 by Kline, Bob (NIH/NCI) [C]

Why are we using the standard CDR Comment elements but putting the user and date in the comment text instead of the proper attributes?

Comment entered 2018-03-26 08:36:46 by Kline, Bob (NIH/NCI) [C]

What does it mean for a Licensee document to have a status of "Production" but an AccountBlocked value of "Yes"? (See the document for the American Association for Cancer Research.)

Comment entered 2018-03-26 11:58:51 by Kline, Bob (NIH/NCI) [C]

: I believe you said the other day something like (I don't remember the exact wording) "some partners [contacts?] switch to a "Special" status so they don't get notifications." Are you aware that the current software sends notifications to all non-deactivated contacts for production "licensees" of all types (including "Special")? So, for example, we're sending notifications to Julie Hom (jhom@coh.org) and her colleagues at City of Hope - Department of Information Sciences even though they have a LicenseeType of "Special" (though Julie's notification didn't go out last week because the mail server broke the connection when we were trying to send it out – see the data-partner-notification.log). Perhaps I misunderstood what you were telling me.

Comment entered 2018-03-26 12:04:53 by Kline, Bob (NIH/NCI) [C]

Hmm. That was a little too optimistic. I see that the ContactType element (which will become ContactRole) is required, and I don't have any way of knowing what those values should be for the contacts who are in the database partner contact table but not in the Licensee document, so I'll have to wait until you're back from vacation.

Comment entered 2018-03-26 12:19:02 by Kline, Bob (NIH/NCI) [C]

Here's some more discrepancies you may want to address:

CDR776336 has u'Todd Mckee' where DB has u'Todd McKee'
CDR761991 has u'Nathan Mobley' where DB has u'Nathan  Mobley'
CDR714136 has u'Richard Z. Liu' where DB has u'Richard Liu'
CDR661243 has u'Haiqing Li ' where DB has u'Haiqing Li'
CDR661231 has u'Elly Cohen' where DB has u'Ellyn Cohen'
CDR783070 has u'Kacee Puzniak ' where DB has u'Kacee Puzniak'
CDR783070 has u'Matthew Coburn ' where DB has u'Matthew Coburn'
CDR751426 has u'Webmaster' where DB has u'HCI Webmaster'
CDR749385 has u'Douglas Wood' where DB has u'Douglas J. Wood'
CDR661251 has u'Gustav Quade' where DB has u'Dr. Gustav Quade'
CDR792347 has u'\u2022\tBrandon Welch' where DB has u'Brandon Welch'
CDR661227 has u'Webmaster' where DB has u'Web Master'
CDR787305 has u'Kirk Junker ' where DB has u'Kirk Junker'
CDR787305 has u'Alex Vilner ' where DB has u'Alex Vilner'
CDR682665 has u'Jessica  Begley' where DB has u'Jessica Begley'
Comment entered 2018-03-26 14:31:17 by Kline, Bob (NIH/NCI) [C]

: Do you want the DateLastModified element changed for all of the documents?

Comment entered 2018-03-26 17:36:57 by Kline, Bob (NIH/NCI) [C]
Comment entered 2018-03-27 08:05:10 by Kline, Bob (NIH/NCI) [C]

S = Special (this value doesn't exist but it should)

Actually, you're already storing that in the LicenseeType element.

Comment entered 2018-03-28 10:48:03 by Englisch, Volker (NIH/NCI) [C]

One question I have is whether there is any semantic difference conveyed by having two separate ContactPerson blocks each with its own ContactType (as for example with Aqil Ahsan in the CiLogik, Inc. document)

No, Agil Ahsan is a mistake, the correct data entry would be that of Emily DeVoto. It's likely a person was dropped and replaced with Agil Ahsan and I overlooked that person was already in the list of contacts.

Comment entered 2018-03-28 11:00:32 by Englisch, Volker (NIH/NCI) [C]

My plan is to mark all of the contacts in the documents which have no counterpart in the database partner table with a ContactType (see notes above for renaming of the existing ContactType element to ContactRole and the use of ContactType to capture the P|S|I|D values from the contact table) with "D" unless you think that approach would be a problem, in which case we can come up with a fifth value.

The meaning of the ContactType values P|S|I|D is as follows:

  • P - Primary
    This is the person who requested the account and is responsible for the username/password. It's typically the technical person but not always. There should always be exactly one primary person per account.

  • S - Secondary, anybody getting notified who is not the primary person.

  • I - Internal, not a licensee but a developer or department head like Lakshmi.

  • D - Deleted, a person not receiving emails anymore. This person is now inactive.

Although the D value indicates a person not receiving the email notification we would loose the information of who is currently a contact person. I would prefer to add another value like N (No Notification) in order to indicate this entry as active without sending emails.

Comment entered 2018-03-28 11:08:57 by Englisch, Volker (NIH/NCI) [C]

Why are we using the standard CDR Comment elements but putting the user and date in the comment test instead of the proper attributes?

I always insert elements from the Element List and those attributes are only populated when you're adding the Comment from the context menu.
The Insert Comment menu item is actually a useful feature and I should use it for future comments.

Comment entered 2018-03-28 11:12:48 by Kline, Bob (NIH/NCI) [C]

The @Type attribute is optional, and only contacts with a @Type value of "P" or "S" or "I" get notifications, which should do what you want (I decided not to bother applying any attribute value at all for ContactPerson blocks with no corresponding row in the data partner contact table, because it looked like what you had been doing was to delete such blocks when the contact should no longer receive notifications).

Comment entered 2018-03-28 11:15:15 by Englisch, Volker (NIH/NCI) [C]

What does it mean for a Licensee document to have a status of "Production" but an AccountBlocked value of "Yes"? (See the document for the American Association for Cancer Research.)

That's an error. This partner is still active and I fixed the entry.

Comment entered 2018-03-28 11:19:05 by Kline, Bob (NIH/NCI) [C]

I've got the software written/modified, the partner data on DEV refreshed from PROD (docs and tables), the schema modified, the global change run, the new table created, and I'm about to test the notification script on DEV.

Comment entered 2018-03-28 11:20:47 by Kline, Bob (NIH/NCI) [C]

https://cdr-dev.cancer.gov/cgi-bin/cdr/get-pdq-contacts.py
https://cdr-dev.cancer.gov/cgi-bin/cdr/get-pdq-partners.py

These now pull the information from the documents and the new notification table, instead of from the old data partner tables.

Comment entered 2018-03-28 11:22:03 by Englisch, Volker (NIH/NCI) [C]

Are you aware that the current software sends notifications to all non-deactivated contacts for production "licensees" of all types (including "Special")?

Yes, that's correct. Some special partners do opt out of receiving email notification in which case I would not add them to the partner tables but the default is that all active partner contacts with status Production and Special will receive an email.
It wouldn't be a problem to send emails to all active partners regardless if they're special or not.

Comment entered 2018-03-28 11:41:13 by Englisch, Volker (NIH/NCI) [C]

Here's some more discrepancies you may want to address:

Done.

Comment entered 2018-03-28 11:44:43 by Englisch, Volker (NIH/NCI) [C]

Do you want the DateLastModified element changed for all of the documents?

Changed the element or change the date as part of the merge?
I would prefer to keep the DLM as is but add a comment with a date ("Data Partner Tables merged" or something like that).

Comment entered 2018-03-28 11:57:15 by Kline, Bob (NIH/NCI) [C]

Changed the element or change the date as part of the merge?

The latter. I left the dates alone. I'll modify the global change to add the comment (probably tomorrow). When the job marks a partner ("Licensee") as deactivated it sets the DLM.

Comment entered 2018-03-28 12:07:26 by Englisch, Volker (NIH/NCI) [C]

Will I still need to look over the spreadsheet you posted on Monday morning?

Comment entered 2018-03-28 14:13:24 by Kline, Bob (NIH/NCI) [C]

Wouldn't hurt to at least do a quick pass over it.

Comment entered 2018-03-28 20:48:48 by Kline, Bob (NIH/NCI) [C]

: would you like to tweak the XMetaL CSS for the Licensee docs, or shall I do that?

Comment entered 2018-03-29 09:04:38 by Englisch, Volker (NIH/NCI) [C]

Yes, I'll do the tweaking.

Comment entered 2018-03-29 09:54:51 by Kline, Bob (NIH/NCI) [C]

I modified the global change to add the requested comment and re-ran the job on DEV. I have attached the validation warnings extracted from the log for your entertainment (many of the documents aren't valid).

Comment entered 2018-03-29 09:59:20 by Englisch, Volker (NIH/NCI) [C]

Does "not valid" mean those haven't been updated as part of the global or just they aren't publishable. I know a bunch of documents aren't publishable due to incomplete addresses.

Comment entered 2018-03-29 10:55:59 by Kline, Bob (NIH/NCI) [C]

Does "not valid" mean those haven't been updated as part of the global or just they aren't publishable.

Neither, really. All the documents were updated. We don't publish the Licensee documents, so it's irrelevant that they aren't publishable. It just means what it says, that they aren't valid. Specifically:

U.S. address must have valid ZIP code (49 occurrences)
Missing ContactDetail element (2 occurrences)
Invalid email addresses – trailing spaces (1 occurrence)
Element 'Country': The attribute 'cdr:ref' is required but missing (1 occurrence)
Element 'PoliticalSubUnit_State': The attribute 'cdr:ref' is required but missing (6 occurrences)
Missing 'AddressType' attribute (2 occurrences)
Invalid OrganizationType values (1 occurrence)
Missing PoliticalSubUnit_State element (3 occurrences)

Comment entered 2018-03-29 12:08:20 by Kline, Bob (NIH/NCI) [C]

I think the essential work on this ticket is complete. Components to be taken care of for deployment include:

  1. schema change

  2. global change

  3. CSS modifications for XMetaL (Volker's going to do this)

  4. creation/population of new partner notification table

  5. modifications to notification script for scheduler

  6. modifications to CGI services to fetch partners and contacts

Items 1, 3, 5, and 6 will be taken care of by the core build/deploy scripts. Item 4 will be handled by a one-off script which will be invoked by the wrapper deploy.bat. Item 2 will be run manually before the first weekend jobs affected by the change. Have I missed anything ?

Comment entered 2018-03-30 14:01:31 by Englisch, Volker (NIH/NCI) [C]

The following files have been updated to adjust for the schema changes:

  • Licensee.ctm

  • Licensee.xml

  • Licensee.css

  • Licensee_structure.css

This has been versions in github:
https://github.com/NCIOCPL/cdr-client/commit/d5006fc [ hawking ]

Comment entered 2018-03-30 14:18:36 by Englisch, Volker (NIH/NCI) [C]

I have attached the validation warnings extracted from the log for your entertainment (many of the documents aren't valid).

These warnings are for the run on DEV, right? It wouldn't make much sense for me to clean up the data if that's correct.

Comment entered 2018-03-30 14:36:36 by Kline, Bob (NIH/NCI) [C]

Well, the data (tables and docs) were copied from PROD, so if you do any data cleanup I would do it on PROD.

Comment entered 2018-04-10 19:00:03 by Englisch, Volker (NIH/NCI) [C]

Are we still using the Manage PDQ Data Partners forms? I'm not exactly sure what I would need to test here and what changed.

Comment entered 2018-04-11 09:18:38 by Kline, Bob (NIH/NCI) [C]

No, I'll remove them.

Comment entered 2018-04-11 09:40:28 by Kline, Bob (NIH/NCI) [C]
Comment entered 2018-04-11 17:03:36 by Englisch, Volker (NIH/NCI) [C]

It appears that all of the Licensee documents on QA are now invalid. Has the schema been updated?

Comment entered 2018-04-11 18:02:32 by Kline, Bob (NIH/NCI) [C]

Has the schema been updated?

I think so, but the global change job hasn't been run yet. Want me to do a test run first, or go straight for the live run?

Comment entered 2018-04-11 18:11:26 by Englisch, Volker (NIH/NCI) [C]

Ah, yes, that would be the other option. It changes the ContactType to ContactRole or does the global change anything else? Go ahead and run it in livemode.

Comment entered 2018-04-11 18:19:25 by Englisch, Volker (NIH/NCI) [C]

You didn't push these changes yet, right? The links are still going to Star Wars.

Should we remove the Manage Data Partners from the menu, too?

Comment entered 2018-04-11 18:22:54 by Kline, Bob (NIH/NCI) [C]

... does the global change anything else?

Here's what the global change job does:

  1. strips all ApplicationInformation blocks

  2. strips all Password elements

  3. changes ContactType to ContactRole

  4. sets the Type attribute for contacts it finds in the database tables

  5. adds a "Data partner tables merged" Comment

I suspect there will be a lot of contacts in the tables which aren't found in the documents because of mismatches in the org names or the email addresses. That problem will be washed away when we refresh QA from PROD.

Comment entered 2018-04-11 18:31:56 by Kline, Bob (NIH/NCI) [C]

Pushed now.

Should we remove ...

Done.

https://github.com/NCIOCPL/cdr-admin/commit/85aef791

Here are the global change test results:

https://cdr-qa.cancer.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2018-04-11_18-04-20

Live run starting in a minute.

Comment entered 2018-04-11 18:44:27 by Englisch, Volker (NIH/NCI) [C]

I looked at a couple of diffs and they all look good.

Could you remind me how to identify now who's receiving an email notification and who isn't? Those with a ContactPerson/@Type will receive an email?

Comment entered 2018-04-11 22:37:36 by Kline, Bob (NIH/NCI) [C]

You get an email notification if

  • the partner to which you're attached is active

  • you have an email address

  • you have a @Type of I or P or S

Comment entered 2018-04-12 07:03:46 by Kline, Bob (NIH/NCI) [C]

Live run completed on QA. Logs attached.

Comment entered 2018-04-12 10:15:52 by Englisch, Volker (NIH/NCI) [C]

... and "partner is active" meaning LicenseeStatus is Production or Test, right?

Comment entered 2018-04-12 11:11:17 by Kline, Bob (NIH/NCI) [C]

In this context it means that the value of the query_term row for the document with the path of /Licensee/LicenseeInformation/LicenseeStatus does not contain the substring "inactive" (so, right).

Comment entered 2018-04-12 15:35:52 by Englisch, Volker (NIH/NCI) [C]

Last ❓ minor change to display attribute for ContactRole and PostalAddress:

Comment entered 2018-04-13 13:05:28 by Englisch, Volker (NIH/NCI) [C]

I confirmed that all active partners received an email notification. Those that did not didn't get merged due to organization name mismatches (fixed on PROD). Most Test partners were disabled on QA. This is expected due to the "age" of the data available.

The only issue I noticed was that one partner received an email notification twice.

Comment entered 2018-04-13 13:46:04 by Kline, Bob (NIH/NCI) [C]

That problem was caused by a mistake in the permissions settings for the group of which the pdqcontent account is a member (explained in more detail in my reply to your earlier email message). Fixed.

Comment entered 2018-05-15 12:19:49 by Englisch, Volker (NIH/NCI) [C]

The new process is now updating the partner XML files directly eliminating the need for two parallel systems.
Closing ticket.

Attachments
File Name Posted User
contacts.xlsx 2018-03-26 07:26:51 Kline, Bob (NIH/NCI) [C]
ocecdr-4100.log 2018-04-12 07:04:23 Kline, Bob (NIH/NCI) [C]
ocecdr4100-warnings.log 2018-03-29 09:53:16 Kline, Bob (NIH/NCI) [C]
partners.xlsx 2018-03-21 14:09:00 Kline, Bob (NIH/NCI) [C]
screenshot-1.png 2018-03-21 13:36:58 Kline, Bob (NIH/NCI) [C]
screenshot-2.png 2018-03-21 13:44:14 Kline, Bob (NIH/NCI) [C]
screenshot-3.png 2018-03-21 13:51:10 Kline, Bob (NIH/NCI) [C]

Elapsed: 0:00:00.001583