Issue Number | 4100 |
---|---|
Summary | Combine Licensee and Content Partner Tables |
Created | 2016-05-16 16:31:26 |
Issue Type | Improvement |
Submitted By | Englisch, Volker (NIH/NCI) [C] |
Assigned To | Kline, Bob (NIH/NCI) [C] |
Status | Closed |
Resolved | 2018-03-29 12:08:20 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/ocecdr/issue.184252 |
We created a new set of DB tables to control in the CDR the email notification of our PDQ content partners. Additional organization information is also stored as the licensee document type.
We should combine the information (and modify the reports) to combine both sets of information into a single set.
LOE estimate is a wild guess, as the requirements are specified at a very high level and will need to be refined, in the process of which the story points may need to be adjusted.
Steps for this task:
✔ Catalog/analyze information stored in database tables and licensee
documents (Bob)
✔ Identify obsolete information (Volker)
✔ Choose new approach for storing information (Bob and Volker)
✔ Adjust story points estimate for task (Bob)
✔ Adjust table and/or document structures (Bob)
✔ Update CSS and Schema for Licensee documents (possibly needed;
Bob/Volker)
✔ Move data (Bob)
✔ Identify data maintenance scripts (Volker)
✔ Modify data maintenance scripts (Bob)
✔ Identify reports and ad-hoc queries (Volker)
✔ Modify reports and ad-hoc queries (Bob)
✔ Test (Volker)
Have I forgotten anything?
Have I forgotten anything?
That depends on item (3) and may need to be added as new tickets:
Update CSS for Licensee templates (if we're still using XML documents to store information)
Update ad-hoc queries (unless that's included in Modify reports)
Here's what I have for step 1. I have tried to identify the corresponding location of the information in the other source for each of the two sources (database tables and CDR Licensee documents). The next step would be for you (~volker) to identify which pieces of information are obsolete.
Licensee
LicenseeInformation
LicenseeNameInformation
OfficialName (data_partner_org.org_name)
@cdr:id (data_partner_org.org_id)
OtherNames? (not in tables)
AlternateName* [name and ID] (not in tables)
FormerName* [name and ID] (not in tables)
Comment* [standard CDR Comment element] (not in tables)
Comment* (not in tables)
LicenseeDepartment* (not in tables)
OrganizationType? [ForProfit|NotForProfit|Government|PublicationReprint]
(not in tables)
ContactDetail? (standard CDR ContactDetail block; not in tables)
LicenseeType [Regular|Internal|Special|Printing] (not in tables)
LicenseeStatus [Production|Production-inactive|Test|Test-inactive]
(data_partner_org.org_status; need mapping with A|S|T vals)
LicenseeStatusDates (mappings not exact with database tables)
TestActivation [date] (data_partner_org.activated) (same thing?)
TestInactivation? [date] (data_partner_org.terminated) (same thing?)
TestExtension? [date] (data_partner_org.renewal) (same thing?)
ProductionActivation? [date] (data_partner.activated) (same thing?)
Comment* (not in tables)
ContactPersons
ContactPerson+
ContactName (data_partner_contact.person_name)
ContactType+ [Business|BusinessBackup|Technical|TechnicalBackup]
(data_partner_contact.contact_type - single occurrence;
need mapping with D|I|P|S values)
ContactDetail [standard CDR ContactDetail block]
PostalAddress? (not in tables)
Phone? (data_partner_contact.phone - required)
TollFreePhone? (not in tables)
Fax? (not in tables)
Email? (data_partner_contact.person_name - required)
WebSite? (not in tables)
ApplicationInformation? (none of these are in the database tables)
DistributionMethods [CD-ROM|E-mail|World Wide Web|Print|Other]
DistributionMethod+ (not in tables)
Audiences (not in tables)
Audience+ [controlled values, e.g.: Attorneys, Hospitals, Universities]
SupplementaryInfoLink* (not in tables)
FtpInformation
UserName (data_partner_org.ftp_username)
Password? (not in tables)
AccountBlocked [Yes/No] (not in tables)
DateLastModified? [date] (data_partner_org.last_mod)
data_partner_contact
contact_id [PK required integer] (not in Licensee docs)
org_id [FK required integer] (not applicable to Licensee docs)
person_name [required string] (ContactPersons/ContactPerson/ContactName)
email_addr [required string] (.../ContactPerson/ContactDetail/Email)
phone [required string] (.../ContactPerson/ContactDetail/Phone)
contact_type [required character, one of D, I, P, S]
(possible mapping with .../ContactPerson/ContactType?)
notif_count [required integer] (not in Licensee docs)
notif_date [optional datetime] (not in Licensee docs)
last_mod [required datetime] (not in Licensee docs)
data_partner_org
org_id [PK required integer] (LicenseeNameInformation/OfficialName@cdr:id)
org_name [required string] (LicenseeNameInformation/OfficialName)
prod_id [FK required integer]
(reflected as "Test" and "Production" in Licensee docs?)
org_status [required character, one of A, S, T]
(LicenseeInformation/LicenseeStatus - need mapping with its 4 values)
activated [required date] (LicenseeInformation/LicenseeStatusDates)
terminated [optional date] (not in Licensee docs for Production;
LicenseeInformation/LicenseeStatusDates/TestInactivation for TEST)
renewal [optional date] (.../LicenseeStatusDates/TestExtension)
ftp_username [optional string] (FtpInformation/UserName)
last_mod [required datetime] (DateLastModified - date only, no time)
data_partner_product
prod_id [PK required integer]
prod_name [required string, "CDR" or "TEST"]
prod_desc [optional string]
inactivated [optional date]
last_mod [required datetime]
None of the elements in the DB tables are obsolete.
A few elements are obsolete in the XML doc:
ApplicationInformation including all child elements
FTPInformation/Password
Regarding the contact_type element:
The contact_type in the DB is used for the email notification: Deleted,
Internal, Primary, Secondary
The ContactType in the XML could be better described as a "role". Not
every person in the XML document is included in the DB because the
Business or BusinessBackup aren't always receiving the notification
emails and vice versa but we can make it mandatory to have everyone
included.
org_status [required character, one of A, S, T]
(LicenseeInformation/LicenseeStatus - need mapping with its 4 values)
A = Production
T = Test
S = Special (this value doesn't exist but it should)
The org_status in the DB doesn't indicate if an organization is active
or inactive. That's what the date is used for if I remember
correctly.
So the following information would need to be captured in the tables if the consolidation puts everything in the database (I'm ignoring elements which the schema allows, but isn't present in any of the documents):
LicenseeInformation/LicenseeNameInformation/OtherNames/AlternateName/Name
LicenseeInformation/LicenseeNameInformation/OtherNames/FormerName/Name
LicenseeInformation/LicenseeNameInformation/Comment
LicenseeInformation/LicenseeDepartment
LicenseeInformation/OrganizationType
LicenseeInformation/ContactDetail/Email
LicenseeInformation/ContactDetail/Fax
LicenseeInformation/ContactDetail/Phone
LicenseeInformation/ContactDetail/PostalAddress/City
LicenseeInformation/ContactDetail/PostalAddress/Country
LicenseeInformation/ContactDetail/PostalAddress/PoliticalSubUnit_State
LicenseeInformation/ContactDetail/PostalAddress/PostalCode_ZIP
LicenseeInformation/ContactDetail/PostalAddress/Street
LicenseeInformation/ContactDetail/TollFreePhone
LicenseeInformation/ContactDetail/WebSite
LicenseeInformation/LicenseeType
LicenseeInformation/LicenseeStatus (partially in the database)
LicenseeInformation/LicenseeStatusDate (partially in the database)
LicenseeInformation/Comment
ContactPersons/ContactType
ContactPersons/ContactDetail/PostalAddress
ContactPersons/ContactDetail/TollFreePhone
ContactPersons/ContactDetail/Fax
ContactPersons/ContactDetail/WebSite
FtpInformation/AccountBlocked
So we would need, at a minimum, four additional tables for this approach:
data_partner_other_name
data_partner_licensee_department
data_partner_comment
data_partner_contact_type
... as well as quite a few new columns added to the existing tables.
If, on the other hand, we decide to consolidate the information into Licensee documents, the following additional information would need to be captured in the XML documents:
data_partner_contact.contact_type
data_partner_contact.notif_count
data_partner_contact.notif_date
data_partner_contact.last_mod
data_partner_org.prod_id
(represented in the
documents in a different way)
data_partner_org.org_status
(not quite a one-to-one
mapping with document)
data_partner_org.terminated
(not in Licensee docs
for Production)
data_partner_product.prod_desc
data_partner_product.inactivated
data_partner_product.last_mod
Some of this information is awkward to capture in the XML documents, as it would be denormalization of data which by its nature should be normalized (for example, the product information).
So neither approach is obviously the compelling Right Choice. It seems to me that it would be more work, more cumbersome, and less intuitive to capture the information needed by the extra tables required for the first approach, but I'm guessing more of the existing software is built around the database tables than around the XML documents. I'm inclined to think we should go with the second approach (expand the XML documents to capture the additional information contained in the database tables and eliminate those tables and rewrite the software that relies on those tables). We might want to pull out the information which would be denormalized (i.e., product information) into a separate document type, or store it somewhere else in the system.
What are your thoughts?
The problem is that we used to have one system for our email notification on the FTP server and another system for administrative work. The administrative portion (a.k.a. XML document) is very convenient. It's very nice to have all of the information in one place but the email system duplicates some of the information.
Another thought I have is this: Why not keep what we have (2
systems) but adjust it somewhat to eliminate duplication?
For instance, the FTP information is not needed in the XML document and
can be added to the tables. In fact, we already have the
username as part of the tables but we would need to add the
Account Blocked which identifies when CBIIT has disabled the
account.
The Application Information isn't used anymore and will get
dropped.
Contact Persons: We don't really need to include the postal
address. What we would need to add to the tables would be the
Type (Business, Technical, Backups) and maybe name it Org
Function
What's left is the Licensee Information of the XML document.
That's the portion I use the most - updating Licensee Status and
Type, updating Activation/Extension, entering
Comment in the XML file.
I'm a little confused. You've said ...
Why not keep what we have (2 systems) but adjust it somewhat to eliminate duplication?
But then you wrote:
What we would need to add to the tables would be the Type ....
That seems to me to be heading in the direction of more duplication. What am I missing?
Maybe we should do the logical thing and start all the way down stream and work our way backwards from the end product(s). Can you identify all of the scripts which consume (as opposed to maintain) this data?
If we want to eliminate the Contact Persons from the XML document we will need to add a person type or Org Function value. The type element in the DB tables is different from the type element in the XML document, so we would want to capture that information.
There isn't too much.
data_partners.py (scheduled job running weekly)
licensee_list.py (scheduled job running monthly)
a job extracting the FTP log data that I'm just realizing is not running anymore.
XMetaL
CDRQueries Interface
Licensee QC Report Filter for QC reports
* XMetaL
That would fall in the pile of tools which maintain the data, rather than scripts which consume that data, right?
* a job extracting the FTP log data that I'm just realizing is not running anymore.
Should it be?
Looks like I did leave out a step. As reflected in the attached Excel report, we will need to reconcile some discrepancies between the two data sources.
Do we need to back fill Licensee
documents for
terminated test partners appearing in the data partner table but not in
the repository (e.g. B4CC, Inc. or AM Systems)?
Which date do we keep when the Licensee
document has
one activation date and the data partner table has another (see, for
example, American Association for Cancer Research, or City of Hope -
Department of Information Sciences)?
What do we do with Test
accounts whose activation
date is much further in the past than 90 days (e.g., Arctic Biosystems,
LLC, activated March 28, 2014)?
We have some Test
accounts with a deactivation date
(and no production activation), but whose status is "Test" instead of
"Test-inactive" (for example, Asan Medical Center).
What do we do with accounts which are marked as Test accounts in the repository but not in the database table (for example, Belong)?
Some Licensee documents don't have a corresponding row in the database table (for example, BreastCancerTrials.org, activated as a production licensee in July 2008, or Damien Leri (s), activated in August 2011).
Some active partners appear in the data partner table, but don't
have a corresponding Licensee
document in the repository
(e.g., Cancer Trials (Coalition)).
Some partners have a different name in their
Licensee
document than in the data partner table (e.g.,
CiLogic versus CiLogik, Inc.).
Some of the name discrepancies are caused by an appended suffix of " (s)"; assuming this is intended to reflect the "Special" nature of the data partner relationship, this information should be stored in the document in a separate element or attribute (see, for example, "Cincinnati Children's Hospital Medical Center" versus "Cincinnati Children's Hospital Medical Center (s)").
As noted in the comments at the head of the script which sends out notifications to the data partners, there are problems with the "TEST" product.
The records for partners are wiped out by a database refresh to the lower tiers from production.
There are redundant ways to test partner notification, so the "TEST product" partners are superfluous.
The use of "test" to mean completely different things in the data partner system is very confusing.
This third (data_partner_product
) table complicates
the software unnecessarily, and complicates efforts to consolidate the
partner information in the Licensee
documents, as it would
introduce unwanted denormalization.
I propose that as part of the reorganization for this task, we
eliminate this artificial "product" concept, and do our testing on the
lower tiers, where notification emails are sent to the members of the
"Test Publishing Notification"
CDR group instead of the outside partners' email addresses anyway. We
could even have an option flag which did the same thing on PROD if we
want.
Do we need to back fill Licensee documents for terminated test partners appearing in the data partner table but not in the repository (e.g. B4CC, Inc. or AM Systems)?
There are partners in the tables that are not available as XML documents??? I need to look into that.
Which date do we keep when the Licensee document has one activation date and the data partner table has another (see, for example, American Association for Cancer Research, or City of Hope - Department of Information Sciences)?
There were a few partners who were activated, stopped using the data, and came back again. This was treated as a new activation rather than a reactivation. I'll have to look at the comments for these partners. I'd say the activation date in the XML document should be used if there are differences.
What do we do with Test accounts whose activation date is much further in the past than 90 days (e.g., Arctic Biosystems, LLC, activated March 28, 2014)?
These should be terminated.
We have some Test accounts with a deactivation date (and no production activation), but whose status is "Test" instead of "Test-inactive" (for example, Asan Medical Center).
I probably just forgot to change the status. They should be "Test-inactive".
What do we do with accounts which are marked as Test accounts in the repository but not in the database table (for example, Belong)?
I'll have to check the comments. This is probably a partner who changed to a "Special" partner.
Some Licensee documents don't have a corresponding row in the database table (for example, BreastCancerTrials.org, activated as a production licensee in July 2008, or Damien Leri (s), activated in August 2011).
I believe these are "Special" partners who opted out of being notified by email. We need a way to suppress the email notification for special partners or business persons who don't receive an email now.
Some active partners appear in the data partner table, but don't have a corresponding Licensee document in the repository (e.g., Cancer Trials (Coalition)).
No? I have the Coalition listed as a partner.
Some partners have a different name in their Licensee document than in the data partner table (e.g., CiLogic versus CiLogik, Inc.).
That's a mistake.
Some of the name discrepancies are caused by an appended suffix of " (s)"; assuming this is intended to reflect the "Special" nature of the data partner relationship, this information should be stored in the document in a separate element or attribute (see, for example, "Cincinnati Children's Hospital Medical Center" versus "Cincinnati Children's Hospital Medical Center (s)").
Agreed. Since I only had "Active"/"Test" available I had to mark "Special" vendors as "Active" and added the (s) to their name to easily identify them.
I know, there are still a lot of open questions here. I need some time to look into the records to come up with all answers.
I agree. The "Test" product was only used when we had to run both, the test and production notification on the same machine. If it simplifies the system removing the Product (we haven't used a true second product for 10 years or so) you should go for it.
It looks like there are quite a few Licensee
documents
with a bogus status:
Where is the status coming from? I'm looking at the first one and see its status to be Test-inactive. I'm using the query Licensee Contacts from the Query interface as my go-to report.
There are partners in the tables that are not available as XML documents??? I need to look into that.
Here are some more incorrect statuses, this time "Production" where either the document or the table (but in none of the five cases both) have a deactivation date. I can see why you're eager to have this data consolidated. :-)
Are you looking at PROD?
I understand why AM systems is not available because Alan was an
internal user but I see B4CC available both in the XML documents and in
the tables.
Yes, double data entry is somewhat error prone. Even for someone like me.
Oops! Half of the analysis was looking at PROD (the tables part) but there's a bug in my script which I thought was pulling down the documents from PROD, but wasn't. Back to the drawing board. :-(
All better! Sorry about that. Still some anomalies, but on a much
more manageable scale. Corrected spreadsheet attached. Sorted by org
status and then by name, but it's a spreadsheet, so you can sort it any
way you want. There's still a bunch of mismatched names, and "Test"
Licensee
documents whose corresponding row in the partner
table has a termination date but no deactivation recorded in the
document.
Can I go in and add the "(s)" to the special partners in the tables as part of the cleanup? We can remove this once we have another indicator for the specials after the merge.
My recommendation for an approach is starting to take shape. It might look something like this:
Eliminate discrepancies between the two sources.
Drop the obsolete elements from the schema/docs (e.g. passwords and application information).
Change ContactType
in the documents to
ContactRole
and use ContactType
to capture the
Primary/Secondary (and whatever "I" and "D" stand for) from the
data_partner_contact
table.
Replace the three existing data partner tables with single
data_partner_notification
table, where we will continue to
record when we last sent an email notification message to a particular
email address.
Rewrite get-pdq-contacts.py
and
get-pdq-partners.py
(and perhaps the stored queries in
CdrQueries.py) to pull data from the query_term
table
instead of the partner tables.
I'm anticipating using a global change to do much of the data juggling.
I considered eliminating the database tables altogether, storing the
last-notified dates in the ContactPerson
blocks, but that
would mean creating a new version for each email message sent out, and
would mean I'd have to force-break any locks you had on the documents.
Those wouldn't necessarily be showstoppers, and if you prefer to do a
complete consolidation, that's what we'll do, but right now I'm leaning
toward using a database table for this purpose. One approach to this
table would be to have only two columns, one for the email address
(possibly normalized to all lowercase), and one for the notification
date/time. That way, we could keep a notification history, rather than
just a record of the last one. Unless you're determined to retain the
notif_count
column's data, I'd be inclined to just drop it
(or stuff it as an historical number somewhere in the
Licensee
document), and let the number of rows with the
same email address take over it's role (sort of). Note that I'm not
taking the trouble to link the rows in this table with the
ContactPerson
elements in the documents (though I gave that
some thought, too, perhaps assigning the ContactPerson
blocks unique IDs, but decided against it). If the contact person
changes, or changes her email address, the next row in the notification
table won't have any indication that it belongs with the rows containing
the older email address. I'm viewing the sole required purpose of the
database table as answering the question "if the notification job aborts
in the middle and I have to pick up where I left off, how will I know
which notifications have already been sent out for this job?".
Sure.
Ok, unless you're telling me this will mess up the work you're doing right now I will correct/synchronize the org name either in the partner table or the XML document.
Unless you're determined to retain the notif_count column's data, I'd be inclined to just drop it
That's OK. In the beginning of the email notifications we didn't use a date and used the notification count for identifying the time past. Using an actual date makes slightly more sense. :-)
Good.
I updated all org names in the table. There is one name I'll have to
update in the XML document but I don't have access to XMetaL at
home.
All of the other mismatches are internal emails that we should merge
into a single "organization" and for which I'll have to create a
document in the CDR.
Good. So you're OK with my general approach?
Note that I'm not taking the trouble to link the rows in this table with the ContactPerson elements in the documents
That's OK. We don't pay a lot of attention to the persons other that we need to have one per org for the notification. Any relationship between one person being replaced or changing phone/email is captured in the comments.
I'm viewing the sole required purpose of the database table as answering the question "if the notification job aborts in the middle and I have to pick up where I left off, how will I know which notifications have already been sent out for this job?".
I have a little problem with this. Currently, the SendEmail program
also disables test accounts after their time is up and stops sending
more messages. I was hoping we could update the status directly in the
XML document but I'm OK continuing this as a manual process if I get a
message to do so.
I do agree, creating new document versions for each notification email
is a little overkill.
Well, we could compromise, and say that the notification script will update the document (with a new version) only when the account is disabled.
I like that.
Drilling down into the contacts (see attachment) we've got some
discrepancies there to take care of as well. One question I have is
whether there is any semantic difference conveyed by having two separate
ContactPerson
blocks each with its own
ContactType
(as for example with Aqil Ahsan in the CiLogik,
Inc. document) versus one ContactPerson
block with
two ContactType
elements (e.g. Emily DeVoto's
block in the document for the National Breast Cancer Coalition Fund
(NBCCF)).
My plan is to mark all of the contacts in the documents which have no
counterpart in the database partner table with a
ContactType
(see notes above for renaming of the existing
ContactType
element to ContactRole
and the use
of ContactType
to capture the P|S|I|D values from the
contact table) with "D" unless you think that approach would be a
problem, in which case we can come up with a fifth value.
~volker: Since you're out
for the next couple of days, I'm going to go ahead and do some cleanup
on the production Licensee
documents myself.
Why are we using the standard CDR Comment
elements but
putting the user and date in the comment text instead of the proper
attributes?
What does it mean for a Licensee
document to have a
status of "Production" but an AccountBlocked
value of
"Yes"? (See the document for the American Association for Cancer
Research.)
~volker: I believe you
said the other day something like (I don't remember the exact wording)
"some partners [contacts?] switch to a "Special" status so they don't
get notifications." Are you aware that the current software sends
notifications to all non-deactivated contacts for production "licensees"
of all types (including "Special")? So, for example, we're sending
notifications to Julie Hom (jhom@coh.org) and her colleagues at City of
Hope - Department of Information Sciences even though they have a
LicenseeType
of "Special" (though Julie's notification
didn't go out last week because the mail server broke the connection
when we were trying to send it out – see the
data-partner-notification.log). Perhaps I misunderstood what you were
telling me.
Hmm. That was a little too optimistic. I see that the
ContactType
element (which will become
ContactRole
) is required, and I don't have any way of
knowing what those values should be for the contacts who are in the
database partner contact table but not in the Licensee
document, so I'll have to wait until you're back from vacation.
Here's some more discrepancies you may want to address:
CDR776336 has u'Todd Mckee' where DB has u'Todd McKee'
CDR761991 has u'Nathan Mobley' where DB has u'Nathan Mobley'
CDR714136 has u'Richard Z. Liu' where DB has u'Richard Liu'
CDR661243 has u'Haiqing Li ' where DB has u'Haiqing Li'
CDR661231 has u'Elly Cohen' where DB has u'Ellyn Cohen'
CDR783070 has u'Kacee Puzniak ' where DB has u'Kacee Puzniak'
CDR783070 has u'Matthew Coburn ' where DB has u'Matthew Coburn'
CDR751426 has u'Webmaster' where DB has u'HCI Webmaster'
CDR749385 has u'Douglas Wood' where DB has u'Douglas J. Wood'
CDR661251 has u'Gustav Quade' where DB has u'Dr. Gustav Quade'
CDR792347 has u'\u2022\tBrandon Welch' where DB has u'Brandon Welch'
CDR661227 has u'Webmaster' where DB has u'Web Master'
CDR787305 has u'Kirk Junker ' where DB has u'Kirk Junker'
CDR787305 has u'Alex Vilner ' where DB has u'Alex Vilner'
CDR682665 has u'Jessica Begley' where DB has u'Jessica Begley'
~volker: Do you want the
DateLastModified
element changed for all of the
documents?
Schema changes installed on DEV:
https://github.com/NCIOCPL/cdr-server/commit/ce056709
Global change test run on DEV:
https://cdr-dev.cancer.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2018-03-26_17-05-52
S = Special (this value doesn't exist but it should)
Actually, you're already storing that in the
LicenseeType
element.
One question I have is whether there is any semantic difference conveyed by having two separate ContactPerson blocks each with its own ContactType (as for example with Aqil Ahsan in the CiLogik, Inc. document)
No, Agil Ahsan is a mistake, the correct data entry would be that of Emily DeVoto. It's likely a person was dropped and replaced with Agil Ahsan and I overlooked that person was already in the list of contacts.
My plan is to mark all of the contacts in the documents which have no counterpart in the database partner table with a ContactType (see notes above for renaming of the existing ContactType element to ContactRole and the use of ContactType to capture the P|S|I|D values from the contact table) with "D" unless you think that approach would be a problem, in which case we can come up with a fifth value.
The meaning of the ContactType values P|S|I|D is as follows:
P - Primary
This is the person who requested the account and is responsible for the
username/password. It's typically the technical person but not always.
There should always be exactly one primary person per account.
S - Secondary, anybody getting notified who is not the primary person.
I - Internal, not a licensee but a developer or department head like Lakshmi.
D - Deleted, a person not receiving emails anymore. This person is now inactive.
Although the D value indicates a person not receiving the email notification we would loose the information of who is currently a contact person. I would prefer to add another value like N (No Notification) in order to indicate this entry as active without sending emails.
Why are we using the standard CDR Comment elements but putting the user and date in the comment test instead of the proper attributes?
I always insert elements from the Element List and those
attributes are only populated when you're adding the Comment
from the context menu.
The Insert Comment menu item is actually a useful feature and I
should use it for future comments.
The @Type attribute is optional, and only contacts with a @Type value
of "P" or "S" or "I" get notifications, which should do what you want (I
decided not to bother applying any attribute value at all for
ContactPerson
blocks with no corresponding row in the data
partner contact table, because it looked like what you had been doing
was to delete such blocks when the contact should no longer receive
notifications).
What does it mean for a Licensee document to have a status of "Production" but an AccountBlocked value of "Yes"? (See the document for the American Association for Cancer Research.)
That's an error. This partner is still active and I fixed the entry.
I've got the software written/modified, the partner data on DEV refreshed from PROD (docs and tables), the schema modified, the global change run, the new table created, and I'm about to test the notification script on DEV.
https://cdr-dev.cancer.gov/cgi-bin/cdr/get-pdq-contacts.py
https://cdr-dev.cancer.gov/cgi-bin/cdr/get-pdq-partners.py
These now pull the information from the documents and the new notification table, instead of from the old data partner tables.
Are you aware that the current software sends notifications to all non-deactivated contacts for production "licensees" of all types (including "Special")?
Yes, that's correct. Some special partners do opt out of
receiving email notification in which case I would not add them to the
partner tables but the default is that all active partner contacts with
status Production and Special will receive an
email.
It wouldn't be a problem to send emails to all active partners
regardless if they're special or not.
Here's some more discrepancies you may want to address:
Done.
Do you want the DateLastModified element changed for all of the documents?
Changed the element or change the date as part of the merge?
I would prefer to keep the DLM as is but add a comment with a date
("Data Partner Tables merged" or something like that).
Changed the element or change the date as part of the merge?
The latter. I left the dates alone. I'll modify the global change to add the comment (probably tomorrow). When the job marks a partner ("Licensee") as deactivated it sets the DLM.
Will I still need to look over the spreadsheet you posted on Monday morning?
Wouldn't hurt to at least do a quick pass over it.
~volker: would you like to
tweak the XMetaL CSS for the Licensee
docs, or shall I do
that?
Yes, I'll do the tweaking.
I modified the global change to add the requested comment and re-ran the job on DEV. I have attached the validation warnings extracted from the log for your entertainment (many of the documents aren't valid).
Does "not valid" mean those haven't been updated as part of the global or just they aren't publishable. I know a bunch of documents aren't publishable due to incomplete addresses.
Does "not valid" mean those haven't been updated as part of the global or just they aren't publishable.
Neither, really. All the documents were updated. We don't publish the Licensee documents, so it's irrelevant that they aren't publishable. It just means what it says, that they aren't valid. Specifically:
U.S. address must have valid ZIP code (49 occurrences)
Missing ContactDetail element (2 occurrences)
Invalid email addresses – trailing spaces (1 occurrence)
Element 'Country': The attribute 'cdr:ref' is required but missing (1
occurrence)
Element 'PoliticalSubUnit_State': The attribute 'cdr:ref' is required
but missing (6 occurrences)
Missing 'AddressType' attribute (2 occurrences)
Invalid OrganizationType values (1 occurrence)
Missing PoliticalSubUnit_State element (3 occurrences)
I think the essential work on this ticket is complete. Components to be taken care of for deployment include:
schema change
global change
CSS modifications for XMetaL (Volker's going to do this)
creation/population of new partner notification table
modifications to notification script for scheduler
modifications to CGI services to fetch partners and contacts
Items 1, 3, 5, and 6 will be taken care of by the core build/deploy scripts. Item 4 will be handled by a one-off script which will be invoked by the wrapper deploy.bat. Item 2 will be run manually before the first weekend jobs affected by the change. Have I missed anything ~volker?
The following files have been updated to adjust for the schema changes:
Licensee.ctm
Licensee.xml
Licensee.css
Licensee_structure.css
This has been versions in github:
https://github.com/NCIOCPL/cdr-client/commit/d5006fc
[ hawking ]
I have attached the validation warnings extracted from the log for your entertainment (many of the documents aren't valid).
These warnings are for the run on DEV, right? It wouldn't make much sense for me to clean up the data if that's correct.
Well, the data (tables and docs) were copied from PROD, so if you do any data cleanup I would do it on PROD.
Are we still using the Manage PDQ Data Partners forms? I'm not exactly sure what I would need to test here and what changed.
No, I'll remove them.
Gone.
https://github.com/NCIOCPL/cdr-admin/commit/7d8a55ba
https://github.com/NCIOCPL/cdr-lib/commit/de883fc
It appears that all of the Licensee documents on QA are now invalid. Has the schema been updated?
Has the schema been updated?
I think so, but the global change job hasn't been run yet. Want me to do a test run first, or go straight for the live run?
Ah, yes, that would be the other option. It changes the ContactType to ContactRole or does the global change anything else? Go ahead and run it in livemode.
You didn't push these changes yet, right? The links are still going to Star Wars.
Should we remove the Manage Data Partners from the menu, too?
... does the global change anything else?
Here's what the global change job does:
strips all ApplicationInformation
blocks
strips all Password
elements
changes ContactType
to
ContactRole
sets the Type
attribute for contacts it finds in the
database tables
adds a "Data partner tables merged" Comment
I suspect there will be a lot of contacts in the tables which aren't found in the documents because of mismatches in the org names or the email addresses. That problem will be washed away when we refresh QA from PROD.
Pushed now.
Should we remove ...
Done.
https://github.com/NCIOCPL/cdr-admin/commit/85aef791
Here are the global change test results:
https://cdr-qa.cancer.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2018-04-11_18-04-20
Live run starting in a minute.
I looked at a couple of diffs and they all look good.
Could you remind me how to identify now who's receiving an email notification and who isn't? Those with a ContactPerson/@Type will receive an email?
You get an email notification if
the partner to which you're attached is active
you have an email address
you have a @Type of I
or P
or
S
Live run completed on QA. Logs attached.
... and "partner is active" meaning LicenseeStatus is Production or Test, right?
In this context it means that the value of the
query_term
row for the document with the path
of /Licensee/LicenseeInformation/LicenseeStatus
does
not contain the substring "inactive" (so, right).
Last ❓ minor change to display attribute for ContactRole and PostalAddress:
Licensee.css
https://github.com/NCIOCPL/cdr-client/commit/b962062
[ hawking ]
I confirmed that all active partners received an email notification. Those that did not didn't get merged due to organization name mismatches (fixed on PROD). Most Test partners were disabled on QA. This is expected due to the "age" of the data available.
The only issue I noticed was that one partner received an email notification twice.
That problem was caused by a mistake in the permissions settings for the group of which the pdqcontent account is a member (explained in more detail in my reply to your earlier email message). Fixed.
The new process is now updating the partner XML files directly
eliminating the need for two parallel systems.
Closing ticket.
File Name | Posted | User |
---|---|---|
contacts.xlsx | 2018-03-26 07:26:51 | Kline, Bob (NIH/NCI) [C] |
ocecdr-4100.log | 2018-04-12 07:04:23 | Kline, Bob (NIH/NCI) [C] |
ocecdr4100-warnings.log | 2018-03-29 09:53:16 | Kline, Bob (NIH/NCI) [C] |
partners.xlsx | 2018-03-21 14:09:00 | Kline, Bob (NIH/NCI) [C] |
screenshot-1.png | 2018-03-21 13:36:58 | Kline, Bob (NIH/NCI) [C] |
screenshot-2.png | 2018-03-21 13:44:14 | Kline, Bob (NIH/NCI) [C] |
screenshot-3.png | 2018-03-21 13:51:10 | Kline, Bob (NIH/NCI) [C] |
Elapsed: 0:00:00.001583