Issue Number | 2954 |
---|---|
Summary | [Genetics Directory] Vendor filter changes for publication from the CDR |
Created | 2009-08-27 16:22:43 |
Issue Type | Improvement |
Submitted By | Beckwith, Margaret (NIH/NCI) [E] |
Assigned To | Englisch, Volker (NIH/NCI) [C] |
Status | Closed |
Resolved | 2010-04-08 15:58:05 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/ocecdr/issue.107282 |
BZISSUE::4629
BZDATETIME::2009-08-27 16:22:43
BZCREATOR::Margaret Beckwith
BZASSIGNEE::Volker Englisch
BZQACONTACT::William Osei-Poku
We need to make changes to the vendor filter in order to publish the Genetics Directory from the CDR after conversion of the data.
BZDATETIME::2009-08-31 12:12:39
BZCOMMENTOR::Volker Englisch
BZCOMMENT::1
Will there be any changes to the licensee DTD?
BZDATETIME::2009-09-04 14:18:04
BZCOMMENTOR::Volker Englisch
BZCOMMENT::2
According to Bob there will be data to work with on MAHLER sometime next week.
The licensee DTD will not change.
BZDATETIME::2009-09-23 13:12:10
BZCOMMENTOR::Volker Englisch
BZCOMMENT::3
I have a question about the output of the vendor filters.
Is our goal to deliver documents that are identical (or as close as
possible) to what we used to deliver from the point-of-view of the
source of the document or the structure of the document?
For example:
The data is currently stored in the CDR like this
<ID>
12
</ID>
<NAME>
<SNAME>
Englisch
</SNAME>
<FIRSTNAME>
Volker
</FIRSTNAME>
</NAME>
...
and this is how we deliver the data to our licensees instead of something like this (which has the advantage that the text-node doesn't include white space in front and at the end of the data:
<ID>12</ID>
<NAME><SNAME>Englisch</SNAME><FIRSTNAME>Volker</FIRSTNAME></NAME>
...
My suggestion would be to use the same format as all other CDR document types but we wouldn't be able to create diff reports between the old and the new data formats.
BZDATETIME::2009-10-06 19:45:08
BZCOMMENTOR::Volker Englisch
BZCOMMENT::4
(In reply to comment #3)
> My suggestion would be to use the same format as all other CDR
document types
Per discussion at the status meeting this is what we're going to do.
BZDATETIME::2009-10-06 19:47:47
BZCOMMENTOR::Volker Englisch
BZCOMMENT::5
I noticed that the degree for the GenProf documents are displayed
with periods as in
M.D., Ph.D., M.S., etc.
while the LOV for the element StandardProfessionalSuffix doesn't display
the periods
MD, PhD, MS, etc.
Are we keeping the values as defined in the LOV or do we need to convert them in the filter?
BZDATETIME::2009-10-07 15:19:42
BZCOMMENTOR::Volker Englisch
BZCOMMENT::6
Another Data question:
The country is listed as 'United States' in the current data which is
the CountryShortName in the country document.
We are typically using the CountryFullName for display with is 'U.S.A.'
for the US.
Should we use United States or U.S.A. as the country name?
If we change the display it may be possible that Cancer.gov will display the country for US addresses as well as for foreign countries but that might happen anyway because we won't submit the extra spaces anymore (see Comment #3).
BZDATETIME::2009-10-07 18:33:50
BZCOMMENTOR::Volker Englisch
BZCOMMENT::7
I've finished the filter as much as I could without seeing the actual
data.
What's left to be done is the CancerType/CancerSite information and
finishing up the address information.
The filter currently used is
CDR650153 - [Test] Vendor GenProf
BZDATETIME::2009-10-09 15:30:22
BZCOMMENTOR::Volker Englisch
BZCOMMENT::8
(In reply to comment #6)
> If we change the display it may be possible that Cancer.gov will
display the
> country for US addresses as well as for foreign countries but that
might
> happen anyway because we won't submit the extra spaces anymore
(see
> Comment #3).
I've talked with Blair about this issue. The Gatekeeper/Cancer.gov
software is checking for the string 'United States' in three places (as
far as we could see) and the publishing of the new GenProf. documents
would need to be coordinated with a change of the software.
In short, changes will be necessary on the Cancer.gov if we change the
country display from 'United States' to 'U.S.A.' but those changes will
be minor.
BZDATETIME::2009-10-28 16:19:40
BZCOMMENTOR::Margaret Beckwith
BZCOMMENT::9
Currently in the Genetics Directory on Cancer.gov we leave the country off it is the United States. Could we just do that (or maybe I am missing the problem here).
BZDATETIME::2009-10-28 16:29:23
BZCOMMENTOR::Volker Englisch
BZCOMMENT::10
(In reply to comment #9)
> Currently in the Genetics Directory on Cancer.gov we leave the
country off
That's correct but Cancer.gov has code that looks for the string
'United States'. When it sees this string it will suppress the display
of the country. After the conversion we will send the string 'U.S.A.' to
Cancer.gov and therefore the country would be displayed without a
code/string change.
As I said, it's a minor change but a change that will need to be
coordinated with the implementation of our filters to create the "new"
documents.
BZDATETIME::2009-10-28 16:33:07
BZCOMMENTOR::Margaret Beckwith
BZCOMMENT::11
Thanks for the explanation! This makes sense, and I figured it must be something along those lines.
BZDATETIME::2009-11-02 15:33:50
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::12
Here are a few observations from our test on Franck comparing with data from the cancer.gov web site. I am pretty sure some of these things may have been addressed already in the vendor filter but I just wanted to include them here just in case you are not aware of them.
1. Zip code
When zip codes have 4 digit extensions, the hyphen is not displayed. For
example
GP 362 (CDR658172 on Franck) - Erin R. Dola
Web site http://www.cancer.gov/search/view_geneticspro.aspx?personid=556226
GP 245 (CDR658198) Linda Robinson
Web site http://www.cancer.gov/search/view_geneticspro.aspx?personid=556055
2. City names without comma
The address information is displayed on the web site without the usual
comma that follows the city name: Examples, same as above.
SALT LAKE CITY UT 84112 5550
SALT LAKE CITY, UT 84112-5550
3. Email address
In the Gen Prof database, the email address of the professional is
entered in only one location but on the web site, it is added to all
locations (in the case of multiple locations). In the same way in the
CDR, the email address would usually be found at the CIPS contact
location. I guess CIAT need not do anything if this will continue to be
the case after the conversion.
BZDATETIME::2009-11-06 12:42:42
BZCOMMENTOR::Volker Englisch
BZCOMMENT::13
(In reply to comment #12)
> 1. Zip code
> When zip codes have 4 digit extensions, the hyphen is not
displayed.
Good, we do want to display the ZIP+4 properly including the hyphen.
> 2. City names without comma
I believe this will require a change request to the Cancer.gov
team.
I'll mention this in combination with the switch for the country
from
United States to U.S.A.
> 3. Email address
> In the Gen Prof database, the email address of the professional is
entered in
> only one location but on the web site, it is added to all locations
(in the
> case of multiple locations). In the same way in the CDR, the email
address
> would usually be found at the CIPS contact location. I guess CIAT
need not do
> anything if this will continue to be the case after the
conversion.
I'm not sure what you are trying to say here.
Looking at the sample of Linda Robinson, the data that we provide to
Cancer.gov does have the email address listed for each location.
Therefore, the Cancer.gov correctly displays the email address submitted
with each of the location blocks.
In the CDR there is only one email address listed for one of the
location blocks.
What are you proposing for the vendor filters to do?
a) Create the email based on the SpecificEmail element if it
exists
b) Create the same email for all of the location blocks if a
single
SpecificEmail element exists for any of the location blocks
c) What would you want to do if two different emails exist?
BZDATETIME::2009-11-06 12:57:30
BZCOMMENTOR::Volker Englisch
BZCOMMENT::14
In the GeneticsProfessional data (the current vendor output) there
are three children of the name element:
<NAME>
<SNAME>
<FIRSTNAME>
<LASTNAME>
</NAME>
Can anyone tell me how the SNAME element is being constructed and if this element is being used by Cancer.gov?
It appears that this element is something like a display name and it
combines the first character of the first name with the middle initial
and the last name like:
Volker Englisch --> V Englisch
James M. Karns --> JM Karns
but it also contains things like
Diana Moglia --> M Moglia
It doesn't appear that this element is being used at all on Cancer.gov (I'll double-check with Blair) but it would be nice to know how this element should get constructed.
BZDATETIME::2009-11-06 13:40:49
BZCOMMENTOR::Volker Englisch
BZCOMMENT::15
(In reply to comment #13)
> I believe this will require a change request to the Cancer.gov
team.
> I'll mention this in combination with the switch for the country
from
> United States to U.S.A.
I was wrong about this one.
Cancer.gov is displaying the address block as its been presented to
them via the CADD elements and it is not constructing the address from
the City, State, Zip information.
I will modify the vendor filter accordingly.
BZDATETIME::2009-11-06 14:02:32
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::16
(In reply to comment #13)
> > 3. Email address
> > In the Gen Prof database, the email address of the
professional is entered in
> > only one location but on the web site, it is added to all
locations (in the
> > case of multiple locations). In the same way in the CDR, the
email address
> > would usually be found at the CIPS contact location. I guess
CIAT need not do
> > anything if this will continue to be the case after the
conversion.
>
> I'm not sure what you are trying to say here.
> Looking at the sample of Linda Robinson, the data that we provide
to Cancer.gov
> does have the email address listed for each location. Therefore,
the
> Cancer.gov correctly displays the email address submitted with each
of the
> location blocks.
> In the CDR there is only one email address listed for one of the
location
> blocks.
You're right and we recognize all the above.
> What are you proposing for the vendor filters to do?
> a) Create the email based on the SpecificEmail element if it
exists
> b) Create the same email for all of the location blocks if a
single
> SpecificEmail element exists for any of the location blocks
> c) What would you want to do if two different emails exist?
We are going to maintain the email address only at the SpecificEmail of the CIPContact. So this was just a heads up. Because we have already said that nothing was going to change on Cancer.gov, I wanted you to be aware of where the email address would be so that you can still include it at every location, in case of multiple locations. In other words, we will continue to have one email address but it needs to continue to be displayed at all locations.
BZDATETIME::2009-11-06 14:37:15
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::17
(In reply to comment #14)
> In the GeneticsProfessional data (the current vendor output) there
are three
> children of the name element:
> <NAME>
> <SNAME>
> <FIRSTNAME>
> <LASTNAME>
> </NAME>
>
> Can anyone tell me how the SNAME element is being constructed and
if this
> element is being used by Cancer.gov?
>
> It appears that this element is something like a display name and
it combines
> the first character of the first name with the middle initial and
the last name
> like:
> Volker Englisch --> V Englisch
> James M. Karns --> JM Karns
>
> but it also contains things like
> Diana Moglia --> M Moglia
>
> It doesn't appear that this element is being used at all on
Cancer.gov (I'll
> double-check with Blair) but it would be nice to know how this
element should
> get constructed.
I am adding Bob (and Margaret) to this issue to see if he can answer this question. I am not sure how it is being used. For the converted documents we have some of the combinations you mentioned above but for most of the records, they were converted as GivenName, MiddleInitial and SurName.
BZDATETIME::2009-11-06 14:40:47
BZCOMMENTOR::Volker Englisch
BZCOMMENT::18
(In reply to comment #16)
> We are going to maintain the email address only at the
SpecificEmail of the
> CIPContact.
I see. I wasn't aware of this.
I have updated the vendor filter to display the email address listed as
the SpecificEmail of the CIPSContact location for any location
displayed.
BZDATETIME::2009-11-06 14:53:31
BZCOMMENTOR::Bob Kline
BZCOMMENT::19
(In reply to comment #17)
> (In reply to comment #14)
> > In the GeneticsProfessional data (the current vendor output)
there are three
> > children of the name element:
> > <NAME>
> > <SNAME>
> > <FIRSTNAME>
> > <LASTNAME>
> > </NAME>
> >
> > Can anyone tell me how the SNAME element is being constructed
and if this
> > element is being used by Cancer.gov?
No idea. We've always just passed through what we get from CIAT without touching (or needing to understand) any of it.
BZDATETIME::2009-11-09 08:30:11
BZCOMMENTOR::Bob Kline
BZCOMMENT::20
If I had to guess I would say that SNAME probably stands for "short name" and would be constructed for matching the form of authors' names as they are cited in scholarly bibliographies, with everything reduced to initials except the surname.
BZDATETIME::2009-11-09 18:38:41
BZCOMMENTOR::Volker Englisch
BZCOMMENT::21
Is it correct that the CancerType/Typename within a FamilyCancerSyndrome is sorted as well as the CancerSite children within each CancerType?
I have copied the vendor and denormalization filter for the GeneticsProfessionals to FRANCK at this point. The filter creates valid XML output but I still have to sort and consolidate the individual CancerType sections.
BZDATETIME::2009-11-10 17:52:08
BZCOMMENTOR::Volker Englisch
BZCOMMENT::22
I was able to finish the display of the FamilyCancerSyndrome sections
on MAHLER.
It doesn't look close to what is send to Gatekeeper but given the
limited MenuInformation on MAHLER that's probably expected.
I've used the document CDR19859 (Clark Robin) as my primary test case.
The following filter has been created:
CDR652508 - Copy XML for GeneticsProfessional
Please note that the new filters are not on FRANCK anymore since FRANCK had been refreshed today.
This is ready for review on MAHLER.
BZDATETIME::2009-11-12 15:35:43
BZCOMMENTOR::Volker Englisch
BZCOMMENT::23
Per request I'm attaching the XML output of the new GeneticsProfessional filter to this issue. This was created on MAHLER with data I prepared myself.
Attachment GenProf_19859_Vendor.xml has been added with description: Vendor Output for CDR19859
BZDATETIME::2009-11-12 15:59:45
BZCOMMENTOR::Volker Englisch
BZCOMMENT::24
(In reply to comment #16)
> We are going to maintain the email address only at the
SpecificEmail of the
> CIPContact.
Per out discussion at the status meeting the email address to be
picked up will not come from the address block with the CIPSContract
fragment but from the address block containing the
UsedFor = "Mailer"
block.
This will need to be modified in the vendor filter.
BZDATETIME::2009-11-12 16:04:37
BZCOMMENTOR::Bob Kline
BZCOMMENT::25
(In reply to comment #24)
> (In reply to comment #16)
> > We are going to maintain the email address only at the
SpecificEmail of the
> > CIPContact.
>
> Per out discussion at the status meeting the email address to be
picked up will
> not come from the address block with the CIPSContract fragment but
from the
> address block containing the
> UsedFor = "Mailer"
> block.
> This will need to be modified in the vendor filter.
To be more specific, you'll want to look in the block which has the NMTOKEN GPMailer in the NMTOKENS UsedFor attribute. The token 'GPMailer' might not be the only value in the attribute. It might, for example, but 'GP GPMailer' (or 'GPMailer GP') in the case where the information in the practice location block repeats information found in the tblMain table.
BZDATETIME::2009-11-16 17:48:54
BZCOMMENTOR::Volker Englisch
BZCOMMENT::26
(In reply to comment #25)
> To be more specific, you'll want to look in the block which has the
NMTOKEN
> GPMailer in the NMTOKENS UsedFor attribute. The token 'GPMailer'
might not be
> the only value in the attribute. It might, for example, but 'GP
GPMailer' (or
> 'GPMailer GP') in the case where the information in the practice
location
> block repeats information found in the tblMain table.
That's exactly what I meant to say. :-)
I've modified the vendor filter to pick up the email address from the address block marked with the UsedFor = "... GPMailer ..." attribute instead of the CIPSContact block.
BZDATETIME::2009-11-20 17:21:02
BZCOMMENTOR::Volker Englisch
BZCOMMENT::27
I've copied the modified vendor filters to FRANCK for testing but I will first need to modify the SELECT statements of the publishing document to pick up the appropriate Person documents for publishing.
I am guessing that the criteria for picking up a Person document to
be processed as a GeneticsProfessional is the "Include in Directory"
element.
Please let me know if this is not correct.
BZDATETIME::2009-11-20 17:43:40
BZCOMMENTOR::Bob Kline
BZCOMMENT::28
(In reply to comment #27)
> I've copied the modified vendor filters to FRANCK for testing but I
will first
> need to modify the SELECT statements of the publishing document to
pick up the
> appropriate Person documents for publishing.
>
> I am guessing that the criteria for picking up a Person document to
be
> processed as a GeneticsProfessional is the "Include in Directory"
element.
> Please let me know if this is not correct.
Right. There are two places in the schemas where the Include element can appear: be sure you're looking at the one in the GeneticsProfessionalDetails block (not the PhysicianDetails block).
BZDATETIME::2009-11-23 10:38:31
BZCOMMENTOR::Volker Englisch
BZCOMMENT::29
Bob, I am now picking up 499 person documents (on FRANCK) to be
published as GenProf documents. The last publishing job produced 535
documents.
I'm thinking you may have a few duplicates and possibly invalid
documents that could account for a lower number of person documents used
for GP.
What number of GP documents would you expect?
BZDATETIME::2009-11-23 11:13:04
BZCOMMENTOR::Bob Kline
BZCOMMENT::30
Let me do some digging.
BZDATETIME::2009-11-23 13:08:43
BZCOMMENTOR::Bob Kline
BZCOMMENT::31
I would have expected 497, not 499, because that's the number which have 'Include' for /AdministrativeInformation/Directory/Include under the GeneticsProfessionalDetails block in the query_term_pub table. The reason it's lower than 535 is that the conversion resulted in some invalid documents which CIAT plans to fix once the conversion is done on Bach (mostly non-US people, because of the funky way the genprof database shoehorned their addresses into a table structure which assumed US addresses).
BZDATETIME::2009-11-23 13:53:47
BZCOMMENTOR::Volker Englisch
BZCOMMENT::32
> I would have expected 497, not 499, because that's the number
which have
> 'Include' for /AdministrativeInformation/Directory/Include under
the
> GeneticsProfessionalDetails block in the query_term_pub table.
It looks like there are two documents that are blocked (CDR10700, CDR19862) and two documents for which the latest version in doc_version is not publishable and the latest publishable version doesn't include the 'Include' flag (CDR3766, CDR7295). Therefore, there will only be 495 documents created on FRANCK.
BZDATETIME::2009-11-23 16:54:18
BZCOMMENTOR::Volker Englisch
BZCOMMENT::33
I've modified the publishing document
CDR000178.xml
and installed it on FRANCK to run a publishing job.
The publishing job selected 495 document to be published of which 14
failed (see the failure report) due to missing data.
http://franck.nci.nih.gov/cgi-bin/cdr/PubStatus.py?id=6637&type=FilterFailure
The remaining 481 documents are valid but differ in many elements from the currently published documents.
I'll attach a sample for review.
BZDATETIME::2009-11-23 16:57:23
BZCOMMENTOR::Volker Englisch
BZCOMMENT::34
Attachment CDR556157.xml has been added with description: GP PJ LeMarbre - Old output
BZDATETIME::2009-11-23 16:58:40
BZCOMMENTOR::Volker Englisch
BZCOMMENT::35
Attachment CDR828.xml has been added with description: GP PJ LeMarbre - New output (FRANCK)
BZDATETIME::2009-11-24 09:22:14
BZCOMMENTOR::Bob Kline
BZCOMMENT::36
For the TYPE element, I think you can have the filter strip everything from the first occurrence of " (" onward. For SPECIALTY/BDCT I think you're only supposed to include that if the value is "Yes." As for the syndromes, it looks as if there are so many changes (beyond capitalization) that it might be good to get CIAT to confirm that they're intentionally making extensive changes to the cancer types and sites associated with the various syndromes.
BZDATETIME::2009-12-01 11:42:03
BZCOMMENTOR::Bob Kline
BZCOMMENT::37
William:
How's progress coming on the review of Volker's publication results? We can't go any further with #4522 until this task is wrapped up.
BZDATETIME::2009-12-01 12:19:34
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::38
(In reply to comment #37)
> William:
>
> How's progress coming on the review of Volker's publication
results? We can't
> go any further with #4522 until this task is wrapped up.
We will look at the results posted in comment #35 and post a comment this afternoon. I must have misunderstood what the next step was. I thought we were waiting for the publish preview to be completed before reviewing the results.
BZDATETIME::2009-12-01 13:26:14
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::39
(In reply to comment #35)
> Created an attachment (id=1824) [details]
> GP PJ LeMarbre - New output (FRANCK)
I have a few observations:
1. It looks like the Institution will display as "Regional Cancer Center". However, looking at the CDR document, I think it should display as "Waukesha Memorial Hospital Regional Cancer Center". "Regional Cancer Center" is the old data from the Gen Prof. database which currently appears on cancer.gov but it has been updated in the CDR to "Waukesha Memorial Hospital Regional Cancer Center". I am assuming you are using the CDR data in which case it should display what is in the CDR.
2. (This is just an FYI) The Postal code in the new output is "53188-". This is exactly how it is in the CDR and it was inherited from the Gen Prof Database. This will eventually be cleaned up. Currently on Cancer.gov, the zip code is displayed correctly without the hyphen. I am assuming this is as a result of the changes in you made in comment # 13.
3. It appears the “Cowden syndrome” was not picked up in your output. The tag that was supposed to contain it appears to be empty.
Everything else looks good and appears to be consistent with what is currently displayed on cancer.gov
BZDATETIME::2009-12-01 13:50:32
BZCOMMENTOR::Volker Englisch
BZCOMMENT::40
(In reply to comment #39)
> 3. It appears the “Cowden syndrome” was not picked up in your
output. The tag
> that was supposed to contain it appears to be empty.
The Menu information for this term doesn't list the DisplayName. That's whey the element is empty. I'm guessing this is fixed on BACH.
> Everything else looks good and appears to be consistent with
what
> is currently displayed on cancer.gov
This means it's OK that the terms displayed on Cancer.gov are different from what's picked up by the filter?
BZDATETIME::2009-12-01 14:33:08
BZCOMMENTOR::Margaret Beckwith
BZCOMMENT::41
There is a display name for Cowden syndrome menu information on Bach. But I don't completely understand your question Volker. The syndrome names are supposed to match exactly what is on Cancer.gov, but the associated cancer types and cancer sites have been updated a bit so may not match exactly. I asked you if that was okay, and you said it was. Am I confused here?
BZDATETIME::2009-12-01 14:55:22
BZCOMMENTOR::Volker Englisch
BZCOMMENT::42
(In reply to comment #41)
> I asked you if that was okay, and you said it was. Am I confused
here?
It is OK from a processing point of view. I just want to make sure that the information that is picked up by the filters matches the information that is expected. I can see from the output created that it is different but I'm unable to tell if it's "different/good" or "different/bad".
BZDATETIME::2009-12-01 15:12:19
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::43
(In reply to comment #42)
> (In reply to comment #41)
> > I asked you if that was okay, and you said it was. Am I
confused here?
>
> It is OK from a processing point of view. I just want to make sure
that the
> information that is picked up by the filters matches the
information that is
> expected. I can see from the output created that it is different
but I'm
> unable to tell if it's "different/good" or "different/bad".
Yes. Mary and I looked at them together and she said they matched correctly.
BZDATETIME::2009-12-01 18:10:27
BZCOMMENTOR::Volker Englisch
BZCOMMENT::44
(In reply to comment #36)
> For SPECIALTY/BDCT I think you're only
> supposed to include that if the value is "Yes."
Margaret, I'm wondering if this is correct because I do see "Board
certified = NO"
entries on Cancer.gov.
BZDATETIME::2009-12-01 19:51:36
BZCOMMENTOR::Volker Englisch
BZCOMMENT::45
(In reply to comment #39)
> 2. (This is just an FYI) The Postal code in the new output is
"53188-". This
> is exactly how it is in the CDR
Yes, that's what I'm doing. I merely display the PostalCode_ZIP as it's listed in the data.
I've updated the filters to address Bob's Comment #36 (a) and William's comment #39 (1).
The filters are on FRANCK.
BZDATETIME::2009-12-07 16:25:33
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::46
(In reply to comment #45)
> (In reply to comment #39)
> > 2. (This is just an FYI) The Postal code in the new output is
"53188-". This
> > is exactly how it is in the CDR
>
> Yes, that's what I'm doing. I merely display the PostalCode_ZIP as
it's listed
> in the data.
>
> I've updated the filters to address Bob's Comment #36 (a) and
William's comment
> #39 (1).
>
> The filters are on FRANCK.
Can I use the Filter Document report to test other GP documents?
BZDATETIME::2009-12-07 16:27:52
BZCOMMENTOR::Volker Englisch
BZCOMMENT::47
(In reply to comment #46)
> Can I use the Filter Document report to test other GP
documents?
Yes.
BZDATETIME::2009-12-08 15:17:45
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::48
(In reply to comment #47)
> (In reply to comment #46)
> > Can I use the Filter Document report to test other GP
documents?
>
>
> Yes.
Is there a way to test with data from Bach without refreshing Franck? I was able to filter three documents and saw a few missing values that may have been fixed on Bach. It looks like either testing on Bach or with Bach data is the appropriate thing to do at this point.
BZDATETIME::2009-12-08 15:23:43
BZCOMMENTOR::Volker Englisch
BZCOMMENT::49
> Is there a way to test with data from Bach without refreshing Franck?
No, you can only filter documents that are on the local server with this interface.
BZDATETIME::2009-12-11 13:26:04
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::50
(In reply to comment #49)
> > Is there a way to test with data from Bach without refreshing
Franck?
>
> No, you can only filter documents that are on the local server with
this
> interface.
Mary and I QCed the filtered documents using the terminology report. She fixed a few things on Bach and everything appears to be fine at this point. We will QC again when all the Genetics Professional tasks are promoted to Bach. At this point everything else looks good.
BZDATETIME::2009-12-14 12:12:54
BZCOMMENTOR::Bob Kline
BZCOMMENT::51
Next step on this issue is for Volker to draft, get approval for,
and
send out a message to the licensees informing them that we will no
longer be
including Person documents in the weekly export, and asking them to let
us know
of any impact that might have on their operations.
BZDATETIME::2009-12-15 16:12:47
BZCOMMENTOR::Volker Englisch
BZCOMMENT::52
The notification to the licensees went out yesterday.
BZDATETIME::2009-12-21 12:16:33
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::53
The application submission form on cancer.gov is currently sent to the Lockheed/Aspensys email address. It should rather be sent to this new email address-GeneticsDirectory@cancer.gov. I am adding this comment so that this change will be included with the other cancer.gov changes for the Genetics Directory.
BZDATETIME::2009-12-21 12:22:22
BZCOMMENTOR::Volker Englisch
BZCOMMENT::54
William, did you mean to add this last comment to the vendor filter
issue?
There isn't any place in the Person data that uses this information,
right?
BZDATETIME::2009-12-21 12:45:34
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::55
(In reply to comment #54)
> William, did you mean to add this last comment to the vendor filter
issue?
> There isn't any place in the Person data that uses this
information, right?
Yes and you're right, this does not affect the vendor filter changes you are making and I probably should not have included it here but I already have the same comment in OCECDR-2847 where the email address also needs to be added to the mailer document. I probably may need to create another issue specifically for Genetics Directory cancer.gov issues. I will check with Margaret first.
BZDATETIME::2009-12-28 14:02:59
BZCOMMENTOR::Margaret Beckwith
BZCOMMENT::56
I don't think there is a need to create an issue for the Cancer.gov issues. I am setting up a meeting with CHristine to talk about several changes that need to be made to the application form on Cancer.gov, including the email change.
BZDATETIME::2009-12-31 16:41:30
BZCOMMENTOR::Volker Englisch
BZCOMMENT::57
I've modified the publishing document and ran a small publishing
export job (10 documents per document type) that successfully loaded on
the Gatekeeper test machine.
I ran this job without previously removing the existing GenProf document
which means that for these 10 documents the old and new output could now
be compared easily. You could do this on
http://wwwgk.cancer.gov/search/geneticsservices/
and search for the following names:
Joseph LeMarbre
John J. Mulvihill
David Ginsburg
David G. Mutch
Mary B. Daly
Robert D. Burk
Kathy J. Helzlsouer
David Smotkin
Kenneth Offit
For the new version you will see the Prof. Suffix displayed without
periods (MD) while the old version displays this data with a period
(M.D.)
The following documents have been updated on FRANCK:
CDR178.xml (Publishing document)
cdrpub.py - R9458
pdqCG.dtd - R9441
pdq.dtd - R9459
Next, we will refresh FRANCK and then run a before/after publishing job.
BZDATETIME::2010-01-04 13:14:47
BZCOMMENTOR::Margaret Beckwith
BZCOMMENT::58
My comments:
Data issues (I think)
1. On Kathy Helzlsoer record, email address looks like a personal email
instead of the general one on the old form. Is this what we want?
2. Some of the types of cancer are lower case and should have the menu
information fixed so that they match the others (e.g., hepatoblastoma,
islet cell, renal transitional).
3. Zip code on John Mulvihill record has dash after it.
4. Professional suffixes on John Mulvihill record only shows MD, not the
BS, BMS--is this intentional?
5. Address for Mary Daly missing Family Risk Assessment Program. Also,
typo in street address need to be fixed.
Other issues:
6. Syndrome names are missing from the table, even though there is an
empty cell, and the cancer types are present. Examples on John Mulvihill
record are Bloom syndrome; Carcinoid, familial; Cowden syndrome; Fanconi
anemia; Rothmund Thomson syndrome.
7. Two practice locations showing on new record;see Kenneth Offit
record, David Ginsburg record. It looks like part or all of the address
is being repeated.
This is what I found from comparing 5 records. I think it would be good for CIAT to take a look at some records like this to see if there are any other issues.
BZDATETIME::2010-01-04 13:22:19
BZCOMMENTOR::Volker Englisch
BZCOMMENT::59
I suggest for CIAT to double-check if these problems have already
been corrected on BACH.
Once I have refreshed FRANCK and reran the publishing job I would expect
many of these issues to go away.
BZDATETIME::2010-01-04 13:28:17
BZCOMMENTOR::Margaret Beckwith
BZCOMMENT::60
When are we going to refresh Franck? It doesn't really make sense for them to spend a lot of time comparing if most of these have been fixed.
BZDATETIME::2010-01-04 13:35:44
BZCOMMENTOR::Bob Kline
BZCOMMENT::61
(In reply to comment #60)
> When are we going to refresh Franck?
We're waiting for CIAT to finish the review of the test for issue #4725 on Franck (see comment #27 from that issue).
BZDATETIME::2010-01-04 16:32:39
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::62
(In reply to comment #61)
> (In reply to comment #60)
> > When are we going to refresh Franck?
>
> We're waiting for CIAT to finish the review of the test for issue
#4725 on
> Franck (see comment #27 from that issue).
I have looked at most of the records and found similar issues identified by Margaret above. We had also identified a lot of the data issues and are waiting for conversion on Bach to fix them. With regards to the Syndrome issues, we have done a lot of testing in other issues and have fixed all the problems we found on Bach so I am pretty sure the errors we are find here have been fixed on Bach. However, it will good for us to do additional testing when conversion is completed on Bach.
Volker:
I believe it is OK to refresh Franck at this point. OCECDR-3049 is ready
to be promoted to Bach.
BZDATETIME::2010-01-05 17:14:38
BZCOMMENTOR::Volker Englisch
BZCOMMENT::63
CDR database on FRANCK has been refreshed.
BZDATETIME::2010-01-07 11:19:33
BZCOMMENTOR::Bob Kline
BZCOMMENT::64
A fresh batch of converted GP documents is on Franck, ready for the next round of publication tests.
BZDATETIME::2010-01-11 14:58:17
BZCOMMENTOR::Volker Englisch
BZCOMMENT::65
I ran a publishing job on FRANCK using the updated data. The updated
documents are now available for review on
http://wwwgk.cancer.gov/search/search_geneticsservices.aspx
14 documents failed validation:
http://franck.nci.nih.gov/cgi-bin/cdr/PubStatus.py?id=6774&type=FilterFailure
The publish-preview is also working no FRANCK.
The data is ready for review on FRANCK.
BZDATETIME::2010-01-12 10:29:13
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::66
(In reply to comment #65)
> I ran a publishing job on FRANCK using the updated data. The
updated documents
> are now available for review on
> http://wwwgk.cancer.gov/search/search_geneticsservices.aspx
>
> 14 documents failed validation:
>
I looked at some of the records that failed validation. I will have some of them fixed so that you can run another publication job to see if the problems go away.
> http://franck.nci.nih.gov/cgi-bin/cdr/PubStatus.py?id=6774&type=FilterFailure
>
> The publish-preview is also working no FRANCK.
>
> The data is ready for review on FRANCK.
I have looked at some of the records and the problems I found were all data entry problems which we will eventually fix on Bach. I also, found that the issue with Carcinoid Syndrome is still there. Mary and I are looking into fixing that problem both on Franck and Bach soon.
Volker:
I have two questions for you.
In what order are the addresses being displayed (in case of multiple locations)? Whichever comes first (In the CDR Record) as long as they are GP locations? The answer to this question will be helpful when creating a new record or when updating/replacing/adding it.
Also, I saw that for Jessica Y. Adcock, MS record, the organization document (389491) is Inactive and blocked and yet it is displayed (published). Is that a problem or you are already aware of this? In case, you are wondering why we have it in the record, we intend to unblock some, if not all, of the blocked records that are currently linked to GP records. It is part of the cleanup we need to do.
BZDATETIME::2010-01-12 11:05:24
BZCOMMENTOR::Volker Englisch
BZCOMMENT::67
(In reply to comment #66)
> In what order are the addresses being displayed (in case of
multiple
> locations)? Whichever comes first (In the CDR Record) as long as
they are GP
> locations?
The filter doesn't specify any order therefore the XML definition is displaying the addresses in document order.
> Also, I saw that for Jessica Y. Adcock, MS record, the
organization document
> (389491) is Inactive and blocked and yet it is displayed
(published).
> Is that a problem
As you can see it's not a problem in terms of publishing the
record.
If this record is meant to be excluded from publishing I don't think
I've seen a requirement for that type of restriction. The filters are
currently processing everything that's flagged with 'Directory =
Include'.
BZDATETIME::2010-01-12 17:15:26
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::68
(In reply to comment #66)
> (In reply to comment #65)
> > I ran a publishing job on FRANCK using the updated data. The
updated documents
> > are now available for review on
> > http://wwwgk.cancer.gov/search/search_geneticsservices.aspx
> >
> > 14 documents failed validation:
> >
>
> I looked at some of the records that failed validation. I will have
some of
> them fixed so that you can run another publication job to see if
the problems
> go away.
>
Please run another publishing job for the failed documents only (if
that is possible).
Meanwhile, I also have a few questions:
1. For documents 663436 and 663103, they appear to have failed because
of the Generational Suffixes they have. Is this correct?
2. For documents 412089, 404148, 269800 etc, they appear to have failed because of the Home Address block in their records. Is this correct? It looks like we need the home address in their records for Board Members.
3. It also appears that professional suffix is required, right? That may have been the reason for the failure of 360777, 330334, 271212 etc.
BZDATETIME::2010-01-13 11:55:29
BZCOMMENTOR::Volker Englisch
BZCOMMENT::69
(In reply to comment #68)
> Please run another publishing job for the failed documents only
Done.
> 1. For documents 663436 and 663103, they appear to have failed
because of the
> Generational Suffixes they have. Is this correct?
Correct. This was a bug in the filter which has been corrected.
> 2. For documents 412089, 404148, 269800 etc, they appear to have
failed
> because of the Home Address block in their records. Is this
correct?
Correct. This was a bug in the filter which has been corrected.
> 3. It also appears that professional suffix is required, right?
Correct. The DTD defines the DEGREE element as 'Item can appear one or more times', so it is required.
Those formerly failed documents have been pushed to GatekeeperGK and are ready for review.
BZDATETIME::2010-01-15 11:25:19
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::70
We have completed the QA for the Gen Prof. documents and everything
looks good now as far as Franck is concerned. It looks like Margaret
will take a final look before this is promoted.
Thank you!
BZDATETIME::2010-01-21 12:19:37
BZCOMMENTOR::Margaret Beckwith
BZCOMMENT::71
I checked the same 5 records I had looked at before in PUblish Preview. The problem with the syndromes not showing up correctly has been fixed, but all other data issues are still there. I understand that CIAT is going to fix them after conversion. They will need to look at every record I think. I do have a question about the dash after the 5 digit zip code. Is that really in the data?
BZDATETIME::2010-01-21 16:26:28
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::72
In the CDR Meeting today, we agreed to do the conversion of the Gen Prof documents next week. However, the publication scripts should not be installed on Bach until a later date when the entire data cleanup is completed.
BZDATETIME::2010-01-28 13:44:30
BZCOMMENTOR::Bob Kline
BZCOMMENT::73
According to William, the vendor output filter is putting out LegacyData elements, which aren't permitted by the DTD. This is causing publish preview to fail (and of course, will cause real publication events to fail).
BZDATETIME::2010-01-28 17:22:43
BZCOMMENTOR::Volker Englisch
BZCOMMENT::74
(In reply to comment #73)
> According to William, the vendor output filter is putting out
LegacyData
> elements, which aren't permitted by the DTD.
Publish preview worked last night and William had closed the PP
issue.
Do we have an example document?
My training is over tomorrow around lunch time and I'll have a look at
the issue at that time.
BZDATETIME::2010-02-01 16:51:36
BZCOMMENTOR::Volker Englisch
BZCOMMENT::75
(In reply to comment #73)
> According to William, the vendor output filter is putting out
LegacyData
> elements, which aren't permitted by the DTD.
William, if you could please give me additional information (i.e. a document with a problem). Since publishing ran successfully on FRANCK with data from BACH I'm not sure were we're having a problem.
BZDATETIME::2010-02-01 17:03:40
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::76
(In reply to comment #75)
> (In reply to comment #73)
> > According to William, the vendor output filter is putting out
LegacyData
> > elements, which aren't permitted by the DTD.
>
> William, if you could please give me additional information (i.e. a
document
> with a problem). Since publishing ran successfully on FRANCK with
data from
> BACH I'm not sure were we're having a problem.
Example - CDR0000663575 on Franck.
The problem has to do with making the ID element of the Legacy data block required for new documents. The new documents do not have any legacy data. When their mailers are generated, we may include their username and password in the Legacy Data block. But until then we do not have any information for them to put in the Legacy Data block.
Also, I closed the publish preview issue because I wanted to open another issue for enhancement and other problems we find while fixing the errors on Bach. Should I re-open it instead?
BZDATETIME::2010-02-01 17:43:25
BZCOMMENTOR::Volker Englisch
BZCOMMENT::77
(In reply to comment #76)
> The problem has to do with making the ID element of the Legacy data
block
> required for new documents. The new documents do not have any
legacy data.
> When their mailers are generated, we may include their username
and
> password in the Legacy Data block. But until then we do not have
any
> information for them to put in the Legacy Data block.
This explanation of the vendor filter behavior is correct but is
opposite to what was reported in comment #73.
I'm assuming that the explanation from comment #76 describes the current
behavior, namely to output a "legacy" element because it is required by
the DTD.
It appears that we never tested publishing a new document using the updated filters. New documents are created without the mandatory ID element at this time and that will need to be fixed.
> Also, I closed the publish preview issue because I wanted to
open another
> issue for enhancement and other problems we find while fixing the
errors
> on Bach.
> Should I re-open it instead?
No, for enhancements please open a new issue.
BZDATETIME::2010-02-01 18:02:24
BZCOMMENTOR::Volker Englisch
BZCOMMENT::78
The vendor filter has been fixed on MAHLER and FRANCK:
CDR559215.xml - R9482: Vendor Filter: GeneticsProfessional
This is ready for review on FRANCK.
BZDATETIME::2010-02-08 21:10:09
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::79
(In reply to comment #78)
> The vendor filter has been fixed on MAHLER and FRANCK:
> CDR559215.xml - R9482: Vendor Filter: GeneticsProfessional
> This is ready for review on FRANCK.
It Works without a problem on Mahler (that is, when no Legacy
information present). However, on Franck, I am unable to get pub prev.
to work without the Legacy information. I get the following error:
"CDRPreview web service error: The element 'GENETICSPROFESSIONAL' has
invalid child element 'NAME'. List of possible elements expected:
'ID'.Validation error occurred when validating the instance
document.,44,2 "
When I add the Legacy Data, then I am able to get pub preview. Example – 665455 on Franck.
BZDATETIME::2010-02-09 01:37:21
BZCOMMENTOR::Volker Englisch
BZCOMMENT::80
(In reply to comment #79)
> (In reply to comment #78)
> It Works without a problem on Mahler (that is, when no Legacy
information
> present). However, on Franck, I am unable to get pub prev. to work
without the
> Legacy information.
It worked when the fix had been implemented on Feb. 1st. However, since then we had to have FRANCK refreshed to test OCECDR-3074 which reverted the filters back to the version on BACH.
BZDATETIME::2010-02-09 09:32:54
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::81
(In reply to comment #80)
> (In reply to comment #79)
> > (In reply to comment #78)
> > It Works without a problem on Mahler (that is, when no Legacy
information
> > present). However, on Franck, I am unable to get pub prev. to
work without the
> > Legacy information.
> It worked when the fix had been implemented on Feb. 1st. However,
since then
> we had to have FRANCK refreshed to test OCECDR-3074 which reverted
the filters
> back to the version on BACH.
You're right! I forgot about the recent Franck refresh. Please promote the fix on Mahler to Bach. Thanks!
BZDATETIME::2010-02-17 18:04:02
BZCOMMENTOR::Volker Englisch
BZCOMMENT::82
I've talked with Blair about this problem.
I had modified the filter to populate the ID field with the CDR-ID if a
LegacyID didn't existed but Bob suggested to check with Blair if we
could drop the ID element completely for those documents, which would
require a DTD change.
According to Blair the ID element is not used at all by Cancer.gov/Gatekeeper (but we don't know if this is true for the licensees). So we could possibly change the DTD and drop the ID element instead of artificially populating it.
BZDATETIME::2010-02-18 10:38:07
BZCOMMENTOR::Volker Englisch
BZCOMMENT::83
I have some notes from our last status meeting referring to missing elements from the vendor filter (TollFree Number, Service Limitation, Public=No).
William, could you please give me some more information on this?
BZDATETIME::2010-02-18 10:46:35
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::84
(In reply to comment #83)
> I have some notes from our last status meeting referring to missing
elements
> from the vendor filter (TollFree Number, Service Limitation,
Public=No).
>
> William, could you please give me some more information on
this?
Sure -
Sure -
1. Toll free numbers don’t show up in publish preview
2. Phone numbers show up regardless of their attributes (Public or
not)
3. Publish preview displays emails regardless of attributes (Public or
not)
4. Service Limitation not displaying in publish preview - Example
664951
I know what your next question will be. Provide sample documents :-). I will provide one example for each of the above issues soon.
BZDATETIME::2010-02-18 10:51:19
BZCOMMENTOR::Volker Englisch
BZCOMMENT::85
(In reply to comment #84)
> 1. Toll free numbers don’t show up in publish preview
We don't display a TollFreeNumber per se in the vendor output, just a phone number so I'm guessing we need a rule of when to display a TollFreeNumber and when not to. Which phone number should be displayed if multiple ones exist?
> 2. Phone numbers show up regardless of their attributes (Public
or not)
> 3. Publish preview displays emails regardless of attributes (Public
or not)
> 4. Service Limitation not displaying in publish preview - Example
664951
>
> I know what your next question will be. Provide sample documents
:-). I will
> provide one example for each of the above issues soon.
No, I don't need an example for 1-3 and you gave me one for item (4).
BZDATETIME::2010-02-19 12:45:49
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::86
Another vendor filter change that we may have to make is for the use of PrivatePractice locations. Some of the addresses we are seeing are private practices and not organizations. However, it looks like you may need to make changes to the vendor filter to be able to publish these to cancer.gov.
BZDATETIME::2010-02-23 20:46:12
BZCOMMENTOR::Volker Englisch
BZCOMMENT::87
(In reply to comment #84)
> 4. Service Limitation not displaying in publish preview - Example
664951
Service Limitation doesn't appear to be an element in the DTD.
I'm not really sure what this field maps to?
I've fixed the Public=No problem for the SpecificPhone and SpecificEmail fields and I'm now including the TollFreePhone.
BZDATETIME::2010-02-25 14:25:06
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::88
(In reply to comment #87)
> (In reply to comment #84)
> > 4. Service Limitation not displaying in publish preview -
Example 664951
>
> Service Limitation doesn't appear to be an element in the
DTD.
> I'm not really sure what this field maps to?
>
The Service Limitation element is referred to as "NOTES" in the DTD.
This matches the information in the record for CDR0000664951 and its
display on cancer.gov
http://www.cancer.gov/search/view_geneticspro.aspx?personid=556197
BZDATETIME::2010-02-25 15:09:19
BZCOMMENTOR::Volker Englisch
BZCOMMENT::89
(In reply to comment #84)
> 4. Service Limitation not displaying in publish preview - Example
664951
Fixed on MAHLER.
BZDATETIME::2010-02-25 17:42:08
BZCOMMENTOR::Volker Englisch
BZCOMMENT::90
(In reply to comment #86)
> Another vendor filter change that we may have to make is for the
use of
> PrivatePractice locations.
For the GP address the INSTITUTION field is mandatory but we don't
list an "institution" for the PrivatePractice.
How should we deal with this situation?
BZDATETIME::2010-02-26 13:01:01
BZCOMMENTOR::Volker Englisch
BZCOMMENT::91
I've modified the filter to allow PrivatePractice locations to be
displayed in the vendor output.
We still need to decide how to handle the mandatory Institution element
in the vendor output for those addresses.
BZDATETIME::2010-02-26 15:09:55
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::92
(In reply to comment #89)
> (In reply to comment #84)
> > 4. Service Limitation not displaying in publish preview -
Example 664951
>
> Fixed on MAHLER.
Please promote these changes to Bach as it is difficult to test with data on Mahler.
BZDATETIME::2010-02-26 18:42:24
BZCOMMENTOR::Volker Englisch
BZCOMMENT::93
(In reply to comment #92)
> Please promote these changes to Bach as it is difficult to test
with data on
> Mahler.
This is typically not a sufficient argument to test on a production
system.
However, the new changes only affect the GP documents which aren't being
published at the moment and the global module (CDR315588) had already
been promoted for OCECDR-2896 when it shouldn't have.
The following filter has been copied to BACH:
CDR559215 - R9505: Vendor Filter: GeneticsProfessional
BZDATETIME::2010-03-01 12:12:26
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::94
(In reply to comment #93)
> (In reply to comment #92)
> > Please promote these changes to Bach as it is difficult to
test with data on
> > Mahler.
>
> This is typically not a sufficient argument to test on a production
system.
> However, the new changes only affect the GP documents which aren't
being
> published at the moment and the global module (CDR315588) had
already been
> promoted for OCECDR-2896 when it shouldn't have.
>
> The following filter has been copied to BACH:
> CDR559215 - R9505: Vendor Filter: GeneticsProfessional
1. The emails issue has been solved
2. The Service Limitation issue has also been solved.
3. The private practice location is now displaying but phone numbers and
email addresses don’t display.
BZDATETIME::2010-03-02 15:47:31
BZCOMMENTOR::Volker Englisch
BZCOMMENT::95
(In reply to comment #94)
> 3. The private practice location is now displaying but phone
numbers and email
> addresses don’t display.
Could you give me a sample on BACH that I could look at? I tested it on MAHLER and the phone and email does display for the PrivatePractice location.
BZDATETIME::2010-03-02 15:50:23
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::96
(In reply to comment #95)
> (In reply to comment #94)
> > 3. The private practice location is now displaying but phone
numbers and email
> > addresses don’t display.
>
> Could you give me a sample on BACH that I could look at? I tested
it on MAHLER
> and the phone and email does display for the PrivatePractice
location.
Here is one - CDR0000664785
BZDATETIME::2010-03-02 17:36:11
BZCOMMENTOR::Volker Englisch
BZCOMMENT::97
The problem with the email and phone not showing for the
PrivatePractice locations has been fixed.
CDR559215 - R9505: Vendor Filter: GeneticsProfessional
This is ready for review on MAHLER.
BZDATETIME::2010-03-03 09:37:17
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::98
(In reply to comment #97)
> The problem with the email and phone not showing for the
PrivatePractice
> locations has been fixed.
> CDR559215 - R9505: Vendor Filter: GeneticsProfessional
>
> This is ready for review on MAHLER.
The phone number shows up but not the email address. Tested with 657322 on Mahler.
BZDATETIME::2010-03-03 09:42:32
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::99
(In reply to comment #98)
> (In reply to comment #97)
> > The problem with the email and phone not showing for the
PrivatePractice
> > locations has been fixed.
> > CDR559215 - R9505: Vendor Filter: GeneticsProfessional
> >
> > This is ready for review on MAHLER.
>
> The phone number shows up but not the email address. Tested with
657322 on
> Mahler.
I take it back. It shows up correctly. Sorry! The document had a validation error and once I fixed it, the email showed up. Please promote to Bach.
BZDATETIME::2010-03-04 15:11:07
BZCOMMENTOR::Volker Englisch
BZCOMMENT::100
Per discussion at today's status meeting, CIAT will enter a few new
GP Person documents on BACH and let me know when that has been
done.
The following day, after the nightly backup took a copy of the CDR, I
will refresh the CDR on FRANCK and run a publishing job to the
GatekeeperGK server.
The result of this publishing job should be very close to the final
documents to be published on BACH after the cleanup has been
completed.
BZDATETIME::2010-03-11 10:28:51
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::101
Please promote the current changes to Bach. Also, I just noticed that
(In reply to comment #99)
> (In reply to comment #98)
> I take it back. It shows up correctly. Sorry! The document had a
validation
> error and once I fixed it, the email showed up. Please promote to
Bach.
Please promote the above changes to Bach.
Also, I just noticed that pub preview fails because of the presence of the SpecificWebSite element. Can you modify the software to ignore this element or modify it such that it does not cause pub preview to fail? I know we do not display the web site info on cancer.gov but when the conversion was done, all the records with web site data, were converted into the SpecificWebSite element and we need to keep the history.
BZDATETIME::2010-03-11 10:38:45
BZCOMMENTOR::Volker Englisch
BZCOMMENT::102
(In reply to comment #101)
> Also, I just noticed that pub preview fails because of the presence
of the
> SpecificWebSite element. Can you modify the software to ignore this
element or
> modify it such that it does not cause pub preview to fail?
I made the changes to include the SpecificWebSite element yesterday because Bob needed the information for the mailers but I haven't finished the change to remove the element again from the vendor output.
BZDATETIME::2010-03-12 13:50:36
BZCOMMENTOR::Volker Englisch
BZCOMMENT::103
I've made changes to remove the SpecificFax/WebSite from the vendor output and I'm in the process of running a publishing job on FRANCK to diff the output.
BZDATETIME::2010-03-15 09:58:16
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::104
1.For Canadian addresses (will be true for UK and Australia addresses also), the Provinces are shortened to two letters like US states, for example, ‘ON’ for Ontario. Because it is on the same line as the zip code it looks as if it is part of the zip code. For example:
Cambridge Memorial Hospital
700 Coronation Boulevard
Cambridge, ON N1R 3G2
Canada
e-mail:
I think it will be good to spell out the provinces completely.
2.Also, whenever a professional does not have an email, there is an empty 'e-mail ' tag (like above). I think it will be good to also not display the e-mail tag when an email address is not provided.
BZDATETIME::2010-03-15 12:09:16
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::105
We have finished the Gen. Prof. Cleanup so we can proceed with the next steps.
BZDATETIME::2010-03-15 12:10:09
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::106
(In reply to comment #105)
> We have finished the Gen. Prof. Cleanup so we can proceed with the
next steps.
I forgot to mention that I have entered 3 new applications/records as well.
BZDATETIME::2010-03-15 12:12:06
BZCOMMENTOR::Volker Englisch
BZCOMMENT::107
(In reply to comment #105)
> we can proceed with the next steps.
The next step was to wait until tomorrow so that we can use tonight's backup file from BACH to refresh the CDR database on FRANCK and then run a GP publishing job.
BZDATETIME::2010-03-15 13:25:25
BZCOMMENTOR::Volker Englisch
BZCOMMENT::108
(In reply to comment #104)
> 1.For Canadian addresses
[...]
>
> Cambridge Memorial Hospital
> 700 Coronation Boulevard
> Cambridge, ON N1R 3G2
> Canada
> e-mail:
>
> I think it will be good to spell out the provinces completely.
I've browsed the web and it appears there are three commonly used
versions of how to list the City/Province/Postal code for Canadian
addresses
a) The "American" version
Cambridge, ON N1R 3G2
b) The "technically correct" version
Cambridge
ON, N1R 3G2
c) The "I-don't-know-what-to-call-it" version
Cambridge, ON
N1R 3G2
None of these version that I've found on the Internet have the province
spelled out.
In addition, all of our addresses are created using a global
template. This means that any change to the format of the address block
will be reflected on any other address throughout the CDR.
For a change of this magnitude I would prefer for Margaret or Lakshmi to
comment.
As an alternative I could create a PostalAddress template that's only used by GP documents.
BZDATETIME::2010-03-15 13:57:31
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::109
I just wanted to add that we are currently spelling it out on cancer.gov
http://www.cancer.gov/search/view_geneticspro.aspx?personid=556013
But when we publish them anew, they won't be spelled out anymore.
BZDATETIME::2010-03-15 14:07:18
BZCOMMENTOR::Volker Englisch
BZCOMMENT::110
(In reply to comment #109)
> I just wanted to add that we are currently spelling it out on
cancer.gov
That's correct. We also spell the city in capital letters and we won't be doing that anymore.
As I said, I would like Lakshmi and Margaret give guidance.
My personal preference is for consistency across all document types.
By the way, I checked with a Canadian native and she said she would
format the address this way:
Ms. John Doe
President
123 E. Kensington St
North Vancouver, BC
V17 1P2
CANADA
BZDATETIME::2010-03-15 14:08:40
BZCOMMENTOR::Volker Englisch
BZCOMMENT::111
(In reply to comment #104)
> 2.Also, whenever a professional does not have an email, there is an
empty
> 'e-mail ' tag (like above).
This has been fixed on MAHLER.
BZDATETIME::2010-03-15 14:52:55
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::112
(In reply to comment #111)
> (In reply to comment #104)
> > 2.Also, whenever a professional does not have an email, there
is an empty
> > 'e-mail ' tag (like above).
>
> This has been fixed on MAHLER.
Verified on Mahler. Please promote to Bach.
BZDATETIME::2010-03-15 15:41:15
BZCOMMENTOR::Volker Englisch
BZCOMMENT::113
The following filters have been copied to FRANCK and BACH:
CDR315588 - R9524: Module: Vendor Cleanup Templates
CDR559215 - R9524: Vendor Filter: GeneticsProfessional
I ran a diff report before on FRANCK and the only changes identified were the 72 term documents that were effected by the filter change in OCECDR-3102.
This is ready to be verified on BACH.
BZDATETIME::2010-03-16 16:22:30
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::114
(In reply to comment #113)
> The following filters have been copied to FRANCK and BACH:
> CDR315588 - R9524: Module: Vendor Cleanup Templates
> CDR559215 - R9524: Vendor Filter: GeneticsProfessional
>
> I ran a diff report before on FRANCK and the only changes
identified were the
> 72 term documents that were effected by the filter change in
OCECDR-3102.
>
> This is ready to be verified on BACH.
(In reply to comment #113)
> The following filters have been copied to FRANCK and BACH:
> CDR315588 - R9524: Module: Vendor Cleanup Templates
> CDR559215 - R9524: Vendor Filter: GeneticsProfessional
>
> I ran a diff report before on FRANCK and the only changes
identified were the
> 72 term documents that were effected by the filter change in
OCECDR-3102.
>
> This is ready to be verified on BACH.
Do you mean, I should verify the email changes reported in comment 112?
BZDATETIME::2010-03-17 10:12:03
BZCOMMENTOR::Volker Englisch
BZCOMMENT::115
At this point you can probably just wait until the publishing to GatekeeperGK finished at the end of the day and preview the result on the Cancer.gov test server.
BZDATETIME::2010-03-18 10:32:18
BZCOMMENTOR::Volker Englisch
BZCOMMENT::116
The latest GP data has been loaded to the GatekeeperGK test server
and is ready for review:
http://wwwgk.cancer.gov/search/geneticsservices/
I noticed that several GPs are listed double on the GK server. This
is due to the fact that we've submitted the GPs twice with different
CDR-IDs. Once from MAHLER (or old FRANCK) and now once from FRANCK
(refreshed data from BACH) with a different CDR-ID.
One can identify the latest one to be QC'ed as follows:
After searching for a name hover the mouse over the link to the person.
The link will display a 'personid'. The link with the higher personid
(a.k.a. CDR-ID) is the one most recently pushed to the GK server.
Example:
For Mary J. Ahrens there exist two links
http://wwwgk.cancer.gov/search/view_geneticspro.aspx?personid=663221
http://wwwgk.cancer.gov/search/view_geneticspro.aspx?personid=664784
The second one represents the document loaded last night and should
resemble the data of this Person's record in the CDR.
BZDATETIME::2010-03-19 14:59:05
BZCOMMENTOR::Volker Englisch
BZCOMMENT::117
I've submitted the latest Cancer.gov DTD to Blair. This DTD makes the ID element optional.
Attachment pdqCG.dtd has been added with description: Cancer.gov DTD
BZDATETIME::2010-03-23 11:07:38
BZCOMMENTOR::Volker Englisch
BZCOMMENT::118
FYI:
There were three new GP documents (not converted) for which we deleted
the LegacyGeneticsData block. Those documents were successfully
published to the GatekeeperGK test system.
BZDATETIME::2010-03-23 11:29:12
BZCOMMENTOR::Volker Englisch
BZCOMMENT::119
Please note that we haven't answered the question of comment #90 on what to display for the mandatory INSTITUTION element in the case of a PrivatePractice location.
BZDATETIME::2010-03-23 12:35:31
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::120
(In reply to comment #116)
> The latest GP data has been loaded to the GatekeeperGK test server
and is ready
> for review:
> http://wwwgk.cancer.gov/search/geneticsservices/
> the data of this Person's record in the CDR.
We have finished testing. We did not see any problems with the data so we are good to go.
There are two minor issues that we can address later:
1. A few records (probably no more than 2) have values in the <PersonTitle> element but this is not captured in the vendor filter. For example : 3766 & 664946.
2. There seems to be slight a discrepancy between pub preview and
what is on test site for at least one Canadian record- CDR0000664881.
The email and phone number are both public in the record. However, in
pub preview, the email does not display but the email and phone number
correctly display on the test site.
http://wwwgk.cancer.gov/search/view_geneticspro.aspx?personid=664881
BZDATETIME::2010-03-24 10:25:25
BZCOMMENTOR::Volker Englisch
BZCOMMENT::121
(In reply to comment #120)
> 1. A few records (probably no more than 2) have values in the
<PersonTitle>
> element but this is not captured in the vendor filter. For example
: 3766 &
> 664946.
For CDR3766 I don't see the PersonTitle listed in the record on
Cancer.gov or wwwGK.cancer.gov.
For CDR664946 a PersonTitle does exist but it had been included as part
of the address block since the DTD doesn't have any element for a person
title.
Is this what we would want to do with the PersonTitle element to stuff
it into the address block?
> 2. There seems to be slight a discrepancy between pub preview
and what is on
> test site for at least one Canadian record- CDR0000664881. The
email and phone
> number are both public in the record. However, in pub preview, the
email does
> not display but the email and phone number correctly display on the
test site.
> http://wwwgk.cancer.gov/search/view_geneticspro.aspx?personid=664881
Did you run PP on BACH or on FRANCK? The data on the wwwGK test site
has been created from FRANCK. Also, the vendor filters on FRANCK are not
up-to-date.
You should only compare the data on the test site with the data/reports
on FRANCK.
BZDATETIME::2010-03-24 11:14:53
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::122
(In reply to comment #121)
> (In reply to comment #120)
> > 1. A few records (probably no more than 2) have values in the
<PersonTitle>
> > element but this is not captured in the vendor filter. For
example : 3766 &
> > 664946.
>
> For CDR3766 I don't see the PersonTitle listed in the record on
Cancer.gov or
> wwwGK.cancer.gov.
> For CDR664946 a PersonTitle does exist but it had been included as
part of the
> address block since the DTD doesn't have any element for a person
title.
> Is this what we would want to do with the PersonTitle element to
stuff it into
> the address block?
>
Yes. We will want it displayed before the address information (Just as
it is in the CDR). We can discuss this later, as I said; only about 2
records were affected so I can include this any vendor filter changes we
make in the future.
> > 2. There seems to be slight a discrepancy between pub
preview and what is on
> > test site for at least one Canadian record- CDR0000664881. The
email and phone
> > number are both public in the record. However, in pub preview,
the email does
> > not display but the email and phone number correctly display
on the test site.
> > http://wwwgk.cancer.gov/search/view_geneticspro.aspx?personid=664881
>
> Did you run PP on BACH or on FRANCK? The data on the wwwGK test
site has been
> created from FRANCK. Also, the vendor filters on FRANCK are not
up-to-date.
> You should only compare the data on the test site with the
data/reports on
> FRANCK.
You're right. When I compared with pup preview on Franck, there was no discrepancy.
BZDATETIME::2010-03-24 13:47:10
BZCOMMENTOR::Volker Englisch
BZCOMMENT::123
It looks like everything is ready to go for the publishing of the GeneticsProfessional Persons on BACH.
We though it might be good to go over the steps to be done during our status meeting and then publish the GP documents after the nightly publishing finished on Thursday.
BZDATETIME::2010-03-25 19:12:21
BZCOMMENTOR::Volker Englisch
BZCOMMENT::124
FYI:
Regarding our discussion this afternoon if it's possible to process the
GP update and GP removal of the old documents at the same time this is
actually not possible. The publishing software doesn't allow to process
updates and removals at the same time when processing a
single document type.
Removals and updates are processed as part of the Export where the
removals are indicated by documents that are blocked.
BZDATETIME::2010-03-25 21:08:45
BZCOMMENTOR::Volker Englisch
BZCOMMENT::125
The following filter has been copied to BACH:
CDR559215 - R9553: Vendor Filter: GeneticsProfessional
The modified publishing document has been stored in the CDR. This
document allows the Person document type to be published as
GeneticsProfessional documents:
CDR178 - V51
The following program has been copied to BACH to allow Person
document (which were formerly suppressed) to be pushed to Cancer.gov as
GP documents:
cdrpub.py - R9458
I ran a Hotfix-Remove request for the old 535 document and stopped
publishing on Gatekeeper at the Preview stage.
Then I ran the Export-GeneticsProfessional publishing job to create the
new 538 Person/GP documents and also stopped publishing on Gatekeeper at
the Preview stage. Once the result of both job were verified I manually
pushed the remove request from the preview site to the live site and
then submitted the load of the new documents. There was a period of
about 5-8 minutes when there were no GP documents available on
Cancer.gov between both of these jobs.
Everything worked without a problem but I noticed that the GP names
are missing a space between the middle initial and the last name of a
person's name.
This had already been fixed on the Gatekeeper test server and I asked
Blair and Mini to have this change moved to production.
We should leave this issue open until the weekly publishing job finished properly.
BZDATETIME::2010-03-26 14:14:11
BZCOMMENTOR::Volker Englisch
BZCOMMENT::126
(In reply to comment #125)
> Everything worked without a problem but I noticed that the GP names
are missing
> a space between the middle initial and the last name of a person's
name.
FYI: The spacing problem on Cancer.gov has been fixed.
BZDATETIME::2010-04-01 10:06:19
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::127
I have looked at a lot of the published documents on cancer.gov and did not see any problems. However, when searching by the Family Cancer menu items only, some of them do not retrieve any documents but I am sure there are documents that have been assigned these syndromes.
Has this got to do with the way the term documents were set up? For example, on cancer.gov when you select Adenomatous polyposis, no documents are retrieved. Meanwhile, the name of the syndrome in the CDR is familial adenomatous polyposis (CDR0000042839) and the display name is Adenomatous polyposis, familial. The cancer type items appear to work fine.
BZDATETIME::2010-04-01 10:12:17
BZCOMMENTOR::Volker Englisch
BZCOMMENT::128
Is this something that was working with the test load on GatekeeperGK?
BZDATETIME::2010-04-01 10:18:15
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::129
(In reply to comment #128)
> Is this something that was working with the test load on
GatekeeperGK?
I do not remember testing this on GatekeeperGK. I concentrated more on data problems on GK and only searched by the names of the professionals.
BZDATETIME::2010-04-01 16:55:57
BZCOMMENTOR::Volker Englisch
BZCOMMENT::130
The latest problem with the search by syndrome names (when the names
had changed) has been resolved on Cancer.gov.
There was a table that is build on Gatekeeper including the new names
but that table did not update a similar table on Cancer.gov. The update
has been performed manually by Min.
BZDATETIME::2010-04-06 13:26:53
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::131
(In reply to comment #130)
> The latest problem with the search by syndrome names (when the
names had
> changed) has been resolved on Cancer.gov.
> There was a table that is build on Gatekeeper including the new
names but that
> table did not update a similar table on Cancer.gov. The update has
been
> performed manually by Min.
I checked this on cancer.gov and everything seems to be working well. It looks like we can close this issue, right?
BZDATETIME::2010-04-06 13:47:51
BZCOMMENTOR::Volker Englisch
BZCOMMENT::132
Margaret noticed one more minor problem with the display on Cancer.gov:
After the Additional Information at the end of some records there is no space between that label and the text. (see Daly, Mary as an example).
I'll have to report this to Blair to get fixed but I don't know at
this point if there should be a space or a newline after the
heading.
Do you know?
BZDATETIME::2010-04-06 14:19:11
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::133
(In reply to comment #132)
> Margaret noticed one more minor problem with the display on
Cancer.gov:
>
> After the Additional Information at the end of some records there
is no space
> between that label and the text. (see Daly, Mary as an
example).
>
> I'll have to report this to Blair to get fixed but I don't know at
this point
> if there should be a space or a newline after the heading.
> Do you know?
There's supposed to be a space. We have copies of some of the documents before conversion.
BZDATETIME::2010-04-08 15:58:05
BZCOMMENTOR::Volker Englisch
BZCOMMENT::134
All vendor filter changes have been taken care of.
The spacing issue will be addressed by the Cancer.gov team.
Closing issue.
File Name | Posted | User |
---|---|---|
CDR556157.xml | 2009-11-23 16:57:23 | Englisch, Volker (NIH/NCI) [C] |
CDR828.xml | 2009-11-23 16:58:40 | Englisch, Volker (NIH/NCI) [C] |
GenProf_19859_Vendor.xml | 2009-11-12 15:35:43 | Englisch, Volker (NIH/NCI) [C] |
pdqCG.dtd | 2010-03-19 14:59:05 | Englisch, Volker (NIH/NCI) [C] |
Elapsed: 0:00:00.001481