Issue Number | 2847 |
---|---|
Summary | [Genetics Directory] Bring Genetics Directory into the CDR |
Created | 2009-03-05 15:33:41 |
Issue Type | Improvement |
Submitted By | Beckwith, Margaret (NIH/NCI) [E] |
Assigned To | Kline, Bob (NIH/NCI) [C] |
Status | Closed |
Resolved | 2010-02-03 21:29:06 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/ocecdr/issue.107175 |
BZISSUE::4522
BZDATETIME::2009-03-05 15:33:41
BZCREATOR::Margaret Beckwith
BZASSIGNEE::Bob Kline
BZQACONTACT::Lakshmi Grama
The Genetics Professionals Directory is currently maintained on a system at Lockheed Martin. We have kept it there because the ability to send electronic mailers for updating the records is part of the system. Now that we have done several types of electronic mailers, and given the possible contract changes, we would like to bring the maintenance of those records into the CDR. This is a placeholder issue to get the discussion started.
BZDATETIME::2009-03-05 15:59:51
BZCOMMENTOR::Bob Kline
BZCOMMENT::1
Can you post any documentation you have for the existing system (including source code, if that's available)?
BZDATETIME::2009-03-09 15:28:00
BZCOMMENTOR::Margaret Beckwith
BZCOMMENT::2
I believe that all got posted when we discussed this previously and decided to just pass the file from CIAT through the CDR for publishing on cancer.gov because it would have required major changes on the Cancer.gov end. So we shoudl already have it.
BZDATETIME::2009-03-11 12:48:04
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::3
Margaret has requested the source code, system documentation,
design
documentation and user documentation from CIAT. CIAT will provide
these
documents as soon as possible.
BZDATETIME::2009-03-12 09:38:01
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::4
This document contains screen shots of the Genetics Professional Database.
Attachment Screenshots (2).doc has been added with description: Screenshots of Genetics Professional Database
BZDATETIME::2009-03-12 09:44:31
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::5
This document is labeled diagram. It has many MS Access files/documents which appear to be names of elements in the database. I could not open them on my computer so I don't know for sure what they are. I will ask the programmer who designed it for additional information if you need it.
Attachment Diagram.mdb has been added with description: Diagram
BZDATETIME::2009-03-12 09:46:47
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::6
This document provides a brief history of the Genetics Professional Database
Attachment CGDir History.doc has been added with description: History of the Genetics Professional Database
BZDATETIME::2009-03-16 11:30:15
BZCOMMENTOR::Bob Kline
BZCOMMENT::7
A helpful next step would be to take a look at a sample mailer and to start the process of establishing the specifications for the electronic mailers which should be generated from the CDR for these documents.
BZDATETIME::2009-04-23 14:15:36
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::8
I have attached a sample paper mailer. The page numbers do not appear to be in order after page 3. We will look into this and fix it.
Attachment Paper_Mailer.pdf has been added with description: Sample Paper Mailer
BZDATETIME::2009-04-23 14:20:06
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::9
I have attached a sample web/electronic mailer form printed from the link we send to the professionals.
Attachment Web_Mailer_Form.pdf has been added with description: Web Mailer Form
BZDATETIME::2009-04-23 14:23:12
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::10
This is the email with link to the electronic form that is sent to the professionals.
Attachment Web_Mailer_Email.pdf has been added with description: Email Sent to professionals
BZDATETIME::2009-04-23 14:25:03
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::11
This attachment shows what is received at CIAT after the professional submits the form.
Attachment Web Mailer_Submitted_Output_Email.pdf has been added with description: Received information from electronic submission
BZDATETIME::2009-05-12 11:03:46
BZCOMMENTOR::Margaret Beckwith
BZCOMMENT::12
Upped priority to P4.
BZDATETIME::2009-05-26 13:55:45
BZCOMMENTOR::Bob Kline
BZCOMMENT::13
Here is a high-level description of the next steps for this task:
1. Analyze the table structures currently used to store the GP data and determine what information needs to be captured which is not currently being represented in the XML documents exported from the CGSD to the CDR.
2. Determine what changes need to be made to the existing document type in the CDR, based on the analysis in step 1, as well as stylistic conventions (e.g., all of the element names in the GENETICSPROFESSIONAL document type are uppercase), and opportunities for closer integration with the other document types in the CDR (e.g., terminology, persons, organizations, geographic information, etc.). Part of this step will involve a decision about whether the Person document type should be enhanced so that it could be used to carry the GP information, or a separate document type be kept for the GP documents.
3. Create the new schema, along with CSS and other configuration files (e.g., templates, customization) for XMetaL.
4. Implement the software to import the GP documents into the CDR.
5. Decide whether to keep both mailer types for the GP documents (electronic and paper). If yes, design and implement the paper mailer software.
6. Decide which platform and technologies will be used to host the web interface for the electronic mailers. If this component is subject to the directive to build all new subsystems using Microsoft's ASP.Net, then we would probably have to host the electronic mailers on a Windows server. We may need to do an analysis which estimates what proportion of the code for these mailers would be common to the electronic S&P mailers and what part would be new code.
7. Design and implement the GP electronic mailer subsystem.
8. Review the screen shots attached by William to the issue, extracting requirements implied by the functionality illustrated there. Use the results of this step to come up with a set of reports and other features which need to be implemented.
What have I left out?
BZDATETIME::2009-05-26 15:54:44
BZCOMMENTOR::Volker Englisch
BZCOMMENT::14
(In reply to comment #13)
> What have I left out?
If we're modifying the schema (i.e. switching element names to mixed case) we'll need to modify the licensee DTD and make a few changes on the CIPSFTP server processing the licensee data and if we're including the GeneticsProfessionals data into the Person schema, Cancer.gov will have to start receiving and processing Person documents again.
BZDATETIME::2009-06-29 13:55:24
BZCOMMENTOR::Bob Kline
BZCOMMENT::15
Attachment gp-conv.html has been added with description: Proposal for data conversion, mapping
BZDATETIME::2009-06-30 09:52:16
BZCOMMENTOR::Bob Kline
BZCOMMENT::16
BZDATETIME::2009-07-20 12:09:10
BZCOMMENTOR::Bob Kline
BZCOMMENT::17
(In reply to comment #15)
> Created an attachment (id=1734) [details]
> Proposal for data conversion, mapping
We will be meeting Thursday morning to go over the proposal. Did you say Jonathan was interested in participating, Lakshmi? Or did he just want to be kept in the loop with progress reports?
BZDATETIME::2009-07-21 10:43:20
BZCOMMENTOR::Lakshmi Grama
BZCOMMENT::18
I want to have some initial discussions so Margaret and I can understand and ask questions. Once we have a good handle on the proposal, we will include Jonathan and Reza to discuss this and the Citation Management System next steps.
BZDATETIME::2009-07-31 11:06:38
BZCOMMENTOR::Bob Kline
BZCOMMENT::19
Attachment UniqueGPLocations.txt has been added with description: Unique locations in genprof DB
BZDATETIME::2009-07-31 11:11:48
BZCOMMENTOR::Bob Kline
BZCOMMENT::20
Attachment UniqueGPLocations.xls has been added with description: Unique location blocks in Excel sheet
BZDATETIME::2009-07-31 11:19:36
BZCOMMENTOR::Bob Kline
BZCOMMENT::21
Margaret:
Please give me a call at 703.979.6216 so we can wrap up our walk-through of the analysis paper.
Bob
BZDATETIME::2009-08-06 11:32:53
BZCOMMENTOR::Bob Kline
BZCOMMENT::22
In discussions with Margaret it was decided that the next step would be to extract all the unique location blocks from the genprof tables and match them up with org locations in the CDR. After examining the data I have come to the conclusion that the most effective way to convert the location information would be to have the postal address block and website stored in the organization document and the phone, fax, and email information stored in the Person document (in the Specific___ elements of the OtherPracticeLocation block). I have written software to extract the location information from the genprof tables (postal addresses and website urls, not the phone, fax, and email info) and create an Excel worksheet. Interspersed with the rows for the unique location blocks are matches I was able to find on Bach with organizations of the same name, with a row for each location block in those organizations. I've added a column at the end ("Map To"), which for the CDR location blocks I have populated with the CDR ID for the organization and fragment ID for the location withing the organization document. Although Lakshmi thought that most of the organizations in the database would not be found in the CDR, my inspection of a small sampling seems to indicate that we will find most of the orgs in the GP database are already in the CDR.
I will need someone at CIAT to go through and identify these organizations by putting the CDR ID for the organization document in the "Map To" column. If the organization has a location block which matches the address and website for the row (or, more accurately, if the address and website in the location block would be correct for the GP location), then the fragment identifier should be appended to the CDR ID, separated by a forward slash. Otherwise, just enter the CDR ID and a new location block will be added to the Organization document with the information from the GP location block. If no CDR document exists for the GP location row, then a new document can be created by CIAT, or the string "New" can be put in the "Map To" column and the software will create a new organization document. For the Canadian rows, the address information is scrambled (it looks like the developers didn't anticipate non-US addresses), so it would be best to have CIAT create new Organization documents for those. It would also be best to have the organization documents created for the organizations which are represented by multiple rows, with different address information for the same organization. Otherwise we'd need to come up with some way to indicate in the spreadsheet that only a single Organization document should be created for multiple GP rows, and we'd have to come up with a way to map variants which should be represented by a single location block in the organization document.
The intention is to have the conversion software use the information stored in the edited worksheet, so it's important to preserve the existing information in the rows other than the "Map To" column, and to use the format described here for the information entered into the last column. Feel free to use the column after the "Map To" column for your own internal notes as you're working on this (the software will ignore that column). Also, color in the worksheet cells carries no meaning usable by the software (it's just there to make the sheet easier to work with).
I've done the first few rows as examples. For the first row, even though the organization name didn't match the preferred name for the organization document, I found that CDR27544 was the same organization, and the address matched exactly, so I put CDR27544/F1 in the "Map To" column. The next row had an Organization document which looked like the same organization, but with a different street address. For the third row I found some orgs in Lafayette, and one of them was even named "Acadiana ..." (and might be related to the org in the GP row) but none of the addresses matched, so I put "New" in the "Map To" column. For the fourth row I found an exact match with CDR31900/F1, so that's what I put in "Map To" (using a different color since the organization document was blocked).
Let me know if you have any questions. Keep in mind that we're working against a tight deadline for getting the conversion done, so this task has a higher priority.
I expect you'll find some anomalies as we go along. For example, I noticed that CDR585150/_2 has Baystate Medical Center in Florida. Florida isn't the Bay State. That's one case where the GP database has the correct information and the CDR the incorrect information. There will probably instances where the roles are reversed.
Be sure to do any data cleanup and creation of new Organization documents in Bach; we'll clone Bach to Franck when we start testing.
If you think the genprof database is volatile enough, with frequent changes and additions to the data, let me know and we can talk about getting me a fresher copy of the DB and having me generate the worksheet again.
Attachment GPLocs.xls has been added with description: Orgs to be mapped
BZDATETIME::2009-08-07 10:50:00
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::23
Bob,
We have started identifying the organizations and mapping them according
to the instructions you provided. I know this is a high priority task
but do you have a deadline for completing this task?
BZDATETIME::2009-08-07 11:30:45
BZCOMMENTOR::Bob Kline
BZCOMMENT::24
We need to have the conversion completed by September 30.
BZDATETIME::2009-08-07 11:56:46
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::25
(In reply to comment #24)
> We need to have the conversion completed by September 30.
I am aware of that. I was asking when you need the spreadsheet back from CIAT to continue doing what you are doing.
Also, some of the addresses are not true organization addresses. They are for departments and faculties and are specific to the person. By converting the data this way, it looks like some of these department addresses may end up becoming the organization addresses. Would this be a problem ?
BZDATETIME::2009-08-07 14:53:10
BZCOMMENTOR::Bob Kline
BZCOMMENT::26
(In reply to comment #25)
> (In reply to comment #24)
> > We need to have the conversion completed by September
30.
>
> I am aware of that. I was asking when you need the spreadsheet back
from CIAT
> to continue doing what you are doing.
The 30th is the only firm deadline. Obviously the later I get the results for the org mapping the more frantic scrambling I'll to do on my end to get the rest done, but that's my problem. :-) The sooner we get the results of this pass the more opportunity we'll have for taking additional passes using fresher copies of the GP database. My plan is to use the information in the marked-up worksheet plus the fresher data (both in the GP database and in Bach) to produce a new sheet.
> Also, some of the addresses are not true organization addresses.
They are for
> departments and faculties and are specific to the person. By
converting the
> data this way, it looks like some of these department addresses may
end up
> becoming the organization addresses. Would this be a problem ?
My first question is: would an entire department or faculty really be specific to only a single individual person?
Next question is: what are the business rules CIAT uses to determine whether an entity needs to have its own Organization document? Can you give me representative examples (preferably drawn from the GP data) of cases where that would and would not be true?
After we nail down the answers to those questions we can talk about the best way to make sure the mapping comes out the way you think it should. For the really tricky cases, the best solution is probably to get the org/location information into the CDR the way you want it, and to plug the resulting doc ID/fragment ID pair into the worksheet. Does this make sense?
BZDATETIME::2009-08-10 12:22:38
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::27
(In reply to comment #26)
> (In reply to comment #25)
> > (In reply to comment #24)
> My first question is: would an entire department or faculty really
be specific
> to only a single individual person?
This is usually not the case in an institutional setting. But in the
database, this is a possibility and it is usually the case that when we
have a specific address (As against a general Organization address,
which usually would not include departments etc.), it is for an
individual in the database who wants mail sent to that specific
address.
> Next question is: what are the business rules CIAT uses to
determine whether >an
> entity needs to have its own Organization document?
> Can you give me
> representative examples (preferably drawn from the GP data) of
cases where that
> would and would not be true?
I think the mapping takes care of most of the rules we have in creating
new organizations in the CDR. However, it is just the issue of including
what may be specific person address information in the organization
record that appears not to be consistent with the rules.
Here are the business rules with some examples:
1. When we have same organization name in same city but different zip
codes. [Records 256 & 255 on the spreadsheet]. In this case, the
practice has been to create a new organization record for each record.
Assuming these records are already in the CDR, we would map them to
separate organization records and that would be consistent with the way
we create organization records. And if they are not the CDR, we would
enter 'New' for both and they would be created as new organization
records.
2. When we have differences in organization names but same address
information. [records 240 & 905 on the spreadsheet]. Just like # 1
this would not create any problems in terms of the mapping.
3. When we have same organizations but different location
addresses.
Same as above. This would not create any mapping problems.
Typically, organization addresses would not include suite numbers, room numbers, divisions, faculties and departments. However, because the genetic professional database did not have a separate organization document type, all this information are included in the spreadsheet as if they are organization addresses but some of them may be specific to the individual person record. Converting this data into the organization record will mean that, in the future, we will have to maintain the specific person information in the organization record.
BZDATETIME::2009-08-10 15:24:56
BZCOMMENTOR::Bob Kline
BZCOMMENT::28
(In reply to comment #27)
> ... a general Organization address, which usually would
not
> include departments etc. ....
Would it be safe to say that this rule is not applied consistently? I see plenty of examples in active Organization documents in the CDR in which 'Department' is part of the name of the organization or (even more common) one of its addresses. Here are some examples: CDR28003, CDR346510, CDR378030, CDR482216, CDR35250, CDR304650, CDR28422, CDR27276, CDR330185, CDR28888, CDR647448, CDR29700, CDR36562, etc.
> Here are the business rules with some examples [all calling for
creation
> of separate Organization documents]: ....
Again, would I be right in my observation that these rules are sometimes followed, and sometimes ignored? See, for example, the spreadsheet at http://bach.nci.nih.gov/MultiLocationOrgs.xls showing many examples of variations in address or even departmental structure of the organization, which – according to these rules – should have required creation of separate Organization documents.
> Typically, organization addresses would not include suite
numbers, room
> numbers, divisions, faculties and departments. However, because the
genetic
> professional database did not have a separate organization document
type, all
> this information are included in the spreadsheet as if they are
organization
> addresses but some of them may be specific to the individual person
record.
> Converting this data into the organization record will mean that,
in the
> future, we will have to maintain the specific person information in
the
> organization record.
All right, then how about if we use the SpecificPostalAddress element (and SpecificWebSite where needed) in every case for which CIAT doesn't give us a fragment ID for an existing organization location. Would there still be cases which would not be converted correctly?
Margaret:
Hope you had a very enjoyable vacation! Please feel free to weigh in on this discussion now that you're back.
BZDATETIME::2009-08-10 16:18:25
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::29
(In reply to comment #28)
> (In reply to comment #27)
> > ... a general Organization address, which usually would
not
> > include departments etc. ....
> Would it be safe to say that this rule is not applied consistently?
I see
> plenty of examples in active Organization documents in the CDR in
which
> 'Department' is part of the name of the organization or (even more
common) one
> of its addresses. Here are some examples: CDR28003, CDR346510,
CDR378030,
> CDR482216, CDR35250, CDR304650, CDR28422, CDR27276, CDR330185,
CDR28888,
> CDR647448, CDR29700, CDR36562, etc.
For relatively older records in the CDR, there are likely to be more records that do not follow the rules. For relatively newer records, there should be few exceptions. It all depends on when CIAT received directions from OCCM about this. On the other hand some of the records may be due to errors on the part of users but I also saw at least one of the records above which is a duplicate record, in which case it has been inactivated.
> > Here are the business rules with some examples [all calling
for creation
> > of separate Organization documents]: ....
> Again, would I be right in my observation that these rules are
sometimes
> followed, and sometimes ignored? See, for example, the spreadsheet
at
> http://bach.nci.nih.gov/MultiLocationOrgs.xls
showing many examples of
> variations in address or even departmental structure of the
organization, which
> – according to these rules – should have required creation of
separate
> Organization documents.
If I am right, the records in the link/document above are from the Genetics Professional Database. However, in the Genetics professional database, I don't think the rules are followed to the latter because there is no separate organization document type.
> All right, then how about if we use the SpecificPostalAddress
element (and
> SpecificWebSite where needed) in every case for which CIAT doesn't
give us a
> fragment ID for an existing organization location. Would there
still be cases
> which would not be converted correctly?
I believe this should be fine.
BZDATETIME::2009-08-10 16:57:43
BZCOMMENTOR::Bob Kline
BZCOMMENT::30
(In reply to comment #29)
> On the other hand some of the records may be due to errors on the
part of
> users but I also saw at least one of the records above which is a
duplicate
> record, in which case it has been inactivated.
All of the documents in the examples I gave you have 'A' in the active_status column of the all_docs table.
> If I am right, the records in the link/document above are from
the Genetics
> Professional Database. However, in the Genetics professional
database, I don't
> think the rules are followed to the latter because there is no
separate
> organization document type.
All of the information in that spreadsheet is from Organization documents in the CDR, not rows in the GP database.
BZDATETIME::2009-08-10 17:36:13
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::31
(In reply to comment #30)
> (In reply to comment #29)
> > On the other hand some of the records may be due to errors on
the part of
> > users but I also saw at least one of the records above which
is a duplicate
> > record, in which case it has been inactivated.
> All of the documents in the examples I gave you have 'A' in the
active_status
> column of the all_docs table.
It looks like the 'status' I am referring to is different from the one you mentioned above.
I was referring to CDR36562 above which has a status value of 'Inactive' with regards to the <Status> element in the org. record. I think it will show up as inactive in the table if it is blocked. In this case, the 'Inactive' status means we do not have to link it to any active protocol (unless we activate it).
BZDATETIME::2009-08-14 17:17:07
BZCOMMENTOR::Bob Kline
BZCOMMENT::32
Here's another cut at the org mapping sheet, following the suggestions made in yesterday's status meeting. I'm listing all of the CDR org documents whose official name starts with the institution name from the GP database. For each GP location row, if exactly one of those matching CDR documents has at least one location with the same country, state, and city (normalizing the names of these geographic entities as well as I can to take into account the different conventions) I plug the CDR ID into the "Map To" column. If there are multiple CDR documents with org names starting the same and same city, state, and country, and all but one of them is inactive (using either of the methods of indicating that an org is inactive) than I'll use that one. Otherwise, I refrain from putting anything in the "Map To" column. Please look through the list and let me know if (a) there are any in appropriate mappings generated this way and (b) this is close enough for CIAT to take it the rest of the way by hand. Perhaps CIAT would like to do some cleanup in the genprof database, give me a fresh snapshot, and have me generate the sheet again.
Attachment GPLocations-4.xls has been added with description: Orgs to be mapped, take 2
BZDATETIME::2009-08-17 12:30:21
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::33
(In reply to comment #29)
> (In reply to comment #28)
> > (In reply to comment #27)
> > All right, then how about if we use the
SpecificPostalAddress element (and
> > SpecificWebSite where needed) in every case for which CIAT
doesn't give us a
> > fragment ID for an existing organization location. Would there
still be cases
> > which would not be converted correctly?
> I believe this should be fine.
Could you please clarify whether the above remains unchanged or not? We talked about dropping the fragments last Thursday but I am not sure how this will affect the mapping.
BZDATETIME::2009-08-17 12:40:41
BZCOMMENTOR::Bob Kline
BZCOMMENT::34
(In reply to comment #33)
> (In reply to comment #29)
> > (In reply to comment #28)
> > > (In reply to comment #27)
>
> > > All right, then how about if we use the
SpecificPostalAddress element (and
> > > SpecificWebSite where needed) in every case for which
CIAT doesn't give us a
> > > fragment ID for an existing organization location. Would
there still be cases
> > > which would not be converted correctly?
> > I believe this should be fine.
>
> Could you please clarify whether the above remains unchanged or
not? We talked
> about dropping the fragments last Thursday but I am not sure how
this will
> affect the mapping.
We agreed that we will store all of the contact information in the person document, except that the identification of the institution will be a link to an actual Organization document (but not to a specific location fragment). You just need to make sure that appropriate cleanup of the genprof database is performed so that when that contact information is pulled into the person documents from the database you won't have incorrect addresses.
BZDATETIME::2009-08-17 12:56:00
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::35
(In reply to comment #34)
> (In reply to comment #33)
> > (In reply to comment #29)
> > > (In reply to comment #28)
> > > > (In reply to comment #27)
> We agreed that we will store all of the contact
information in the person
> document, except that the identification of the institution will be
a link to
> an actual Organization document (but not to a specific location
fragment). You
> just need to make sure that appropriate cleanup of the genprof
database is
> performed so that when that contact information is pulled into the
person
> documents from the database you won't have incorrect addresses.
Thank You!
One more clarification needed. On the spreadsheet, would you be using the highlighted or the ones marked 'C' for anything? That is, we should not be changing the "Map To" field for those, right?
BZDATETIME::2009-08-17 13:13:44
BZCOMMENTOR::Bob Kline
BZCOMMENT::36
(In reply to comment #35)
> One more clarification needed. On the spreadsheet, would you be
using the
> highlighted or the ones marked 'C' for anything? That is, we should
not be
> changing the "Map To" field for those, right?
Correct. Those are just there to help you decide what (if anything) to put in the rows marked 'G'.
BZDATETIME::2009-08-19 19:35:15
BZCOMMENTOR::Bob Kline
BZCOMMENT::37
When Margaret and I discussed section 3.2.3.8 of the conversion paper, I believe she decided that:
1. While in the long term we want to eliminate the "Cancer Type"
groupings,
we will need to keep them until Cancer.gov can accommodate revisions
in
the GP documents.
2. We will use plain text content for the "Cancer Type" strings, and
links
to Term documents (the conversion paper had suggested the reverse
of
this approach).
3. We might want to use "xxx cancer" Term documents for at least some
of
the "sites" where the GP database has "xxx".
The attached sheet lists the cancer "sites" (grouped by "cancer type") with mappings to matching Term documents (3rd column) or Term documents which match if " cancer" is appended to the site string from the GP database.
Can I get mappings for the rows that don't have a CDR ID in either of the last two columns (and confirmation of #3 above)? If it's necessary to add missing Term documents to the CDR, please add them on Bach (that's where the sheet is getting the IDs it's seeded with).
Attachment GPCancerSites-1.xls has been added with description: Cancer 'sites' to be mapped
BZDATETIME::2009-08-20 08:12:02
BZCOMMENTOR::Bob Kline
BZCOMMENT::38
(In reply to comment #37)
> Created an attachment (id=1765) [details]
> Cancer 'sites' to be mapped
There's a bug in Excel which this workbook hits. The apostrophe is supposed to be an acceptable character in sheet names, but at least some versions of Excel reject such names. When you open the file in such a version of Excel, you'll get a dialog box saying parts of the file can't be read. If you tell Excel to go ahead and open the file anyway it will rename the sheet. The rest of the data is intact. Works fine in Open Office.
BZDATETIME::2009-08-20 08:50:22
BZCOMMENTOR::Bob Kline
BZCOMMENT::39
Here's the schema for the new HereditaryCancerSyndrome:
http://mahler.nci.nih.gov/cgi-bin/cdr/GetSchema.py?id=650192
Down the road, when Cancer.gov is ready to deal with GP documents without the unwanted SyndromeCancerType, we can replace the use of this document type with generic TermSet documents, which have everything we would need for the syndromes.
Please take a look at the schema and provide any feedback. Now is the time to decide whether you want to modify the data (and schema valid value list) for the cancer types to use a more consistent approach to capitalization.
BZDATETIME::2009-08-20 10:12:40
BZCOMMENTOR::Bob Kline
BZCOMMENT::40
I've done a test generation of the new syndrome documents. I used CDR IDs for "sites" for which I have a mapped Term document, and left out the cdr:ref attribute for the rest (most of them). I'll do this again when I get back the "site" mapping spreadsheet with the gaps filled in.
Please take a look:
BZDATETIME::2009-08-20 14:39:57
BZCOMMENTOR::Bob Kline
BZCOMMENT::41
Please fill in the CDR IDs for the Term documents (on Bach) to be used for the genetics conditions used for mapping syndromes from the genprof DB. I put in the one we looked up in this afternoon's meeting.
Attachment gp-syndromes.xls has been added with description: Mapping sheet for GP syndromes
BZDATETIME::2009-08-20 14:58:32
BZCOMMENTOR::Margaret Beckwith
BZCOMMENT::42
I will do a first cut at #1, 2, and 4 and then pass things on to CIAT. Here is a quick summary of what we discussed at our meeting that needs to be done:
1. Add new menu types to the schema: Genetics
Professionals-CancerType, Genetics ProfessionalsCancerSite,
Genetics Professionals-GeneticSyndrome. (I will add a new issue
for this.)
2. Map the list of cancer syndromes to the CDR records.
3. Add a term document for all syndromes not already in the CDR; use
Semantic Type of Genetic Condition; use menu information (with menu type
of Genetics Professional--GeneticSyndrome) to make sure the display name
matches what is on Cancer.gov.
4. Map list of cancer sites to term records in the CDR.
5. Add RelatedTerm to each cancer site term record with a link to the
appropriate genetic syndrome and a RelatedTermType of Associated Genetic
Condition.
6. Add menu information to each cancer site term record; menu type of
GP-CancerType with the display name of the larger group (e.g.
Digestive/Gastrointestinal) and menu type of GP-CancerSite with the
display name matching the site name on Cancer.gov (e.g.
colon/rectum).
BZDATETIME::2009-08-20 16:52:45
BZCOMMENTOR::Bob Kline
BZCOMMENT::43
I talked with Blair and Min about the GP syndrome and cancer type information, and it appears that the data is maintained by the GateKeeper based on the information imported from the CDR, so there should be no problems created by changes to the strings. If that turns out to be wrong, it should be easy to tweak the menu information so that they go back to getting what they've always been getting.
BZDATETIME::2009-08-26 11:56:51
BZCOMMENTOR::Bob Kline
BZCOMMENT::44
I have generated a spreadsheet showing Person documents with surname values matching those in the genprof database. As with the organization mapping, I need for CIAT to pick which (if any) of the CDR documents below each GP row should be used for importing the GP information, by entering the CDR ID for the matching document in the "Map To" column of the "G" row, as I have done (as a sample) for Kathleen Blazer (if I'm wrong and these aren't the same Kathleen Blazer, please remove the ID from the Map To column). The number of usable existing Person documents may be higher than the original estimate, as some (such as Kathleen R. Blazer) have minor variations in the person's name which caused the original sweep not to pick them up.
It's possible that additional matches might be found by some creative searching for variations in surname spellings, or even maiden-name/married-name/hyphenated-name variations, but I doubt that the effort would be justified by the very small number of mappings that would likely be added to the list.
As soon as CIAT has filled in the "Map To" column I will generate another spreadsheet which has fuller address information for the mapped documents as well as the GP locations, and I will ask CIAT which (if any) of the GP locations need to be added as OtherPracticeLocation blocks to the existing documents. I expect that this approach would result in a more accurate determination for this question, and that it would be more cost-effective than having me write software to trying implement an algorithm sophisticated enough to make a judgment which a human can do quickly.
Attachment gp-match-candidates.xls has been added with description: Person documents with matching surnames
BZDATETIME::2009-08-26 16:46:31
BZCOMMENTOR::Bob Kline
BZCOMMENT::45
The new schema with changes needed for the conversion has been installed on Mahler.
I have started some preliminary testing of the conversion software ("preliminary" because there are a few things I'm still waiting for). I made up a handful of mappings to existing person documents while I'm waiting for the spreadsheet I posted earlier today to come back with the real mappings, so I could test the path of the software which merges GP information into existing Person documents. You can see the results at:
http://mahler.nci.nih.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2009-08-26_16-25-03
The remaining documents (the ones for which new documents were created) can be reviewed at http://mahler.nci.nih.gov/cgi-bin/cdr/ShowGPDocs.py .
Probably not a complete list, but some of the things I still need include:
[ ] org mappings (see comment #37 & attachment)
[ ] person mappings (see comment #44 & attachment)
[ ] locations to be dropped (see last paragraph of comment #44)
[ ] syndrome mappings (comment #41 and attachment)
[ ] answer from William on magic numbers for ContactBy
[ ] answer from Margaret on third value for CertificationStatus
For merging into existing documents, the analysis paper said we'd preserve the existing status. Can I get confirmation that this is the right thing to do? Also, we said we'd put the date from DateUpdated in the DateLastModified element, but for existing Person documents that sometimes results in an earlier date than the document had before conversion. What should the software do in this situation?
Finally, the only place I'm inserting the Public attribute is on the SpecificEmail element, when the PostEMailToWeb column in tblMain is 0. Are there any other elements which should get the attribute inserted by the conversion software?
BZDATETIME::2009-08-26 17:02:58
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::46
I am attaching the Org. mapping just in case you need it for testing. We are still doing QA of the mapping. All the rows have either CDR IDs or 'New' for the ones that need to be created. I will attach a final mapping when we are done with the QA and make this file obsolete. (The rows that have been highlighted red are only for me to track what has been QAed).
Attachment GPLocs.xls has been added with description: GP Org. Mapping from CIAT
BZDATETIME::2009-08-26 18:36:11
BZCOMMENTOR::Bob Kline
BZCOMMENT::47
(In reply to comment #46)
> Created an attachment (id=1769) [details]
> GP Org. Mapping from CIAT
>
> I am attaching the Org. mapping just in case you need it for
testing. We are
> still doing QA of the mapping. All the rows have either CDR IDs or
'New' for
> the ones that need to be created. I will attach a final mapping
when we are
> done with the QA and make this file obsolete. (The rows that have
been
> highlighted red are only for me to track what has been QAed).
Thanks, this will be useful. I'm a little puzzled, though, by the fact that you appear to be working from the obsolete version of the spreadsheet. As you may recall, Lakshmi asked (at the meeting on the 13th) for a revision of the sheet which picked up CDR org documents whose names started off the same as the name of the institution for the GP data, not just exact matches, so I generated a second version of the sheet and posted it with comment #32. marking the earlier sheet as obsolete. The sheet you're working from just has the docs for exact matches.
Also, I'm inclined to assume that, since I've received no takers on my offer to generate the sheet again from a fresher copy of the data, there have been few or no changes to the GP data since the snapshot I was given, but I thought I'd ask once again.
BZDATETIME::2009-08-27 09:46:01
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::48
(In reply to comment #47)
> (In reply to comment #46)
> > Created an attachment (id=1769) [details] [details]
> > GP Org. Mapping from CIAT
> >
> > I am attaching the Org. mapping just in case you need it for
testing. We are
> > still doing QA of the mapping. All the rows have either CDR
IDs or 'New' for
> > the ones that need to be created. I will attach a final
mapping when we are
> > done with the QA and make this file obsolete. (The rows that
have been
> > highlighted red are only for me to track what has been
QAed).
> Thanks, this will be useful. I'm a little puzzled, though, by the
fact that
> you appear to be working from the obsolete version of the
spreadsheet.
What happened was when the second spreadsheet was generated we had completed about a quarter of the mapping in the first spreadsheet already so we thought it would be better to continue with that instead of copying them over to the new spreadsheet or starting all over. I reference the new sheet from time to time when I need to.
> Also, I'm inclined to assume that, since I've received no takers
on my offer to
> generate the sheet again from a fresher copy of the data, there
have been few
> or no changes to the GP data since the snapshot I was given, but I
thought I'd
> ask once again.
You're right. Minimal changes were done in the GP database so it is probably not worth generating another spreedsheet.
BZDATETIME::2009-08-27 10:26:24
BZCOMMENTOR::Bob Kline
BZCOMMENT::49
(In reply to comment #22)
> The intention is to have the conversion software use the
information
> stored in the edited worksheet, so it's important to preserve
the
> existing information in the rows other than the "Map To" column,
....
The reason I wrote that "it's important to preserve the existing information in the rows other than the 'Map To' column" is that (because of design flaws in the genprof database) the only way I have of matching up the CDR ID you put in the "Map To" column is by matching the values in the other columns of the spreadsheet row with the values in the database columns. It doesn't look like these instructions were followed, as I can see that other columns of the spreadsheet have been modified. Let's talk at this afternoon's status meeting about what needs to happen.
In hindsight, I probably should have password protected the other columns to prevent tampering with the data. Live and learn. :-)
I will start creating the new Org document which will be needed.
BZDATETIME::2009-08-27 11:12:49
BZCOMMENTOR::Bob Kline
BZCOMMENT::50
(In reply to comment #49)
> I will start creating the new Org document which will be needed.
You can review the preliminary results of this task here:
BZDATETIME::2009-08-27 14:08:41
BZCOMMENTOR::Bob Kline
BZCOMMENT::51
(In reply to comment #45)
> For merging into existing documents, the analysis paper said
we'd preserve the
> existing status. Can I get confirmation that this is the right
thing to do?
MB: Go ahead and do this. We will report those merged documents where the status is 'Inactive' and CIAT will take a look at them.
> Also, we said we'd put the date from DateUpdated in the
DateLastModified
> element, but for existing Person documents that sometimes results
in an earlier
> date than the document had before conversion. What should the
software do in
> this situation?
MB: go with the most recent date.
> Finally, the only place I'm inserting the Public attribute is on
the
> SpecificEmail element, when the PostEMailToWeb column in tblMain is
0. Are
> there any other elements which should get the attribute inserted by
the
> conversion software?
MB: No.
BZDATETIME::2009-08-28 13:31:34
BZCOMMENTOR::Bob Kline
BZCOMMENT::52
Here is a revised sheet for the person mapping. As requested in yesterday's meeting, I have added rows for the unique location blocks in the genprof database. For the 'G' rows, put in the "Map To" column the CDR ID of the Person document into which the GP data should be merged (where such a document already exists). For the location blocks which should be skipped because they already exist in the matching Person document into which the GP data will be merged, enter the fragment ID for the location in the Person document into the "Map To" column (obviously it makes no sense to put fragment IDs in that column for GPs that don't already have an existing Person document).
Rows in red mean the given name also matches the First_Name column of the GP record (but doesn't necessarily mean it's the same person).
Attachment GPMatchCandidates.xls has been added with description: GP mapping spreadsheet (take 2)
BZDATETIME::2009-08-28 14:33:07
BZCOMMENTOR::Bob Kline
BZCOMMENT::53
A new requirement was identified in yesterday's status meeting, to identify which location blocks are associated with the person's activity as genetics professional. Several approaches to meeting this requirement were discussed:
1. Add a repeating element to the GeneticsProfessionalDetails
block
containing the fragment ID for a location block which is
associated
with the person as a GP (similar to the CIPSContact element).
2. Same as #1, but using an attribute (with space-delimited
fragment
ID values to identify the associated location blocks) instead of
a repeating element.
3. Add a repeating element to the ContactDetail block, each
occurrence
of which would name a context (such as "GP") in which the contact
block is relevant.
4. Same as #3, but using an attribute (with space-delimited
values)
to identify the contexts associated with the block.
I misspoke yesterday, referring to the type of the attribute for the fourth option as "IDREFS"; IDREFS would be the type of the attribute for the second option, but for the last option the attribute type would be NMTOKENS (since there would be no IDs elsewhere in the document to which the token values in the attribute would link).
Lakshmi expressed a preference for the fourth option, and as things stand now I plan to implement the software and change the schema to use that option. There are some aspects of this approach which introduce considerations not associated with the most closely-related other option (#3):
1. Using an attribute instead of a separate element for each
value
means the users will not have a picklist from which to select
values; Alan pointed this out at the meeting yesterday, and
Lakshmi decided that this is not a serious deterrent to using
the attribute-based approach.
2. We do not have out-of-the-box support for validation of
NMTOKENS values (other than determining that the syntax is
correct). The choices we have for dealing with this include:
a. Implement NMTOKEN value validation support as part of our
schema package
Doing this before we convert the genprof database might
jeopardize our schedule, though we could use another
interim approach and implement this support later on.
b. Replace our custom schema validation package with a third
party schema validator which supports validation of
NMTOKEN values
See comment for (a) above.
c. Don't bother validating the values
Not my favorite choice; risks missing export of location
blocks which should be published.
d. Use a single-valued enumeration list, taking advantage of
the fact that we only have one use case for this attribute
right now, kicking the problem further down the road
Again, not my favorite, though we might consider this as
a stopgap measure in combination with (a) or (b) above (or
perhaps (e) or (f) below).
e. Use a custom validation rule
This could be done using our XSL/T-based custom validation
engine, though the code could get exponentially more complex
as more values are added.
f. Same as (e) above, but in conjunction with a new server
function which takes two arguments: the first argument is
the value of the attribute from the document being validated,
and the second argument is a space-delimited set of valid
tokens; the function determines whether all of the tokens
in the first argument are represented in the set of tokens
found in the second argument.
This is the approach I'm planning to take, as it's much more
straightforward to implement this kind of set membership in
C++ than in XSL/T (and more efficient, too).
g. Use an enumerated list to specify all of the possible
possible orders for all of the possible legal values
(treating the value as a string instead of as NMTOKENS)
Too ugly to even think about.
3. Using an NMTOKEN attribute makes the publishing filter code
to
decide which location blocks to export somewhat trickier.
The approach I recommend Volker use to handle this problem is
to wrap spaces around the attribute value and test for the
presence
of the string ' GP ' within that wrapped value; something like:
<xsl:if test='contains(concat(" ", @Contexts, " "), " GP
")'>
<xsl:apply-templates select='.'/>
</xsl:if>
Please review my description above and let me know if you find anything I've missed or mis-represented, or if you disagree with my recommendations. Also, let me know if you have strong preferences for names other than "Contexts" (for the name of the new attribute) or "GP" (for the name of the value we'll use to represent the current use case). I'll make the necessary change to the Person schema and start implementing the new server function after I've heard back from everyone.
BZDATETIME::2009-08-28 15:04:36
BZCOMMENTOR::Bob Kline
BZCOMMENT::54
The previous comment (#53) was truncated and Bugzilla failed to send out the notification emails that the comment was posted. My best guess as to the culprit would be the intrusion detection tool (IPD), but I haven't had a chance to test this theory with John (whom I've added to the CC list for this issue).
I went into Bugzilla and did some surgery to replace the truncated comment with the full version. Please go into that comment and review my analysis and recommendations. I'll hold off doing any further work on this part until I hear back from folks.
Thanks!
[I created a separate issue (#4631) for the Bugzilla/IDP failure.]
BZDATETIME::2009-08-28 21:19:12
BZCOMMENTOR::Alan Meyer
BZCOMMENT::55
(In reply to comment #53)
I think your analysis summarizes the options and the
consequences
very well.
My own personal choices for what to do are different.
I like option 3 above (Add a repeating element to the
ContactDetail...) if we want flexibility, option d if we want the
simplicity that Lakshmi favors.
Neither one requires any programming at all.
Alternatively, to make d even simpler, we could change it to an
attribute like:
IncludeInGenProfDirectory="Y"
or maybe:
GenDir="Y"
with a default value of "N".
This has no generality. If we have another inclusion to
consider
we would add another attribute. However, Lakshmi is thinking
that we won't have any more requirements like this in the future.
However, I am not opposed to your proposal if we
think there
could be other uses for validating NMTOKENS. Extra programming
is bad, but extending the generalized capabilities of the system
is good and may justify it.
BZDATETIME::2009-08-31 12:06:52
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::56
There were two questions last Thursday about address
information:
Margaret wanted to know which location address is used for the mailers.
The CONTACT ADDRESS, which is the first address in the display, is used
for the mailers. Additional addresses in the PRACTICE ADDRESS forms of
the display are also printed on the mailers for the professionals to
make corrections.
There was also a question about whether we export all addresses and I
think the answer is yes. If you look at the record for Kim Ranieri, M.S.
on Cancer.gov
http://www.cancer.gov/search/view_geneticspro.aspx?personid=653656
all her addresses are listed.
Please let me know if this is what you wanted to know or you need additional information.
BZDATETIME::2009-08-31 13:06:09
BZCOMMENTOR::Bob Kline
BZCOMMENT::57
(In reply to comment #56)
> Margaret wanted to know which location address is used for the mailers.
Are there cases in which it would be necessary to use a different contact block for the GP mailer than the CIPSContact block?
BZDATETIME::2009-08-31 14:37:07
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::58
(In reply to comment #52)
> Created an attachment (id=1770) [details]
> For the location blocks which should be skipped because they
> already exist in the matching Person document into which the GP
data will be
> merged, enter the fragment ID for the location in the Person
document into the
> "Map To" column (obviously it makes no sense to put fragment IDs in
that column
> for GPs that don't already have an existing Person document).
For clarrification of the above instruction:
i. If we are able to find a CDR person document (for a given GP
person) and
ii. The CDR person document has a location that already exists in the GP
person record,
iii. Enter the Fragment ID in the 'Map To" column.
Is the above breakdown accurate?
BZDATETIME::2009-08-31 14:41:16
BZCOMMENTOR::Bob Kline
BZCOMMENT::59
(In reply to comment #58)
> (In reply to comment #52)
> > Created an attachment (id=1770) [details] [details]
> > For the location blocks which should be skipped because
they
> > already exist in the matching Person document into which the
GP data will be
> > merged, enter the fragment ID for the location in the Person
document into the
> > "Map To" column (obviously it makes no sense to put fragment
IDs in that column
> > for GPs that don't already have an existing Person
document).
>
> For clarrification of the above instruction:
>
> i. If we are able to find a CDR person document (for a given GP
person) and
> ii. The CDR person document has a location that already exists in
the GP person
> record,
> iii. Enter the Fragment ID in the 'Map To" column.
>
> Is the above breakdown accurate?
Yes (in addition to entering the CDR ID for the person in the "Map To" column for the GP row).
BZDATETIME::2009-08-31 15:04:02
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::60
(In reply to comment #57)
> (In reply to comment #56)
> > Margaret wanted to know which location address is used for the
mailers.
> Are there cases in which it would be necessary to use a different
contact block
> for the GP mailer than the CIPSContact block?
I don't think so. Using the CIPSContact block alone for the GP mailer should be OK.
BZDATETIME::2009-09-02 10:23:35
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::61
1. I have attached the mappings for the Gen. Prof. locations.
2. Please ignore the last column (J). We used it to enter comments
that will help us track CDR documents that need changes (Mostly
reactivations)
3. We could not verify each CDR record to determine if they were
Inactive or not so if it is possible to provide us with CurrentStatus of
each document or for only documents that are Inactive, that will be
helpful. We will review the documents to make sure they have the right
information and activate them.
4. There are a few rows without any mapping to a matching CDR document or any indication to create a new document. This is because we identified them to be records that were used for testing in the Gen. Prof. Database. These rows can be skipped.
5. We created new records for Canadian sites and also records with
multiple rows which needed to have new records created.
6. Also, this time I tried to preserve the other columns (apart from the
"Map To" column.)
If I am missing anything, please let me know.
Attachment GPLocs.xls has been added with description: Gen. Prof. Location Mapping
BZDATETIME::2009-09-02 11:17:33
BZCOMMENTOR::Bob Kline
BZCOMMENT::62
OK, the next steps for location mapping are:
[ ] Get a fresh backup of the database from CIAT
[ ] Generate a new mapping spreadsheet
[ ] CIAT fills in gaps in the "Map To" column
For the second step, the sheet will use the fresh copy of the database and the mappings already provided in the sheet you just posted. I will add the indications as to which organizations are inactive as requested, and I will carry over the mapping which was made in the previous pass to the extent that this is possible. However, some of the mapping you did in the previous pass will have to be done again. This is partly because the location information in the database for existing locations may have changed (an unavoidable problem), and partly because (as discussed above) someone tampered with the location information in the other columns, breaking the software's link with the original location information (an avoidable problem).
William:
Let me know how you will get me the fresh backup of the genprof database. Possibilities include:
1. You provide an (s)ftp url from which I can pull the backup
2. You upload the backup image via FTP to NCI (working with
Volker)
3. You bring a CD or DVD to tomorrow's status meeting
If #1 or #2 I can get started on the regeneration of the sheet today, and have it ready to go over with you at the meeting tomorrow. Otherwise I'll get started on creating the new mapping sheet later on Thursday (or Friday morning).
BZDATETIME::2009-09-02 11:31:37
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::63
(In reply to comment #62)
> OK, the next steps for location mapping are:
> [ ] Get a fresh backup of the database from CIAT
Will you need another copy after this one? I am trying to determine how
we will handle ongoing updates in the database before we start updating
in the CDR. If you will need another copy after this request, we can
continue working in the database until you request it. If not, we may
either have to stop working in the database or have a way of keeping
track of any changes that happen between now and when the database if
fully incorporated in the CDR.
BZDATETIME::2009-09-02 11:45:13
BZCOMMENTOR::Bob Kline
BZCOMMENT::64
(In reply to comment #63)
> Will you need another copy after this one? I am trying to
determine how we will
> handle ongoing updates in the database before we start updating in
the CDR. If
> you will need another copy after this request, we can continue
working in the
> database until you request it. If not, we may either have to stop
working in
> the database or have a way of keeping track of any changes that
happen between
> now and when the database if fully incorporated in the CDR.
I anticipate that we will get a fresh backup of the database right before we perform the conversion (and right after we turn off the ability to edit the data in the old system). If this presents a problem for some reason (I can't imagine that it would) we could talk about freezing the database or manually tracking changes, but those approaches shouldn't be necessary (and I really don't like the idea of trying to track changes manually and fold them in).
BZDATETIME::2009-09-02 12:44:08
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::65
(In reply to comment #64)
> (In reply to comment #63)
> I anticipate that we will get a fresh backup of the database right
before we
> perform the conversion (and right after we turn off the ability to
edit the
> data in the old system). If this presents a problem for some reason
(I can't
> imagine that it would) we could talk about freezing the database or
manually
> tracking changes, but those approaches shouldn't be necessary (and
I really
> don't like the idea of trying to track changes manually and fold
them in).
Thanks!
I uploaded a copy of the database to the ftp site cipsftp.nci.nih.gov in a folder named Genetics Professional Databate. It is a .bak file.
BZDATETIME::2009-09-02 15:59:57
BZCOMMENTOR::Bob Kline
BZCOMMENT::66
I left out a step in the list at the top of comment #62. I also need to create new org docs where you have "New" in the "Map To" column. I have created the XML for the new docs. Please take a look at at least a sampling of them (http://mahler.nci.nih.gov/cgi-bin/cdr/ShowGPDocs.py) and let me know if you see any problems. After I hear back from you I'll create the documents in the repository on Mahler and have you take another look. Then if things look OK I'll create the documents on Bach and move on to the other steps.
BZDATETIME::2009-09-02 16:16:56
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::67
(In reply to comment #59)
> (In reply to comment #58)
> > (In reply to comment #52)
> Yes (in addition to entering the CDR ID for the person in the "Map
To" column
> for the GP row).
So you want to see:
MAP TO
G 34 19838/F1 BACHMAN, RONALD
Kaiser Permanente Medical Center - Oakland
OR
G 34 19838 BACHMAN, RONALD
FI Kaiser Permanente Medical Center - Oakland
BZDATETIME::2009-09-02 16:30:26
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::68
(In reply to comment #66)
> I left out a step in the list at the top of comment #62. I also
need to create
> new org docs where you have "New" in the "Map To" column. I have
created the
> XML for the new docs. Please take a look at at least a sampling of
them
> (http://mahler.nci.nih.gov/cgi-bin/cdr/ShowGPDocs.py)
and let me know if you
> see any problems. After I hear back from you I'll create the
documents in the
> repository on Mahler and have you take another look. Then if things
look OK
> I'll create the documents on Bach and move on to the other
steps.
It looks like the link goes to the GP Person documents instead of the GP org documents.
BZDATETIME::2009-09-02 17:47:30
BZCOMMENTOR::Bob Kline
BZCOMMENT::69
(In reply to comment #68)
> (In reply to comment #66)
> > I left out a step in the list at the top of comment #62. I
also need to create
> > new org docs where you have "New" in the "Map To" column. I
have created the
> > XML for the new docs. Please take a look at at least a
sampling of them
> > (http://mahler.nci.nih.gov/cgi-bin/cdr/ShowGPDocs.py)
and let me know if you
> > see any problems. After I hear back from you I'll create the
documents in the
> > repository on Mahler and have you take another look. Then if
things look OK
> > I'll create the documents on Bach and move on to the other
steps.
>
> It looks like the link goes to the GP Person documents instead of
the GP org
> documents.
Sorry, the correct link is http://mahler.nci.nih.gov/cgi-bin/cdr/ShowGPOrgDocs.py
BZDATETIME::2009-09-02 17:49:03
BZCOMMENTOR::Bob Kline
BZCOMMENT::70
(In reply to comment #67)
> (In reply to comment #59)
> > (In reply to comment #58)
> > > (In reply to comment #52)
> > Yes (in addition to entering the CDR ID for the person in the
"Map To" column
> > for the GP row).
>
> So you want to see:
>
> MAP TO
>
> G 34 19838/F1 BACHMAN, RONALD
> Kaiser Permanente Medical Center - Oakland
>
>
> OR
>
>
> G 34 19838 BACHMAN, RONALD
> FI Kaiser Permanente Medical Center - Oakland
The second form is correct (since a GP can have more than one location block, you have to be able to tell me which locations are already present in the Person document).
BZDATETIME::2009-09-02 18:59:50
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::71
(In reply to comment #69)
> (In reply to comment #68)
> > (In reply to comment #66)
> > > I left out a step in the list at the top of comment #62.
I also need to create
> > > new org docs where you have "New" in the "Map To" column.
I have created the
> > > XML for the new docs. Please take a look at at least a
sampling of them
> > > (http://mahler.nci.nih.gov/cgi-bin/cdr/ShowGPDocs.py)
and let me know if you
> > > see any problems. After I hear back from you I'll create
the documents in the
> > > repository on Mahler and have you take another look. Then
if things look OK
> > > I'll create the documents on Bach and move on to the
other steps.
> >
> > It looks like the link goes to the GP Person documents instead
of the GP org
> > documents.
>
> Sorry, the correct link is
> http://mahler.nci.nih.gov/cgi-bin/cdr/ShowGPOrgDocs.py
Thanks!
I looked at several of them and they all looked good. The first record of the "Coastal Oncology, PL" has the suite number on the same line as the street address but I am guessing this is data entry error since the second record rightly shows the suite number on a separate <Street> field. I will take a look at the record in the Gen. Prof. Database tomorrow when I am in the office.
BZDATETIME::2009-09-03 08:32:39
BZCOMMENTOR::Bob Kline
BZCOMMENT::72
(In reply to comment #71)
> The first record of the "Coastal Oncology, PL" has the suite
number on
> the same line as the street address but I am guessing this is
data
> entry error ....
Yes, that's what's in the database.
BZDATETIME::2009-09-03 09:54:56
BZCOMMENTOR::Bob Kline
BZCOMMENT::73
I have generated actual CDR Organization documents on Mahler. Once you have given the green light that these are OK I'll create them on Bach.
BZDATETIME::2009-09-03 10:01:30
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::74
> > > (In reply to comment #58)
> > > > (In reply to comment #52)
>
> The second form is correct (since a GP can have more than one
location block,
> you have to be able to tell me which locations are already present
in the
> Person document).
Thanks!
I am attaching the Person Document Mapping.
Attachment GPMatchCandidates(1).xls has been added with description: Gen. Prof. Person Mapping
BZDATETIME::2009-09-03 10:03:12
BZCOMMENTOR::Bob Kline
BZCOMMENT::75
Here are the warnings from the job to create the org documents:
Emory University School of Medicine
('CDR0000650335',
'<Errors>\n<Err>/Organization/OrganizationLocations[3]/OrganizationLocation[1]/Location[1]/PostalAddress[1]:
U.S. address must have valid ZIP
code</Err>\n<Err>Non-publishable version will be
created.</Err>\n</Errors>')
University of Calgary
('CDR0000650400', '<Errors>\n<Err>Missing required attribute
cdr:ref in element
PoliticalSubUnit_State</Err>\n<Err>Non-publishable version
will be created.</Err>\n</Errors>')
BZDATETIME::2009-09-03 10:06:42
BZCOMMENTOR::Bob Kline
BZCOMMENT::76
(In reply to comment #75)
> University of Calgary
> ('CDR0000650400', '<Errors>\n<Err>Missing required
attribute cdr:ref in element
> PoliticalSubUnit_State</Err>\n<Err>Non-publishable
version will be
> created.</Err>\n</Errors>')
Isn't Calgary in Canada? I thought you said (comment #61) that CIAT was handling creation of new Organization documents for the Canadian organizations.
BZDATETIME::2009-09-03 10:28:53
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::77
(In reply to comment #76)
> (In reply to comment #75)
> > University of Calgary
> > ('CDR0000650400', '<Errors>\n<Err>Missing required
attribute cdr:ref in element
> > PoliticalSubUnit_State</Err>\n<Err>Non-publishable
version will be
> > created.</Err>\n</Errors>')
> Isn't Calgary in Canada? I thought you said (comment #61) that CIAT
was
> handling creation of new Organization documents for the Canadian
organizations.
Sorry. It must have been an oversight. Here is the mapping for University of Calgary CDR654285 and the correct zip code for Emory University School of Medicine is 30033. I have updated both documents on Mahler.
BZDATETIME::2009-09-03 10:59:40
BZCOMMENTOR::Bob Kline
BZCOMMENT::78
(In reply to comment #77)
> Sorry. It must have been an oversight. Here is the mapping for
University of
> Calgary CDR654285 and the correct zip code for Emory University
School of
> Medicine is 30033. I have updated both documents on Mahler.
I will modify the code to suppress generation of Organization documents for Canadian organizations, and if you can fix the zip code in the genprof database (in both tables where the error appears) I will manually fix the zip code in the spreadsheet you gave me so that when I generate the Organization document on Bach the document will be valid.
Let me know when you have finished reviewing the documents on Mahler.
BZDATETIME::2009-09-03 16:01:28
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::79
(In reply to comment #78)
> (In reply to comment #77)
> > Sorry. It must have been an oversight. Here is the mapping for
University of
> > Calgary CDR654285 and the correct zip code for Emory
University School of
> > Medicine is 30033. I have updated both documents on
Mahler.
> I will modify the code to suppress generation of Organization
documents for
> Canadian organizations, and if you can fix the zip code in the
genprof database
> (in both tables where the error appears)
I fixed the zip code error in the Gen Prof. Database.
BZDATETIME::2009-09-03 16:34:59
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::80
(In reply to comment #78)
> (In reply to comment #77)
> Let me know when you have finished reviewing the documents on
Mahler.
When a web site is present, for example [650345 and 650347], it is entered in both the field and the attribute. This duplicates the web site in QC reports. Normally, when we enter a web site, we include the following standard wording "Web site for" and then the EXACT NAME OF INSTITUTION as it is in the official name. In the case of 650347, it will read "Web site for Genetic Medicine Central California" in the field and then "www.geneticscentralcal.org" will be added as the attribute value in the attribute inspector.
Also, Can you include the DateLastModified element in each of the newly created organizations and add the date the document was created as the default date?
BZDATETIME::2009-09-03 16:55:39
BZCOMMENTOR::Bob Kline
BZCOMMENT::81
(In reply to comment #80)
> Also, Can you include the DateLastModified element in each of
the newly created
> organizations and add the date the document was created as the
default date?
I can do that, but doesn't the word "modified" imply a change from a previous version?
BZDATETIME::2009-09-08 10:08:15
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::82
(In reply to comment #81)
> (In reply to comment #80)
> > Also, Can you include the DateLastModified element in each of
the newly created
> > organizations and add the date the document was created as the
default date?
> I can do that, but doesn't the word "modified" imply a change from
a previous
> version?
You're right. However, I understand we use it to track when the document was created and also, it has been used for accounting/reporting purposes in the past.
BZDATETIME::2009-09-08 10:21:05
BZCOMMENTOR::Bob Kline
BZCOMMENT::83
To capture what was decided at the most recent CDR status meeting:
CIAT will finish up any last modifications to the genprof database and generate a final mailer. They will then shut down access to the system and provide a fresh backup of the database. I will use this backup to create the next version of the organization mapping sheet.
BZDATETIME::2009-09-08 10:22:09
BZCOMMENTOR::Bob Kline
BZCOMMENT::84
I've removed John R. from the CC list, as the IDP problem has been resolved.
BZDATETIME::2009-09-09 17:06:50
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::85
(In reply to comment #83)
> To capture what was decided at the most recent CDR status
meeting:
> CIAT will finish up any last modifications to the genprof database
and generate
> a final mailer. They will then shut down access to the system and
provide a
> fresh backup of the database. I will use this backup to create the
next
> version of the organization mapping sheet.
I uploaded a fresh copy of the Gen Prof database to the ftp site cipsftp.nci.nih.gov in folder "Genetics Professional Database". The name of the file is NCICGDIR090909.BAK (slightly different from the existing one)
All work on the GenProf database stopped this morning. Mailers had already been generated last Tuesday so we did not generate another set of mailers this week.
BZDATETIME::2009-09-10 08:41:41
BZCOMMENTOR::Bob Kline
BZCOMMENT::86
The thread for syndrome mapping wandered off into a discussion outside of the tracking system issue. I've attached the replacement mapping spreadsheet, and I have included Mary in the CC list for the issue. Here is an excerpt from the offline email discussion:
=============================== snip ===================================
*Please excuse the spreadsheet I sent yesterday – I’m not sure what version I attached, but this is the completed one.*
Hi everyone,
I’ve created the missing terms for the syndrome spreadsheet. I’ve attached the updated version – new terms are in red. We already had Peutz-Jeghers syndrome – it wasn’t the preferred name for the term.
Perhaps we should revisit the Cancer Site spreadsheet after the transition?
Thanks,
Mary
From: Beckwith, Margaret (NIH/NCI) [E] mbeckwit@icic.nci.nih.gov
Sent: Thursday, August 27, 2009 4:16 PM
To: Barnstead, Mary E
Cc: Grama, Lakshmi (NIH/NCI) [E]; Kline, Robert (NCI); Beebe, Deborah P;
Osei-Poku, William; Robinson, Douglas H
Subject: RE: diagnosis/condition mapping for Genetics Directory
Hi Mary,
I have inserted comments below. When I wrote this and sent the
spreadsheets I wasn’t thinking about how we had decided to handle the
Cancer Types that are needed by Cancer.gov. We are basically going to
handle them through menu information in the cancer site term record. We
will have three types of menu information: Genetics
Professionals-CancerType, Genetics ProfessionalsCancerSite,
Genetics Professionals-GeneticSyndrome.
Cancer site term records:
We will need to add two types of menu information to each cancer site term record: menu type of GP-CancerType with the display name of the larger group (e.g. Digestive/Gastrointestinal) and menu type of GP-CancerSite with the display name matching the site name on Cancer.gov (e.g. colon/rectum). What this means from the terminology effort below is that you don’t need to create term records for the “header” terms that were missing.
After mapping the list of cancer sites to term records in the CDR, you will also need to add RelatedTerm to each cancer site term record with a link to the appropriate genetic syndrome and a RelatedTermType of Associated Genetic Condition.
Cancer syndrome term records:
For the cancer syndromes, we need to map the list of cancer syndromes to the CDR records. Please add a term document for all syndromes not already in the CDR using the SemanticType of Genetic Condition. For all cancer syndromes that will be used in the Genetics Directory, also add menu information (with menu type of Genetics Professional--GeneticSyndrome) to make sure the display name matches what is on Cancer.gov.
Sorry I didn’t cover this before when I sent the spreadsheets, although most of the mapping still would have to be done. In terms of timing, I still need to put in a task to add the new menu information types before you can add them to the term records. At that point there isn’t any reason to not add them and the RelatedTerm links even before we convert the data. This will all need to be in place and tested before we can publish the data from the CDR after the transition at the end of September. Please let me know if you have any questions about this or about my comments below.
Thanks,
Margaret
From: Barnstead, Mary E mary.e.barnstead@lmco.com
Sent: Monday, August 24, 2009 3:43 PM
To: Beckwith, Margaret (NIH/NCI) [E]
Cc: Grama, Lakshmi (NIH/NCI) [E]; Kline, Robert (NCI); Beebe, Deborah P;
Osei-Poku, William; Robinson, Douglas H
Subject: RE: diagnosis/condition mapping for Genetics Directory
Hi Margaret,
Doug, Ning and I have gone over your spreadsheets and have the following questions/comments:
Questions:
· Does the cancer site “schwannoma” refer to malignant schwannoma or all schwannomas? MB: Probably malignant schwannomas.
· The syndrome list includes “Hodgkin’s lymphoma” and “Melanoma”; are we correct in assuming that this refers to the familial form of each disease? MB: Yes, I think that is a safe assumption since they are listed as syndromes.
Comments:
· The header term “gastrointestinal cancer” covers the cancer type “Digestive/Gastrointestinal”. MB: OK, see note about about not needing these header terms for this purpose.
· We will create the header term “genitourinary cancer” as a child of “Body system/site cancer” and make “kidney/urinary cancer”, “male reproductive cancer”, and “female reproductive cancer” its children. MB: No need to create this term; see note about about not needing these header terms for this purpose.
· “Female reproductive cancer” can cover “Gynecologic”. MB: OK, see note about about not needing these header terms for this purpose.
· “Hematopoietic/lymphoid cancer” can cover “Hematologic”. MB: OK, see note about about not needing these header terms for this purpose.
· “Germ cell tumor” can cover “Germ cell”. MB: OK, see note about about not needing these header terms for this purpose.
· “Nervous system cancer” can cover “Neurologic”. MB: OK, see note about about not needing these header terms for this purpose.
· We would add “pulmonary carcinoid tumor” to the terms mapped to “carcinoid”. MB: I guess this is okay (see my *note below about the cases where there are multiple terms).
· We would add “extragonadal germ cell tumor” to the terms mapped to “germ cell”. Perhaps the top terms under “germ cell tumor” (“childhood germ cell tumor”, “extragonadal germ cell tumor”, “ovarian germ cell tumor”, “teratoma”, “testicular germ cell tumor”) would be good for covering everything under “germ cell tumor”. MB: Yes, I think using the top level term is a good idea.
· “Colon (HNPCC)” maps to CDR ID 42841. MB: This is already on the syndrome mapping sheet .
· “Melanoma” maps to CDR ID 42847. MB: This is already on the syndrome mapping sheet. The syndrome listed as “Melanoma” maps to “hereditary melanoma (CDKN2, CDK4) cdr id of 42847. The cancer site term on the other spread sheet is “melanoma” cdr id 38833. Am I missing something?
*Note: I think it will probably be okay to have more than one term map to each of the cancer sites but it will really expand the list of diagnoses listed next to each Cancer Type on the search form and in the table at the end. For the ones that have a lot of terms, it would be better to use a more top level term like we did for germ cell tumor. Glioma is a good example; there were 6 terms mapped to the word glioma. Also for a lot of them we have a childhood and an adult term and I don’t know any way around that except to just pick the adult term.
New terms:
· “tongue cancer” (child of “head and neck cancer”) MB: OK
· “acoustic schwannoma” (child of “central nervous system cancer”) MB: The missing term seems to be “acoustic neuroma”, not “acoustic schwannoma”—are they the same thing?
· “Bloom syndrome” MB: OK
· “familial carcinoid syndrome” MB: OK
· “Carney syndrome” MB: OK
· “tylosis with esophageal cancer” MB: OK
· “osteochondromatosis” MB: OK
· “Peutz-Jeghers syndrome” MB: OK
· “familial renal cell cancer” MB: OK
· “Rothmund-Thomson syndrome” MB: OK
· “familial testicular carcinoma” MB: OK
· “tuberous sclerosis” MB: OK
· “Werner syndrome” MB: OK
In looking at the spreadsheet it seems that we would also need terms for “familial pancreatic cancer”, “familial prostate cancer”, and “familial paraganglioma”.
I think that would fill in the blanks. Would you like me to work with the spreadsheets you’ve begun? Whatever is easier for you. Bob only needs the spreadsheet with the syndromes and the CDR ID.
Thanks,
Mary
From: Beckwith, Margaret (NIH/NCI) [E] mbeckwit@icic.nci.nih.gov
Sent: Monday, August 24, 2009 2:45 PM
To: Barnstead, Mary E; Robinson, Douglas H
Cc: Grama, Lakshmi (NIH/NCI) [E]; Kline, Robert (NCI); Beebe, Deborah P;
Osei-Poku, William
Subject: RE: diagnosis/condition mapping for Genetics Directory
Thanks!
From: Barnstead, Mary E mary.e.barnstead@lmco.com
Sent: Monday, August 24, 2009 1:59 PM
To: Beckwith, Margaret (NIH/NCI) [E]; Robinson, Douglas H
Cc: Grama, Lakshmi (NIH/NCI) [E]; Kline, Robert (NCI); Beebe, Deborah P;
Osei-Poku, William
Subject: RE: diagnosis/condition mapping for Genetics Directory
Hi Margaret,
Just wanted you to know that Doug, Ning, and I are meeting today to discuss an approach to completing the mapping for the directory. We’ll get back with you very soon with a plan.
Thanks,
Mary
From: Beckwith, Margaret (NIH/NCI) [E] mbeckwit@icic.nci.nih.gov
Sent: Friday, August 21, 2009 4:23 PM
To: Barnstead, Mary E; Robinson, Douglas H
Cc: Grama, Lakshmi (NIH/NCI) [E]; Kline, Robert (NCI); Beebe, Deborah P;
Osei-Poku, William
Subject: diagnosis/condition mapping for Genetics Directory
Hi Mary and Doug,
As you (probably) know, we are in the process of transitioning the maintenance of the Genetics Directory from a Lockheed database to the CDR. In reviewing the various data elements for mapping between the databases, there are three that relate to diagnosis and genetic condition. The three elements in the current LM database are cancer type, cancer site, and syndrome. Cancer type refers to the larger categories that cancer sites are assigned to (e.g. colon/rectal is a site in the digestive/gastrointestinal type). There are also hereditary cancer syndromes.
I have done a first cut at mapping these and am attaching two spreadsheets. The first spreadsheet has the cancer types and cancer sites on it. I matched as closely as I could to our terminology using the diagnosis hierarchy report, but I didn’t put the CDR IDs on the spreadsheet. I was hoping that you could review what I have done and fill in any blanks if possible. There are several cancer types that we don’t have and this may be a problem since our terminology isn’t really organized around these large categories in some cases. There are two cancer site terms we don’t have and we may need to create.
The second spreadsheet has the syndromes mapped. We had a lot of these in the CDR already, but will need to create records for all those that we don’t have. There are also a few of them that aren’t really syndromes and that we already have as cancer sites (e.g. Hodgkin lymphoma). We need to figure out how to handle these since we need to have a term record for every syndrome.
Neither of these lists is very long, so I hope this won’t take a huge amount of time. We are under a little bit of a time crunch since we have to have this completely converted before the end of Sept. If you could take a look at the spreadsheets and let me know what you think I would appreciate it. We will probably need a meeting to go over the terms where there are issues.
Please let me know if you have any questions (and have a great weekend!).
Thanks,
Margaret
=============================== snip ===================================
A few questions:
1. What (if anything) should I do with the "alternates" on the sheet?
2. Do I need to wait for the MEN1/MEN2 split?
3. Where is the Genetics Professionals--GeneticSyndrome menu info
in
the term documents?
Attachment gp-syndromes082109MBMEB.xls has been added with description: Syndrome mappings
BZDATETIME::2009-09-10 09:37:50
BZCOMMENTOR::Bob Kline
BZCOMMENT::87
(In reply to comment #80)
> Normally, when we enter a web site, we include the following
standard wording
> "Web site for" and then the EXACT NAME OF INSTITUTION as it is in
the official
> name. ....
>
> Also, Can you include the DateLastModified element in each of the
newly created
> organizations and add the date the document was created as the
default date?
I have deleted the previous set of Org documents and generated a new set with:
1. additional code to skip over Canadian organizations;
2. a fix in the spreadsheet for Emory's bad zip code
3. the requested modification to the WebSite element's text
content
4. the addition of the DateLastModified element as requested.
This time there were no warnings in the logs. Please review the documents:
http://mahler.nci.nih.gov/cgi-bin/cdr/ShowGPOrgDocs.py
When you let me know that they are correct I will generate the Organization documents on Bach.
BZDATETIME::2009-09-10 10:06:40
BZCOMMENTOR::Bob Kline
BZCOMMENT::88
[More offline discussion of syndrome mapping:]
Hi Margaret,
· Sorry, I missed the part about MEN1 and MEN2.
· I think that during the meeting I had with Doug and Ning, we considered creating a “familial non-Hodgkin lymphoma term” for this mapping. The term would probably best be placed outside the adult and childhood subhierarchies.
· I’ll fix BRCA1 and BRCA2 in the table.
I’m not sure what you mean by changing them to genetic syndromes. Do you mean changing the preferred name of the term? Perhaps it would be better (as with the familial NHL term, to create familial terms for chordoma, ataxia telangiectasia, and carcinoid syndrome.
As far as the cancer site spreadsheet goes, I think we should probably have a teleconference or meeting so that we can go over the spreadsheet and agree on which changes are necessary and which are not. I will mark up the spreadsheet with our comments (the ones I sent in the email below and your responses to that) so that we’ll have somewhere to start. If this needs to be done ASAP, we certainly will get it done. Could you give me an idea of a deadline? I can reprioritize my work, but it means I can’t put my effort into getting the backlog of InScope trials cleared.
Thanks,
Mary
From: Beckwith, Margaret (NIH/NCI) [E] mbeckwit@icic.nci.nih.gov
Sent: Thursday, September 10, 2009 9:43 AM
To: Barnstead, Mary E
Cc: Grama, Lakshmi (NIH/NCI) [E]; Kline, Robert (NCI); Beebe, Deborah P;
Osei-Poku, William; Robinson, Douglas H
Subject: RE: diagnosis/condition mapping for Genetics Directory
This looks better Mary, but I still have a few comments/questions:
Multiple endocrine neoplasia needs to be split into MEN1 and MEN1
For cases where there are two CDR IDs we need to choose 1 (for example, child vs adult NHL, child vs adult Hodgkin, and the 2 choices for renal cancer, familial
It looks like BRCA1 mutation carrier and BRCA2 mutation carrier might be switched
For all of the ones that aren’t marked as genetic syndromes, I believe we need to change them so that they are (for example, NHL, Hodgkin, ataxia telangiectasia)
In terms of the cancer site spreadsheet, I would like to have a discussion about how long you think it will take to make all of the changes and include the menu information. We will not be able to publish the genetics directory data from the CDR until this is all completed.
Thanks,
Margaret
BZDATETIME::2009-09-10 10:59:45
BZCOMMENTOR::Bob Kline
BZCOMMENT::89
(In reply to comment #45)
In preparation for this afternoon's status meeting, here's where we stand on the checklist below:
> Probably not a complete list, but some of the things I still
need include:
>
> [ ] org mappings (see comment #37 & attachment)
I have the org mapping sheet which was posted on September 2 (with comment #61) and the frozen snapshot of the genprof database. I am waiting for William to review the latest batch of test Organization documents generated on Mahler, and for him to give me the green light to generate the documents on Bach. After that's done, I will use the final DB, the new org documents on Bach, and the September 2 mapping sheet to generate a new org mapping sheet. That sheet will have org mappings filled in from the September 2 sheet (and from the newly created Organization documents) for some of the locations. For locations which were added to the database more recently (and which were therefore not in the September 2 sheet), or for which the location information was changed (either in the genprof database or by tampering with columns of the sheet other than the "Map To" column) there will be gaps in the mapping which CIAT will need to fill in.
> [X] person mappings (see comment #44 & attachment)
William posted that on September 3.
> [X] locations to be dropped (see last paragraph of comment #44)
That information was included in the sheet for person mappings posted on September 3.
> [ ] syndrome mappings (comment #41 and attachment)
We received a mapping sheet from Mary this morning, but there are problems with it which still need to be resolved. See comments #86 and #88 above.
> [X] answer from William on magic numbers for ContactBy
William answered this off-line: '1' means "electronic mail"; '2' means "postal mail."
> [ ] answer from Margaret on third value for CertificationStatus
This one's still outstanding.
BZDATETIME::2009-09-10 11:08:49
BZCOMMENTOR::Bob Kline
BZCOMMENT::90
[Sent offline by Mary with the following message:]
===============================================================================
Update: I’ve created the MEN terms and fixed the entries for BRCA1 and 2. I’m attaching the current version. The ones still in question are highlighted in yellow. Doug agrees that familial terms would be best to distinguish the genetic conditions from the body site cancers.
Thanks,
Mary
===============================================================================
Mary:
Any possibility we could keep the discussion in the tracking issue by posting comments here?
Thanks!
Bob Kline
Attachment gp-syndromes082109MBMEB.xls has been added with description: Replacement syndrome mapping sheet
BZDATETIME::2009-09-10 11:09:39
BZCOMMENTOR::Bob Kline
BZCOMMENT::91
[Another offline comment:]
Thanks Mary. Are you going to create a familial Hodgkin term as well? And we need to choose 1 for the renal cancer, familial term—right now there are two listed.
Margaret
BZDATETIME::2009-09-10 11:49:06
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::92
(In reply to comment #87)
> (In reply to comment #80)
> > Normally, when we enter a web site, we include the following
standard wording
> > "Web site for" and then the EXACT NAME OF INSTITUTION as it is
in the official
> > name. ....
> >
> > Also, Can you include the DateLastModified element in each of
the newly created
> > organizations and add the date the document was created as the
default date?
> I have deleted the previous set of Org documents and generated a
new set with:
> 1. additional code to skip over Canadian organizations;
> 2. a fix in the spreadsheet for Emory's bad zip code
> 3. the requested modification to the WebSite element's text
content
> 4. the addition of the DateLastModified element as requested.
> This time there were no warnings in the logs. Please review the
documents:
> http://mahler.nci.nih.gov/cgi-bin/cdr/ShowGPOrgDocs.py
> When you let me know that they are correct I will generate the
Organization
> documents on Bach.
They all look good. I did not see the text that precedes the name of the organization in the website field ("Web site for"). However, there are not too many web sites and we can fix these manually when we QA the documents on Bach and we need to standardize some of the addresses anyway.
BZDATETIME::2009-09-10 13:26:00
BZCOMMENTOR::Bob Kline
BZCOMMENT::93
(In reply to comment #89)
>
> > [ ] answer from Margaret on third value for
CertificationStatus
>
> This one's still outstanding.
LG: Use "Not certified or eligible".
BZDATETIME::2009-09-10 15:13:12
BZCOMMENTOR::Bob Kline
BZCOMMENT::94
The new Organization documents are on Bach:
BZDATETIME::2009-09-11 16:51:46
BZCOMMENTOR::Bob Kline
BZCOMMENT::95
(In reply to comment #89)
> > [ ] org mappings (see comment #37 & attachment)
>
> I will use the final DB, the new org documents on Bach, and
the
> September 2 mapping sheet to generate a new org mapping
sheet.
> That sheet will have org mappings filled in from the September
2
> sheet (and from the newly created Organization documents) for
some
> of the locations. For locations which were added to the
database
> more recently (and which were therefore not in the September
2
> sheet), or for which the location information was changed
(either
> in the genprof database or by tampering with columns of the
sheet
> other than the "Map To" column) there will be gaps in the
mapping
> which CIAT will need to fill in.
Here's the new sheet. Please fill in the "Map To" column for the rows in red. Do not change any of the data in the columns to the left of that column.
Attachment GPLocations.xls has been added with description: New org mapping spreadsheet
BZDATETIME::2009-09-16 12:40:17
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::96
I have attached the completed sheet. There were a few new ones that need to be created. We added 'New' in the 'Map To' Column for those. Please, let me know if you have any questions.
Attachment GPLocations(1).xls has been added with description: Gen. Prof. Location Mapping
BZDATETIME::2009-09-28 10:10:26
BZCOMMENTOR::Bob Kline
BZCOMMENT::97
I have generated test documents for the new organizations based on the final locations mapping spreadsheet. Please review the documents and if you see no problems I'll load them onto Bach.
BZDATETIME::2009-09-28 11:31:11
BZCOMMENTOR::Bob Kline
BZCOMMENT::98
(In reply to comment #53)
> A new requirement was identified in yesterday's status meeting, to
identify
> which location blocks are associated with the person's activity as
genetics
> professional. Several approaches to meeting this requirement were
discussed:
>
> 1. Add a repeating element to the GeneticsProfessionalDetails
block
> containing the fragment ID for a location block which is
associated
> with the person as a GP (similar to the CIPSContact element).
>
> 2. Same as #1, but using an attribute (with space-delimited
fragment
> ID values to identify the associated location blocks) instead
of
> a repeating element.
>
> 3. Add a repeating element to the ContactDetail block, each
occurrence
> of which would name a context (such as "GP") in which the
contact
> block is relevant.
>
> 4. Same as #3, but using an attribute (with space-delimited
values)
> to identify the contexts associated with the block.
>
> ......
I don't see any other feedback to this analysis recorded in the issue other than Alan's (in comment #54). I went ahead and added a 'UsedFor' NMTOKENS attribute to the ContactDetail block in the common schema (since Lakshmi had earlier indicated a preference for the fourth option, and she gets more votes than the rest of us :-) ), but that's actually the wrong place for the attribute, since for OtherContactLocation elements, the ContactDetail block is not in the Person document but instead in the linked Organization document. So I think I need to move the attribute to the OtherPracticeLocation and PrivatePractice elements. The only question I'm asking for confirmation on at this point would be whether my decision to omit it from the Home element is correct. I would think so, assuming that the name of the element means what it says, in which case the question "what is this location used for?" is already answered (in other words, CIAT would never use a Home element for a location used for anything other than domicile). Right?
BZDATETIME::2009-09-28 14:22:43
BZCOMMENTOR::Bob Kline
BZCOMMENT::99
(In reply to comment #74)
> Created an attachment (id=1773) [details]
> Gen. Prof. Person Mapping
>
> I am attaching the Person Document Mapping.
William:
Could you go back through the mappings in this spreadsheet to make sure whoever was putting them in wasn't confused about which fragment to use? I tried running a test and it choked on the mapping for the main location of the GP with genprof ID of 64 to CDR20923 with fragment ID of 'F1': the software couldn't find any location with that fragment ID, so I looked at the document and saw that there was only one location block, with a fragment ID of 'F2' but inside that block there was a fragment link to an Organization document with fragment ID of 'F1' so it looks like the person doing the mapping picked up the fragment ID for a piece of the Organization document, not the Person document. Just wanted to make sure there weren't other mistakes like this (which might not be detected by the conversion software if the Person document has other location blocks and one of the other locations – but not the one that matches the GP location – has a fragment ID picked up from the Organization link).
BZDATETIME::2009-09-28 17:50:17
BZCOMMENTOR::Bob Kline
BZCOMMENT::100
(In reply to comment #34)
> We agreed that we will store all of the contact
information in the person
> document, except that the identification of the institution will be
a link to
> an actual Organization document (but not to a specific location
fragment).
I just realized that the schema doesn't allow us to do this. It requires that a fragment ID must be entered for the OrganizationLocation link (which makes sense, since without the fragment ID calling the element "OrganizationLocation" is wrong).
What's your preference, Lakshmi?
1. Back out this decision and modify the conversion software to come
up with
location blocks to link to in the Org docs?
2. Modify the Person schema to change <OrganizationLocation
cdr:ref='CDR0000012345#xxx'>... to <OrganizationLink
cdr:ref='CDR0000012345'>... (would require a global change)?
3. Change the schema to allow links without fragment IDs and ignore
the
mismatch between the name of the element and its semantics?
BZDATETIME::2009-09-30 11:27:47
BZCOMMENTOR::Bob Kline
BZCOMMENT::101
I need feedback for comments 97, 98, 99, and 100 before I can proceed further with this task.
BZDATETIME::2009-09-30 14:33:04
BZCOMMENTOR::Bob Kline
BZCOMMENT::102
Checklist of next steps for this task:
[ ] CIAT: Review new org docs on Mahler
[ ] BOB: Create new org docs on Bach
[ ] CIAT: Review new org docs on Bach
[ ] CIAT: Check GP mapping sheet for fragment ID mistakes (comment
#99)
[ ] LAKSHMI, CIAT: Answer 'UsedFor' questions (comments #53, #54,
#98)
[ ] LAKSHMI: Org link question(s) (comment #100)
[ ] BOB: Run test conversion on Mahler
[ ] CIAT: Review test conversion results on Mahler
[ ] VOLKER: Refresh Franck
[ ] BOB: Install schema changes on Franck
[ ] BOB: Run test conversion on Franck
[ ] CIAT: Review test conversion results on Franck
[ ] VOLKER: Install publication changes on Franck
[ ] VOLKER: Test publication changes on Franck
[ ] CIAT: Review results of publication test on Franck
[ ] BOB: Install schema changes on Bach
[ ] BOB: Run final conversion on Bach
[ ] CIAT, LAKSHMI: Review final conversion
[ ] VOLKER: Install publication changes on Bach
[ ] CIAT, LAKSHMI: Check results of publication of GP docs
Today was our deadline (though it was only an internal deadline, and was overtaken by issue #4364) for completing all of this! :-)
BZDATETIME::2009-09-30 16:01:11
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::103
I have attached the mapping. In all we made about 4 or 5 changes. Thank You!
Attachment GPMatchCandidates.xls has been added with description: GenProfPerson mapping
BZDATETIME::2009-10-01 08:34:30
BZCOMMENTOR::Bob Kline
BZCOMMENT::104
(In reply to comment #103)
> I have attached the mapping. In all we made about 4 or 5 changes.
Thanks, the problem I found is gone. I can check off one of the steps:
[ ] CIAT: Review new org docs on Mahler
[ ] BOB: Create new org docs on Bach
[ ] CIAT: Review new org docs on Bach
[X] CIAT: Check GP mapping sheet for fragment ID mistakes (comment
#99)
[ ] LAKSHMI, CIAT: Answer 'UsedFor' questions (comments #53, #54,
#98)
[ ] LAKSHMI: Org link question(s) (comment #100)
[ ] BOB: Run test conversion on Mahler
[ ] CIAT: Review test conversion results on Mahler
[ ] VOLKER: Refresh Franck
[ ] BOB: Install schema changes on Franck
[ ] BOB: Run test conversion on Franck
[ ] CIAT: Review test conversion results on Franck
[ ] VOLKER: Install publication changes on Franck
[ ] VOLKER: Test publication changes on Franck
[ ] CIAT: Review results of publication test on Franck
[ ] BOB: Install schema changes on Bach
[ ] BOB: Run final conversion on Bach
[ ] CIAT, LAKSHMI: Review final conversion
[ ] VOLKER: Install publication changes on Bach
[ ] CIAT, LAKSHMI: Check results of publication of GP docs
BZDATETIME::2009-10-01 11:13:14
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::105
(In reply to comment #97)
> I have generated test documents for the new organizations based on
the final
> locations mapping spreadsheet. Please review the documents and if
you see no
> problems I'll load them onto Bach.
>
> http://mahler.nci.nih.gov/cgi-bin/cdr/ShowGPOrgDocs.py
I reviewed the documents and they all look good. Thanks!
BZDATETIME::2009-10-01 16:15:32
BZCOMMENTOR::Volker Englisch
BZCOMMENT::106
[X] VOLKER: Refresh Franck
BZDATETIME::2009-10-01 17:22:48
BZCOMMENTOR::Bob Kline
BZCOMMENT::107
[X] BOB: Create new org docs on Bach
[ ] CIAT: Review new org docs on Bach
Ready for CIAT review (http://bach.nci.nih.gov/cgi-bin/cdr/ShowGPOrgDocs.py). That interface lists all the org docs created (both sets). If that's cumbersome to deal with for this review step I can modify the report to just show the ones created by this step. The advantage of doing it this way is that if any orgs got added twice it would show up on the list in an obvious way.
BZDATETIME::2009-10-05 16:51:19
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::108
(In reply to comment #107)
> [X] BOB: Create new org docs on Bach
> [ ] CIAT: Review new org docs on Bach
>
> Ready for CIAT review (http://bach.nci.nih.gov/cgi-bin/cdr/ShowGPOrgDocs.py).
> That interface lists all the org docs created (both sets). If
that's
> cumbersome to deal with for this review step I can modify the
report to just
> show the ones created by this step. The advantage of doing it this
way is that
> if any orgs got added twice it would show up on the list in an
obvious way.
Reviewed orgs. They look good. Thanks!
BZDATETIME::2009-10-05 22:35:36
BZCOMMENTOR::Lakshmi Grama
BZCOMMENT::109
(In reply to comment #100)
> (In reply to comment #34)
>
> > We agreed that we will store all of the
contact information in the person
> > document, except that the identification of the institution
will be a link to
> > an actual Organization document (but not to a specific
location fragment).
>
> I just realized that the schema doesn't allow us to do this. It
requires that
> a fragment ID must be entered for the OrganizationLocation link
(which makes
> sense, since without the fragment ID calling the element
"OrganizationLocation"
> is wrong).
>
> What's your preference, Lakshmi?
>
> 1. Back out this decision and modify the conversion software to
come up with
> location blocks to link to in the Org docs? About this option, I
recall that this involved a lot of data cleanup work for CIAT and it did
not seem to be worth the effort.
>
> 2. Modify the Person schema to change
<OrganizationLocation
> cdr:ref='CDR0000012345#xxx'>... to <OrganizationLink
> cdr:ref='CDR0000012345'>... (would require a global
change)?
>
Don't want to change this for all persons - would cause all kinds of
problems for other records where we use this information.
> 3. Change the schema to allow links without fragment IDs and ignore
the
> mismatch between the name of the element and its semantics?
THis seems more attractive - to have the schema allow either a fragment
link or a organization link - more along the lines of what we had
initially thought of (see comment in the person schema that seems to
indicate we did think about this). Can we add a custom rule that if
there is n organization link (rather than fragment) that there is a
specific address block?
BZDATETIME::2009-10-06 09:05:23
BZCOMMENTOR::Bob Kline
BZCOMMENT::110
(In reply to comment #109)
> > 3. Change the schema to allow links without fragment IDs
and ignore the
> > mismatch between the name of the element and its
semantics?
> THis seems more attractive - to have the schema allow either a
fragment link or
> a organization link - more along the lines of what we had initially
thought of
> (see comment in the person schema that seems to indicate we did
think about
> this).
Right. Of course, that comment was written at a time when the name of the Element in question was 'Organization' (not 'OrganizationLocation'); the hesitation at this point for the third option was not that flexibility for what's allowed in the attribute would be a bad thing (we do that elsewhere in the schemas), but rather that the change of the element name to 'OrganizationLocation' constrains us to link to a specific location, and not just to an Organization document, unless we're willing to ignore the confusion that the mismatch between the name of the element and the information it carries will create.
> Can we add a custom rule that if there is n organization link
(rather
> than fragment) that there is a specific address block?
Yes, that's possible.
BZDATETIME::2009-10-15 14:20:28
BZCOMMENTOR::Margaret Beckwith
BZCOMMENT::111
Update on terminology work: Mary has put in the menu cancer type and cancer site information into the term records and is in the process of adding the related links. These will be completed early next week, hopefully on Monday.
BZDATETIME::2009-10-20 14:32:33
BZCOMMENTOR::Bob Kline
BZCOMMENT::112
(In reply to comment #111)
> Update on terminology work: Mary has put in the menu cancer type
and cancer
> site information into the term records and is in the process of
adding the
> related links. These will be completed early next week, hopefully
on Monday.
How's this coming?
BZDATETIME::2009-10-21 08:20:42
BZCOMMENTOR::Margaret Beckwith
BZCOMMENT::113
Mary has completed putting in the Related Term links so everything should be finished. I am going to put in a request for a report to show the linkages between Syndrome/Cancer Type/Cancer Site menu information to check everything, but we should go ahead with the next step toward conversion.
BZDATETIME::2009-10-21 08:34:41
BZCOMMENTOR::Bob Kline
BZCOMMENT::114
(In reply to comment #113)
> ... but we should go ahead with the next step toward conversion.
Volker:
Please run another refresh of Franck.
BZDATETIME::2009-10-21 10:24:20
BZCOMMENTOR::Volker Englisch
BZCOMMENT::115
(In reply to comment #114)
> Please run another refresh of Franck.
Completed using the backup file from BACH from last night.
BZDATETIME::2009-10-21 15:04:14
BZCOMMENTOR::Bob Kline
BZCOMMENT::116
Am I right in assuming that since we don't publish Person documents anymore it doesn't make any difference whether the conversion program creates publishable versions of the documents?
BZDATETIME::2009-10-21 15:17:11
BZCOMMENTOR::Volker Englisch
BZCOMMENT::117
We do publish Person documents. We just don't send them to Cancer.gov.
BZDATETIME::2009-10-22 14:39:43
BZCOMMENTOR::Bob Kline
BZCOMMENT::118
A conversion test has been run on Franck. The software created 598 new Person documents and selected another 97 existing Person documents for merge of GP information. Logs from the test are attached.
Attachment Request4522-FranckTest.log has been added with description: Logs from test conversion on Franck
BZDATETIME::2009-10-22 14:57:50
BZCOMMENTOR::Bob Kline
BZCOMMENT::119
I see that there is a problem with some of the links to newly created organizations, which appear to have the wrong document IDs. I'll investigate further.
BZDATETIME::2009-10-22 16:31:55
BZCOMMENTOR::Bob Kline
BZCOMMENT::120
(In reply to comment #119)
> I see that there is a problem with some of the links to newly
created
> organizations, which appear to have the wrong document IDs. I'll
investigate
> further.
I found and have corrected the problem. I'll have Volker do another refresh of Franck and then run the conversion job again.
BZDATETIME::2009-10-22 16:56:24
BZCOMMENTOR::Volker Englisch
BZCOMMENT::121
(In reply to comment #120)
> I found and have corrected the problem. I'll have Volker do another
refresh of
> Franck and then run the conversion job again.
The CDR on FRANCK has been refreshed using last night's backup copy from BACH.
BZDATETIME::2009-10-22 17:50:07
BZCOMMENTOR::Bob Kline
BZCOMMENT::122
Here are the categories of problems I see in the log file:
1. No document ... OrganizationLocation=CDR0000000000 [21 occurrences]
Didn't find a mapped Org CDR ID for the location in the spreadsheet.
2. Link from Person.OrganizationLocation to document type is illegal [4]
Found a mapped CDR ID but it didn't link to an Org document
3. PoliticalSubunit_State is in different country ... [38]
4. No document ... PoliticalSubunit_State=CDR0000000000 [33]
5. No document ... Country=CDR0000000000 [5]
6. U.S. address must have valid ZIP code [2]
7. Invalid URI value 'www.northshore.org ...' [1]
8. Invalid integer value: '6/99' in element EligibilityYear [1]
9. Pattern constraints not matched for 'Jacob. A. Reiss@kp.org'
in
element SpecificEmail [1]
Attachment Request4522-FrankTest2.log has been added with description: Second set of test logs from Franck
BZDATETIME::2009-10-22 22:57:56
BZCOMMENTOR::Bob Kline
BZCOMMENT::123
(In reply to comment #122)
> Here are the categories of problems I see in the log file:
>
> 1. No document ... OrganizationLocation=CDR0000000000 [21
occurrences]
>
> Didn't find a mapped Org CDR ID for the location in the
spreadsheet.
For example, CDR658044 (converted from GP686, Carrie Heuer), whose location gives "Baptist Women's Health Center" as the organization, with address information matching row 88 on the location mapping spreadsheet; the "Map To" column on that row is empty.
> 2. Link from Person.OrganizationLocation to document type is
illegal [4x]
>
> Found a mapped CDR ID but it didn't link to an Org document
For example, CDR658070 (converted from GP450, Melinda W. Fawbush), with a location at Baptist Cancer Institute with address information matching row 81 on the location mapping spreadsheet; the "Map To" column in that row has "CDR287766" but that document is an S&P mailer tracking document, not an Organization document.
> 3. PoliticalSubunit_State is in different country ...
[38x]
> 4. No document ... PoliticalSubunit_State=CDR0000000000 [33x]
> 5. No document ... Country=CDR0000000000 [5x]
For an example of these three error messages, see CDR658116 (GP436, Hakan L. Olsson); the GP record has "XX" for the state, and "SN" for the country (the CDR doesn't having any documents matching these values).
> 6. U.S. address must have valid ZIP code [2x]
For example, CDR658246 (GP252, Maureen E. Smith); the genprof database had a blank in the zip4 column, which caused the conversion software to put "38146-" in the PostalCode_ZIP element. Also, CDR658430 (GP74 for Martha A. McGuy) has '02402' which isn't in the table of valid zipcodes.
> 7. Invalid URI value 'www.northshore.org ...' [1x]
See CDR658400 (GP190, Anna C. Newlin); value is missing protocol portion of URL.
> 8. Invalid integer value: '6/99' in element EligibilityYear [1x]
See CDR658401 (GP70, Carin L. Beltz).
> 9. Pattern constraints not matched for 'Jacob. A. Reiss@kp.org'
in
> element SpecificEmail [1x]
See CDR658514 (GP406, Jacob A. Reiss); can't have spaces in a valid SMTP address.
These can all be corrected by hand once the conversion has been done on Bach, but the errors will complicate Volker's testing of the publication scripts.
Ready for CIAT review. Please review at least a hefty sampling of the converted documents and the logs to verify that the converted documents match what you expect.
BZDATETIME::2009-10-23 08:15:14
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::124
(In reply to comment #123)
> (In reply to comment #122)
>
> > Here are the categories of problems I see in the log
file:
> >
For the errors that can be fixed on the spreadsheet, should I update the
spreadsheet and post the updated one?
BZDATETIME::2009-10-23 08:48:04
BZCOMMENTOR::Bob Kline
BZCOMMENT::125
(In reply to comment #124)
> (In reply to comment #123)
> > (In reply to comment #122)
> >
> > > Here are the categories of problems I see in the log
file:
> > >
> For the errors that can be fixed on the spreadsheet, should I
update the
> spreadsheet and post the updated one?
Possibly, though I'd be more inclined to just defer fixing the problems until the documents are on Bach. That way we'll avoid the possibility of introducing new problems into the spreadsheet while we're fixing old ones, and we'll keep progress toward completion by avoiding another round of testing. My preference at this stage would be for CIAT to invest its time and energy into reviewing the results, comparing the documents with what's in the old system, and looking for problems which haven't already been identified. I'll leave the decision up to Margaret (who filed the issue) or Lakshmi (the QA contact).
BZDATETIME::2009-10-23 08:52:20
BZCOMMENTOR::Margaret Beckwith
BZCOMMENT::126
How many and what types of problems are there? If not too many, I agree with Bob that it makes more sense to just fix them on Bach once we have converted. This is taking longer than we intended and we still have to the mailers. Lakshmi, what do you think?
BZDATETIME::2009-10-23 08:57:41
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::127
(In reply to comment #125)
> (In reply to comment #124)
> > (In reply to comment #123)
> > > (In reply to comment #122)
> > >
> > > > Here are the categories of problems I see in the log
file:
> > > >
> > For the errors that can be fixed on the spreadsheet, should I
update the
> > spreadsheet and post the updated one?
>
> Possibly, though I'd be more inclined to just defer fixing the
problems until
> the documents are on Bach. That way we'll avoid the possibility of
introducing
> new problems into the spreadsheet while we're fixing old ones, and
we'll keep
> progress toward completion by avoiding another round of testing. My
preference
> at this stage would be for CIAT to invest its time and energy into
reviewing
> the results, comparing the documents with what's in the old system,
and looking
> for problems which haven't already been identified. I'll leave the
decision up
> to Margaret (who filed the issue) or Lakshmi (the QA contact).
OK. Thanks!
By "comparing the documents with what's in the old system", I am assuming you mean to use the spreadsheet you generated, since we do not have the old system up and running?
BZDATETIME::2009-10-23 09:00:14
BZCOMMENTOR::Bob Kline
BZCOMMENT::128
(In reply to comment #126)
> How many and what types of problems are there?
See comments #122 and #123 for descriptions, examples, and counts. The problems with the highest frequency are caused by bogus values in the database, and thus can't be fixed in the spreadsheet. The next-most-frequent problem, with 21 occurrences, was caused by the fact that CIAT left some blank cells in the spreadsheet's "Map To" column.
BZDATETIME::2009-10-23 09:05:03
BZCOMMENTOR::Bob Kline
BZCOMMENT::129
(In reply to comment #127)
> By "comparing the documents with what's in the old system", I am
assuming you
> mean to use the spreadsheet you generated, since we do not have the
old system
> up and running?
If you don't have a read-only version of the system available, and you have no access to the viewing the data directly from the database using SQL, then you would probably want to compare the data that's on Cancer.gov against the new and newly modified Person documents on Franck. You'll also have an opportunity to review carefully the results of the publishing test which Volker will run (once the review of the documents themselves has been completed).
If you need me to, I can create specific SQL queries in the ad-hoc query interface on Mahler which will give you read-only access to the values in the final snapshot of the genprof database from which the conversion was performed.
BZDATETIME::2009-10-23 09:14:24
BZCOMMENTOR::Bob Kline
BZCOMMENT::130
I found some additional logging information in a separate log file (I wondered where it had gone). Doesn't contain any errors we didn't already know about. It has the report of the five MiddleInitials values we knew we'd have to correct manually after conversion, and additional information about the location values which couldn't be found in the spreadsheet for the errors already reported in the log file posted previously. I'll modify the conversion script so that all the logging information goes into a single log file.
Attachment Request4522-extra.log has been added with description: Extra logging information from test run
BZDATETIME::2009-10-27 11:27:48
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::131
I have extracted the errors from the logs and attached it here. Since we are not going fix these errors until after conversion on Bach, this will help us to fix all the conversion errors, including ones we will identify while we review the data on Franck.
Most of the errors in the logs were as a result of the conversion of foreign sites. In the Gen Prof. database, the state field for all foreign sites were entered with 'XX' and all of these returned PoliticalSubUnit_State errors in the logs. There were also some mapping problems including typos of CDR IDs, empty fields (for the Map Tos) as well as wrong CDR IDs. These should not be difficult to fix.
The other set of errors were documents that we should have identified as “Not to convert”. For example there were test documents in the Gen Prof. Database such as records for Aspen Systems, Sheri Khanna, and Elaine Shaya etc. which were all test documents in the Gen Prof database. All these documents may need to be blocked and deleted from Bach after conversion.
Meanwhile, we have started reviewing the data on Franck. I will post any problems or errors we find.
Attachment ERROR LOGS FOR CIAT.txt has been added with description: ERROR LOGS FOR CIAT TO FIX
BZDATETIME::2009-10-27 11:40:17
BZCOMMENTOR::Bob Kline
BZCOMMENT::132
(In reply to comment #131)
> The other set of errors were documents that we should have
identified as “Not
> to convert”. For example there were test documents in the Gen Prof.
Database
> such as records for Aspen Systems, Sheri Khanna, and Elaine Shaya
etc. which
> were all test documents in the Gen Prof database. All these
documents may need
> to be blocked and deleted from Bach after conversion.
If you want to provide me with a list of the GP primary IDs for records which should not be converted, I can modify the conversion software to skip them.
BZDATETIME::2009-10-27 15:21:21
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::133
(In reply to comment #132)
> (In reply to comment #131)
>
> > The other set of errors were documents that we should have
identified as “Not
> > to convert”. For example there were test documents in the Gen
Prof. Database
> > such as records for Aspen Systems, Sheri Khanna, and Elaine
Shaya etc. which
> > were all test documents in the Gen Prof database. All these
documents may need
> > to be blocked and deleted from Bach after conversion.
>
> If you want to provide me with a list of the GP primary IDs for
records which
> should not be converted, I can modify the conversion software to
skip them.
Here is the list of the GP Person Documents
CDR0000658284
CDR0000658301
CDR0000658318
CDR0000658451
The following organization documents were in the records above although not linked because we did not provide a mapping for them and also we did not mark them as "New" to be created by the program.
Organization Documents
Practice Address 3
Practice Address 2
Practice Address 1
SLK
Aspen Systems
Aspen Systems Corp.
Also, we are going to need the Person QC report modified to display the new GP information added to the schema. We will also need a publish preview for the GP documents as well. When these are approved, I will enter them as new issues.
BZDATETIME::2009-10-30 10:28:26
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::134
(In reply to comment #133)
> (In reply to comment #132)
>
> Here is the list of the GP Person Documents
>
> CDR0000658284
> CDR0000658301
> CDR0000658318
> CDR0000658451
>
Here are the GP Primary IDs for the four records:
399
400
410
401
BZDATETIME::2009-10-30 14:06:45
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::135
Here are some of the issues we have come across so far, apart from all the data problems that we need to fix manually on Bach.
1. Professional suffixes
It looks like all the suffixes were converted into the
<CustomProfessionalSuffix> element. However, some of them are
standard professional suffixes and it would be good to have them
converted into the <StandardProfessionalSuffix> element. For
example, GP 436, 450, 436 have MSN, MD, PhD, Bsc as suffixes that have
been converted into the <CustomProfessionalSuffix> element. They
also have suffixes that should remain in the custom professional suffix
elements.
2. URLs
The URLs for the locations were converted into the
<SpecificWebSite> element and also entered in the cdr:xref
attribute. Traditionally we only enter “Web site for [Name of
Organization]” in the <SpecificWebSite> element and only enter the
URL in the attribute. For example: GP – 450 & 436
3. Foreign Phone numbers
It looks like all telephone numbers have been assigned the public = yes
attribute. However, on the web site, we are seeing that foreign phone
numbers are not displayed so it would be good to have their attributes
set to public = no. For example: GP 294 & 315
4. Private Locations
Some of the locations are private but they have been converted into the
organization element for example GP 618 has “Office of Dr. Elizabeth A.
Poynor” added as an organization location. Is this Okay? It looks like
there are not many like this though so we can fix this manually if we
had a Private Location element in the GP schema.
BZDATETIME::2009-10-30 17:30:23
BZCOMMENTOR::Bob Kline
BZCOMMENT::136
(In reply to comment #135)
> 1. Professional suffixes
> It looks like all the suffixes were converted into the
> <CustomProfessionalSuffix> element. ....
I have modified the conversion software to strip the periods from the values so that they'll match the valid values in the schema.
> 2. URLs ....
That's been fixed.
> 3. Foreign Phone numbers
> It looks like all telephone numbers have been assigned the public =
yes
> attribute. However, on the web site, we are seeing that foreign
phone numbers
> are not displayed so it would be good to have their attributes set
to public =
> no. ....
There really isn't a Public='Yes' attribute allowed, so I assume you just meant that the Public='No' wasn't present. I have modified the software so that if the country in the incoming location information is not "US" then the Public='No' attribute is added to the SpecificPhone element.
> 4. Private Locations
> Some of the locations are private but they have been converted into
the
> organization element for example GP 618 has “Office of Dr.
Elizabeth A.
> Poynor” added as an organization location. Is this Okay? It looks
like there
> are not many like this though so we can fix this manually if we had
a Private
> Location element in the GP schema.
That's what you'll need to do. The input database puts that value in the institution column, with no distinction between private-practice locations and locations at institutions.
When you're ready for the next round of testing, please let Volker know so he can perform another refresh of Franck.
BZDATETIME::2009-11-03 14:41:44
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::137
We are ready for the next round of testing on Franck. I think Franck can be refreshed at this point.
BZDATETIME::2009-11-03 16:41:33
BZCOMMENTOR::Volker Englisch
BZCOMMENT::138
FRANCK has been refreshed.
BZDATETIME::2009-11-03 17:41:22
BZCOMMENTOR::Bob Kline
BZCOMMENT::139
Ready for CIAT review of test results.
Note to myself: Schemas to install on Bach when the conversion is run in production:
CdrCommonBase.xml
PersonSchema.xml
GlossaryTermConcept.xml
Attachment gp-conv-20091103-franck.log has been added with description: Logs from third test run on Franck
BZDATETIME::2009-11-09 08:58:53
BZCOMMENTOR::Bob Kline
BZCOMMENT::140
When I originally asked Sheri about why CIAT stored GP addresses in two different tables, and why there was often duplication between the addresses in those tables, she said nobody knew. After reviewing the printout from the old electronic mailer example I think I now know the answer. My guess is that the address stored in the tblMain table is used just for contacting the person annually for review of the information, and all of the other addresses (whether or not they duplicate the first address) are published in the directory. Assuming this is right, I would think we would need to modify the conversion logic so that we preserve this distinction. We already have an attribute ('UsedFor') which identifies which address blocks are to be picked up for publication in the GP directory. What we need is the GP equivalent of the CIPSContact element for identifying the address to which we should send the GP mailers. That address may or may not also be flagged for publication in the online directory with the UsedFor='GP' attribute. I'm further assuming that it's possible (even likely) that at least some GPs may want us to use a different address for contacting them as a genetics specialist than we should use for their role as (for example) a board member or manager (which would imply that we couldn't press CIPSContact into double duty here). Possible approaches:
1. Add an attribute to CIPSContact to distinguish uses
2. Add another value to the UsedFor valid values list
3. Add another element parallel to CIPSContact
4. Add another attribute parallel to UsedFor
The first option has the serious drawback that we'd have to modify a lot of software that uses CIPSContact. The second option is reasonable (using, for example, the value 'GPMailer'); it does speed up the timetable for implementing validation of NMTOKENS but we'd have to do that sooner or later. The other two are also reasonable, though not quite a clean as #2.
So here are my questions:
1. Does the analysis of my reverse engineering of the table
semantics
seem plausible enough to act on?
2. If so, am I right in thinking it would be unacceptable to force
all mailers for a person with multiple roles to use the same
address?
3. If so, which of the options above should we use?
BZDATETIME::2009-11-09 11:11:08
BZCOMMENTOR::Bob Kline
BZCOMMENT::141
Here's a follow-on question to the previous comment. If my reverse engineering of the address semantics is correct, and we should have been separating out addresses which are supposed to be used internally for verification from those addresses which are supposed to be published in the directory, and no one has noticed the failure to make this distinction until now, might it be prudent to have the CIAT staff who work directly with the GP data participate in the review of the test conversion results? There may be other gaps which they would notice that no one has picked up so far.
BZDATETIME::2009-11-09 11:32:14
BZCOMMENTOR::Bob Kline
BZCOMMENT::142
While I was doing some preliminary work on converting the GP mailer documents I noticed that there were mailer records for two GPs which I couldn't find in the CDR after the test conversion of Franck. One was for GP6, which doesn't have a row in tblMain at all (apparently there are no referential integrity constraints in the DB), and the other is for GP312. It took me a long time to figure out why the conversion program showed no trace of GP312, but I finally found that in the attachment mapping GP records to existing Person documents both GP312 and GP169 were mapped to the same CDR document (CDR19862, Kent Hoskins). Assuming CIAT confirms that GP312 and GP169 really do represent the same person (and from the location information it would appear that they do), I propose making the LegacyID multiply occurring, putting both GP IDs in the converted document, and having CIAT fix the other information by hand, merging data from the two genprof records manually. To assist them, I can either create custom reports to show what's in the genprof tables for both records, or (simpler solution) I can just have the import software create a separate document for one of the GP records, merging the other one into CDR19862. CIAT can use the new document to get the information which needs to be merged by hand into CDR19862, then have it deleted. And I can have the mailer conversion software hook mailer events for both GP records to CDR19862.
Does this sound like a reasonable solution?
For the GP6 mailer(s), should I just discard them, or should I create Mailer documents anyway, leaving the Recipient and Document elements empty?
BZDATETIME::2009-11-10 14:40:52
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::143
(In reply to comment #142)
> While I was doing some preliminary work on converting the GP mailer
documents I
> noticed that there were mailer records for two GPs which I couldn't
find in the
> CDR after the test conversion of Franck. One was for GP6, which
doesn't have a
> row in tblMain at all (apparently there are no referential
integrity
> constraints in the DB), and the other is for GP312. It took me a
long time to
> figure out why the conversion program showed no trace of GP312, but
I finally
> found that in the attachment mapping GP records to existing Person
documents
> both GP312 and GP169 were mapped to the same CDR document
(CDR19862, Kent
> Hoskins). Assuming CIAT confirms that GP312 and GP169 really do
represent the
> same person (and from the location information it would appear that
they do),
You're right; both records refer to the same person. It is noted in the comments of GP169 as "Duplicate of 312 therefore, post "NO" to WEB pad."
>I
> propose making the LegacyID multiply occurring, putting both GP IDs
in the
> converted document, and having CIAT fix the other information by
hand, merging
> data from the two genprof records manually. To assist them, I can
either
> create custom reports to show what's in the genprof tables for both
records, or
> (simpler solution) I can just have the import software create a
separate
> document for one of the GP records, merging the other one into
CDR19862. CIAT
> can use the new document to get the information which needs to be
merged by
> hand into CDR19862, then have it deleted. And I can have the mailer
conversion
> software hook mailer events for both GP records to CDR19862.
>
> Does this sound like a reasonable solution?
>
This sounds good to me. I think the first option of creating a custom report should be good for us. Thanks!
> For the GP6 mailer(s), should I just discard them, or should I
create Mailer
> documents anyway, leaving the Recipient and Document elements
empty?
Please create a mailer document. What is the consequence of not creating a mailer document at this time? Does it mean we will not be able to send any mailers in the future? If this is the case, I think you should create a mailer document for GP6.
BZDATETIME::2009-11-10 15:51:34
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::144
(In reply to comment #140)
> When I originally asked Sheri about why CIAT stored GP addresses in
two
> different tables, and why there was often duplication between the
addresses in
> those tables, she said nobody knew. After reviewing the printout
from the old
> electronic mailer example I think I now know the answer. My guess
is that the
> address stored in the tblMain table is used just for contacting the
person
> annually for review of the information, and all of the other
addresses (whether
> or not they duplicate the first address) are published in the
directory.
Sergey Shishov (sshishov@icfi.com), who was there when the GenProf database was built is now here at Z-Tech. If it is necessary, please send him an email and he will try to answer any questions them. I have a copy of the codes, I can give it to him if he needs it. I can ask him the questions but I thought it would be better for you to do that to prevent any misunderstanding.
BZDATETIME::2009-11-10 17:23:36
BZCOMMENTOR::Bob Kline
BZCOMMENT::145
(In reply to comment #144)
> Sergey Shishov (sshishov@icfi.com), who was there when the
GenProf database was
> built is now here at Z-Tech. If it is necessary, please send him an
email and
> he will try to answer any questions them.
Sergey confirmed my analysis (in comment #140) of the semantics of the genprof tables holding address information. So I will need answers to the last two questions in comment #140 (see also the question in comment #141).
BZDATETIME::2009-11-12 14:24:46
BZCOMMENTOR::Bob Kline
BZCOMMENT::146
(In reply to comment #145)
> (In reply to comment #144)
>
> > Sergey Shishov (sshishov@icfi.com), who was there when the
GenProf database was
> > built is now here at Z-Tech. If it is necessary, please send
him an email and
> > he will try to answer any questions them.
>
> Sergey confirmed my analysis (in comment #140) of the semantics of
the genprof
> tables holding address information. So I will need answers to the
last two
> questions in comment #140 (see also the question in comment
#141).
Got the answers at this afternoon's status meeting:
> 2. If so, am I right in thinking it would be unacceptable to
force
> all mailers for a person with multiple roles to use the same
address?
Yes.
> 3. If so, which of the options above should we use?
#2: Add another value to the UsedFor valid values list (GPMailer)
> ... might it be prudent to have the CIAT staff who work directly
with
> the GP data participate in the review of the test conversion
results?
There's only one staff member who works with the genetics directory, and she's brand new.
I will modify the GP conversion software to make the distinction between addresses to be used internally for the mailers, those intended for publication in the online directory, and those to be used for both purposes, and then run another conversion job on Franck.
BZDATETIME::2009-11-13 08:22:30
BZCOMMENTOR::Bob Kline
BZCOMMENT::147
I did a run in which the part to merge GP information into existing Person documents was performed in test mode, in order to be able to run again without another refresh of Franck, which would wipe out the work we're doing on testing the CT.gov import modifications. The diffs for the modified Person documents are at
http://franck.nci.nih.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2009-11-12_18-02-26
Take a look at these looking for any obvious problems, but don't spend too much time on them. When you're done with this quick review, I'll delete the newly created Person documents and run it again, this time with the second half set to process in live mode, actually altering the merged Person documents in the CDR (on Franck) and I'll have you take a closer look at the results of that (and I'll use those results to create test GP mailer history documents).
You should also look at the newly created Person documents created by this run (they won't be any different – except for their CDR IDs – from what created by the subsequent test).
Attachment Request4522-20091112.log has been added with description: Test on Franck with code to handle distinction of address types
BZDATETIME::2009-11-16 11:18:51
BZCOMMENTOR::Bob Kline
BZCOMMENT::148
(In reply to comment #143)
> You're right; both records refer to the same person. It is noted
in the
> comments of GP169 as "Duplicate of 312 therefore, post "NO" to WEB
pad."
I have added GP169 to the DO_NOT_CONVERT set, which means GP312 will be merged into CDR19862.
> This sounds good to me. I think the first option of creating a
custom report
> should be good for us. Thanks!
I have preserved the GP169 information which got merged into CDR19862 in the previous conversion test here:
http://franck.nci.nih.gov/gp169.xml
I can dig out anything else you need from the genprof tables, though I believe gp169.xml should have everything you need.
See the mailer issue (#4630) for responses to the mailer-related questions in comment #143.
BZDATETIME::2009-11-16 11:38:22
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::149
(In reply to comment #147)
> Created an attachment (id=1817) [details]
> Test on Franck with code to handle distinction of address
types
>
> I did a run in which the part to merge GP information into existing
Person
> documents was performed in test mode, in order to be able to run
again without
> another refresh of Franck, which would wipe out the work we're
doing on testing
> the CT.gov import modifications. The diffs for the modified Person
documents
> are at
>
> http://franck.nci.nih.gov/cgi-bin/cdr/ShowGlobalChangeTestResults.py?dir=2009-11-12_18-02-26
>
> Take a look at these looking for any obvious problems, but don't
spend too much
> time on them. When you're done with this quick review, I'll delete
the newly
> created Person documents and run it again, this time with the
second half set
> to process in live mode, actually altering the merged Person
documents in the
> CDR (on Franck) and I'll have you take a closer look at the results
of that
> (and I'll use those results to create test GP mailer history
documents).
>
I have looked at the results above and I did not see any mistakes.
> You should also look at the newly created Person documents
created by this run
> (they won't be any different – except for their CDR IDs – from what
created
> by the subsequent test).
I do not understand this part. Since you stated that you did the conversion in test mode, I did not expect to see any modified documents on Franck. Am I missing anything?
BZDATETIME::2009-11-16 11:47:23
BZCOMMENTOR::Bob Kline
BZCOMMENT::150
(In reply to comment #149)
> > You should also look at the newly created Person documents
created by this
> > run (they won't be any different – except for their CDR IDs –
from what
> > created by the subsequent test).
>
> I do not understand this part. Since you stated that you did the
conversion in
> test mode, I did not expect to see any modified documents on
Franck. Am I
> missing anything?
Yes, you are. The GP conversion program has two parts. One part merges information for genetics professionals who are already represented in the CDR (as indicated by the mapping you provided in the GenProfPerson mapping spreadsheet) into the Person documents you identified. The other part creates new Person documents for the rest of the genetics professionals found in the database. I didn't say that I ran the conversion in test mode. What I said was that I had performed a test run, and that "the part to merge GP information into existing Person documents was performed in test mode." I did this because I can easily delete the new Person documents, but that's not true for the Person documents which were already there, into which I'm merging additional information.
BZDATETIME::2009-11-16 12:03:50
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::151
(In reply to comment #150)
> (In reply to comment #149)
>
> Yes, you are. The GP conversion program has two parts. One part
merges
> information for genetics professionals who are already represented
in the CDR
> (as indicated by the mapping you provided in the GenProfPerson
mapping
> spreadsheet) into the Person documents you identified. The other
part creates
> new Person documents for the rest of the genetics professionals
found in the
> database. I didn't say that I ran the conversion in test mode. What
I said
> was that I had performed a test run, and that "the part to merge GP
information
> into existing Person documents was performed in test mode." I did
this because
> I can easily delete the new Person documents, but that's not true
for the
> Person documents which were already there, into which I'm merging
additional
> information.
Thank you! I understand now. Please proceed with the next steps. I did not see anything unusual.
BZDATETIME::2009-11-16 13:34:42
BZCOMMENTOR::Bob Kline
BZCOMMENT::152
(In reply to comment #151)
> Please proceed with the next steps.
This test was live for both halves. Please review the results carefully. If they look OK Volker can run his publishing tests against them, and I'll generate mailer history documents using them.
Attachment Request4522-20091116.log has been added with description: New conversion test on Franck
BZDATETIME::2009-11-18 15:32:03
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::153
(In reply to comment #152)
> Created an attachment (id=1818) [details]
> New conversion test on Franck
>
> (In reply to comment #151)
>
> > Please proceed with the next steps.
>
> This test was live for both halves. Please review the results
carefully. If
> they look OK Volker can run his publishing tests against them, and
I'll
> generate mailer history documents using them.
I did not see anything unusual in the results. We also looked at several of the newly created documents and did not find any problems we had not seen in our previous reviews. There were a few documents without emails. We will try to get the emails when it is time to publish the documents on Bach.
BZDATETIME::2009-12-04 12:12:35
BZCOMMENTOR::Bob Kline
BZCOMMENT::154
Waiting for results of the review CIAT is performing on the publishing tests (issue #4629).
BZDATETIME::2009-12-14 11:22:22
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::155
(In reply to comment #154)
> Waiting for results of the review CIAT is performing on the
publishing tests
> (issue #4629).
CIAT has finished testing on Franck in issue # 4659.
BZDATETIME::2009-12-14 11:23:22
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::156
(In reply to comment #155)
> (In reply to comment #154)
> > Waiting for results of the review CIAT is performing on the
publishing tests
> > (issue #4629).
>
> CIAT has finished testing on Franck in issue # 4659.
correction: It should be OCECDR-2954. Sorry!
BZDATETIME::2009-12-14 12:10:51
BZCOMMENTOR::Bob Kline
BZCOMMENT::157
(In reply to comment #156)
> > CIAT has finished testing on Franck in issue # 4659.
>
> correction: It should be OCECDR-2954. Sorry!
Next step (in that same issue) is for Volker to draft, get approval for, and send out a message to the licensees informing them that we will no longer be including Person documents in the weekly export, and asking them to let us know of any impact that might have on their operations.
BZDATETIME::2009-12-21 12:15:24
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::158
Margaret requested the creation of the new email address- GeneticsDirectory@cancer.gov which has now been created and tested. It is this new email address that should be on the mailer and all communications.
The application submission form on cancer.gov is currently sent to the Lockheed/Aspensys email address. It should rather be sent to this new email address-GeneticsDirectory@cancer.gov. I am adding this comment so that this change will be included with the other cancer.gov changes for the Genetics Directory.
BZDATETIME::2009-12-22 10:57:43
BZCOMMENTOR::Bob Kline
BZCOMMENT::159
(In reply to comment #158)
> Margaret requested the creation of the new email address-
> GeneticsDirectory@cancer.gov which has now been created and tested.
It is this
> new email address that should be on the mailer and all
communications.
>
> The application submission form on cancer.gov is currently sent to
the
> Lockheed/Aspensys email address. It should rather be sent to this
new email
> address-GeneticsDirectory@cancer.gov. I am adding this comment so
that this
> change will be included with the other cancer.gov changes for the
Genetics
> Directory.
Does this request have any implications for changes I need to make to the conversion software? Or is it just relevant for the publishing (vendor filter) and mailer issues?
BZDATETIME::2009-12-22 17:13:53
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::160
(In reply to comment #159)
> (In reply to comment #158)
> > Margaret requested the creation of the new email
address-
> > GeneticsDirectory@cancer.gov which has now been created and
tested. It is this
> > new email address that should be on the mailer and all
communications.
> >
> > The application submission form on cancer.gov is currently
sent to the
> > Lockheed/Aspensys email address. It should rather be sent to
this new email
> > address-GeneticsDirectory@cancer.gov. I am adding this comment
so that this
> > change will be included with the other cancer.gov changes for
the Genetics
> > Directory.
>
> Does this request have any implications for changes I need to make
to the
> conversion software? Or is it just relevant for the publishing
(vendor filter)
> and mailer issues?
I do not believe so. Sorry about the confusion. I should have included this in OCECDR-2955.
BZDATETIME::2009-12-31 14:31:25
BZCOMMENTOR::Bob Kline
BZCOMMENT::161
(In reply to comment #157)
> Next step (in that same issue) is for Volker to draft, get
approval for, and
> send out a message to the licensees informing them that we will no
longer be
> including Person documents in the weekly export, and asking them to
let us know
> of any impact that might have on their operations.
That notice has gone out, so the obstacle caused by publishing the Person documents separately has been removed.
As soon as we have finished testing for task 4725 (pre-population of external map table) on Franck, Volker will refresh Franck and we'll do another GP conversion and publication test. Then if all goes well with that test we'll want to sit down and talk about timing. Part of that discussion will focus on the need to correct the validation errors which CIAT needs to clean up on Bach after the conversion before we can have Volker plug in the new publishing control document.
BZDATETIME::2010-01-07 11:14:59
BZCOMMENTOR::Bob Kline
BZCOMMENT::162
Ready for Volker to test with the new publication code.
Attachment Request4522-20100107.log has been added with description: New conversion test on Franck
BZDATETIME::2010-01-14 13:29:22
BZCOMMENTOR::Bob Kline
BZCOMMENT::163
The mapping spreadsheet for the syndromes has an error (for familial carcinoid syndrome). It should be CDR0000654587 (instead of CDR618606). Bob will fix the sheet by hand before the next conversion run.
BZDATETIME::2010-01-14 14:29:26
BZCOMMENTOR::Bob Kline
BZCOMMENT::164
(In reply to comment #163)
> The mapping spreadsheet for the syndromes has an error (for
familial carcinoid
> syndrome). It should be CDR0000654587 (instead of CDR618606). Bob
will fix
> the sheet by hand before the next conversion run.
The mapping has been corrected.
BZDATETIME::2010-01-15 11:32:34
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::165
(In reply to comment #164)
> (In reply to comment #163)
> > The mapping spreadsheet for the syndromes has an error (for
familial carcinoid
> > syndrome). It should be CDR0000654587 (instead of CDR618606).
Bob will fix
> > the sheet by hand before the next conversion run.
>
> The mapping has been corrected.
Now the Syndrome and Type of Cancer columns in publish preview display correctly when 'Familial carcinoid syndrome' is selected in the person document.
BZDATETIME::2010-01-15 12:50:04
BZCOMMENTOR::Bob Kline
BZCOMMENT::166
(In reply to comment #165)
> (In reply to comment #164)
> > (In reply to comment #163)
> > > The mapping spreadsheet for the syndromes has an error
(for familial carcinoid
> > > syndrome). It should be CDR0000654587 (instead of
CDR618606). Bob will fix
> > > the sheet by hand before the next conversion run.
> >
> > The mapping has been corrected.
>
> Now the Syndrome and Type of Cancer columns in publish preview
display
> correctly when 'Familial carcinoid syndrome' is selected in the
person
> document.
Just want to make sure that no one thinks the condition described by the last comment is connected to the action recorded in the comment to which the last comment was posted as a reply: the correction of the mapping for this syndrome was in the spreadsheet used for conversion (not in the CDR documents), and won't have any effect until the next conversion is performed.
BZDATETIME::2010-01-21 13:08:54
BZCOMMENTOR::Bob Kline
BZCOMMENT::167
Margaret and William agreed that we are ready to convert the GP data on Bach. We'll hold off until next week, so that publication of Person documents is turned off before the conversion.
BZDATETIME::2010-01-27 10:58:27
BZCOMMENTOR::Bob Kline
BZCOMMENT::168
Please review the results carefully and let me know immediately if you encounter any unexpected problems. CIAT needs to correct the known data problems.
Attachment Request4522.log has been added with description: Log from conversion on production system
BZDATETIME::2010-02-03 21:29:06
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::169
(In reply to comment #168)
> Created an attachment (id=1851) [details]
> Log from conversion on production system
> Please review the results carefully and let me know immediately if
you
> encounter any unexpected problems. CIAT needs to correct the known
data
> problems.
We have reviewed over 40 documents so far and did not find any conversion problems. I am closing this issue at this point. Thank you!
File Name | Posted | User |
---|---|---|
CGDir History.doc | 2009-03-12 09:46:47 | Osei-Poku, William (NIH/NCI) [C] |
Diagram.mdb | 2009-03-12 09:44:31 | Osei-Poku, William (NIH/NCI) [C] |
ERROR LOGS FOR CIAT.txt | 2009-10-27 11:27:48 | Osei-Poku, William (NIH/NCI) [C] |
GPCancerSites-1.xls | 2009-08-19 19:35:15 | |
gp-conv.html | 2009-06-29 13:55:24 | |
gp-conv-20091103-franck.log | 2009-11-03 17:41:22 | |
GPLocations.xls | 2009-09-11 16:51:46 | |
GPLocations(1).xls | 2009-09-16 12:40:17 | Osei-Poku, William (NIH/NCI) [C] |
GPLocations-4.xls | 2009-08-14 17:17:07 | |
GPLocs.xls | 2009-09-02 10:23:35 | Osei-Poku, William (NIH/NCI) [C] |
GPLocs.xls | 2009-08-26 17:02:58 | Osei-Poku, William (NIH/NCI) [C] |
GPLocs.xls | 2009-08-06 11:32:53 | |
GPMatchCandidates.xls | 2009-09-30 16:01:11 | Osei-Poku, William (NIH/NCI) [C] |
GPMatchCandidates.xls | 2009-08-28 13:31:34 | |
gp-match-candidates.xls | 2009-08-26 11:56:51 | |
GPMatchCandidates(1).xls | 2009-09-03 10:01:30 | Osei-Poku, William (NIH/NCI) [C] |
gp-syndromes.xls | 2009-08-20 14:39:57 | |
gp-syndromes082109MBMEB.xls | 2009-09-10 11:08:49 | |
gp-syndromes082109MBMEB.xls | 2009-09-10 08:41:41 | |
Paper_Mailer.pdf | 2009-04-23 14:15:36 | Osei-Poku, William (NIH/NCI) [C] |
Request4522.log | 2010-01-27 10:58:27 | |
Request4522-20091112.log | 2009-11-13 08:22:30 | |
Request4522-20091116.log | 2009-11-16 13:34:42 | |
Request4522-20100107.log | 2010-01-07 11:14:59 | |
Request4522-extra.log | 2009-10-23 09:14:24 | |
Request4522-FranckTest.log | 2009-10-22 14:39:43 | |
Request4522-FrankTest2.log | 2009-10-22 17:50:07 | |
Screenshots (2).doc | 2009-03-12 09:38:01 | Osei-Poku, William (NIH/NCI) [C] |
UniqueGPLocations.txt | 2009-07-31 11:06:38 | |
UniqueGPLocations.xls | 2009-07-31 11:11:48 | |
Web_Mailer_Email.pdf | 2009-04-23 14:23:12 | Osei-Poku, William (NIH/NCI) [C] |
Web_Mailer_Form.pdf | 2009-04-23 14:20:06 | Osei-Poku, William (NIH/NCI) [C] |
Web Mailer_Submitted_Output_Email.pdf | 2009-04-23 14:25:03 | Osei-Poku, William (NIH/NCI) [C] |
Elapsed: 0:00:00.000636