CDR Tickets

Issue Number 3247
Summary [EBMS/CiteMS] Notes on a new system
Created 2010-10-15 00:16:43
Issue Type Bug
Submitted By alan
Assigned To alan
Status Closed
Resolved 2012-10-22 11:44:31
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.107575
Description

BZISSUE::4937
BZDATETIME::2010-10-15 00:16:43
BZCREATOR::Alan Meyer
BZASSIGNEE::Alan Meyer
BZQACONTACT::Margaret Beckwith

This Bugzilla issue exists as a central place to put notes,
ideas, analysis, wish lists, etc. regarding what should be done
if and when we build a new Citation Management System. Please
feel free to enter any notes or attach any documents that are
relevant.

For those of you who don't use Bugzilla regularly and aren't
comfortable with it, please email your notes or documents to me
(Alan) and I will post them here on your behalf.

Comment entered 2010-11-01 13:09:40 by Juthe, Robin (NIH/NCI) [E]

BZDATETIME::2010-11-01 13:09:40
BZCOMMENTOR::Robin Juthe
BZCOMMENT::1

A comment came up in a meeting today about a possible report in the new system. The onsite staff currently maintain a spreadsheet of the number of literature surveillance returns for each Board each month, in order to prepare total numbers to submit to Debbie for the CIAT monthly report. This information is all stored in the CiteMS, so it would be nice if there was a report to capture it in the future.

From what I understand, the numbers collected are specifically the number of "Yes" responses (meaning any response other than "no changes") and the number of "No" responses that are received from Board members for each Board each month. Presumably the report could have a date range so the user could run it for a single month for the monthly report or for a year for the annual report or any other period as needed.

Comment entered 2010-11-01 15:19:11 by Boggess, Cynthia (NIH/NCI) [C]

BZDATETIME::2010-11-01 15:19:11
BZCOMMENTOR::Cynthia Boggess
BZCOMMENT::2

This report is currently available in the CiteCMS. From the client welcome page under Reports and then Mailed Citation Report. And from the staff Admin Tool tab under mMiled Citations Report.
All responses except the null value "<not returned>" seem to be working. But it would be a good idea to use the spreadsheet that was mentioned below to look back at previous months/years to see it the data in the report matches whats in the spreadsheet.
Work on the null repsonse was discontinued years ago and was never completed.
Have a look at the report in the CiteMS and let me know if you have questions.

Comment entered 2010-11-01 15:36:27 by Juthe, Robin (NIH/NCI) [E]

BZDATETIME::2010-11-01 15:36:27
BZCOMMENTOR::Robin Juthe
BZCOMMENT::3

Thanks, Cynthia. I think the Mailed Citation Report tallies the number of citations on which a board member (or members) had comments. What I think Bonnie is interested in is the number of returns rather than the number of citations. In other words, 4 people could reply with comments on a single article but I think it would only display on the Mailed Citations Report once yet it would count as 4 returns in terms of her monthly report data. It would be best to talk with Bonnie though so I'll send her your comments and maybe you, she, and Judy Morris can touch base to see if this report would work for her or whether there's another solution for getting this information with the current CiteMs setup.

Comment entered 2010-11-01 15:53:11 by Boggess, Cynthia (NIH/NCI) [C]

BZDATETIME::2010-11-01 15:53:11
BZCOMMENTOR::Cynthia Boggess
BZCOMMENT::4

I think this is just a matter of redisplaying the data in the report. The report also allows you to select responses by board member which would be returns based on reviewer not based on total citations. I think we may just need to re-work the report a bit.

Comment entered 2010-11-04 22:57:58 by alan

BZDATETIME::2010-11-04 22:57:58
BZCOMMENTOR::Alan Meyer
BZCOMMENT::5

The attached Microsoft Word document contains some very high level notes I've made about issues and a possible future for the Citation Management System.

Comment entered 2010-11-04 22:57:58 by alan

Attachment CiteMS_Notes_Draft2.doc has been added with description: Brief notes on system issues and future development.

Comment entered 2010-11-16 20:49:18 by alan

BZDATETIME::2010-11-16 20:49:18
BZCOMMENTOR::Alan Meyer
BZCOMMENT::6

Based on the work done for Issue #4841, I've done some analysis
concerning how to handle journal records in any new system.

The attached document lists the ideas that occurred to me.

Comment entered 2010-11-16 20:49:18 by alan

Attachment Journal Management in a New CiteMS.doc has been added with description: Journal Management in a New CiteMS

Comment entered 2011-02-04 23:08:53 by alan

BZDATETIME::2011-02-04 23:08:53
BZCOMMENTOR::Alan Meyer
BZCOMMENT::7

I've been making lots of notes.

At some point I'd like to sit down with a current CiteMS user
(perhaps Robin or one of the other board managers) and work
through the entire system, seeing the operations done, looking at
outputs and reports, and asking questions. It looks like we can
do this in the test database, making any changes we want, without
any board members or managers receiving email from the system or
being otherwise affected.

I'm thinking that a good way to do this is to spend an hour or
two until one or the other of us is worn out or has to do other
work. Then I'll go off and organize what I've learned and later
come back for more, maybe the next time I'm in or the next week.

What I'm planning to produce is a first rough draft of a
requirements outline.

Margaret pointed out in the last CDR status meeting that some of
the functions currently done by the system may no longer be
needed while there are new functions that will be needed. I'm
thinking that a reasonable way to document all this is to do
something like the following:

Produce an outline of all requirements for a new system,
organized in a functional hierarchy.

Tag all identified functions as:

"old" - the existing system does it.

"new" - the existing system doesn't do it.

"modified" = the existing system does it but we now want
to do it a significantly different way.

I think I should also try to list all of the things that the
existing system does that need not be done by the new one. If we
fail to document those, someone reading the requirements document
might not notice that one of the functions they depend on was not
included in the requirements for a new system.

Once we've got a rough draft, we can show it to everyone and get
additions, comments and criticisms to develop a requirements
specification followed by a systems design.

Comment entered 2011-02-07 10:23:54 by Boggess, Cynthia (NIH/NCI) [C]

BZDATETIME::2011-02-07 10:23:54
BZCOMMENTOR::Cynthia Boggess
BZCOMMENT::8

I think this is a good idea, but please keep in mind that there are 3 distinct types of CiteMS users each of which uses selected relevant features (some of which are user restricted) to do their specific tasks.
I am available for questions as you need.

Comment entered 2011-02-07 10:31:09 by Juthe, Robin (NIH/NCI) [E]

BZDATETIME::2011-02-07 10:31:09
BZCOMMENTOR::Robin Juthe
BZCOMMENT::9

I would be happy to meet to talk about the use of the CiteMS from the Board Manager perspective, and to some extent, the onsite perspective. I agree with Cynthia though - it will be necessary to get input from each angle - Cynthia/Minaxi, Bonnie, and the Board Managers.

Comment entered 2011-02-25 01:43:33 by alan

BZDATETIME::2011-02-25 01:43:33
BZCOMMENTOR::Alan Meyer
BZCOMMENT::10

I have attached a first draft outline in HTML format of
requirements for a new Citation Management System.

I would like to have spent more time on this and done a better
job before submitting it to users. However it has probably
reached a stage where "stakeholders" (as we now call them) might
start to get nervous about what I'm doing with all of this and
want to see where I'm heading and whether it makes any sense.

There are many problems with this draft - errors, omissions,
half-baked ideas and inconsistencies. There are places where I
put some ideas in one section of the outline, then addressed them
again in another part - not necessarily the same way. With more
time I would sort these out and produce a more polished draft.
There are some topics that I've thought out more deeply than
others and some that I've hardly thought about at all that will
need to be added or substantially beefed up in the next draft.

However I will be away for two weeks and don't want to delay
things too long.

Much of what I have written is tentative. There is language
like, "The system might ..." rather than "The system shall ...".
This language is typically used where the ideas depart from
existing practice in the old CiteMS. Such tentative language
should be tightened up in a later draft.

I would appreciate any comments or criticisms that anyone can
offer. Users responding to the document should probably
concentrate on the biggest issues first - the biggest errors,
biggest omissions, best and worst new ideas, etc.

I'll be out of town from February 28 to March 12. I may have
little or no access to email during that time. If anyone is able
to respond to the document I will read the responses when I get
back.

Please do not hesitate to express criticism. No offense will be
taken. We'll all work together to make this the best system that
we can.

Comment entered 2011-02-25 01:43:33 by alan

Attachment newCiteMSNotes.html has been added with description: New CiteMS requirements outline - Draft 1

Comment entered 2011-03-10 17:33:50 by Boggess, Cynthia (NIH/NCI) [C]

BZDATETIME::2011-03-10 17:33:50
BZCOMMENTOR::Cynthia Boggess
BZCOMMENT::11

Alan,
William, Minaxi and I have reviewed your CiteMS requirements outline - draft one. We met today to start discussing our comments and plan to meet again on Monday. We would like to set up a meeting with you to discuss our comments. We think it would be better to have a discussion so that you can make notes in your own words rather than us trying to add our comments to your document.
Cynthia

Comment entered 2011-03-12 16:22:52 by alan

BZDATETIME::2011-03-12 16:22:52
BZCOMMENTOR::Alan Meyer
BZCOMMENT::12

(In reply to comment #11)
> Alan,
> William, Minaxi and I have reviewed your CiteMS requirements outline - draft
> one. We met today to start discussing our comments and plan to meet again on
> Monday. We would like to set up a meeting with you to discuss our comments. We
> think it would be better to have a discussion so that you can make notes in
> your own words rather than us trying to add our comments to your document.
> Cynthia

I'm back in town.

I'll be available any time Monday, Tuesday or Thursday (March 14, 15 or 16)
at my office at NCI. If Wednesday or Friday is better I can talk from home.

I look forward to hearing your comments and discussing the system.

Comment entered 2011-03-14 11:57:43 by alan

BZDATETIME::2011-03-14 11:57:43
BZCOMMENTOR::Alan Meyer
BZCOMMENT::13

Shall we go ahead and setup a meeting for this?

Can it be done via a telephone conference call?

Does ZTech have a conference call system setup? Or should I find out how to do it at NCI?

Comment entered 2011-03-14 22:43:20 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-03-14 22:43:20
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::14

(In reply to comment #13)
> Shall we go ahead and setup a meeting for this?
>
> Can it be done via a telephone conference call?
>
> Does ZTech have a conference call system setup? Or should I find out how to do
> it at NCI?

Please check your email. I just sent the invite for the teleconference. Let me know if the time and date is not good for you (Tuesday 2-4).

Comment entered 2011-03-14 22:55:29 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-03-14 22:55:29
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::15

(In reply to comment #14)
> (In reply to comment #13)
> Please check your email. I just sent the invite for the teleconference. Let me
> know if the time and date is not good for you (Tuesday 2-4).
^^^^
or
self-correction.
(If 2:00PM is not good for you, we can start from 12:00 or 1:00PM tomorrow. On the other hand we can have the teleconference on Wednesday)

Comment entered 2011-03-14 23:09:41 by alan

BZDATETIME::2011-03-14 23:09:41
BZCOMMENTOR::Alan Meyer
BZCOMMENT::16

(In reply to comment #15)

> (If 2:00PM is not good for you, we can start from 12:00 or 1:00PM tomorrow. On
> the other hand we can have the teleconference on Wednesday)

2-4 pm sounds good to me. There is another meeting going on from 2-3 pm, but
the original plan was for Volker and Bob to handle that one so, unless Volker or Bob think they need me for that, I'll stick with that plan and join the CiteMS meeting at 2 pm.

Talk to you then.

Comment entered 2011-03-16 00:33:30 by alan

BZDATETIME::2011-03-16 00:33:30
BZCOMMENTOR::Alan Meyer
BZCOMMENT::17

Cynthia, William and I had a telephone conference today to
discuss the first draft of the Citation Management System
requirements. We discussed the "Data" section of the
requirements outline. We plan to discuss the "Functions" section
this coming Thursday, March 17 by telephone.

After the conference I prepared the revised draft which is
attached. The revisions are:

  • Cleaned up numerous typos and garbled sentences.

No doubt I also created new ones.

  • Firmed up some tentative language.

A number of topics were treated tentatively or with more
questions than answers. Where Cynthia supplied answers I was
able to remove some questions and firm up the language.

Although some questions have been answered and removed as
questions, I've also got some new ones.

  • Reorganized some material.

I moved some ideas from one section to another where that
seemed to make things clearer.

  • Added new material.

Cynthia identified a number of areas where my knowledge of
the existing system and processes was too limited. I was able
to add some new material to incorporate the ideas she
supplied.

Most of the changes are in the Data section of the outline but a
number are in the "Functions" and "Design Notes" sections. In
most cases they are changes in functionality that were occasioned
by my better understanding of the data after our conference.

Although the changes in Draft 2 reflect input from Cynthia, I
can't be sure that I've captured all of her ideas faithfully. We
crammed a lot of discussion into a short period and I may have
misunderstood some things. But I believe that we're getting
closer to our goal.

At some point I'll try to format the outline better using one of
Bob's elegant CSS stylesheets for outlines.

Comment entered 2011-03-16 00:33:30 by alan

Attachment newCiteMSNotes.html has been added with description: New CiteMS requirements outline - Draft 2

Comment entered 2011-03-16 13:56:36 by Boggess, Cynthia (NIH/NCI) [C]

BZDATETIME::2011-03-16 13:56:36
BZCOMMENTOR::Cynthia Boggess
BZCOMMENT::18

I spoke to Minaxi this morning regarding the question you asked about updating pre-medline citations imported into the CiteMS. She confirmed that this is an update of the original record with the complete medline citation data and not a replacement.

Comment entered 2011-03-18 01:02:22 by alan

BZDATETIME::2011-03-18 01:02:22
BZCOMMENTOR::Alan Meyer
BZCOMMENT::19

Cynthia, William and I had another telephone conference today.
This draft incorporates my notes from the meeting.

We only went through a few of the entries in the Functions
section, so I'm giving this draft the interim number 2.5 instead
of 3. We'll talk again on Monday and try to get through more of
them.

The main changes are:

Two new objects appear in the Data section:

Status values.

Tags.

A new top level function appears in the Functions section:

Initial ("pre-publishing") review

A number of other sections have been edited. A new entry has
appeared in the "Some user interface concepts" section under
"Design Notes". It is:

Dynamic HTML

Stay tuned for more.

Comment entered 2011-03-18 01:02:22 by alan

Attachment newCiteMSNotes.html has been added with description: New CiteMS requirements outline - Draft 2.5

Comment entered 2011-03-21 18:23:55 by alan

BZDATETIME::2011-03-21 18:23:55
BZCOMMENTOR::Alan Meyer
BZCOMMENT::20

This draft is a strawman. It is FAR from complete. However
I've decided to post it as a kind of advance notice of how I'm
thinking about the design of the new system. I think it might be
of special relevance to EBMS designers who might share some data
with it. However some of the most likely shared tables are not
yet defined in this document.

Besides being incomplete, it contains a number of questions and
second guesses, marked with an "XXX" prefix.

The format is a MySQL SQL script that will actually create the
database. Small changes are required if we use SQL Server or
another database management system instead.

Comment entered 2011-03-21 18:23:55 by alan

Attachment createdb.sql has been added with description: Database definitions for new CiteMS - Draft 1

Comment entered 2011-03-21 18:35:14 by alan

BZDATETIME::2011-03-21 18:35:14
BZCOMMENTOR::Alan Meyer
BZCOMMENT::21

Cynthia, Minaxi, William and I met this morning for a long
telephone session. We completed the initial review of the
requirements outline.

I worked on some preliminary database definitions today and have
not yet prepared another draft of the outline, however I will be
working on that this week.

If possible, to assist in completing the requirements outline, I
would like to meet with Bonnie and see what she is doing with her
part of the CiteMS. Perhaps that can be setup for tomorrow or
Thursday? I'll try to incorporate any new knowledge gained in
the next draft.

Comment entered 2011-04-01 15:21:24 by Boggess, Cynthia (NIH/NCI) [C]

BZDATETIME::2011-04-01 15:21:24
BZCOMMENTOR::Cynthia Boggess
BZCOMMENT::22

I am posting our requirements document for importing, reviewing, and publishing. I think this document will be a good spring board for further discussion. Please let me know when you are available and we can set up a meeting with Minaxi and William to discuss this document.

Comment entered 2011-04-01 15:21:24 by Boggess, Cynthia (NIH/NCI) [C]

Attachment Importing&Publishing_Req_Doc.doc has been added with description: Requirements Document - Importing & Publishing

Comment entered 2011-04-02 21:58:29 by alan

BZDATETIME::2011-04-02 21:58:29
BZCOMMENTOR::Alan Meyer
BZCOMMENT::23

(In reply to comment #22)

> ...
> I am posting our requirements document for importing
> ...

I've read the document. It's excellent. There are a number of good
ideas there that had not surfaced before as well as a number of
refinements of ideas that we already discussed.

The best times for discussion for me would be Tuesday at 10 am
or 1:30 pm, or Thursday at 10 am or 3 pm.

Comment entered 2011-04-04 09:04:47 by Boggess, Cynthia (NIH/NCI) [C]

BZDATETIME::2011-04-04 09:04:47
BZCOMMENTOR::Cynthia Boggess
BZCOMMENT::24

(In reply to comment #23)
> (In reply to comment #22)
> > ...
> > I am posting our requirements document for importing
> > ...
> I've read the document. It's excellent. There are a number of good
> ideas there that had not surfaced before as well as a number of
> refinements of ideas that we already discussed.
> The best times for discussion for me would be Tuesday at 10 am
> or 1:30 pm, or Thursday at 10 am or 3 pm.

I vote for Tuesday at 1:30pm.

Comment entered 2011-04-04 10:06:23 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2011-04-04 10:06:23
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::25

(In reply to comment #24)
> (In reply to comment #23)
> > (In reply to comment #22)
> > > ...
> > > I am posting our requirements document for importing
> > > ...
> > I've read the document. It's excellent. There are a number of good
> > ideas there that had not surfaced before as well as a number of
> > refinements of ideas that we already discussed.
> > The best times for discussion for me would be Tuesday at 10 am
> > or 1:30 pm, or Thursday at 10 am or 3 pm.
> I vote for Tuesday at 1:30pm.

Tuesday 1:30 PM works for me too. We will use same conference number.

Comment entered 2011-06-06 20:59:50 by alan

BZDATETIME::2011-06-06 20:59:50
BZCOMMENTOR::Alan Meyer
BZCOMMENT::26

I'm reviewing my notes and attempting to put them in shape. I
have a question for Cynthia and Minaxi.

When we execute a search for a summary topic we sometimes pull in
a citation that was also pulled in for previous search. Normally
this would happen in the same review cycle, but could appear in a
different review cycle if it was a special search, not limited to
the current month's publications.

Here is my reading of what we do in each of the possible cases:

1. Same summary topic, same or different review cycle.

Case:

A special search retrieves a citation for a topic. The
citation has already been imported in a previous review
cycle, or perhaps the search was re-executed in the same
review but with changes in the search criteria and many
of the same citations are retrieved.

Disposition:

The citation is not re-imported. No action is taken for
the citation. It appears in the list and count of
"Duplicates not imported."

2. Different summary topic, same or different editorial board,
same review cycle.

Case:

The same citation appears in two searches for two
different topics. This happens in the same review cycle.

Disposition:

The citation is not re-imported. Another summary topic
is assigned to the citation. The citation is already in
the current review cycle and need not be added again. It
appears in the list and count of "Citations already in
the database, summary topic only added."

3. Different summary topic, same editorial board, different
review cycle.

Case:

The same citation appears in two searches for two
different topics managed by the same or a different
board, in two different review cycles. This should only
happen when the citation appeared as a result of a
"special" search that looks back retrospectively.

This is the case I'm having trouble with. I know that
we've discussed this issue, but I'm not sure what our
conclusion was.

Disposition:

The citation is not re-imported.

Questions:

Do we do anything else?

Assign the new summary topic?

Add it to the current review cycle?

Report it in the "... summary topic only" category?

Treat it differently based on whether the same board
has already seen this article?

Treat it differently based on whether the article
previously accepted as relevant or rejected?

It would seem to be a mistake to assign it to a summary
topic to it if it doesn't also appear in a new review
cycle. Otherwise it won't automatically be re-reviewed
for the new topic.

Thanks.

Comment entered 2011-06-07 12:57:17 by Boggess, Cynthia (NIH/NCI) [C]

BZDATETIME::2011-06-07 12:57:17
BZCOMMENTOR::Cynthia Boggess
BZCOMMENT::27

#1 and #2 are correct.
#3…bottom line…treat as a “new” citation. Assign (aka - add to existing citation record) current review cycle and new summary topic. Report as summary topic added. Maybe we need to have “Summary topic only added” as well as “Summary topic and review cycle added”. Citation would then be reviewed and published as a “new” citation. Note: The current CiteMS treats such citations as “new” and they are reported as “summary topic only added”.
These citations would be reviewed by ICF staff before publishing and rejected or accepted as usual. Since these citations have been previously reviewed for a different topic, we want to let them be reviewed again but let the reviewer know in some way that they have already reviewed this citation for topic X and what decision they made at that time.
These citations are not always the result of a “special” search. Upon import (as we have discussed previously) we need to be able to flag batches or single citations to identify them as regular monthly searches, fast track or retro/special search request, etc. This flag would then indicate how we treated the citation further.
This is a bit tricky and may be easier to explain over the phone. Minaxi and I are available if you would like to discuss this. Just let us know.

Comment entered 2011-06-08 00:11:45 by alan

BZDATETIME::2011-06-08 00:11:45
BZCOMMENTOR::Alan Meyer
BZCOMMENT::28

(In reply to comment #27)

I think I understand all of the notes in comment #27. I've updated the requirements specification accordingly and we can review it there to be sure I've got it right.

If no new CDR tasks intervene, I'm going to make a mighty effort to finish draft 3 by the end of next week. I've re-written a lot of the import section in light of the comments in the 3-29-2011 "Importing, Publishing & Admin Tool Requirements Document" attached in comment #22. However there's still a lot to revise.

When the document is ready, the next steps might be to:

Review it with all concerned parties, including board managers.

Fix errors.

Review it with respect to EBMS integration.

Prepare a high level design.

Prepare an implementation plan - what to implement first, what
is highest priority, etc.

Begin coding.

Comment entered 2011-06-17 00:17:29 by alan

BZDATETIME::2011-06-17 00:17:29
BZCOMMENTOR::Alan Meyer
BZCOMMENT::29

I'm happy with a lot of the requirement outline, but there are still some sections with which I'm not satisfied.

I'm going to put off distribution for another day and aim for next Tuesday.

Comment entered 2011-06-22 00:29:35 by alan

BZDATETIME::2011-06-22 00:29:35
BZCOMMENTOR::Alan Meyer
BZCOMMENT::30

Here's the new draft. I took out some things that I didn't think really belonged there, so it's not larger than the old one.

Every time I read it I see something else that I would like to change, but I've reached the point where getting it out the door is more important than getting it perfect - an impossible goal anyway. So here it is.

I've linked it to Bob's outline stylesheet so it looks nicer than the default output from the outliner.

Comment entered 2011-06-22 00:29:35 by alan

Attachment newCiteMSNotes3.html has been added with description: New CiteMS requirements outline - Draft 3.0

Comment entered 2011-07-06 17:03:50 by Boggess, Cynthia (NIH/NCI) [C]

BZDATETIME::2011-07-06 17:03:50
BZCOMMENTOR::Cynthia Boggess
BZCOMMENT::31

Minaxi and I have reviewed and discussed your new Draft 3.0. We have a few corrections, comments and questions. Would you like us to post them to bugzilla or discuss them with you?

Comment entered 2011-07-06 17:43:04 by alan

BZDATETIME::2011-07-06 17:43:04
BZCOMMENTOR::Alan Meyer
BZCOMMENT::32

(In reply to comment #31)
> Minaxi and I have reviewed and discussed your new Draft 3.0. We have a few
> corrections, comments and questions. Would you like us to post them to bugzilla
> or discuss them with you?

You can do either one if you like, or both. Posting them in the bugzilla issue has the advantage of making them available to everyone else as well, which I think is a good thing.

Comment entered 2011-07-07 16:55:29 by Boggess, Cynthia (NIH/NCI) [C]

BZDATETIME::2011-07-07 16:55:29
BZCOMMENTOR::Cynthia Boggess
BZCOMMENT::33

Minax and I combined our comments into one document. Please review and let me know if you have questions or wish to discuss.

Comment entered 2011-07-07 16:55:29 by Boggess, Cynthia (NIH/NCI) [C]

Attachment New Citation Management System Requirements and Notes_7-7-2011_CB_MT_Notes.doc has been added with description: Notes on Draft 3.0 from Cynthia & Minaxi

Comment entered 2011-07-12 13:53:17 by alan

BZDATETIME::2011-07-12 13:53:17
BZCOMMENTOR::Alan Meyer
BZCOMMENT::34

(In reply to comment #33)
> Created attachment 2133 [details]
> Notes on Draft 3.0 from Cynthia & Minaxi
>
> Minaxi and I combined our comments into one document. Please review and let me
> know if you have questions or wish to discuss.

I've gone through all of the comments, corrections and questions and think I understand them. I didn't see anything problematic.

Rather than respond to the notes separately, I'm going to update the draft document to fix the mistakes in accordance with the corrections in the notes, resolve issues where questions were raised, and replace the questions I raised in my document with the answers as given by Cynthia and Minaxi.

When I'm finished I'll upload Draft 3.1 incorporating the changes.

Comment entered 2011-07-13 21:57:07 by alan

BZDATETIME::2011-07-13 21:57:07
BZCOMMENTOR::Alan Meyer
BZCOMMENT::35

The attached draft 3.1 has only minor changes to incorporate comments and corrections from Cynthia's and Minaxi's notes. In some cases I removed a "Question:" that I had asked and replaced it with the a statement based on their answer. In some cases I made changes to the wording. In some I modified proposed functions or added new ones to accommodate their ideas. Most of the changes are in the import sections of the document.

Following a suggestion from Bob, I collapsed the table of contents to a more reasonable, two level representation.

If anyone wants to know whether an entry of interest has changed, the easiest way to find out is probably to look at Cynthia's and Minaxi's notes and see if their red highlighted text bears upon the entry of interest. If so, it is then easy to find the section of the document text.

Comment entered 2011-07-13 21:57:07 by alan

Attachment newCiteMSNotes3.1.html has been added with description: New CiteMS requirements outline - Draft 3.1

Comment entered 2011-07-19 11:53:15 by Boggess, Cynthia (NIH/NCI) [C]

BZDATETIME::2011-07-19 11:53:15
BZCOMMENTOR::Cynthia Boggess
BZCOMMENT::36

Minaxi and I have reviewed the new draft 3.1 and everything looks good. There is only one question that we are a bit confused about. Question: is status also summary topic specific? You say no and we say yes. Maybe your thinking is different from ours on this but we may need to clarify.

Comment entered 2011-07-19 13:40:36 by alan

BZDATETIME::2011-07-19 13:40:36
BZCOMMENTOR::Alan Meyer
BZCOMMENT::37

(In reply to comment #36)
> Minaxi and I have reviewed the new draft 3.1 and everything looks good. There
> is only one question that we are a bit confused about. Question: is status also
> summary topic specific? You say no and we say yes. Maybe your thinking is
> different from ours on this but we may need to clarify.

I agree that it should be summary topic specific. My failure to fix that in the document is an oversight on my part. I meant to correct that, but I must have forgotten, or else maybe I corrected it in one place but not another.

I see that item 2.13 has it wrong.

I'll fix it.

Comment entered 2011-07-19 16:48:50 by alan

BZDATETIME::2011-07-19 16:48:50
BZCOMMENTOR::Alan Meyer
BZCOMMENT::38

This draft 3.1.1 contains corrections as per comment #36 posted by Cynthia.

The only changes are in section 2.13 on status, fixing some places I had overlooked when noting that citation status is summary topic specific.

Comment entered 2011-07-19 16:48:50 by alan

Attachment newCiteMSNotes3.1.1.html has been added with description: New CiteMS requirements outline - Draft 3.1.1

Comment entered 2011-08-10 00:30:48 by alan

BZDATETIME::2011-08-10 00:30:48
BZCOMMENTOR::Alan Meyer
BZCOMMENT::39

After discussing some design issues with Bob, and following his
suggestions, I did the following:

1. Wrote a program to download CiteMS citations from PubMed.

The program selects PMIDs from the existing CiteMS database,
then fetches the corresponding citation records in XML from
PubMed and stores them in a file.

2. Wrote a program to scan the downloaded file for non-ASCII
characters.

The program parsed the XML, examined every character in the
text content, and categorized records as containing no
extended characters, extended characters that fit in 8 bits
(e.g., Latin-1), and extended characters that don't fit in 8
bits and are only available in Unicode, not Latin-1.

I ran the two programs.

The first one timed out after downloading 2,000 records. I may
need a longer wait time, or it may be that PubMed is designed to
cutoff programs that consume too many resources in too short a
time. I'll probably need to do some experimentation and tuning.

I ran the second program on the 2,000 record set. The results
were as follows:

Records requiring 16 bit characters = 398.

Records requiring 8 bit chars, but not 16 = 716

All the rest (plain ASCII) = 886

I looked at a few of the records requiring 16 bit chars, for
example:

PMID:

21741692
21765105
20979021

The extended 16 bit characters I saw included >= (as one
character), and Greek alpha and beta characters.

These all display properly in the PubMed HTML and are represented
correctly in the XML. However they are not properly represented
in the Medline Print format. Other characters are substituted
for them that don't always convey the meaning, at least to a
relatively inexperienced PubMed user such as myself.

In the existing system, PubMed IDs are hyperlinked and users can
click the links to see the record in the full and accurate PubMed
character set, but they can't see it the fully correct data in
the Citation Management System displays.

Given that at least 20% of the records are not displaying
perfectly in Medline Print format, I am inclined to import XML
into the system and store the data in Unicode.

The CIAT staff that search for data now are accustomed to seeing
search results in Medline Print format and then feeding that to
the Import Utility. I think we should either provide style
sheets to enable them to look at the XML in a pretty format, or
else allow them to import the Medline Print format but, when each
record is read in, have the system go to PubMed to get the XML.

Comment entered 2011-08-11 23:38:40 by alan

BZDATETIME::2011-08-11 23:38:40
BZCOMMENTOR::Alan Meyer
BZCOMMENT::40

I sent an email to NLM to find out if they have any guidelines for how many records should be downloaded in what way and over what period of time in order to be "well behaved".

In the meantime, as an experiment, I followed another suggestion from Bob and modified my program to retry (up to 5 times) after a timeout. I then ran it on a sample of over 50,000 records and downloaded all of them with no problem. I will still try to follow any reasonable guidelines but, in the meantime, I have strong evidence that we can download whatever we need, for example for bulk replacement of the entire original CMS database with new XML versions.

Comment entered 2011-08-22 17:01:39 by alan

BZDATETIME::2011-08-22 17:01:39
BZCOMMENTOR::Alan Meyer
BZCOMMENT::41

I found some particularly useful NLM documents for system design that others might also be interested in. They are:

NLM XML Element Descriptions and their Attributes:
http://www.nlm.nih.gov/bsd/licensee/elements_descriptions.html

Extended characters used in PubMed:
http://www.ncbi.nlm.nih.gov/sites/pubmedutils/charmap

Comment entered 2011-08-30 10:14:38 by alan

BZDATETIME::2011-08-30 10:14:38
BZCOMMENTOR::Alan Meyer
BZCOMMENT::42

(In reply to comment #40)
> I sent an email to NLM to find out if they have any guidelines for how many
> records should be downloaded in what way and over what period of time in order
> to be "well behaved".

I got a response from NLM which is attached.

It looks like what I want to do is to put a large number of Pubmed IDs in a single request and send them in one batch, then wait a second before sending more. 500 in a batch was a suggested number, up to 10,000 is legal.

That should give us plenty of leeway to do what we need for citation management.

Comment entered 2011-08-30 10:14:38 by alan

Attachment Re Guidelines for proper submission of bulk requests to the Entrez API.eml has been added with description: Guidelines for sending computer to computer requests to NLM

Comment entered 2011-09-08 23:41:11 by alan

BZDATETIME::2011-09-08 23:41:11
BZCOMMENTOR::Alan Meyer
BZCOMMENT::43

Here are some more notes on the new system.

The data proffered by NLM in XML format has some differences from
the Medline Print format. Each of these requires some decisions
on our part of how to handle them in the new CiteMS.

Here are some of the differences I've noticed so far:

1. Unicode character set.

The subset of Unicode used by NLM includes the European
language accented characters plus a number of scientific
symbols, including symbols like +/-, >=, Greek characters,
super and subscripts, etc. You can see the differences
between the Unicode and plain ASCII versions by calling up a
citation on Pubmed. It shows the Unicode symbols by default.
If you switch to the MEDLINE display setting for a record that
has these characters you'll see how they've been transformed.

I think that we can store and display Unicode with no problem,
but keyboard input is an issue. We don't input records with
the keyboard, but we do enter search strings from the keyboard
and need to make some decisions about how to handle searches
for names, journal titles, etc., that have non-ASCII
characters.

Two approaches are:

a) Require users to know how to enter special characters from
their keyboards, or else use wild cards.

For example: assume that the name "Cortazar" has an accent
over the first 'a'.

The user either enters the accented a, or types "Cort?zar",
where the question mark stands for any character in that
spot.

b) Create ASCII indexes.

We can create an index with the accented 'a' replaced by a
plain ASCII 'a', mapping all of the accented characters to
non-accented versions.

I'm inclined to think that we should start with approach a)
and only go to b) if there is a need.

2. Abstracts.

The Medline format for abstracts is simpler than the one used
in Pubmed. The entire Medline abstract is put into a single
paragraph, possibly with embedded headers like "RESULTS: ",
"CONCLUSIONS: ", etc. The more modern XML and Pubmed formats
allow separate paragraphs.

I've written a routine that correctly (I think) creates a
Medline format abstract from the XML, but I don't know if we
want it or not.

This may not be a significant issue.

3. SO field.

Every Medline record has an SO (source) field that contains a
citation to the article (title abbreviation, volume, issue,
pages, etc.) in a single string. This is used quite a bit in
the existing CiteMS. To my surprise, I found that the XML
doesn't have that field.

NLM publishes the algorithm for creating SO style citations
from the XML data and I've implemented it, so if we want this
style of citation, it's now there. But maybe we don't want
it.

Comment entered 2011-09-27 23:39:17 by alan

BZDATETIME::2011-09-27 23:39:17
BZCOMMENTOR::Alan Meyer
BZCOMMENT::44

Pardon the length of this posting but the devil is in the details
here and I thought it best to get the details out.

Cynthia, Minaxi, and Bob may be the only people that need to
really weigh in on this one.

I've given a lot of thought to the problem of representing
journal title information in the new CiteMS database. I've
worked out about four different schemes for doing it, including
some with a table of journal titles, some with a separate table
of journals and a separate table of titles (allowing one journal
with one NLM Unique ID to change titles over time), and a simple
one with no journal table at all.

I'm now inclining toward the simplest solution, as follows:

Store the following 3 columns in EVERY citation record:

NLM unique journal ID (NlmUniqueID)
Full title (Title)
Medline abbreviation (MedlineTA)

Do not store:

ISSN (the MedlineTA is less ambiguous)
ISO Title Abbreviation (a few records don't have them)

(However we can get these if we need them because I'll
also store the full imported XML and we can re-parse it
if we have to.)

Do not keep any separate title list.

Here's what we would do to solve various problems:

1. Search by title

We just select citation records with that title, or portion
thereof.

This might get more than one actual journal, but that's okay.
The user specified a title and that's what we find for him.

The same would apply to searches by short titles.

2. Find all articles from one journal.

User enters one of:

NLM unique journal ID
Specific citation (e.g., by its Pubmed ID)
Title

The first two of these work easily. The title search
might sometimes require that the user go through an
intermediate disambiguation step to get to the NLM Unique ID
before finding all of the articles.

3. NOT lists.

We would store NLM unique journal IDs in the NOT lists, not
titles. We would block all citations with the specified ID,
not all with the same title.

To display a NOT list:
For each unique journal ID on the list:
Display the ID
Display the latest title as far as we can figure it
out (not a straightforward process) or maybe
display all the titles associated with that ID.

The title displays can be messy but using the unique ID
produces a more accurate NOT list than if we put in titles.
It's safer than the current method.

The advantages of the approach, as I see them, are:

+ We have the title of the journal for every article exactly as
it was when it was published.

+ We do NO maintenance of title files. NLM does it all. We
simply use their unique journal ID.

+ I think the total implementation is more straightforward
and robust. I can't be certain of that but I think it's the
case.

The disadvantages are:

  • If a journal article record ever comes through with multiple
    titles in the record, we could have a problem.

I believe this is illegal now. If it ever happens in the
future I'm hoping that NLM will create some separate field
for title added entries (as in MARC) that we can ignore, or
that we can just use the first title and ignore any others.

  • If we cite journals that aren't in Pubmed, we need special
    provisions for handling them.

I believe we can handle the problem and I'm not certain that
it's significantly harder than if we have a separate title
file. We just need to think about it up front so that we
don't do something that makes it hard when the times comes to
implement this.

? More disk storage is required.

But I estimate it's no more than about 60 MB to cover the
next 10 years - trivial in today's world of disk storage.
It's really not a significant disadvantage.

I'm going to proceed on the assumption that the above plan is the
right one unless someone figure out that it's wrong.

Comment entered 2011-09-28 09:55:54 by priced

BZDATETIME::2011-09-28 09:55:54
BZCOMMENTOR::Minaxi Trivedi
BZCOMMENT::45

I fully agree with the plan.
Minaxi
(In reply to comment #44)
> Pardon the length of this posting but the devil is in the details
> here and I thought it best to get the details out.
>
> Cynthia, Minaxi, and Bob may be the only people that need to
> really weigh in on this one.
>
>
> I've given a lot of thought to the problem of representing
> journal title information in the new CiteMS database. I've
> worked out about four different schemes for doing it, including
> some with a table of journal titles, some with a separate table
> of journals and a separate table of titles (allowing one journal
> with one NLM Unique ID to change titles over time), and a simple
> one with no journal table at all.
>
> I'm now inclining toward the simplest solution, as follows:
>
> Store the following 3 columns in EVERY citation record:
>
> NLM unique journal ID (NlmUniqueID)
> Full title (Title)
> Medline abbreviation (MedlineTA)
>
> Do not store:
>
> ISSN (the MedlineTA is less ambiguous)
> ISO Title Abbreviation (a few records don't have them)
>
> (However we can get these if we need them because I'll
> also store the full imported XML and we can re-parse it
> if we have to.)
>
> Do not keep any separate title list.
>
> Here's what we would do to solve various problems:
>
> 1. Search by title
>
> We just select citation records with that title, or portion
> thereof.
>
> This might get more than one actual journal, but that's okay.
> The user specified a title and that's what we find for him.
>
> The same would apply to searches by short titles.
>
> 2. Find all articles from one journal.
>
> User enters one of:
>
> NLM unique journal ID
> Specific citation (e.g., by its Pubmed ID)
> Title
>
> The first two of these work easily. The title search
> might sometimes require that the user go through an
> intermediate disambiguation step to get to the NLM Unique ID
> before finding all of the articles.
>
> 3. NOT lists.
>
> We would store NLM unique journal IDs in the NOT lists, not
> titles. We would block all citations with the specified ID,
> not all with the same title.
>
> To display a NOT list:
> For each unique journal ID on the list:
> Display the ID
> Display the latest title as far as we can figure it
> out (not a straightforward process) or maybe
> display all the titles associated with that ID.
>
> The title displays can be messy but using the unique ID
> produces a more accurate NOT list than if we put in titles.
> It's safer than the current method.
>
> The advantages of the approach, as I see them, are:
>
> + We have the title of the journal for every article exactly as
> it was when it was published.
>
> + We do NO maintenance of title files. NLM does it all. We
> simply use their unique journal ID.
>
> + I think the total implementation is more straightforward
> and robust. I can't be certain of that but I think it's the
> case.
>
> The disadvantages are:
>
> - If a journal article record ever comes through with multiple
> titles in the record, we could have a problem.
>
> I believe this is illegal now. If it ever happens in the
> future I'm hoping that NLM will create some separate field
> for title added entries (as in MARC) that we can ignore, or
> that we can just use the first title and ignore any others.
>
> - If we cite journals that aren't in Pubmed, we need special
> provisions for handling them.
>
> I believe we can handle the problem and I'm not certain that
> it's significantly harder than if we have a separate title
> file. We just need to think about it up front so that we
> don't do something that makes it hard when the times comes to
> implement this.
>
> ? More disk storage is required.
>
> But I estimate it's no more than about 60 MB to cover the
> next 10 years - trivial in today's world of disk storage.
> It's really not a significant disadvantage.
>
>
> I'm going to proceed on the assumption that the above plan is the
> right one unless someone figure out that it's wrong.

Comment entered 2011-09-28 10:03:35 by Boggess, Cynthia (NIH/NCI) [C]

BZDATETIME::2011-09-28 10:03:35
BZCOMMENTOR::Cynthia Boggess
BZCOMMENT::46

I think this looks like a good plan as well. But here are a few comments.
1. Searching by title: if we default sort results for this search by journal title then users can scroll to the title they want OR is it possible to present them with a list of possible matches first then they can click on the title(s)they want?
2. All articles from one journal: "intermediate disambiguation step" ...an ID look up?
3. I agree we need to stick with the ID here and let NLM do the work for us.

Comment entered 2011-09-28 10:30:24 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-09-28 10:30:24
BZCOMMENTOR::Bob Kline
BZCOMMENT::47

[Just a reminder to everyone that it's not necessary to quote the entirety of very long comments when adding a very short reply or when replying to a specific portion of the original comment.]

(In reply to comment #44)

> Store the following 3 columns in EVERY citation record:
>
> NLM unique journal ID (NlmUniqueID)
> Full title (Title)
> Medline abbreviation (MedlineTA)
>
> Do not store:
>
> ISSN (the MedlineTA is less ambiguous)
> ISO Title Abbreviation (a few records don't have them)
>
> (However we can get these if we need them because I'll
> also store the full imported XML and we can re-parse it
> if we have to.)

You refer later to the problem of dealing with citations to journals which aren't in PubMed, without saying exactly how we'd handle them. Would, for example, the literature review interface ever need to present articles in such journals? We'll want to address these issues at the outset, rather than leave them as functionality to be tacked onto the side of the system.

Comment entered 2011-09-28 11:21:56 by alan

BZDATETIME::2011-09-28 11:21:56
BZCOMMENTOR::Alan Meyer
BZCOMMENT::48

(In reply to comment #47)
...
> You refer later to the problem of dealing with citations to journals which
> aren't in PubMed, without saying exactly how we'd handle them. Would, for
> example, the literature review interface ever need to present articles in such
> journals? We'll want to address these issues at the outset, rather than leave
> them as functionality to be tacked onto the side of the system.

Right now non-Pubmed citations are handled outside the system but I think they can be incorporated without too much effort. I assume that some stages of the literature review processing would be involved, or at least could be involved.

I've included a "source_id" column in the table of citations. If that's something other than Pubmed, we can process data differently. If, for example, we want to do something that only applies to Pubmed records, we ought to incorporate something like:

AND source_id = {however we identify Pubmed}

I'm planning to store citations in two ways. One is the citation record that contains columns for title, journal title, various IDs, etc., along with the associated author table. The other is by storing the original XML or other source citation text in a column of its own. The stored XML, or whatever it is, will be different for different data sources, but I would hope that if we incorporate another source besides Pubmed we can write a parser/loader for it that will produce highly compatible data in the relational columns. If so, then almost all software would be able to process all of the citations without regard to their source. Only a few functions would have to be source aware. For example any software that produces hyperlinks to Pubmed would have to be aware of which records are and which aren't Pubmed sourced. The NOT list processing might also be affected (depending on how we do it), and so on.

That's the general strategy. No doubt when we get around to actually importing non-Pubmed data we'll find problems that we didn't anticipate, but I'm hoping they'll all be moderate sized.

Comment entered 2011-10-13 22:43:32 by alan

BZDATETIME::2011-10-13 22:43:32
BZCOMMENTOR::Alan Meyer
BZCOMMENT::49

I have some questions about "PreMed" processing for Cynthia or
Minaxi:

From reading the code in the import system, it looks like the
program manages PreMed updates as follows:

A user must choose whether she is doing an "Import" or an
"Update PreMed".

If the user chooses "Import" then:

If a citation in the import file does not already exist
in the database (i.e., no match on PMID) then:

Insert it into the database.

Else (the citation already exists) then:

If the citation is already associated with the same
summary topic for which this import is occurring then:

Reject the citation as a duplicate.

Do not update the citation database, even if the
newly imported record is different from the old
one, e.g., it has a "MEDLINE" status and the old
one did not.

Else (it's imported for a different summary topic)
then:

Do not load the citation. It is a duplicate.

Do not update the citation database, even if the
newly imported record is different from the old
one, e.g., it has a "MEDLINE" status and the old
one did not.

Associate the existing citation with the new
summary topic.

Else (the user chooses "Update PreMed") then:

If a citation in the import file matches an existing
citation (PMIDs match) then:

If the newly imported citation contains a
STAT="MEDLINE" field, then:

Replace the data in the citation database with
the newly imported record.

No checking is done to determine if the new
record is different from the old.

Is that correct?

If it is (or even if it's not), how does the user identify the
articles in the citation management system for which updates are
needed? I presume this information is stored in the database
somewhere, but I'm not having any luck finding it.

How does the user get the new versions of the records from
Pubmed?

I'm assuming we can do something more automated than the above
but I want to understand what's done now before designing a
better approach.

Thanks.

Comment entered 2011-10-14 10:05:45 by Boggess, Cynthia (NIH/NCI) [C]

BZDATETIME::2011-10-14 10:05:45
BZCOMMENTOR::Cynthia Boggess
BZCOMMENT::50

Yes this is correct. The only changes made to the database during the import process include adding new citation records and adding new summary topics to existing records.
The only way to update the citation record is to use the Update PreMed feature. Note this feature is limited to updating records that came into the database as premedline only. If a citation came in as a complete medline record and later is corrected or updated in pubmed, we currently have no means to update it in the citems.
All premedline records that are imported into the citems are somehow flagged upon import and the flag remains until it has been updated or rather replaced with a complete medline record.
We identify these records using the "Outstanding Pre-Publication PMIDs Report" located in the Admin Tool section of the citems. http://citems.nci.nih.gov/cmsReports_PMID.asp
This report generates a list of pmids for all records flagged as premedline in the database for a specified date range. This list of pmids is then copied and pasted into the search window of pubmed and a textfile is made listing all these citations in medline format. This file is then "re-imported" using the Update Premed option and the existing premedline records are replaced with the new medline records.
If any of the citations generated by the "Outstanding Pre-Publication PMIDs Report" are still premedline records at the time of update, then they will not be updated/replaced. They will remain flagged as premedline and continue to be listed on the report.
Let me know if you have additional questions.

Comment entered 2011-11-15 14:41:55 by alan

BZDATETIME::2011-11-15 14:41:55
BZCOMMENTOR::Alan Meyer
BZCOMMENT::51

I notice that five users in the old system appear to have more than one user record. These are:

Bonnie Ferguson / Bonnie Fergeson
Nathan Harper (twice)
Robin Harrison (twice)
Sharon Kasner (twice + Sharon Q Kasner)
Tana Smith

I think we want to import all of the user records, including those for people who are no longer active in the existing CMS. They are still linked to the history of various citations and actions. However I wouldn't think we should import multiple records for each of these people.

I could do the cleanup in the old system, or the new, or in the conversion process. I'm not certain yet which of these is best (probably in the conversion process) but my goal would be to only have one record for each user in the new system. Any data that is carried over into the new system would use only one userid for that person and any other userids that were used by him or her would be mapped to that one.

Let me know if there is a problem with that thinking.

Comment entered 2011-11-15 15:14:27 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2011-11-15 15:14:27
BZCOMMENTOR::Bob Kline
BZCOMMENT::52

Let's first find out why there are multiple accounts. If such a user were wearing two different hats, using the different account for each role, we might not want to lose that information. If it's just a mistake, then of course a merge would be fine.

Comment entered 2011-11-15 15:48:39 by alan

BZDATETIME::2011-11-15 15:48:39
BZCOMMENTOR::Alan Meyer
BZCOMMENT::53

(In reply to comment #52)
> Let's first find out why there are multiple accounts. If such a user were
> wearing two different hats, using the different account for each role, we might
> not want to lose that information. If it's just a mistake, then of course a
> merge would be fine.

Does anyone know the answer to Bob's question?

I notice that, in every case where there is more than one user record with the same name, there is at most one active user record, which makes me think that these were mistakes. It's possible that people became inactive in the system, then became active, and someone just created a new record for them.

There is a "level" field in the user record that is supposed to set access permissions. In each case the level values are the same for each of the records for a given person.

Comment entered 2011-11-15 16:50:51 by Boggess, Cynthia (NIH/NCI) [C]

BZDATETIME::2011-11-15 16:50:51
BZCOMMENTOR::Cynthia Boggess
BZCOMMENT::54

At some point years ago we had a problem where selected user profiles were somehow corrupted. Creating a new profile for these users fixed the problem at that time. This is the case for Sharon and Robin and I think Nathan as well. Bonnie's was a mistake. Her name was spelled incorrectly in the email I received to set up her profile. I did an edit and am not sure why there would be two instead of just the edited original.
Also both Sharon and Robin were reassigned to different boards/roles. Sharon moved from the genetics board to peds. Robin started with a staff profile and then left to work for OCCM on the Genetics board with a client profile.
I am not sure why Tana would have more than one.
So yes some cleanup is probably needed.

Comment entered 2011-11-15 17:00:29 by alan

BZDATETIME::2011-11-15 17:00:29
BZCOMMENTOR::Alan Meyer
BZCOMMENT::55

(In reply to comment #54)
...
> So yes some cleanup is probably needed.

Thanks for the explanation. It's great to have someone with a history going back to the beginning of the system and a memory of what happened.

All of the cited reasons sound like they were specific to the old system and that it would be appropriate the consolidate the records in the new one. The result will be that if a user were to search, for example, for a note or a status set by Robin Harrison, all such notes or status values would appear, regardless of which of Robin's records were originally used.

The alternative to that would be to create two records for each person, three for Sharon.

I'll plan to consolidate these records during conversion.

Comment entered 2011-11-15 20:11:03 by alan

BZDATETIME::2011-11-15 20:11:03
BZCOMMENTOR::Alan Meyer
BZCOMMENT::56

Here's another conversion question.

In the original system there is a "notes" field in each citation
record. It isn't required but about 99.5% of the records have
them. About 210,000 of the total 242,000 records with notes have
notes with some variant of "ok" (variants include " ok" and
"okok".)

In the new system we have specified a more flexible system of
"tagging" citations. A user can assign a tag to a citation,
choosing the tag from a controlled list of tags, and associate a
comment/note with the tag. Tags are described in the "New CiteMS
requirements outline" (about which more in the next comment after
this one.)

I propose that we create a tag with a name something like "CMS1
Note". When we convert the citation records to load into the new
CiteMS, if the citation has any data in the notes field, I would:

o Assign a "CMS1 Note" tag to the citation.

o Copy the note into the comment text field associated with the
tag.

o Assign a userid of some type to tag. The safest userid might
be a special one for "conversion program" or "old CMS", or
something like that.

o Assign the original citation input date and time to the tag.

I like this better than creating a notes field in the new
citation record because notes have much less flexibility than the
new tagging approach.

Is that a reasonable thing to do?

Are there better approaches?

Is there a better approach to choosing a userid?

Should I do this even for notes that currently just have the
value "ok", or only for notes with more text?

Comment entered 2011-11-15 20:29:52 by alan

BZDATETIME::2011-11-15 20:29:52
BZCOMMENTOR::Alan Meyer
BZCOMMENT::57

A question about tagging:

In the requirements outline the fields are specified for what
goes into a "status" record and what goes into a "tag" record.

Both kinds of record have the following fields:

Status (or Tag) identifier
Citation identifier
Person (user) identifier
Date-time assigned
Comment (free text)

In addition, status records always have:

Summary topic identifier

I'm thinking that's right as far as it goes but there is one
small fly in the ointment. If a citation is associated with more
than one topic, then when we view a status display a board
manager will necessarily see tags that are relevant to multiple
topics, not just one, and possibly even topics on different
boards.

That might be just fine. The amount of data may not be all that
large and the bother of specifying a summary topic for a tag
might not be worth the return, and the danger of not seeing
important tags and comments because someone didn't assign a topic
might be too great. Or maybe the bother is justified.

If we do allow optional topics on tag/notes, I'd be inclined to
leave them out of the "CMS1 Note" tags that I create on
converting citation records (see comment #56.)

I will proceed for now with the original plan that has no Summary
topics for tags, as documented in our requirements. If someone
thinks that's not the best approach, please say so.

Comment entered 2011-11-16 00:17:36 by alan

BZDATETIME::2011-11-16 00:17:36
BZCOMMENTOR::Alan Meyer
BZCOMMENT::58

This is an improved draft of new database definitions. It has a number of additional tables, drops the journal table (we've decided not to keep our own journal authority), and has some added or changed fields in some tables.

As with Draft 1, Draft 2 does not include any database tables that are already defined for EBMS (like editorial board, summary topic, user, etc.) We'll be using the already defined EBMS tables for that. These are just new ones that weren't needed in EBMS up until now, or ones where EBMS stubbed out simpler tables that will be filled in more completely when the citation management functions are ready.

The definitions will continue to evolve. They will require reconciling with EBMS, which may need changes for EBMS purposes and vice versa for CiteMS uses of EBMS tables.

I'm also working in parallel on a new version of the spreadsheet that documents the existing Citation Management System database. The older version is at:

http://verdi.nci.nih.gov/tracker/attachment.cgi?id=2081

The new version adds a column showing where the data in each field in the old CMS goes in the new CiteMS. I'll be using that new spreadsheet to identify holes in our new database definitions. If there is data in the old system that should be preserved but can't be mapped anywhere in the new system, I'm calling that a "hole". It requires a revision of the new database definitions to eliminate it.

I'll also use the spreadsheet as a conversion specification, showing what the data conversion program must do with every single field in the old system.

It will be a while before the new version of the spreadsheet is ready. I often have to examine data in the system for each individual field in the old system, count occurrences, look at where the data is used elsewhere in the old system, and think about cleanup issues for that field. Cleanups are required if, for example, some records in the old system contain a null value in a field where there really ought to be data - something that I have encountered in a number of fields. We'll use the built-in database integrity features of the DBMS in the new system to ensure that these kinds of errors can't occur in the new system.

Comment entered 2011-11-16 00:17:36 by alan

Attachment createdb.sql has been added with description: Database definitions for new CiteMS - Draft 2

Comment entered 2011-11-16 00:25:18 by alan

BZDATETIME::2011-11-16 00:25:18
BZCOMMENTOR::Alan Meyer
BZCOMMENT::59

Oops. Fixed a mistake and re-uploaded.

(Working hard to refrain from making Rick Perry jokes.)

Comment entered 2011-11-16 00:25:18 by alan

Attachment createdb.sql has been added with description: Database definitions for new CiteMS - Draft 2.1

Comment entered 2011-11-16 11:08:51 by Boggess, Cynthia (NIH/NCI) [C]

BZDATETIME::2011-11-16 11:08:51
BZCOMMENTOR::Cynthia Boggess
BZCOMMENT::60

All of the notes in the CiteMS to date are not summary topic dependant simply because the functionality to do so does not exist. There are, however, notes linked to specific decisions and other recorded data.
All citations get an “OK” note when I have given it my final review. This has been our way of tagging the record as ready to be published. The “OKOK” note is just a typo. I type the word “ok” about 3000 times a month so my fingers tend to get carried away on occasion. Due to the fact that we do not have a date stamp for when I gave my final “ok” this note looses value after the citation is published. If a citation has only the “ok” note, I do not think we need to go out of the way to keep it.
All other notes we need to keep and need to be able to search. And if possible use these notes to identify groups of citations to retrospectively tag. For example, citations with a note containing the words “Added” OR “Fast” OR “FT” could be identified and tagged as fast track citations.

Comment entered 2011-11-17 23:34:25 by alan

BZDATETIME::2011-11-17 23:34:25
BZCOMMENTOR::Alan Meyer
BZCOMMENT::61

(In reply to comment #60)

> All of the notes in the CiteMS to date are not summary topic
> dependant simply because the functionality to do so does not
> exist. There are, however, notes linked to specific decisions
> and other recorded data.

I've tentatively added an optional topic ID to the tag table. My
thinking is that it would be used for searching. A user might
want to find all the records tagged a certain way for a
particular topic, or a particular board (which could be found
through topics), or the user might not care about the topic.

We might want to implement support for using this from the
beginning, or add it later on. It might turn out to be easy
since pretty much whatever we did for "status" associations would
probably work with very small changes for "tag" associations.

> All citations get an "OK" note when I have given it my final
> review. This has been our way of tagging the record as ready
> to be published. The "OKOK" note is just a typo. I type the
> word "ok" about 3000 times a month so my fingers tend to get
> carried away on occasion. Due to the fact that we do not have
> a date stamp for when I gave my final "ok" this note looses
> value after the citation is published. If a citation has only
> the "ok" note, I do not think we need to go out of the way to
> keep it.

Got it.

I'll plan on discarding those notes.

> All other notes we need to keep and need to be able to search.
> And if possible use these notes to identify groups of citations
> to retrospectively tag. For example, citations with a note
> containing the words “Added” OR “Fast” OR “FT” could be
> identified and tagged as fast track citations.

I'll save them in tag comments, as described in comment #56.

Comment entered 2011-11-22 15:02:52 by alan

BZDATETIME::2011-11-22 15:02:52
BZCOMMENTOR::Alan Meyer
BZCOMMENT::62

(In reply to comment #42)

> I got a response from NLM which is attached.

Somewhere in the process the URL for the NLM usage guidelines got a bit mangled in the posted attachment. The correct URL is:

http://www.ncbi.nlm.nih.gov/books/NBK25497/#chapter2.Usage_Guidelines_and_Requirements

Comment entered 2011-12-29 10:08:26 by alan

BZDATETIME::2011-12-29 10:08:26
BZCOMMENTOR::Alan Meyer
BZCOMMENT::63

This outline specifies an applications programmer interface for bringing articles into the EBMS/CiteMS from NLM, and for doing things with them in the system. It's intended as programmer documentation that will probably not be of interest to users.

In this and future documents I am using the term "article" in place of the term "citation" for consistency with the naming convention in the EBMS.

Comment entered 2011-12-29 10:08:26 by alan

Attachment newCiteMS_API_1.0.html has been added with description: Draft 1.0 API for article import and information

Comment entered 2011-12-29 11:58:02 by alan

BZDATETIME::2011-12-29 11:58:02
BZCOMMENTOR::Alan Meyer
BZCOMMENT::64

The existing CiteMS system appears to have the following rules
when storing authors:

Store up to 250 (or 251 for 3 cases) characters of author
names and initials for each article. If there are more
authors than fit in 250 characters, the string of authors is
truncated, in the middle of a name if need be.

Store only the last name + initials form of the author name.
This matches an earlier Medline print format rule. The
current Medline print format and the XML format also have
full first names when available. However, while that
information is stored at NLM it does not appear in the
default Pubmed display format.

Use only the ASCII form of names. Diacritics or other
non-English characters are not used.

Should we continue these policies, or should we have new ones.
The policies affect both search and display. Specific questions
are:

1. Should we only import up to some maximum number or character
count of authors?

If there are more than that, they cannot be searched or
displayed, though they may be seen by linking out to NLM.

2. Should we import full names if they are available, or only
lastname + initials?

If we want to search on full names, we'll need to import
them.

I'll leave the issue of the non-English characters out for now as
it's a general question that can affect other fields.

Comment entered 2011-12-29 12:15:57 by priced

BZDATETIME::2011-12-29 12:15:57
BZCOMMENTOR::Minaxi Trivedi
BZCOMMENT::65

(In reply to comment #64)

1. I am of the opinon to import upto three authors.

2. Import only Last name + initials.

Minaxi

> Should we continue these policies, or should we have new ones.
> The policies affect both search and display. Specific questions
> are:
>
> 1. Should we only import up to some maximum number or character
> count of authors?
>
> If there are more than that, they cannot be searched or
> displayed, though they may be seen by linking out to NLM.
>
> 2. Should we import full names if they are available, or only
> lastname + initials?
>
> If we want to search on full names, we'll need to import
> them.
>
> I'll leave the issue of the non-English characters out for now as
> it's a general question that can affect other fields.

Comment entered 2011-12-29 15:40:26 by alan

BZDATETIME::2011-12-29 15:40:26
BZCOMMENTOR::Alan Meyer
BZCOMMENT::66

(In reply to comment #64)

> 1. Should we only import up to some maximum number or character
> count of authors?

We discussed the issue at the CDR status meeting today. Minaxi doesn't need all of the authors but it looks like the Board Members will. The last author in a list is special in that, by scientific publishing conventions, that person is often the director of the lab where the research was done and may well be the person that a board member remembers when searching or when looking at an article display.

So our plan will be to keep all of the authors.

> 2. Should we import full names if they are available, or only
> lastname + initials?

Again, lastname + initials appears to be fine for the import process but may be useful in the full system with board member access. Searching Pubmed (admittedly a much bigger database than ours) I found the following:

12,858 Smith A
1,024 Smith AD
129 Smith Alan
27 Smith Alan D

So full first names can significantly help with searching.

> I'll leave the issue of the non-English characters out for now as
> it's a general question that can affect other fields.

My understanding of the sense of the meeting was that our first priority should be to try to make the searching work pretty well for users that don't know how to enter non-English characters and, if practical, also try to make it work for users who do know how to use them.

Author names will be affected but some titles will also be affected, for example those with Greek letters in names of molecules. In both cases we want searching to work as well as it can (within practical limits) for the user who doesn't know how to enter a diacritic or a Greek letter.

Comment entered 2011-12-29 23:09:48 by alan

BZDATETIME::2011-12-29 23:09:48
BZCOMMENTOR::Alan Meyer
BZCOMMENT::67

I've been looking at author names in Pubmed. Here is an extract
of all of the authors of one of the articles (PMID = 15447788):

<Author ValidYN="Y">
<LastName>Malthaner</LastName>
<ForeName>Richard A</ForeName>
<Initials>RA</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Wong</LastName>
<ForeName>Rebecca Ks</ForeName>
<Initials>RK</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Rumble</LastName>
<ForeName>R Bryan</ForeName>
<Initials>RB</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Zuraw</LastName>
<ForeName>Lisa</ForeName>
<Initials>L</Initials>
</Author>

Here's one from 1998 with no first names (PMID = 9855693):

<Author>
<LastName>Charland</LastName>
<Initials>SL</Initials>
</Author>
<Author>
<LastName>Hui</LastName>
<Initials>JW</Initials>
</Author>

And one with no first names but with first name fields
(PMID = 9748936):

<Author ValidYN="Y">
<LastName>Kilstoff</LastName>
<ForeName>K</ForeName>
<Initials>K</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Chenoweth</LastName>
<ForeName>L</ForeName>
<Initials>L</Initials>
</Author>

There's some tricky stuff there. Minaxi's suggestion of using
only last name and initials sounds like the voice of experience.

I'm not going to over analyze this right now. I think we should
probably do what we can do with the low hanging fruit and not
risk our necks trying to climb to the top of the tree. But I
thought I would let people know that these kinds of issues exist.

Comment entered 2011-12-29 23:26:55 by alan

BZDATETIME::2011-12-29 23:26:55
BZCOMMENTOR::Alan Meyer
BZCOMMENT::68

Just for kicks, how about this one:

<Author ValidYN="Y">
<LastName>González-Losa</LastName>
<ForeName>Maria del Refugio</ForeName>
<Initials>Mdel R</Initials>
</Author>

Those furriners really know how to mess things up.

Comment entered 2012-01-05 23:57:43 by alan

BZDATETIME::2012-01-05 23:57:43
BZCOMMENTOR::Alan Meyer
BZCOMMENT::69

Another draft of database definitions.

This one will:

NOT compile.
Does NOT incorporate Bob's latest changes.
Has stuff in it purely for test.

But I wanted to get something up to show what my current thinking
about tables is, and reflecting the latest discussions between
myself and Bob. So here it is. I'll do some reconciliation and
testing on Monday.

Comment entered 2012-01-05 23:57:43 by alan

Attachment ebmsalan.sql has been added with description: Database definitions for new CiteMS - Draft 2.2

Comment entered 2012-01-06 14:55:33 by Boggess, Cynthia (NIH/NCI) [C]

BZDATETIME::2012-01-06 14:55:33
BZCOMMENTOR::Cynthia Boggess
BZCOMMENT::70

The attached document was drafted by Minaxi Trivedi addressing the issue of non-journal citations in pubmed. This is not a problem at the moment, but something to think about for the new CiteMS.

Comment entered 2012-01-06 14:55:33 by Boggess, Cynthia (NIH/NCI) [C]

Attachment Missing source field.htm has been added with description: regarding non-journal citations in pubmed

Comment entered 2012-01-09 14:49:45 by alan

BZDATETIME::2012-01-09 14:49:45
BZCOMMENTOR::Alan Meyer
BZCOMMENT::71

Another draft database creation script.

This one:

Will compile.
Incorporates Bob's latest changes, as of today.
Has test code but it's commented out.

I'll begin working on writing code. No doubt I will need to make changes to the database definitions, but the attached draft will, hopefully, be close to what we need.

Comment entered 2012-01-09 14:49:45 by alan

Attachment ebmsalan.sql has been added with description: Database definitions for new CiteMS - Draft 2.3

Comment entered 2012-01-09 14:52:26 by alan

BZDATETIME::2012-01-09 14:52:26
BZCOMMENTOR::Alan Meyer
BZCOMMENT::72

(In reply to comment #70)
> Created attachment 2211 [details]
> regarding non-journal citations in pubmed
>
> The attached document was drafted by Minaxi Trivedi addressing the issue of
> non-journal citations in pubmed. This is not a problem at the moment, but
> something to think about for the new CiteMS.

Cynthia or Minaxi,

Could you check the attachment? All I'm seeing is a cover letter for Minaxi's document. I'm not seeing the document itself.

Thanks.

Comment entered 2012-01-09 15:01:51 by priced

BZDATETIME::2012-01-09 15:01:51
BZCOMMENTOR::Minaxi Trivedi
BZCOMMENT::73

The correct file is attached.
Minaxi

Comment entered 2012-01-09 15:01:51 by priced

Attachment Missing Source Field.doc has been added with description: Missing source field

Comment entered 2012-01-09 15:58:17 by alan

BZDATETIME::2012-01-09 15:58:17
BZCOMMENTOR::Alan Meyer
BZCOMMENT::74

(In reply to comment #73)
> Created attachment 2213 [details]
> Missing source field
...

The check for the SO field was one I added. Perhaps that was a
mistake.

The reasons I added the check were:

1. The SO field appeared to be the last field on any journal
article (I wasn't aware of the book chapter problem at the
time.)

There had been bugs in the system in the past where it failed
to correctly identify record boundaries, causing corrupted
records to be added. I thought that by adding the SO check
(and by some other changes) that would prevent that kind of
database corruption.

I was imagining, of course, that if there were no SO field,
something had gone terribly wrong, possibly including the
start of a new record without the completion of a previous
one.

2. I saw significant code in the system that used the SO field.

It was used to create the brief citation in the display, and
to identify the full journal for the article. I seem to
recall that there were other uses too. Not realizing that it
might be legitimate to load a record without an SO field, it
seemed like a good idea to stop importing.

I think the two main issues now are:

1. What should we do about this in the existing system?

a. Modify all the searches.

b. Remove the check that I put in.

I believe that I can do this without a lot of work. It
may open the door to some errors, but whether anything
will come through that door isn't known yet.

c. Change the software to reject any record without an SO,
but not stop processing, i.e., keep processing the rest of
the records.

I don't know how hard that would be. It might be easy or
might not. I'd have to study it.

d. Do nothing.

If and when the error occurs again, have someone delete
the record by hand from the input file.

That would be the cheapest solution if this happens
infrequently and the new system comes online soon.

Whatever we do, I don't think it's worthwhile to try to adapt
the old software to display or otherwise process these
records correctly. As I understand it, we aren't sending
them to board members for review in any case.

2. What should we do in the new system?

a. Reject records that are not journal articles, e.g., by
refusing to import them and putting them on the import
report in a special category.

After each import we'll produce a report that says how
many articles were imported, how many were duplicates, how
many got a new summary topic and possibly a new review
cycle, etc.

I could add a category to indicate articles that were not
imported because they were not journal articles.

b. Allow import of non-journal articles.

We'd have to analyse how to store these in the system,
display them, and whether to put them through the usual
processing that journal articles get, or create some
special path for them.

I'd be inclined to use solution 2.a., at least unless and
until we decide that there is a use case for importing
records that are not journal articles.

Comment entered 2012-01-10 15:05:31 by Boggess, Cynthia (NIH/NCI) [C]

BZDATETIME::2012-01-10 15:05:31
BZCOMMENTOR::Cynthia Boggess
BZCOMMENT::75

(In reply to comment #74)
> (In reply to comment #73)
> > Created attachment 2213 [details]
> > Missing source field
Minaxi and I discussed your questions and have commented below:

> 1. What should we do about this in the existing system?
> d. Do nothing.

Right now this is not a problem. So far there are not very many non-journal article citations coming up in our searches. If we get an error on import, we will remove the non-journal record from our text file and reimport.
Also who knows what else PubMed is planning on adding. They still do not have the book chapter publication type searchable yet so I am guessing things are still in the works. So watching and waiting seems like a good plan right now.

> 2. What should we do in the new system?
> a. Reject records that are not journal articles, e.g., by
> refusing to import them and putting them on the import
> report in a special category.
> After each import we'll produce a report that says how
> many articles were imported, how many were duplicates, how
> many got a new summary topic and possibly a new review
> cycle, etc.
> I could add a category to indicate articles that were not
> imported because they were not journal articles.

Until a decision is made to include these non-journal resources in our monthly literature surveillance, we shouldn't go out of the way to store them. For now lets reject them for import. Give us a count as well as a list of pmids. We can review and evaluation these resources and forward any that seem relevant via email. If we see an increase in the number of non-journal article resources retrieved or if the editorial boards find these resources useful then we can revisit this issue.
>

Comment entered 2012-01-10 15:07:53 by alan

BZDATETIME::2012-01-10 15:07:53
BZCOMMENTOR::Alan Meyer
BZCOMMENT::76

(In reply to comment #75)

> Until a decision is made to include these non-journal resources in our monthly
> literature surveillance, we shouldn't go out of the way to store them. For now
> lets reject them for import. Give us a count as well as a list of pmids. We can
> review and evaluation these resources and forward any that seem relevant via
> email. If we see an increase in the number of non-journal article resources
> retrieved or if the editorial boards find these resources useful then we can
> revisit this issue.

I'll put that into the design.

Comment entered 2012-01-10 15:54:38 by alan

BZDATETIME::2012-01-10 15:54:38
BZCOMMENTOR::Alan Meyer
BZCOMMENT::77

The design for the import program now proposes the following
seven possible dispositions for a record import.

Note: "test mode" means that the import doesn't update the
database. If a user selects test mode and imports a batch of
records, no records are actually put into the database. Instead,
a report is created that just says what would have been done with
them. In "live mode" the same report is produced but records
actually update the database where appropriate.

  • "new"
    Article was not in the database, would be (test mode) or was (live
    mode) added.

  • "new topic"
    Article was in the database but did not have this topic assigned.
    New topic would be (test mode), or was, assigned.

  • "duplicate"
    Article was in the database for this topic. It would not be loaded.
    It may be that the topic was de-assigned at one time but we still
    regard it as a duplicate.

  • "NOT listed"
    Article was to a journal that is on the NOT list. Article would be
    (test mode) or was imported but automatically marked as rejected
    during initial review, with a comment in the status record explaining
    why.

  • "replacement"
    Article fetched from Pubmed is newer than what we've got. In live
    mode it would replace the record we have.

  • "non-journal"
    The imported record was for something other than a journal article.
    It might, for example, be a record for a chapter in a book. The
    record would be (test mode) or was rejected, not added anywhere.

  • "error"
    Document could not be loaded. No articleId. otherId might or might
    not be present, depending on the error. messages is an array of one
    or more strings, each containing one error. A typical example of an
    error is that we searched Pubmed for this ID but didn't find it
    (should be extremely rare.)

I notice that the XML format for book chapters (See
PMID=21882460) is very different. If we are to know what to do
with these non-journal article records we'll need to know more
about the XML Schemas or DTDs for the different kinds of records
we might encounter. I did a cursory Google search for this data
but haven't found it yet. If anyone knows where to find it,
and/or knows what else we might find besides articles and book
chapters in Pubmed search results, please post the info here.

Comment entered 2012-01-19 23:01:09 by alan

BZDATETIME::2012-01-19 23:01:09
BZCOMMENTOR::Alan Meyer
BZCOMMENT::78

Two sections in our requirements outline are:

2.12 Queues (or states, or workflow).

2.13 Status information.

I'm working on this part of the system now and have come up with
the following fairly simple scheme for representing status or
states in the system. I would like very much to hear anyone's
opinion on whether this makes sense.

To begin with, let me describe some specific status states that
I'm considering.

Imported, awaiting initial review.

Rejected in initial review (end of the line.)

Passed initial review.

Rejected in NCI abstract review (end of the line.)

Passed NCI abstract review.

Full text requested, awaiting full text retrieval.

Full text retrieved, awaiting full text review (or maybe just
"Awaiting full text review" - see below.)

Rejected in NCI full text review (end of the line.)

Passed NCI full text review.

Those are examples. I'll need to know the real values to use
sometime soon, but not immediately. If the concept is right, the
software will accommodate whatever particular status states we
want.

The states are implicitly ordered in a way that I'll explain, but
first I'll describe the database tables needed to represent this
information.

Two tables are required. Conceptually, they contain the following
information:

1. Article states table:

There is one record here for each state that an article has
entered for a particular topic. Fields are:

article ID - Unique ID of the article.
topic ID - An article can be in different states for
different topics.
state ID - An ID representing one of the above status
states.
user ID - Who put the article in this state.
datetime - When this state was entered.
comment - Any free text the user wants to associate
with this article/topic/state. I'm not sure
about this one. We might want to use "tags"
instead.

2. State types table:

There is one record here for each type of state. If we need
to track some new type of state that we didn't track in the
past, we enter a new record in this table.

state ID - Unique ID of the state.
state name - For example, "Passed NCI abstract review".
description- Longer text used in help displays,
for example: "An NCI reviewer read the
citation and abstract and decided that this
article should be considered for further,
more comprehensive, review".
ordinality - A number from 1 to N, where 1 is the lowest
value and N is highest. See below.
inactive - If we ever decide to stop using a state
we'll need to keep it in the table so we can
show what happened to articles that had this
state in the past, but we'll flag it as
inactive, not to be used again.

The cleverly simple (or maybe simply dumb) part of this design is
the ordinality field. It is used as follows:

Each state is sequentially ordered with respect to other
states by it's ordinal number. Numbers are assigned as
needed, starting with 1, the lowest article state in the
system. For example, we might assign priorities to the
states listed above as:

1. Imported, awaiting initial review.
2. Rejected in initial review (end of the line.)
3. Passed initial review.
4. Rejected in NCI abstract review (end of the line.)
5. Passed NCI abstract review.
6. Full text requested, awaiting full text retrieval.
7. Full text retrieved, awaiting full text review.
8. Rejected in NCI full text review (end of the line.)
9. Passed NCI full text review.

An article in one of these states can move forward and acquire
the next one. For example, a good article might travel through
the following states - all recorded in the status table:

1, 3, 5, 6, 7, 9.

A user might have retrieved the full text herself, in which case
the same article might have had these states (state 6 was
skipped):

1, 3, 5, 7, 9.

An article that fails initial review might only have had two
states:

1, 2

States can skip forward, for example from 1 straight to 9
(1,9) for an article that a board member brought up and wanted
discussed at the meeting, but can only go backward by erasing the
higher ordinality states.

For example, the CIAT reviewer might approve the article (state
3) and then change her mind. In that case we erase the record
that says this was in state 3. It reverts to state 2. It is as
if it was never in state 3. We'll still have a log of what
happened so that we can go back and see the sequence, but that
would only be for debugging or maybe for reports. From the
system's point of view, the record was in states 1 and 2 and that
was the end of the line.

The "current" state is always the highest numbered one that the
article has reached.

If we ever need a new state, we insert it in the ordinal sequence
where we want it. For example, we might have a state called
"Rejected via NOT list". We might insert that state between
state 1 (Imported) and 2 (Rejected in initial review.) To
accommodate that, we renumber the states, move 2 to 3, 3 to 4,
etc., and create a new number 2.

There are some issues to consider with this scheme. Two that I
can think of are:

1. Are there some states that are outside the ordinality
ordering?

I think the answer is no. If we want to attach information
to an article that doesn't have to do with processing and
work flow, we can use the "tag" mechanism. See 2.14
"Descriptive Tagging" in the requirements outline.

2. Is it ever legal to go backward without erasing states?

A conceivable use case is, I'm looking at a paper journal.
I pass the article without ever having retrieved electronic
full text. Later, I retrieve the PDF and put it in the
system.

One approach is to not go back for that. We put the full
text in the system but don't ever use the "full text
retrieved" state. Another is to make an exception for that
state. Perhaps the best is to not have that state at all,
instead having a state "Awaiting full text review". If a
user has requested full text, Bonnie, or whoever, gets it
and sets the new state to "Awaiting full text review".
Entry of the PDF into the system is not a "state" for
workflow purposes.

A smart way to handle this is to have the program that
stores the PDF check the status table. If the highest
status value is "Awaiting full text retrieval", the program
automatically enters "Awaiting full text review". If the
highest state is already past that, it leaves it alone.

3. Are there some states with equal ordinality?

Again, I think the answer is no. Work proceeds in a
sequence. Items in the sequence can be skipped, but an
article doesn't go into two states at the same time. If
that's not so, let's come up with the use cases for it.

This is where I'm heading as of this moment. Someone please
stick out a foot and trip me if I'm heading off a cliff.

Comment entered 2012-01-19 23:12:08 by alan

BZDATETIME::2012-01-19 23:12:08
BZCOMMENTOR::Alan Meyer
BZCOMMENT::79

(In reply to comment #78)
...
> I'm working on this part of the system now and have come up with
> the following fairly simple scheme for representing status or
> states in the system.
...

In the discussion in comment #78 I didn't talk about board member
packets or responses in relation to article states.

We already have a model for board member responses that has been
prototyped, reviewed, and approved. It is a process that is more
complex than the simple states and state ordering that I've
described. I presume that we want to leave that alone and not
try to integrate it into this scheme or reflect it in these state
tables.

Comment entered 2012-01-19 23:38:14 by alan

BZDATETIME::2012-01-19 23:38:14
BZCOMMENTOR::Alan Meyer
BZCOMMENT::80

(In reply to comment #78)

We might need to add "review cycle" to the fields for an article
state.

If it's legal for the staff to review an article in, say, December
of 2012, reject it for a topic at some point, then reconsider it
again in March, 2013 for the same topic, we have to consider how
to represent that.

We might overwrite the old status, or we might keep it but attach
a review cycle to each of the status records so as to keep the
history of each of the two review processes. If we overwrite, I
still plan to keep a log that we can use for debugging and or some
reports, so I don't expect to lose important information.

Comment entered 2012-01-20 13:18:24 by priced

BZDATETIME::2012-01-20 13:18:24
BZCOMMENTOR::Minaxi Trivedi
BZCOMMENT::81

(In reply to comment #78)
> Two sections in our requirements outline are:
>
> 2.12 Queues (or states, or workflow).
>
> 2.13 Status information.
>
> I'm working on this part of the system now and have come up with
> the following fairly simple scheme for representing status or
> states in the system. I would like very much to hear anyone's
> opinion on whether this makes sense.
>
>In my opinion it makes perfect sense. Minaxi

Comment entered 2012-01-23 10:17:11 by Boggess, Cynthia (NIH/NCI) [C]

BZDATETIME::2012-01-23 10:17:11
BZCOMMENTOR::Cynthia Boggess
BZCOMMENT::82

(In reply to comment #80)
The only way a citation gets another review cycle assigned is if it is entering the CiteMS with a new summary topic...which would be a new sequence of states for that topic.
An overwrite would only be needed in the following example...
A user searches the CiteMS and retrieves several citations that they did not pass for further review for a specific topic, but now they want to review them again for that same topic, the old status would need to be overwrited to kick the citation back into the review process for that topic. And depending on how far along the citation was in the original review process, would indicate which status would get overwrited. The review cycle should not change.

Comment entered 2012-01-31 17:29:27 by alan

BZDATETIME::2012-01-31 17:29:27
BZCOMMENTOR::Alan Meyer
BZCOMMENT::83

I've done more research on author names in Pubmed. There are
some pieces of information regarding authors that will be in the
XML data that we download from Pubmed. They can be displayed if
we want to display them, but I am not planning to consider them
in constructing author indexes. These are:

Suffix:

Values I observed were:
Jr
Sr
2nd
3rd
4th
5th

In a sample of 51,845 records with 278,537 authors,
Suffixes occurred 1,154 times.

I think we should ignore these for searching on the
theory that they may be more confusing than helpful, just
as non-English characters may be.

ValidYN:

This is a required author attribute. It almost always
has the value 'Y'. If NLM happens to know that the
author's name is spelled wrong in the article, they
publish the author's name as it appears in the article
but set ValidYN='N' to indicate that they know this is an
error.

There were 82 ValidYN='N' in the above sample of 278,537
authors. Sometimes the correct spelling appears in
another author field preceding the invalid name, for
example:

Valid: Marco Zappa.
Invalid: Zappa Marco.

But sometimes the valid name does not appear to be in the
record at all.

I plan to ignore the validity indicator because I can't
think of anything useful to do with it. The only times
we know the valid name are when it also appears in the
data, in which case both the valid and invalid forms will
be indexed.

Even if we have a valid name and know which invalid name
it should replace, I think it would be odd not to index
the name that the publisher used, valid or not.

NameID:

This element isn't used yet but is intended to be used in
the future. If present, it will indicate that the name
has a unique ID in some NLM recognized authority file.
A hypothetical example in the NLM documentation is:

<NameID Source='NCBI'>123456</NameID>

Until we know more about this, and until all or almost
all Pubmed author names have these, I don't see them
helping us in searching. So I plan to ignore this if and
when it appears in data.

The information about these came from scanning a sample of data,
and from this document:

http://www.nlm.nih.gov/bsd/licensee/elements_descriptions.html

I don't think there's any need for anyone to comment on what I've
written unless you think I'm making a mistake and we should index
some of this data.

Comment entered 2012-02-10 13:35:20 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2012-02-10 13:35:20
BZCOMMENTOR::Bob Kline
BZCOMMENT::84

I'm working on converting the legacy data for the new EBMS, and I've got a question about the topic assignment. Most of the topics originally assigned when an article is imported are recorded in the asc_ORIGINAL_SUMMARIES table, as well as in the asc_CITE_SUMMARIES table (that's 333,805 original topic assignments). A smaller number (14,771) don't show up in the asc_SITE_SUMMARIES table (for example, PMID 21407093 (Ref_ID 246801, imported 2011-09-06). Is there any significance in this? I'm wondering whether I should merge the information from the two tables, or just use the asc_CITE_SUMMARIES table, on the assumption that the ones that didn't make it into that table weren't really viewed as having been assigned a topic.

Comment entered 2012-02-10 14:04:43 by Boggess, Cynthia (NIH/NCI) [C]

BZDATETIME::2012-02-10 14:04:43
BZCOMMENTOR::Cynthia Boggess
BZCOMMENT::85

(In reply to comment #84)
Summaty topics can be added after the original import during the initial review process as well as after publication for client review. My guess is that this table is for citations that have had summary topics added after import. Can you provide a few more examples so that we can confirm this.

Comment entered 2012-02-10 14:42:12 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2012-02-10 14:42:12
BZCOMMENTOR::Bob Kline
BZCOMMENT::86

(In reply to comment #85)
> (In reply to comment #84)
> Summaty topics can be added after the original import during the initial review
> process as well as after publication for client review. My guess is that this
> table is for citations that have had summary topics added after import. Can you
> provide a few more examples so that we can confirm this.

When I first looked at the asc_CITE_SUMMARIES, I assumed its purpose was as you describe it above: namely, to assign another topic to an article, in addition to the one assigned when the article was imported. This assumption was based, in part, on the name of the "date_added" column, which would seem to imply that the topic is being added to the one originally assigned, rather than copying the information already in the asc_ORIGINAL_SUMMARIES table. But the number I gave in comment #84 (333,805) was for the original topic assignments made when the article was imported, for which the assignment of that original topic was also copied into the asc_CITE_SUMMARIES table. There are, indeed, rows in the asc_CITE_SUMMARIES table which represent assignment of another topic, different from the one originally assigned to the article (approximately 40,000 such rows). But most of the rows in the asc_CITE_SUMMARIES table are for assignment of topics already recorded in the asc_ORIGINAL_SUMMARIES table. What I'm looking for here is the reason why not all of the original topic assignments got copied into the asc_CITE_SUMMARIES table (about 4% of the original assignments didn't get copied). Is there something about those 4% of assignments which caused them to be somehow revoked, in which case I wouldn't want to bother carrying them over into the EBMS, but instead just use the assignments which are represented in the asc_CITE_SUMMARIES table.

Comment entered 2012-02-10 15:56:13 by Boggess, Cynthia (NIH/NCI) [C]

BZDATETIME::2012-02-10 15:56:13
BZCOMMENTOR::Cynthia Boggess
BZCOMMENT::87

(In reply to comment #86)
It would be really helpful to see more examples of this 4%. Minaxi and I keep all of our text files used to import citations and may be able to figure this out.
Another guess would be that these citations were part of an import error...if the import failed and these citations had to be added again or altered in some way. If we can see more examples we can compare the dates to see if they match to previous errors.

Comment entered 2012-02-10 16:00:58 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2012-02-10 16:00:58
BZCOMMENTOR::Bob Kline
BZCOMMENT::88

You asked for additional examples of what I am describing (in fact, when I first tried posting this comment, I ran into a conflict with your repeated request for more examples; sorry it took so long). Here are the original assignments for a handful of articles (15 of them):

246801 21407093 2011-09-06 Prevention of Bladder Cancer* September 2011
246617 19564173 2011-09-06 Prevention of Bladder Cancer* September 2011
245241 21458152 2011-08-31 Prevention of Bladder Cancer* September 2011
242906 19968734 2011-08-05 Gastrointestinal Carcinoid Tumor August 2011
242905 21140451 2011-08-05 Gastrointestinal Carcinoid Tumor August 2011
242904 21276407 2011-08-05 Gastrointestinal Carcinoid Tumor August 2011
242903 21540861 2011-08-05 Gastrointestinal Carcinoid Tumor August 2011
242902 21719393 2011-08-05 Gastrointestinal Carcinoid Tumor August 2011
242872 21447720 2011-08-05 Gastrointestinal Carcinoid Tumor August 2011
242398 20622646 2011-08-05 Gastrointestinal Carcinoid Tumor August 2011
241859 21743467 2011-07-27 Genetics of Skin Cancer June 2011
236046 20229076 2011-05-04 Pediatric Supportive Care May 2011
234540 20619924 2011-04-11 Prevention of Lung Cancer March 2011
234539 21168236 2011-04-11 Prevention of Lung Cancer March 2011
234538 21169285 2011-04-11 Prevention of Lung Cancer March 2011

The first column is the Ref_ID (the legacy ID for the article), followed by the Pubmed ID, the date the article was imported and the original topic assigned, the topic name, and the cycle. The asc_CITE_SUMMARIES table has none of these topic->article assignments, but instead has these assignments for those articles:

246801 21407093 2011-09-16 Screening for Bladder and Other Urothelial Cancers September 2011
246801 21407093 2011-09-16 Cancer Screening Overview September 2011
246617 19564173 2011-09-28 Screening for Bladder and Other Urothelial Cancers September 2011
246617 19564173 2011-09-28 Cancer Screening Overview September 2011
246617 19564173 2011-09-28 Cancer Prevention Overview September 2011
245241 21458152 2011-09-16 Bladder Cancer September 2011
245241 21458152 2011-09-16 Screening for Bladder and Other Urothelial Cancers September 2011
245241 21458152 2011-09-16 Cancer Screening Overview September 2011
242906 19968734 2011-08-05 Gastrointestinal Stromal Tumors August 2011
242905 21140451 2011-08-05 Gastrointestinal Stromal Tumors August 2011
242904 21276407 2011-08-05 Gastrointestinal Stromal Tumors August 2011
242903 21540861 2011-08-05 Gastrointestinal Stromal Tumors August 2011
242902 21719393 2011-08-05 Gastrointestinal Stromal Tumors August 2011
242872 21447720 2011-08-05 Gastrointestinal Stromal Tumors August 2011
242872 21447720 2011-08-05 Gastric Cancer August 2011
242398 20622646 2011-08-05 Anal Cancer August 2011
242398 20622646 2011-08-05 Gastrointestinal Stromal Tumors August 2011
241859 21743467 2011-11-02 Prostate Cancer November 2011
241859 21743467 2011-07-27 Genetics of Prostate Cancer July 2011
236046 20229076 2011-05-23 Late Effects of Treatment for Childhood Cancer May 2011
236046 20229076 2011-05-23 General Supportive Care* May 2011
234540 20619924 2011-11-04 Cancer Screening Overview November 2011
234540 20619924 2011-11-04 Screening for Lung Cancer March 2011
234539 21168236 2011-11-04 Cancer Screening Overview November 2011
234539 21168236 2011-11-04 Screening for Lung Cancer March 2011
234538 21169285 2011-04-11 Screening for Lung Cancer March 2011

Does this help?

Comment entered 2012-02-10 17:38:21 by Boggess, Cynthia (NIH/NCI) [C]

BZDATETIME::2012-02-10 17:38:21
BZCOMMENTOR::Cynthia Boggess
BZCOMMENT::89

Thanks for the examples but unfortunately we cannot determine how these citations are unique. They all have topics that were added after import but there are so many more such citations in the system that have had topics added as well. Why these particular 4% are in their own table is truly a mystery. These citations have different topics in each table. We need to keep all the topics so you will need to merge the date from the two tables.

Comment entered 2012-02-10 17:43:42 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2012-02-10 17:43:42
BZCOMMENTOR::Bob Kline
BZCOMMENT::90

(In reply to comment #89)
> Thanks for the examples but unfortunately we cannot determine how these
> citations are unique. They all have topics that were added after import but
> there are so many more such citations in the system that have had topics added
> as well. Why these particular 4% are in their own table is truly a mystery.
> These citations have different topics in each table. We need to keep all the
> topics so you will need to merge the date from the two tables.

OK. I had a theory that in these cases it had been determined that the originally assigned topics weren't the best ones after all, and were being replaced by more appropriate topic assignments, but I gather you've determined that the original assignments were correct even for the ones where the assignments weren't copied over into the larger table. Thanks for investigating.

Comment entered 2012-02-16 16:11:58 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2012-02-16 16:11:58
BZCOMMENTOR::Bob Kline
BZCOMMENT::91

I don't see any mention of the fact in this issue's comments or in the latest requirements document (possibly because I'm not reading them carefully enough), so I'll make a note here of the fact that the earlier notion of having a table to associate tags (and optional comments) with article-topic combinations has been replaced by the article state table. So in the conversion I'll be storing the contents of the notes column of the legacy bib table (except for those which are just "ok" – see comment #61) in a row in the ebms_article_state table. The columns for that row are:

article_id: tells us which article we're talking about
topic_id: derived from the asc_ORIGINAL_SUMMARIES table
state_id: key to value "Imported"
user_id: derived from bib.user_id and converted mt_USERS table
status_dt: derived from bib.Date_input
comments: from bib.notes

Where there was more than one topic assigned to an article at import (that is, the asc_ORIGINAL_SUMMARIES table has more than one row for that article's ref_id) I will create multiple rows in the ebms_article_state table, one for each topic. Each of the rows will get the same value from bib.notes in the comments column.

There are quite a few articles in the bib table whose ref_id does not appear at all in the asc_ORIGINAL_SUMMARIES table. I suggest that the solution to this problem would be to change the topic_id column in the ebms_article_state table to allow NULL values (a better solution than losing the value of the bib.notes column).

Comment entered 2012-03-07 17:52:23 by alan

BZDATETIME::2012-03-07 17:52:23
BZCOMMENTOR::Alan Meyer
BZCOMMENT::92

(In reply to comment #84)
> I'm working on converting the legacy data for the new EBMS, and
> I've got a question about the topic assignment. Most of the
> topics originally assigned when an article is imported are
> recorded in the asc_ORIGINAL_SUMMARIES table, as well as in the
> asc_CITE_SUMMARIES table (that's 333,805 original topic
> assignments). A smaller number (14,771) don't show up in the
> asc_SITE_SUMMARIES table (for example, PMID 21407093 (Ref_ID
> 246801, imported 2011-09-06). Is there any significance in
> this? ...

This is all very confusing.

I wondered if I could find the answer to Bob's question by
searching the code. I looked to see where the inserts were
performed and whether any code deleted data from one table but
not the other. I looked at the import code, the stored
procedures, and the ASP code. I also did some experiments. At
the end of the process, I am more confused than ever.

I didn't find any deletes for either table. All of the inserts I
found were in the import utility - which surprised me. Maybe I
missed one by searching for the wrong patterns, so I tested the
actual system.

I logged into the test database and added a "new path" to record
246801. I added board=CAM, Reviewer=CAM Reviewer, and
Summary=Acupuncture. This did not result in any new summary_id
in either the "cite" or "original" summaries table, and in fact,
did not show the summary name in the history display for the
record. I then added a "decision" for the record, selecting
Acupuncture as the topic. This too did not add a record to
either of our tables, though Acupuncture does appear in the
decision section (but not the Summary Topics line) in the history
display.

So now I'm not sure that even asc_cite_summaries contains all of
the summary topics assigned to a summary. They seem to be
recorded somewhere else when adding a new topic.

The code used in the original import is also a little suspect.

There are two places in the code that update the two tables of
interest. In each case the code does the following:

Search to see if there is already an entry in the
cite_summaries table. If not, add one.

If the function returns True, meaning that a summary was
successfully added, then check the original_summaries table
and, if it isn't found, try to insert it there.

But there is a possible problem with the code. The function
returns True if it successfully inserts a row in the cite
table and False if the insertion fails. However if the function
finds that there is already a row for this in the cite table, it
doesn't attempt to insert one - which is fine, but then it still
returns a boolean value, one that was initialized by Visual
Basic, not by the programmer, to False. When False is returned,
the calling program treats this as an error, as if an insertion
were attempted and failed, and doesn't do anything with the
original_summaries table.

This might still be what the programmer intended. Perhaps the
programmer believed that it was reasonable to assume that if
there was a row in one table the row would already be in the
other (though we now know that isn't the case and, even if it
were, why should this invoke an error routine?) However I'd be
more comfortable with it if there were a comment in the code
explaining what was going on.

The only conclusion I came to was that a simple reading of the
code will not answer Bob's question. A hard slog through it will
be required if we really want to know what's happening.

Comment entered 2012-03-07 21:55:49 by alan

BZDATETIME::2012-03-07 21:55:49
BZCOMMENTOR::Alan Meyer
BZCOMMENT::93

(In reply to comment #91)
...
> I'll make a note here of the fact that the earlier notion of
> having a table to associate tags (and optional comments) with
> article-topic combinations has been replaced by the article
> state table.
...

This needs further discussion, but I'm thinking that my earlier
plan to combine status and tags isn't the best approach. I still
haven't caught up on all of the Bugzilla comments so I hesitate
to be more specific. Perhaps we can discuss the technical issues
tomorrow.

Comment entered 2012-03-08 09:59:43 by alan

BZDATETIME::2012-03-08 09:59:43
BZCOMMENTOR::Alan Meyer
BZCOMMENT::94

(In reply to comment #93)

> ...
> This needs further discussion, but I'm thinking that my earlier
> plan to combine status and tags isn't the best approach.
> ...

However, Bob's plan to store the "notes" from the old system's bib
table in the comment field for the "Imported" status in the status
still seems to me to be a good one. We might prepend an indicator
like "Old CMS note: " before the actual notes that are copied over.

Comment entered 2012-03-08 21:59:46 by alan

BZDATETIME::2012-03-08 21:59:46
BZCOMMENTOR::Alan Meyer
BZCOMMENT::95

(In reply to comment #92)
...
> I logged into the test database and added a "new path" to record
> 246801. I added board=CAM, Reviewer=CAM Reviewer, and
> Summary=Acupuncture. This did not result in any new summary_id
> in either the "cite" or "original" summaries table, and in fact,
> did not show the summary name in the history display for the
> record. I then added a "decision" for the record, selecting
> Acupuncture as the topic. This too did not add a record to
> either of our tables, though Acupuncture does appear in the
> decision section (but not the Summary Topics line) in the history
> display.
>
> So now I'm not sure that even asc_cite_summaries contains all of
> the summary topics assigned to a summary. They seem to be
> recorded somewhere else when adding a new topic.
...

I found where the summary topic ID was recorded. It appears in
the mt_decision_history table, but not the asc_cite_summaries or
asc_original_summaries tables.

That would seem to be a risky strategy on the part of the
original developers. There are at least three places to look to
find article <-> topic associations. I don't know whether or how
they tried to guarantee that every program needing to know a
summary topic ID for an article looked in all three places.

The implications for our conversion code are that we probably
have to look at the mt_decision_history table to find some of the
article <-> summary topic associations during conversion.

Searching through the SQL Server generated SQL code to create the
database, I didn't find any other tables that associated an
article with something called a "summary_id". So I'm hopeful
that it's just those three places, and not four or five.

Comment entered 2012-03-09 00:21:41 by alan

BZDATETIME::2012-03-09 00:21:41
BZCOMMENTOR::Alan Meyer
BZCOMMENT::96

Bob and I discussed status management this afternoon. We came to
a number of tentative decisions intended to make the conversion
easier and to support capabilities that we didn't fully realize
were required until the conversion was attempted.

Here are our tentative decisions:

1. We'll assign a unique automatically generated row ID primary
key to each row in the ebms_article_state table.

This will give us the flexibility to have rows in the table
that don't have topic IDs - which were otherwise required as
parts of the primary keys to uniquely identify an article
state.

We may or may not want to add new rows to the state table
that have no topic IDs, but it appears that we will have some
articles and states in the converted data for which we cannot
determine a topic ID.

2. Make topic ID an optional column in the state table.

In line with decision 1 above, it has to be optional if we
have to convert old data that has no determinable topic ID.

There may also be states for which a topic is irrelevant even
in the new system. If an article is sent out FYI to people
who are not reviewers of the topic assigned to the article,
why do we want a topic ID? If full text cannot be found for
an article, what does that have to do with a topic?

If we do this, we'll have to consider what to do when a user
requests information about an article with respect to a
particular board or topic. It will probably make sense to
show all states for that board or topic and also all states
for which no board or topic IDs were recorded.

3. Add a column for editorial board ID in the state table.

Where a topic is known, the editorial board will be derived
from that. Where a topic is not known but an editorial board
ID is known, we'll record that and make the topic null.

Some states, such as "Not listed" and "FYI", may be board but
not topic specific.

Board ID will also be optional. There may be some converted
records for which we cannot determine either a board or topic

  • though we might arguably not want to convert those.

If a topic is ever moved from one editorial board to another,
for example if a board is split into two boards, recording
both the board and topic makes it easy to see which board
assigned a particular state to a topic (though it might also
be possible to do so by dates using more cumbersome
techniques.)

4. Control board and topic requirements by state type.

We'll add columns to the ebms_article_state_type table to
indicate whether a board ID or a topic ID, both, or neither
is required for this state type and enforce the requirements
in application software rather than using DBMS integrity
constraints.

We expect most states to require both a board and a topic ID.

5. The new states identified by Robin will be added to the state
type table.

The only one that we discussed at length was the state
related to FYI assignments. Using the state table to record
FYI assignments has some limitations.

We've tentatively decided that we should go ahead and make a
state type for this, but other possibilities should be
discussed such as using "tags" or using a separate mechanism
for FYI - which adds more complexity to the system. Some
limitations we thought of are:

a. It does not support electronic distribution of
articles to specific reviewers by a program.

To use software to do this we'd have to have a place to
record board member IDs in the state table - something
not required for any other state (it would require
another table) and requiring extra software as well as
data.

My understanding is that Robin has no problem with that.
Distribution will be initiated by informal, manual means
and comments will be used to record who receives the
articles - as is done in the old system.

b. It is not clear whether what the "sequence" number for
this state should be.

It would seem to make sense to send an FYI anywhere in
the sequence of processing steps. No other steps depend
on it and it seems odd to say that the current state of
an article is that an FYI was sent out.

See below for more about sequence numbers.

Bob and I discussed the above issues enough to reach the above
tentative decisions for further discussion with users.

Some other status/state related topics that we identified but
didn't discuss enough to agree on tentative decisions are listed
below. These need further discussion include:

6. Should there always be a sequence number?

My original thinking on this (see comment #78) was that all
states require a sequence number. But FYI doesn't fit this
model well and there may be other types of things that we
want to record in the state table that aren't processing
sequence specific. Comment #78 said we should treat these as
tags, not states.

We could make sequence number null (or 0) for some state
types and treat these as states. If we ask the question,
What is the current state of this article for this topic,
we'd ignore any with state 0.

Or we could use tags, or something else. It requires
discussion.

7. Should we ever erase state information?

My original proposal in comment #78 was that if an article
reverted to an earlier state, later state information (i.e.,
states with higher sequence numbers for that article/topic)
would be erased.

I'm now thinking that's too radical. A better approach would
be to add an "active_status" column to each state table row.
If a step back is taken, e.g., a decision that an article
passed a review step is reversed, it's better to just
inactivate and hide the fact that that it once passed review.
We might have a user interface button that appears on a
history screen if and only if there is hidden, inactive state
information. If the user presses the button, we display all
of the history, including the parts that were superseded.

One reason I now prefer this approach is that someone can
look at the history of an article and think, "Wait a minute,
I thought I approved that article." Keeping the inactive
history makes it easy to figure out what happened and may
also make it easier to recover from mistakes, by
recreating an inactive state we can see in the history.

We probably don't want ever to just re-activate an inactive
state row. If we did that it would confuse the history - the
same row would represent two different dates. It would be
better in the above case to record:

Article passed review.
Article rejected in review.
Passed review state automatically inactivated.
Article passed review
Superseding (or automatically inactivating, see 8.
below) rejected state.

8. Are there states that should have identical sequence numbers.

In comment #78 I said No. I couldn't think of a use case.
But I now have one.

For every state that involves a pass fail, it would seem to
make sense to make the passed state and the failed state
equal in sequence number. The reason is that either decision
should supersede and inactivate ny previous decision of the
same type.

If an article fails initial review and then upon later
reconsideration it passes, we should inactivate the failed
state.

But if it passes initial review and fails on reconsideration,
then we should inactivate the passed state.

An easy way to do this is to give passed and failed the same
sequence number and say that the assignment of any state
inactivates any equal or higher sequenced state - unless it's
a state like FYI that doesn't participate in sequence
ordering.

9. How should we handle full text requested and retrieved or
not?

Bob raised this question regarding the "retrieved - no"
state. Cynthia explained some of the significance of the
issue. We don't want to leave the article in the full text
requested state forever if full text can't be retrieved -
having someone try and try to retrieve it forever.

I addressed this obliquely in comment #78 - discussing a
different aspect of the issue.

I thought then that this shouldn't be treated as a state but
I can see, as Cynthia pointed out, that it play's an
important role in workflow - which sounds a lot like a state
to me.

There are conversion issues here too concerning full text
retrieval in general. Many articles processed in the past
did not involve retrieval of PDF forms of the documents. I
can imagine cases where we might want to request PDFs for
some of them. But we certainly don't want to revert them to
the "Full text requested" or its retrieved yes/no states,
which is what would happen if we inserted state rows for
that.

We need to discuss it at greater length. The solution may
just be to retrieve such PDFs outside the state management
system. Full text was once retrieved. If it's retrieved
again, fine, but no change is needed to the state of the
article.

Knowing as we do that doing something right the first time is
vastly faster and cheaper than doing it wrong and re-doing it,
we'll want to get as many of these details right as we can.

Comment entered 2012-03-09 12:28:04 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2012-03-09 12:28:04
BZCOMMENTOR::Bob Kline
BZCOMMENT::97

I checked in a version of ebms.sql which incorporates the decisions captured in the previous comment. I started to set the board_required and topic_required column values, but for now I've left them all at the default of 'Y'; since you said you'll be using those in the API to enforce the rules, I thought it might be appropriate to ignore the values for the conversion of the legacy states, and let you come up with stricter values for the future state rows which don't have to accommodate the dirty data we're inheriting. Let's discuss next week.

Comment entered 2012-03-09 14:44:49 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2012-03-09 14:44:49
BZCOMMENTOR::Bob Kline
BZCOMMENT::98

Alan:

You were surprised yesterday when I told you there were rows in the history table for which the "review board" listed was not the board associated with the topic listed. I created a report which listed all such cases (there are 192 of them).

Comment entered 2012-03-09 14:44:49 by Kline, Bob (NIH/NCI) [C]

Attachment mismatched-state-topic-boards.txt has been added with description: States with mismatched topic/board combo

Comment entered 2012-03-09 16:51:08 by Juthe, Robin (NIH/NCI) [E]

BZDATETIME::2012-03-09 16:51:08
BZCOMMENTOR::Robin Juthe
BZCOMMENT::99

I'm attaching our decisions regarding article status values that can be assigned following review by Board members.

We would like to get your thoughts on the "Rejected for Agenda" step in particular. This category will include articles that should not be placed on a meeting agenda because there is consensus that they will not be cited in the summary and articles for which the change that was proposed is so minor that it doesn't warrant putting the article on the agenda for a meeting. In the first case, no further action is needed, and in the second case, additional action (by the Board manager) will be required. We have proposed an attribute on the "Rejected for Agenda" status to handle these different states (we would like to have a report down the road that could show only those citations requiring further action), but we aren't sure if that is the best solution. We're open to other ideas if you have any.

In terms of mapping data from the old system, the free-text comments in the "On Agenda, No Decision" category should be mapped to the new "On Agenda" status. This isn't a perfect fit, but it is the best fit given the different ways that field has been used in the old system.

The "Editorial Board Decision" values of "Yes" and "No" in the old system should be mapped as follows:

  • "Yes" should map to "Cited (legacy)"

  • "No" should map to "Not cited"

Thanks!

Comment entered 2012-03-09 16:51:08 by Juthe, Robin (NIH/NCI) [E]

Attachment Article Status Values - Post Discussion with BMgrs 3_8_12.doc has been added with description: Article Status Values Following Board Member Review

Comment entered 2012-03-12 14:29:24 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2012-03-12 14:29:24
BZCOMMENTOR::Bob Kline
BZCOMMENT::100

Alan:

I have checked in a new version of ebms.sql to handle the requirements from the previous comment. At first I considered using a NULL value for the meeting column to reflect cases in which the article was not discussed, obviating the need for a separate Yes/No column, but in the end I decided to take Robin's description literally, implying that it was necessary to record which meeting the decision was made at, even for articles which were not discussed at the meeting (though I'm not sure how a decision could be made without any discussion at all). Now that the state table has a real primary key of its own, I'm using that to link back to the state table from the board decision table directly, instead of re-specifying the article and topic. The free-text comment will go in the state table. Note that if the interface works as described in Robin's attachment to the previous comment, we'll have decisions for an article tied to more than one topic in some cases, which means a bit of denormalization as we repeat the comment in each of the rows we add to the state table.

I'm modifying the conversion software to use the new table definitions. Please look them over and let me know if you have any questions or concerns. We can discuss more fully tomorrow.

Comment entered 2012-03-12 15:47:19 by alan

BZDATETIME::2012-03-12 15:47:19
BZCOMMENTOR::Alan Meyer
BZCOMMENT::101

(In reply to comment #100)

> ... At first I considered using a NULL value for the meeting
> column to reflect cases in which the article was not discussed,
> obviating the need for a separate Yes/No column, but in the end
> I decided to take Robin's description literally, implying that
> it was necessary to record which meeting the decision was made
> at, even for articles which were not discussed at the meeting
> (though I'm not sure how a decision could be made without any
> discussion at all).

I guess that in this case we're thinking of the "meeting"
associated with a board decision as the meeting for which that
decision was immediately relevant, whether or not an actual
discussion took place at that meeting.

I will note here, mainly for Robin's benefit, that I'm a bit
uncomfortable with the possible confusion of a review cycle and a
meeting date. We may never have two meetings for one board in
one review cycle and, if we do, it may be okay to treat them as
if they were one meeting, but I think we might make that
perspective clearer by calling the column "cycle_id" instead of
"meeting_date".

> Now that the state table has a real primary key of its
> own, I'm using that to link back to the state table from the
> board decision table directly, instead of re-specifying the
> article and topic.

We should be prepared with a plan for what to do, if anything,
when a linked row in the state table is marked "inactive". That
should be rare but on the principle that anything that can happen
will happen (amply demonstrated in the existing CiteMS), we need
a position on it. The position might be:

Fine, it's linked to a state that is no longer active, so
what?

But I'd like a comment in the code or the SQL that says so. And
we'll need to be sure that our software traces links into all
states, not just active ones.

> The free-text comment will go in the state table. Note that if
> the interface works as described in Robin's attachment to the
> previous comment, we'll have decisions for an article tied to
> more than one topic in some cases, which means a bit of
> denormalization as we repeat the comment in each of the rows we
> add to the state table.

That seems fine to me. We probably shouldn't regard identical
comments as denormalizations of each other - even in this case
where both comments were generated by the same keyboard action.
In principle, two comments are independent, whether they have the
same text or not.

> I'm modifying the conversion software to use the new table
> definitions. Please look them over and let me know if you have
> any questions or concerns. We can discuss more fully tomorrow.

I think tomorrow will be a big meeting.

Comment entered 2012-03-15 22:21:34 by alan

BZDATETIME::2012-03-15 22:21:34
BZCOMMENTOR::Alan Meyer
BZCOMMENT::102

I've designed the function to set status in the new system.
There are parts of what I'm proposing in the design that might be
controversial, so I'm posting the ideas here before I actually
write the code in case anyone disagrees:

The calling program passes all of the status information for the
new state:

article id
state name (e.g. "Passed initial review")
user id
date/time (uses current date time if not passed)
topic id (may be null for some states)
board id (may be null for some states)

Steps:

1. Find out everything about the state from the state_type
table.

2. Validate passed parameters:

State name is known.
State type is active and allowed to be used.
Topic id is present if required.
Board id is present if required, or can be derived from a topic.
Passed board ID matches the board for the topic.

3. Create the new row in the article state table.

(Now comes the only part I take to be controversial)

4. Inactivate any rows that must be inactivated:

If the type sequence number of the new state is not null:

If any active states exist for this article with higher
type sequence numbers:

If there is a topic associated with the new state:

  • Inactivate any active rows with the same article id,
    the same topic, and a higher type sequence number.

Else if there is a board associated with the new state:

  • Inactivate any active rows with the same article id,
    the same board, and a higher type sequence number.

Else

  • Inactivate any active rows with the same article id
    and a higher type sequence number.

Inactivating a state will do the following:

Set the "active_status" column to 'I' (inactive).

Append a program generated comment to any existing comment
with content something like:

"State inactivated 2012-03-15 by user 'AHM' setting
status = 'Rejected by Board Manager' for board 'Adult
Treatment' and topic 'Lung Cancer Treatment'. See
state row = 123456."

What this means is that if we have a state like "Rejected by
Board Manager", and if we decide that it requires a board but not
a topic, then:

If a user sets the status "Rejected by Board Manager" and
names a specific topic, then any higher states like "Passed
full text review" will be inactivated IF AND ONLY IF, they
have the same topic id.

But if the user sets "Rejected by Board Manager" and only
specifies a board, not a specific topic, then ALL higher
sequenced states that pertain to that board, even if they
have various different topics, will be inactivated.

Similarly, if we have any state types that don't require a board
or topic (I don't think we'll have any of these with non-null
sequence numbers - so this should be a moot point), and if no
specific board or topic is specified, we would inactivate
EVERY active state with a higher sequence number.

Comment entered 2012-03-15 22:39:23 by alan

BZDATETIME::2012-03-15 22:39:23
BZCOMMENTOR::Alan Meyer
BZCOMMENT::103

This is a technical comment to Bob. Others can ignore it but I
thought it would be desirable to have the comment in the Bugzilla
record rather than in a more ephemeral email exchange.

I believe it is important to have a single API function to set
the state of an article because there is a lot of checking to do
and some side effects (inactivating higher numbered states) that
need to be implemented consistently, with no exceptions.

However I am of two minds as to whether we need API based
retrieval functions.

Retrieval functions would centralize retrieval functionality,
which might reduce code duplication. However there are also
disadvantages to centralization.

1. We don't know if there would actually be much code
duplication without it.

2. It's not easy to provide all of the functionality that every
calling program could want - especially if the article state
information needs to be joined with information from other
tables like the article, user, board, topic, article_tag, and
the state and tag type tables.

I originally envisioned building a kitchen sink function that
gets everything a caller could possibly want and implements lots
of parameters to control what comes back. But that might be
overkill that adds complexity with little useful purpose. It
might be better to just implement some views to simplify access
where that would be helpful and have the programs that need
status information just directly execute SQL to get what they
want.

That's the way I'm leaning now.

I suspect that you lean that way too, but since I originally
proposed the other approach, I'll ask now about which direction
you are leaning towards.

Comment entered 2012-03-16 09:47:51 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2012-03-16 09:47:51
BZCOMMENTOR::Bob Kline
BZCOMMENT::104

(In reply to comment #103)

> That's the way I'm leaning now.
>
> I suspect that you lean that way too, but since I originally
> proposed the other approach, I'll ask now about which direction
> you are leaning towards.

I think we're pretty much on the same page. Matches the approach taken by the CDR Server: tightly controlled writes, flexible reads.

Comment entered 2012-03-16 09:59:53 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2012-03-16 09:59:53
BZCOMMENTOR::Bob Kline
BZCOMMENT::105

(In reply to comment #102)

> I've designed the function to set status in the new system.
> ....
> The calling program passes all of the status information for the
> new state:
>
> article id
> state name (e.g. "Passed initial review")
> user id
> date/time (uses current date time if not passed)
> topic id (may be null for some states)
> board id (may be null for some states)

I would put the user id and date/time last, since they're the most likely to be defaulted.

When you wrote about invalidating states with higher sequence numbers, I think you "higher or equal" sequence numbers.

It's tempting to have separate columns for user comments and system notes.

Comment entered 2012-03-20 22:23:34 by alan

BZDATETIME::2012-03-20 22:23:34
BZCOMMENTOR::Alan Meyer
BZCOMMENT::106

I have attached an excerpt from ebms.sql, the script that creates
our EBMS database. The excerpt contains modified definitions of
the state type table, "ebms_article_state_type", the table that
defines what states an article can be in.

The changes are:

1. Fixed the "completed" flage for the "Imported" state. An
article in this state is not completed (this came up in one
of our meetings.)

2. Added a new state called "Ready for initial review".

Bob and I decided, in part because of conversion issues and
in part because an import is logically distinct from a topic
assignment even if it's done as one user action, that we
would not associate a topic ID with the Imported state.
However, topics do need to be assigned so we created a new
state to show that a topic was assigned and the article is
ready for initial review. I voted to call it "Topic
assigned" or "Topic associated" but Bob much preferred "Ready
for initial review" and that has advantages too. So that's
what we're calling it.

Our original idea was to sequence this after "Rejected by NOT
list" and before "Rejected in initial review" on the theory
that NOT list processing is board, not topic, specific.
However if we do that, the knowledge that a particular
article was imported for a particular topic would be lost
since, once the article was rejected by the NOT list, no
further processing should be done on it unless we intend to
activate it again. So I put this state before the NOT list
state.

3. Changed accepted/rejected state pairs to have the same
sequence numbers.

This is so that if an accepted/passed or a rejected state is
superseded by its opposite, the software will inactivate the
other. For example:

"Passed Board Manager"

will inactivate:

"Rejected by Board Manager"

and vice versa, if the opposite state exists and has the same
topic or same board - depending on how the board manager
accepted or rejected it.

A couple of other pairs were also changed.

4. Changed the "completed" indicator in "Flagged as FYI" from
'N' to 'Y'.

It is my understanding that, when an article is flagged as
FYI, that's the end of the line for it. If I got that wrong,
let me know.

We had talked about giving this FYI state a null sequence
number, the idea being that any article could be sent as an
FYI to someone without stopping further consideration of it.
But if my new understanding of FYI is right, that's not true.
FYI means this article is done. If so, then we want a
sequence number and we want to inactivate any later states.

We haven't committed to using the completed flag for anything
but I think we should make it accurate in case we do commit.
If nothing else, it's documentation.

5. Changed the description of "Passed initial review".

From:
'Article "published" for board manager review'
To:
'Article passed by initial reviewer, now ready for Board
Manager review'

"Publishing" meant something related but different in the old
CiteMS. I revised the text to be more accurate.

6. Updated sequence numbers to accommodate the changes.

I have NOT updated the test conversion database with the new type
values. I wanted to put these changes out for review by the
project people before doing that. Also, it will be easier to
install the changes if we do a repeat conversion and just re-run
the ebms.sql script before hand. So I'll hold off in hopes that
I don't have to make these changes in the database by hand.

Comment entered 2012-03-20 22:23:34 by alan

Attachment StateTypes.sql has been added with description: Updated state type definitions

Comment entered 2012-03-23 00:48:45 by alan

BZDATETIME::2012-03-23 00:48:45
BZCOMMENTOR::Alan Meyer
BZCOMMENT::107

(In reply to comment #105)
...
> It's tempting to have separate columns for user comments and system notes.

I'm proceeding without a separate column on the theory that system messages can be made very self-evident. But if you're strongly tempted, let's discuss it further.

Comment entered 2012-03-26 15:50:21 by Juthe, Robin (NIH/NCI) [E]

BZDATETIME::2012-03-26 15:50:21
BZCOMMENTOR::Robin Juthe
BZCOMMENT::108

(In reply to comment #106)

In general, when an article is sent as an FYI, that is the end of the line. We do not send a coversheet with the article soliciting Board member comments on the paper. However, we do sometimes get comments back and Board members might suggest that the paper be discussed in a future meeting. This is rare, but it does happen. I think what you propose is OK as long as we could easily activate those future steps in the event that we do receive comments about an FYI article or we decide for some other reason that it needs to go onto a meeting agenda.

Comment entered 2012-03-26 16:19:58 by Juthe, Robin (NIH/NCI) [E]

BZDATETIME::2012-03-26 16:19:58
BZCOMMENTOR::Robin Juthe
BZCOMMENT::109

All of the Board managers met on Friday to continue discussing report specifications (I will post those to the EBMS issue soon, Bob), and we talked further about article states and we would like to request two changes to the set of new states that we're adding after Board member review. I'm not sure if we're too late in the game to make additional changes, but we figured it didn't hurt to ask.

1. Rejected for Agenda. We had proposed using an attribute to distinguish between those articles that required further action from those that did not require further action. Instead, we think it would be cleaner to split this into two independent states. These are:

  • No further action

  • Changes not for agenda

2. Approved for Agenda. We would also like to break this down into two separate states. These two values are:

  • For future agenda (with changes)

  • For future agenda (discussion only)

If I understand sequence numbering correctly, I am thinking all 4 of these values would have the same sequence number.

We also had another question about states. We are thinking it would be helpful to have a report to identify which articles have been sent out for review but have not been assigned a later status by a Board manager (in other words, one of the 4 states described above has not been given to the article). This would be a way for us to tie up loose ends and be sure we don't miss any important papers that should be discussed in a meeting. Do we need an additional state for "included in packet"?

Comment entered 2012-03-26 16:33:31 by alan

BZDATETIME::2012-03-26 16:33:31
BZCOMMENTOR::Alan Meyer
BZCOMMENT::110

(In reply to comment #109)
> ... I'm not sure if we're too late in the game to make
> additional changes, but we figured it didn't hurt to ask. ...

The principal motivation behind using a state table rather than
hard wiring states into the code was precisely so that we could
make changes. Right now, there shouldn't be any problems adding
states.

Later on there will be more code keyed to specific states and
more work will be required to change things, but hopefully it
will still be reasonably flexible.

I don't think any of my code is affected by the requested changes
but Bob will have to weigh in. It's his code that works with
agendas.

...
> We also had another question about states. We are thinking it
> would be helpful to have a report to identify which articles
> have been sent out for review but have not been assigned a
> later status by a Board manager (in other words, one of the 4
> states described above has not been given to the article). This
> would be a way for us to tie up loose ends and be sure we don't
> miss any important papers that should be discussed in a
> meeting. Do we need an additional state for "included in
> packet"?

One possibility is to produce a generic status report or a
generic search in which a user chooses states from a drop down
list, or maybe two of them, one for for states of articles a user
wants to see on the report and one for states that must not be in
the history of those articles. We could of course combine that
with review cycle, editorial board, and other selection criteria.

That may be worth doing if we have lots of state based reports.
Even if we "can" individual reports we could have something inside
that works that way to select and format the data for all of
them.

Or maybe not. My first instinct is often to over-abstract
things. So as my son tells me, I'm just saying ...

Comment entered 2012-03-26 16:35:03 by Kline, Bob (NIH/NCI) [C]

BZDATETIME::2012-03-26 16:35:03
BZCOMMENTOR::Bob Kline
BZCOMMENT::111

(In reply to comment #109)

> We also had another question about states. We are thinking it would be helpful
> to have a report to identify which articles have been sent out for review but
> have not been assigned a later status by a Board manager (in other words, one
> of the 4 states described above has not been given to the article). This would
> be a way for us to tie up loose ends and be sure we don't miss any important
> papers that should be discussed in a meeting. Do we need an additional state
> for "included in packet"?

I'll let Alan address the other questions about states, but I believe the answer to this one is "no" - the system already has enough information to know which articles have been included in packets without having to consult the state table.

Comment entered 2012-03-26 16:42:41 by alan

BZDATETIME::2012-03-26 16:42:41
BZCOMMENTOR::Alan Meyer
BZCOMMENT::112

(In reply to comment #108)
> (In reply to comment #106)
>
> In general, when an article is sent as an FYI, that is the end of the line. We
> do not send a coversheet with the article soliciting Board member comments on
> the paper. However, we do sometimes get comments back and Board members might
> suggest that the paper be discussed in a future meeting. This is rare, but it
> does happen. I think what you propose is OK as long as we could easily activate
> those future steps in the event that we do receive comments about an FYI
> article or we decide for some other reason that it needs to go onto a meeting
> agenda.

The idea is that any state can be overridden by a higher sequence state, or inactivated by an equal or lower sequence state. The "current" state of an article is the highest sequenced, active state. So if an article is assigned "FYI", and then something else happens, either the FYI will be inactivated if an equal or lower sequenced state occurs, or it will remain active but be superseded by any higher sequenced states that are added.

The way we've designed the state table there is a "completed" (Y/N) flag that says whether a particular state is the end of the line. We don't know right now whether we're actually going to use that for anything. It might just be documentation of the intent of that state. But we might possibly use it to indicate whether an article in a completed state should be part of any workflow queue. That's what I had in mind for it. But even if we do that there should be no problem adding higher sequenced states and getting the article moving again.

Comment entered 2012-03-26 16:47:04 by alan

BZDATETIME::2012-03-26 16:47:04
BZCOMMENTOR::Alan Meyer
BZCOMMENT::113

(In reply to comment #112)
> ... So if an article is assigned
> "FYI", and then something else happens, either the FYI will be inactivated if
> an equal or lower sequenced state occurs ...

That language is admittedly strange. When an FYI occurs and articles are sent out to board members, changing that state to "inactive" doesn't pull the articles back. It's more like Richard Nixon's pronouncement that an earlier statement he made is "inoperative". It's out there but we're ignoring it 🙂

Comment entered 2012-04-03 15:22:15 by Shields, Victoria (NIH/NCI) [E]

BZDATETIME::2012-04-03 15:22:15
BZCOMMENTOR::Victoria Shields
BZCOMMENT::114

Margaret, Robin, and I met with Cynthia and Minaxi and included their feedback on the CiteMS issues we discussed on 3/20/12.

Comment entered 2012-04-03 15:22:15 by Shields, Victoria (NIH/NCI) [E]

Attachment CiteMS Feedback 3-30-12.odt has been added with description: CiteMS Feedback including comments from Cynthia & Minaxi

Comment entered 2012-04-10 11:37:48 by alan

BZDATETIME::2012-04-10 11:37:48
BZCOMMENTOR::Alan Meyer
BZCOMMENT::115

I have updated the section on Descriptive Tagging from the Requirements Document and extracted it here as an attachment to use for today's meeting on tagging. The changes largely parallel similar changes made for status management - namely that comments are immutable, that there can be multiple comments attached to a single tag, and that we "delete" tags by inactivating them rather than by actually erasing them from the database.

Comment entered 2012-04-10 11:37:48 by alan

Attachment DescriptiveTagging.txt has been added with description: Descriptive tagging

Comment entered 2012-04-11 00:45:06 by alan

BZDATETIME::2012-04-11 00:45:06
BZCOMMENTOR::Alan Meyer
BZCOMMENT::116

This draft reflects some changes relating to article states and descriptive tags, plus small changes in a few other areas of the document. It supersedes previous versions and the "Descriptive tagging" notes attached in comment #115.

It is still citation management specific - no EBMS specific material is included.

Comment entered 2012-04-11 00:45:06 by alan

Attachment newCiteMSNotes3.1.4.html has been added with description: New CiteMS requirements outline - Draft 3.1.4

Comment entered 2012-10-22 11:40:13 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2012-10-22 11:40:13
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::117

We decided to close this issue since notes on the new system are no longer being tracked in this issue.
Marked as "RESOLVED"

Comment entered 2012-10-22 11:44:31 by Osei-Poku, William (NIH/NCI) [C]

BZDATETIME::2012-10-22 11:44:31
BZCOMMENTOR::William Osei-Poku
BZCOMMENT::118

Issue closed.

Attachments
File Name Posted User
Article Status Values - Post Discussion with BMgrs 3_8_12.doc 2012-03-09 16:51:08
CiteMS_Notes_Draft2.doc 2010-11-04 22:57:58
CiteMS Feedback 3-30-12.odt 2012-04-03 15:22:15 Shields, Victoria (NIH/NCI) [E]
createdb.sql 2011-11-16 00:25:18
createdb.sql 2011-11-16 00:17:36
createdb.sql 2011-03-21 18:23:55
DescriptiveTagging.txt 2012-04-10 11:37:48
ebmsalan.sql 2012-01-09 14:49:45
ebmsalan.sql 2012-01-05 23:57:43
Importing&Publishing_Req_Doc.doc 2011-04-01 15:21:24
Journal Management in a New CiteMS.doc 2010-11-16 20:49:18
mismatched-state-topic-boards.txt 2012-03-09 14:44:49
Missing Source Field.doc 2012-01-09 15:01:51
Missing source field.htm 2012-01-06 14:55:33
New Citation Management System Requirements and Notes_7-7-2011_CB_MT_Notes.doc 2011-07-07 16:55:29
newCiteMS_API_1.0.html 2011-12-29 10:08:26
newCiteMSNotes.html 2011-03-18 01:02:22
newCiteMSNotes.html 2011-03-16 00:33:30
newCiteMSNotes.html 2011-02-25 01:43:33
newCiteMSNotes3.1.1.html 2011-07-19 16:48:50
newCiteMSNotes3.1.4.html 2012-04-11 00:45:06
newCiteMSNotes3.1.html 2011-07-13 21:57:07
newCiteMSNotes3.html 2011-06-22 00:29:35
Re Guidelines for proper submission of bulk requests to the Entrez API.eml 2011-08-30 10:14:38
StateTypes.sql 2012-03-20 22:23:34

Elapsed: 0:00:00.001305