New Citation Management System Requirements and Notes

Draft 3.0 June 21, 2011

This draft document is an attempt to specify requirement in outline form for a new Citation Management System ("CiteMS").

1 Organization of the document
2 Data to maintain
3 Functionality in the New CiteMS
4 Design Notes

1 Organization of the document

The document is divided into three main parts:

1.1 Data

This describes the information that must be maintained in the system. It is the foundation of any information system.

The data is described as a set of information objects or categories, for example, citations, journals, persons, etc. For each category I've documented what I take to be the purpose of maintaining the information, followed by notes regardings its content, use, source, etc.

1.2 Functionality

This describes the basic functions that the system has to perform. Again, I've organized the functions into categories. For each category I've described the purpose of functions in this category followed by a list of what I take to be logical functions, for example retrieving and displaying certain information or supporting the entry and processing of new information.

1.3 Design notes

These are notes on requirements or possible designs that don't comfortably fit into the above two categories. In most cases they aren't really requirements so much as ideas for design that came up while thinking about the requirements and which I've recorded here so that they won't get lost.

2 Data to maintain

The tag ":linked:" following an information category means that the source and the control, of this data is outside the CiteMS. If an item is marked as linked, a "Source" note says where the master data comes from.

In some cases, the source can be accessed directly. If the source is the CDR or EBMS system, we may be able to leave it there and access the data directly in real time.

If the source is PubMed, the data is accessed by importing it in batches. We will also want periodic batch maintenance processes that verify that the information we imported from PubMed is still accurate. If PubMed has updated the information, we will probably want to get the updates.

2.1 Citations linked

2.1.1 Purpose

Provide information to identify a single journal article, book, book chapter, or whatever, that can be cited. Information may be used in the new CiteMS itself in reports, displays, correspondence sent to physicians, etc. and may also be sent to the CDR, EBMS, or other consumers of citation data from the CiteMS.

2.1.2 Notes

"bib" table in old system
- Currently contains plain ASCII text loaded from Medline format
- Not maintained. Changes in PubMed are not reflected.
New system
- Download xml from PubMed instead of parsing the Medline format
  - Much easier to parse
  - Contains much more information
  - Unicode, better representation of non-English journals
- Never edit PubMed info
  - This is PubMed controlled data, not ours
    We do not need to update the content of PubMed citations. Even if there are errors in the data, we might be better off ignoring them than correcting them. See the following point about refreshing data.
  - We can periodically refresh it from PubMed to update with any changes
    PubMed does not make a lot of changes to records once they are in the system. MeSH headings change, but they are not significant to CiteMS. Bibliographic changes are relatively rare. We might, say once per year after the new MeSH is incorporated into PubMed, refresh all of our data from PubMed using an autmoated automated process.
  - Pre-Medline citations
    Some searches only download citations from PubMed that are fully indexed and processed. Others also allow "Pre-Medline" citations that contain partial records that will be supplemented and corrected later. See the Import functions for a discussion of this.
- Allow other sources of citations
  All citations currently come from PubMed. This may continue to be true for the life of the new system on into the future, however it is conceivable that at least some citations to literature outside of PubMed might be considered to be in scope in the future.
  It makes sense to use data structures and hooks that will enable us to add non-PubMed data in the future without having to re-design the system. Our goal should be to not write software or allocate significant resources for the support of non-PubMed data unless and until we actually want to use such data, but to design the system to accommodate the changes if and when we do need them.
  
  Accommodating non-PubMed data in the design might use fairly simple techniques like marking each citation with a source indicator and only performing PubMed specific operations on a citation if the source is PubMed. All citations would be PubMed in the initial system and there might never be any other sources, but it might cost very little to add source checks where they are needed.
  
  It is very possible that if and when we do allow non-PubMed data into the system, we will be adding very small numbers of records, not importing large batches as we do from PubMed. If so, the most cost effective way to import such data might be to type it in rather than to write new import modules. If the new system uses XML, this might be as easy as using the PubMed XML schema with an off-the-shelf XML editor like XMetal, and ensuring that the import program does not require the source of the citations to be PubMed.
- Source
  PubMed. Maybe also other sources in the future.

2.2 Journals linked

2.2.1 Purpose

Every citation record should link to a journal title that is stored only once. If the journal title changes, is replace, etc. there is only one place to update it and one unique ID to link citation to it.

2.2.2 Notes

Journal complexity
- Titles change
  A citation published under the old form of the title might need to be recoverable under that form and under the new form. For example, a report on what citations we have from journal ABC might or might not look at previous versions, depending on what the report user wants.
  This is easy to do if the link is to a journal record that has a history of title strings associated with it. It is the journal unique ID and the journal itself, not the title string, that a citation is linked to.
  
  Note that two journals can have the same title string. The strings should not be considered to be always unique, especially over time.
- Titles merge and split
  This is a tricky problem having a significant impact on how we record the history of a title, assuming we do so.
Relegate all journal management to NLM
Using NLM as the authority for our journal titles may have small drawbacks but has enormous benefits.
Ideally, we should do zero management of journal titles. We should download them and, when necessary, apply updates, automatically as needed from NLM.
Journal title searching
Question: Do we need to keep cross references from old versions of journal titles, possibly in a special table, or possibly in the audit trail?
There are at different possibilities in searching that depend on our decisions regarding title history:
- Allow searching on old versions of titles?
  If a user can search on two different versions of a title and get a unified result set, we need to keep the old versions.
- Separate search results on old and new versions of titles?
  The problem is different if we need to retrieve separate results when searching on old and new versions of a title. For example, if title A1 became A2 on January 1, 2011 and we want a search on A1 to retrieve citations from before 1/1/11 and a search on A2 to retrieve citations on or after that date.
- Unified search?
  Still another possibility is to only store the last version of a title and only permit searching on that.
- Kitchen sink?
  Hardest of all is to allow any of the above three search types, under user control.
Source
PubMed. Maybe others in the future.

2.3 NOT lists

2.3.1 Purpose

Some journals have been found to be of little or no value in cancer research, but their articles show up when searching for a topic.

Listing the journals individually in a PubMed search is not as practical as keeping a list in the CiteMS and weeding the articles from those journals out at import time.

NOT lists are currently editorial board specific.

2.3.2 Notes

NOT list content
The old system used the title of a journal as the value of a NOT list entry. This caused thousands of errors.
The new system should use journal title unique identifiers. That way there should never be an entry in a NOT list that doesn't exist in the journal title table.

2.4 People linked

2.4.1 Purpose

Store each person associated with the system only and exactly once. Any information we need to associate with that person can be associated with this record, e.g., address, phone number, affiliation, titles, etc. This is what we did in the CDR except that the CDR also has separate user records - which may or may not be something we want to normalize away

It looks like we can have a single person table for all users in the system and have separate structures that define roles and permissions (see below) to which people are linked.

2.4.2 Notes

User descriptions
Having a single person table enables us to use a common format for all people for storing descriptive information - name, affiliation, contact details, etc., and common software for many user oriented functions such as sending mail or email to a user, formatting a display of user information, etc.
User profiles
It is desirable to have user editable profiles that enable them to control the way the CiteMS behaves for them.
Example: Robin finds the default/fixed search results page length of 5 citations with abstracts or 10 without to be too limiting. She'd prefer much longer outputs with more citations on one display. This might be in a profile - possibly as a user setable value, or possibly as a remembered last value selected.

There should be a standard way to create new slots in a profile, a standard way to display and edit them, and a standard way for software to access them.
Source
EBMS.
The largest number of people in the system are board members and managers, who are all in the CDR, where full records are already kept. The new EBMS will also need the same people. If the people records are in a single table shared by EBMS and CiteMS, then whatever technique EBMS uses to stay in sync with the CDR should automatically keep CiteMS in sync too.

A small number of people are only involved in literature surveillance and do not participate in EBMS activities. However it may still make sense to use a single database for all people, with one authentication and authorization system.

2.5 Roles and Permissions

2.5.1 Purpose

Different people in the system will have different roles and different things they are permitted to do. Currently there are four distinct roles in the existing CiteMS that see different functions on menus and/or have different permissions.

It's possible to setup custom tables for each type of permission, but it would be more flexible for the future to have a scheme that allows new roles to be added as required.

Possible roles might be: board member, board manager, CIPS reviewer (not always a board manager), citation importer, print manager, system administrator.

Typical permissions might be: import citations, perform initial review, save full text, etc. - corresponding to any system functions that require permission control.

2.5.2 Notes

Adding roles and permissions
Presumably, the total number of roles and permissions is small and stable. It's the IDs of the people in each role that change. We may not need a sophisticated user interface to create new roles and permissions but can simply directly modify the values in the tables if and when software modifications define new roles and functions.
However, we do need a user interface for all non-programmer systems administrators to assign and de-assign users to roles and perhaps to assign and de-assign permissions to roles.
Many to many user <-> role relationships
A single user might perform multiple roles in the system. A single role might be performed by multiple users.
Context specific permissions
Many roles and permissions exist in the context of a particular board and possibly a particular summary topic. There may be a generic way to represent this in the database by saying that a particular role or permission is board specific or summary specific. In that case, the authentication software would check not only that the user has the required role to perform an action, but is also a member of the requisite board or summary topic review group that is associated with the citation or other object on which the action will be performed.
If we store user specific preferences in a user specific profile, then only that user or a systems administrator should be able to change the profile.

2.6 Control tables

2.6.1 Purpose

These are domains of valid values with which objects (people, citations, actions, etc.) can be associated.

2.6.2 Notes

Components of a control table entry
Each entry in a control table has at least the following properties. There may be quite a few others too.
- A unique identifier
  This is a number that is used for linking control information to other data objects in the system. There are some areas in the old system where unique identifiers should have been used but were not.
- A human readable character string
  This is what identifies the control information to users. Sometimes it changes.
  When there is a change in the language of a control value but not its function, the character string changes but the unique identifier should stay the same.
  
  When there is a new function added to the system it should always get a new unique identifier as well as a new string.
- Active status
  If a control value has been used in the past but is no longer used it may be necessary to keep the value in the table so that, when looking at old records or old activities, we can display the human readable value for the events that took place in the past.
  However the values need to be marked so that they will not be used again.
Examples

Here are some examples of control tables. There will be more than just these.
- Citation decisions/actions
  - Purpose
    Actions taken for a citation, request full text, send for review, etc.
    Had ten values. They look a bit odd to me, but I presume they are what the users wanted.
  - Source
    Created by Board Managers and Literature Surveillance staff Currently stored in lu_decision in the existing CiteMS.
- Categories of evidence
  - Purpose
    Contains strings that characterize the nature of the evidence presented by a research study. Some are descriptions of how a study was conducted ("Randomized controlled clinical trials. Double-blinded") and some look like end points measured by a study ("Total mortality", "Indirect surrogates-Event free survival").
  - Notes
    - Board specific usage
      Not all boards use the same categories of evidence. Some appear to use just the descriptions of how a study was conducted while others appear to use just the study end-points. Others may not use this at all.
      The use of categories of evidence is currently under review. There may be an effort to consolidate and systematize their usage. Whatever we do will have to be flexible enough to acommodate change.
    - Source
      These are NCI designated strings, under our control.
- Responses
  - Purpose
    Records what a reviewer/board member says should be done about a citation.
  - Notes
    A citation may receive more than one response, possibly including more than one from the same reviewer, for the same board, and same topic (Is that right? Even if not, it doesn't seem like a bad idea to enable it.)
    - Relationship to "citation decisions"
      The difference between "citation decisions" and "responses" appears to be that the former are dispositions made by CIPS/CIAT staff and the latter are dispositions made by board member reviewers.
    - Source
      Devised by the lit surveillance group.
See references?
Since the names of controlled values sometimes change, it may be desirable to keep see references from the old values. A user searching on the old value sees everything.
The problem of old and new values may have been more serious in the old system since it appears that, in some cases, linking was done by character string value instead of by unique ID. When the value changed, it could become necessary for a user to search on both old and new values to find all citations with the same logical status. The new system should always use unique IDs (see above), eliminating this problem.

2.7 Editorial boards linked

2.7.1 Purpose

Editorial boards are composed of groups of scientists and clinicians organized by NCI to review broad areas of cancer research. Each board has its own board manager and its own working style that affects many aspects of the management of citations for topics handled by that board.

Citations are associated directly with summary topics and summary topics are associated directly with editorial boards. One citation can be associated with multiple summary topics but each summary topic can only be associated with one board. There are concepts that have application across editorial board boundaries but, where that happens, different summary topics are created.

Editorial boards have a name, a board manager, an associated list of members, possible sub-boards, and other things.

2.7.2 Notes

Source
Board names are used in the CDR, cancer.gov, the new EBMS, and probably elsewhere. The data must match the editorial board names used elsewhere.
Board names need not be physically linked to other systems since the number of boards is very small and volatility is very low. It would be acceptable (I think) to manually update the CiteMS board list if and when a change occurs in PDQ Board designations, even though the same changes are also made manually i other systems.

However, see the section on linking below.
Sub-boards
Pediatrics and Genetics currently contain smaller groups. The ability to create smaller groups within a board should be built into the system and usable by any board that wishes to do so. Some boards that do not now divide into sub-boards might do so in the future.
Question: We'll need to figure out just what a sub-board is. What kinds of objects are associated with sub-boards: summary topics, members, categories of evidence, other things?

Question: Will we ever need sub-sub-boards? Are the editorials boards large enough to ever require it? Do we need to generalize the grouping of boards within boards within boards …?

Question: Do sub-boards need to be visible to the system at all, or are they better treated as methods of organization outside the system? Are they just informal groups of board members with a particular interest that do not need any special recognition in the system?

The issues of sub-boards may best be resolved by meeting with all of the board managers together so that they can hear each other's views as well as communicate their own.

[That's probably a good idea for lots of issues.]
Source
EBMS.

2.8 Review cycles

2.8.1 Purpose

The work of literature surveillance is broken down into monthly cycles. Each batch of citations entered into the system is identified as belonging to the monthly cycle in which it was entered.

The same citation processed in one monthly review cycle can also be associated with a different review cycle for another board or another Summary Topic.

Review cycles are heavily used in reports.

2.8.2 Notes

History
The notion of a "review cycle" is inherited from the pre-CiteMS processes.
Review cycles permeate the system but their actual use is more significant to some kinds of users than others. They may play a larger role in the import and initial review process, before citations get to board managers, than they do after that.

Review cycles are roughly, but not rigorously, synchronized with calendar months. The work of a review cycle might take five weeks or three weeks and a new review cycle might be established a little earlier or or later than the actual first day of a month.

Question: What are all the tasks that are currently slaved to review cycles?

Question: What would happen to the tasks currently organized into review cycles if they were freed from that organization and allowed to "float" on more fluid date-time boundaries?
"Fast tracked" citations
Sometimes citations are imported into the system immediately as a result of a board member or manager request, independently of the usual topic searches and, in effect, independent of the review cycles. The concept is analogous to the "hotfix" publishing concept in the CDR.
Fast tracked citations may be new or may be old, for example old citations from before the first CiteMS became operational but they are important to have in the system.

The fast track capability is important and must be preserved.
Alternatives to review cycles
Even in the current system, almost every action pertaining to a citation is date-time stamped. It is possible that we could just rely on date-time stamps for synchronizing events, but treat the interval periods flexibly - using months for some purposes, weeks for others, longer periods for others.
Alternatively, we could continue the review cycle process but allow all processes that use it to "float" their cycles in whatever way is best for that process.
Reasons for keeping review cycles

Although alternatives are possible, there are at least a few very important areas where the review cycle concept would seem to be central to operations. These include:
- Batching PubMed searches
  PubMed searches are run to find all new citations since the last search. If we were to run all searches every day, finding any citations created in the last day, and import them into the system, there would be an explosion of the search and import workload. The benefit would be very small at best, and very likely non-existent. Downstream processes, including the cycle of board meetings, which certainly do not occur every day, would mean that the final decisions based on our literature surveillance would be held up and batched anyway.
  It makes sense to run the searches in batches over a reasonable period of time - not so long a time that we get behind in our appreciation of new scientific developments - not so short a time that we are making unnecessary work for ourselves.
- Batching CIPS based reviews
  The initial review is done one record at a time. However, the reviewer (usually Cynthia) sometimes finds that her opinion of the value of a citation is altered by what she sees later in her review. For example, it is possible that a better article is found on the same topic, or even that an article seen later proves that the conclusions published in the one seen earlier can't be right.
  By batching the process and "publishing" all of the initial review decisions for an editorial board at once, the reviewer can better use the knowledge gained from the whole process.
  
  The same principle also applies to reviews done by Board Managers and other CIPS staff reviewers. The decisions regarding what full text is worth reading and what is worth sending to board member reviewers can also be better informed if the final decisions are released all at once instead of at the time that each article is considered.
- Batching distribution of articles to board members.
  Currently, whether mailing by FedEx or distributing articles electronically, we try to send all articles that a board member has to review in one batch. Besides saving on delivery costs (not a factor for electronic delivery) batching the articles is helpful to the board member reviewers. The board member receives a complete paper or electronic "packet" that shows him everything he has been asked to review.
  Having the articles all together, the board member can allocate his time far more effectively than if he received one article at a time. He can glance over all the articles and pick out the ones that he thinks merit more of his attention. He can see what his total workload is decide when and how to work on it.
Further analysis
My sense is that we must retain the review cycle concept, however we should try to insure that working outside the usual cycles, the so-called "fast track" processing, is not prevented or hampered by our implementation of review cycles. And that the fast tracks do not in their turn interfere with our standard review cycle work loads.
[Of course that's all easy to say. The devil is in the details of how fast tracking is designed and implemented.]
Source
Devised by the lit surveillance group.

2.9 Summary topics linked

2.9.1 Purpose

Identify topics that are separately considered in PDQ board work. Most, but not all, of the topics are essentially the same as those in the CDR and cancer.gov, though they sometimes have different names. However they may differ from CDR/cancer.gov topics, for example when certain board members work on a topic such as psycho-social aspects of genetics testing and counseling that bears on multiple specific PDQ Summaries, but are considered together for the purposes of editorial board work.

A single citation can be reviewed for inclusion in more than one summary topic.

The objects here are not the summaries themselves, just the topics, but we do need access to the topics.

2.9.2 Notes

Editorial board linkage
Each summary topic belongs to one and only one board. In those cases where two boards had an interest in the same topic from different perspectives, the topic was split into two summary topics, one assigned to each board.
Because of this, it is not necessary and is probably not be desirable to directly associate a citation with an editorial board. Strictly speaking, the citation is associated with a summary topic, and only through the topic, with a board. To store a board association with a citation consititutes a redundant, denormalized, and hence undesirable data representation.
Source
EBMS.
Although summary topics in CiteMS are not one for one with those in the CDR and cancer.gov they should be one for one with those in EBMS.

Both systems should use the same database table.

2.10 PubMed searches

2.10.1 Purpose

Find newly published articles about cancer for review by the boards.

Each search is aimed at a particular summary topic and attempts to retrieve citations that are pertinent to that one topic.

2.10.2 Notes

Searches are very complicated. Sometimes a single search requires multiple pages to print.

Searches change all the time. They are continuously refined.

No automatic search execution
Searches are currently stored outside the CiteMS system. It may be counterproductive to bring them in.
Minaxi executes a search and then examines the results. Then she might import the citations or, alternatively, might refine the search in light of what she sees in the search results and then re-execute it. For this reason, it would appear to be a mistake to store searches in the system and have the system execute them without human direction. That would save time in executing each search but would result in retrieving more dross, less topic-accurate results, and would cost more labor time in the long run.

2.11 Full text PDFs?

2.11.1 Purpose

The current method of retrieving full text for use by reviewers is to fetch PDFs. They may be retrieved from one of the online academic journal vendors using the NIH account or, if no machine readable PDF is available, a request is sent to NLM where the document is scanned from a paper journal and sent as a PDF image back to us.

The person in charge of retrieving the full text saves the PDF file after printing it. Some board managers also save their own copy of the PDF on a shared drive for future reference.

It may be desirable to provide for storing all of the PDFs in our database, associating it with the citation for it. That would have a number of advantages, including:

2.11.2 Notes

Ability to use any search capability in CiteMS to find it.
Currently, I don't think users are using any search software for PDFs stored in the file system. They use a conventional name for the file and browse for it.
That could be fixed in the file system using a Microsoft Windows indexing service configured to use an Adobe PDF "iFilter". This might actually be a lot better than anything we could easily provide, though it's possible that we could integrate the Adobe indexer with our database software.

However it's not clear that full text indexing is required or would be used in our application. We are typically not searching for material on a particular topic but rather searching for a particular article by its bibliographic description - something we'll have in the CiteMS.
Ability to click on a link in the citation to see the full text.
Easy access for all people with access to CiteMS.
This could include physician Board Members if desired.
No need for multiple people to save the PDF separately.
Unless they want their own copy for some reason, they can just access it through CiteMS.
Copyright issues
I don't know what copyright implications might exist on a practice like this, and whether they are better or worse for us than what we're doing now.
I'm not personally qualified to research the copyright issues and will not further address them in this document. I will assume that we can store PDFs and access them when we need them. If we can, we will. If we can't, well then, we won't.

2.12 Queues (or states, or workflow)

2.12.1 Purpose

A citation may be in one or more queues awaiting further processing.

The notion of objects moving through states or queues is also known as "workflow".

2.12.2 Notes

Queues share a number of properties.

It might be desirable to have common workflow oriented software to do much of the processing for all queues. It might even be desirable to have one common data structure with multiple views or slices, each representing a different logical queue.

Properties of queues
Some properties of queues include.
- Name of the queue
  "Awaiting assignment to Summary Topic" "Awaiting initial review" "Awaiting full text retrieval" etc.
- Responsible agent
  There may be a board, a department, a committee, or a person who is expected to deal with items in a particular queue, and who should receive both the lists of items in the queue and the alerts generated when an item is not being processed in a timely fashion.
- Time limit on the queue
  How long should items be allowed to remain in the queue before action is required or alerts generated? Maybe this is variable, dependent on the object (e.g., book citations get more time than articles (I'm making that up)).
  Maybe individual items can have their time limits increased on a case by case basis.
- Next steps
  Software needs to know what happens to an item when whatever action is required by item's being in the queue is complete. Obviously, the next steps are conditional based on what action was taken in the queue.
Properties of individual entries in a queue
Each object in a queue has certain properties attached to it by virtue of its being in the queue.
- Identifier of the object in the queue
  Typically it's a reference to a citation. There may be multiple references to the same citation in different queues or different (or the same) parts of one queue, for example a citation that is being reviewed by two different boards.
- Person that placed the object in the queue
- Agency to which the task has been assigned
  This would often be a group of people working on these tasks. Many other times it's an individual. Sometimes it might be a group which then further assigns it to an individual.
- Date-time of entry into the queue
- Due date-time
  After the due date alerts might start appearing if the task is not complete.
- Date-time completed
- Final disposition
Auditing
All of the queue/workflow related actions that occur should be logged. That enables us to find out the history of our processing of a citation and get throughput reports (How many cites did we … last month, etc.)

2.13 Status information

2.13.1 Purpose

A great deal of what the CiteMS does is tracking the status of citations. Status values tell us what the decisions have been regarding a citation - what we are going to do with it and what has been done with it in the past.

We can think about different kinds of citation status as disparate things, but there are a lot of attributes and properties of all of the different kinds of status that can be seen more abstractly as common to all of them. We should therefore have a unified concept of what is a status value, and how we represent and process them in the system.

Status information is intimately bound up with queues and workflows (see above) and with auditing (see below).

2.13.2 Notes

Examples of status values
Particular status values are to be decided upon later by the users. The software should be written in such a way that status values are not embedded in the program code. They are in the database and in control tables that specify workflows based on status. Over time, status values may be added or inactivated (set to not be used again) by the systems administrators.
Here are some made up examples.
- At import time
  - Awaiting initial review
  - Accepted - ready to publish (or re-publish)
  - Rejected
- At board manager review time
  - Awaiting board manager review
  - Awaiting full text retrieval
  - Retrieval complete, awaiting disposition
  - Assigned to board member(s) for review
  - Rejected
- At board member review time
  - Awaiting board member review
  - Board member response received
  - Scheduled for discussion at board meeting
- Final disposition
  - Included in summary
  - Not included in summary
Status components
A single status item might contain the following components
- Status ID
  The unique identifier of a value in a control table for status values. Changing the human readable representation of a status value (for example from "Retrieved PDF" to "Retrieved full text"), would not require changing thousands of status records, just the string in one control table entry.
- Citation ID
  The unique identifier of a citation.
- Person ID
  ID of the person who assigned this status.
  Perhaps status can also be assigned by a group, a department, or a program.
- Board id
  If one citation is being reviewed by two or more different boards, it has two or more different status values. It may be awaiting board manager full text review by one board, already assigned for board member review in another board, and rejected by a third board.
- Date-time assigned
  The date-time stamp for when the citation aquired this status.
- Comment
  It should be possible to enter a comment to any status value, for example, "I changed the status from accepted to rejected because Sharon said we shouldn't be citing any articles from this journal."
  Comments can be added to or edited.
  
  If and when a comment is added or edited, the system should record information in the audit trail so that it will be possible to recover older comments or versions of comments that have been changed by an edit.
Status history
Every citation has a status history. The citation moves from one status to another. It may go back to a previous status but, if so, it still has a history that shows it once had what would normally have been a later status.
The history has to be displayable, as it is in the existing system. Any functions that allow a user to assign a status must also be able to show the user the complete status history, including the person, date-time, and any comment(s) as well as the status value.
Status currency
Every citation has one and only one current status for one board. Having a particular status may imply certain previous status values. For example, a status like "Awaiting board member review" implies that statuses like "Accepted for inclusion" and "Full text retrieved" are almost certainly in the history for this citation, but there is only ever one current status for a citation.
Question: Is status also summary topic specific?
If a citation is assigned to two different summary topics in the same editorial board, does it track status separately?
I'm thinking that the answer is No.

2.14 Descriptive tagging

2.14.1 Purpose

It is desirable to be able to tag citations with keywords or phrases chosen from a controlled vocabulary and then use those tags in subsequent searching, retrieval, and citation review.

There is no exact analog to this in the existing system. The closest thing to is the addition of free text notes in restricted form that a user can use to search for at a later time. For example, Cynthia might put the text "MXT" somewhere in the note field for a citation with that intent that Minaxi will search for that string and look at all of the citations marked that way. However the use of free text in a note field is not a reliable system for this. It is a workaround for a capability that is not present in the existing system but could be added to the new one.

Some uses for tags include:

Mark citations that need special handling
Call specific citations to the attention of other users
Add comments and notes to citations
Mark and recall citaitons that have been set aside for some reason

2.14.2 Notes

Tags vs. status
A tag is not the same thing as a status. Every citation always has a particular status but tags are entirely optional.
Status values determine what will happen to a citation, who will see it, who has responsibility for it, etc. Tags do not determine anything about the processing of the citation. Tags are for the use of human users, not software.
Vocabulary control
Tags are most useful if they come from a controlled vocabulary. It would defeat the purpose if people used different character strings to represent what is really a single concept. So some kind of vocabulary control is desirable.
The vocabulary should be under the control of users, perhaps only the system administrators - who could add or edit new tags. Perhaps the addition of new tags should be something that all of the board managers discuss before they are added.

For the present, let's assume that there is a single pool of tags available to all board managers and users. Creating user or board specific tags is a complication that shouldn't be indulged in unless and until experience shows a definite requirement for it.
Information associated with a tag
When a citation is tagged, the following information is stored:
- Unique id
  Enables people and programs to refer to this specific tag record.
- Citation id
  Citation tagged.
- Tag id - what tag is this
  Identifies a row in a table containing all of the allowable tags, with their names
- User id of person adding the tag to the citation
- Datetime of tagging of this citation
- Optional comment
  Should be editable later on. We should probably allow anyone who can edit tags to edit it, even if someone else created it. This allows another person to comment on a comment or answer a question.
Multiple occurrences
Unlike status values, we would allow a citation to accumulate any number of tags.
Deleting tags
It must be possible remove a tag from a citation.
The audit trail will retain all of the information about a tag and the information can be retrieved for exceptional purposes (for example for debugging). However for ordinary purposes, a deleted tag will not appear in a display of information about a citation.

We should probably allow anyone who has authority to assign tags to records to also delete them - whether that person created the tag or not. This relies on collegiality among users to keep order.

2.15 Messages and communications

2.15.1 Purpose

It is sometimes necessary for one user to send a message to another, or for a program to notify a user of something, especially about a citation. The system should support this.

2.15.2 Notes

Support it with email
There are different ways to support messaging. The simplest way is to send email from the system.
The email should contain a From field, e.g., "CiteMS@…" that enables users to readily distinguish CiteMS messages and, if desired, use an automatic filter to put all CiteMS messages into a single folder.

Email clients already support highlighting of unread messages, sorting, archiving, multiple folders, hierarchical folders, visual or audible alert for new mail, sending replies, embedding links, using HTML messaging, invoking web browsers and other applications, archiving, deleting, flagging, etc. To implement all of that in new software would be a huge task.

I therefore suggest that we plan to support messaging through email only, and only do something different only if experience proves the need.

See "Messaging and communications" under "Functionality in the New CiteMS".

2.16 Audit trail information

2.16.1 Purpose

Every action in the system should be recorded. The record can be used for reports and can be invaluable for diagnosing problems. Seeing what happened in what order is essential when something goes wrong.

Sometimes an action will be taken and then rescinded. A status might be assigned to a citation and then taken away again. The status effectively disappears. But the audit trail should show that the status was assigned, by whom, and when, and should show the same information for the de-assignment. Information that had an ephemeral life in the system would leave a permanent record in the audit trail.

If disk space becomes a problem (it probably won't), we can periodically archive the audit trail in the same way that we archive document versions in the CDR.

2.16.2 Notes

Centralized auditing
There should be a single internal auditing function in the system in order to guarantee consistent information.
Serialization
Events should be recorded in such a way that, as nearly as possible, we see the exact order of operations.
Contents
It is useful to have certain common information recorded identically in every log entry:
- Common information
  - Start and stop times (or at least one of them - the same one each time)
  - Person or program id
  - Citation id, if applicable
  - Editorial board id, if applicable
  - Action id
- Transaction specific content
  The content will vary depending on what kind of action is logged.

3 Functionality in the New CiteMS

3.1 User authentication

3.1.1 Purpose

Determine who is able to use the system and what they are allowed to do.

3.1.2 Functions

Authentication requirements include:

Control access to the system at all
Control access to system functions
Users with different roles need access to different functions.
Consider editorial board and summary topic in granting access to functions
Sometimes access to a function is also dependent upon the user being associated with a specific editorial board or summary topic. See the section above on Data / Roles and Permissions.
Control access to user specific information
Only a user or a systems administrator should be able to change a user record or profile.
Provide single sign-on with EBMS and possibly other systems
Many users will need access to both EBMS and CiteMS. It might be desirable if, when a user logs into one, he is automatically logged into the other.
Or maybe not. Maybe this is not a big deal.

3.2 Import citations

3.2.1 Purpose

Load citations from PubMed into our system making some initial dispositions for each citation.

We do not now and may not ever import citations from any other source. However it inutitively seems like a bad idea to design the system in such a way that it is impossible to add citations from another source.

3.2.2 Functions

Search for new citations (outside CiteMS)
The first step in importing citations is to search PubMed for new citations relevant to the mission of the editorial boards. This is done outside of and prior to any activity in the CiteMS itself. This process should continue to work as before, outside the system.
Import citations from PubMed search results file
We need to decide whether the results file should be in XML format or the older Medline print format.
- XML format
  - It's easier for the machine to parse.
  - It supports Unicode for non-English characters, if we want them.
- Medline print format
  - It's easier for a human to read and review.
  - It has been updated to include about the same info as the XML.
Select a review cycle and assign all citations in an import to it
The import user selects a review cycle for an import. This is done for the entire batch at once. But see also notes on Review Cycles under Data above.
Select an editorial board for the import
A user selects an editorial board for an import. Selecting a board enables the program to determine what summary topics should be shown in the Summary Topic drop down list for the import.
Select a Summary Topic and assign all citations in an import to it
Most searches are conducted for specific summary topics and all citations for one search are imported and automatically associated with that topic. They may then be individually re-assigned to other topics, or other topics may be added to them.
Optionally assign each citation in an import with one or more tags
Each batch of citations is imported for a specific topic. Optionally, the entire batch can also be assigned one or more "tags". Each tag would apply to each citation in the batch and could be used to find citations with that tag.
Tags can also be individually assigned to citations. See elsewhere in this document for a discussion of tags.
Assignment of citations to editorial boards is automatic
Assignment of a citation to a Summary Topic automatically implies that it is assigned to the particular editorial board that reviews that topic. No separate action is or should be required to assign a topic to a board.
A citation assigned to two or more Summary Topics automatically implies that the citation is being reviewed by each board that controls those topics. There may be one board controlling all of the topics assigned or there may be multiple boards.
De-duplicate imports
Different PubMed searches will often retrieve overlapping sets of citations. The import program has to check each citation, currently by PubMed ID (but conceivably by other external ID if we ever support import from other sources) and not import citations that are already in the database.
Duplicate citations are never created in the database. There must always be only one copy of a particular citation. However the handling of status values, queues, and reviews is different for different cases.

There are three different ways to process duplicates, depending on the summary topic and review cycles of the original and duplicate imports.
- Same summary topic, same or different review cycle
  A special search retrieves a citation for a topic. The citation has already been imported for this topic in a previous review cycle, or perhaps the search was re-executed in the same review cycle but with changes in the search criteria and many of the same citations are retrieved. Note: whether it was also imported for another topic in the past doesn't matter. Then processing is unchanged.
  - Add it to the list of "Duplicates not imported" (see below)
- Different summary topic, same review cycle
  The same citation appears in two searches for two different topics. This happens in the same review cycle.
  - Assign another summary topic to it
  - Add it to the list of "Duplicate, summary topic added"
    
    Note that the citation is already in the new review cycle for both summary topics
- Different summary topic, different review cycle
  A citation appears that had appeared for a different summary topic in an earlier review cycle.
  - Assign another summary topic to it
  - Assign another review cycle to it
  - Add it to the list of "Duplicate, summary topic and review cycle added"
    
    Assigning another review cycle to it ensures that the citation will come to the attention of the import reviewer and, if it passes initial review, the board manager for the new topic.
    
    This requires that the database support a many to one relationship of review cycle to citation.
    
    Note: This requires some discussion. The existing CiteMS system did not assign another review cycle for a citation that has already been reviewed by the same board for a different topic associated with that board.
    
    Perhaps this should be a board specific board manager decision?
    
    However, whether or not we assign a new review cycle to a citation, the status display for the citation should indicate that it has been reviewed before, when, and for what topics.
Replace pre-Medline citations

Searches conducted at different times may produce "pre-Medline" citations that have not yet been indexed or reviewed for quality control. These records will be replaced in PubMed by full Medline records. The system needs to import these when they are available and replace the provisional records.

Currently, this happens when two versions of a record appear in two different searches, or when a user (Minaxi) runs a report to identify pre-Medline records and download and import updates. However, if possible, we should automatically download the full records when they become available without requiring searches or human intervention. This could be done in a periodic batch process that connects directly to PubMed and retrieves the new versions. This might be a process that periodically updates records in general, whether or not the update is due to replacement of a pre-Medline record.

In a few cases, pre-Medline citations are withdrawn from PubMed, for example if it is discovered that they are out of scope for the PubMed database. Any citation withdrawn from PubMed should be so marked in our database.
Managing special imports
Most citations are imported as a result of searches for new citations in PubMed, but there are other ways that citations can be located for import.
Sometimes retrospective searches are done. For example, a new search or an important refinement of a search might be executed asking for documents from the last X years instead of just the last month.

Sometimes individual citations are imported, for example from a list supplied by a board manager or board member. In some cases these might be articles published before the inception of the old CiteMS that were never imported but are important in the history of cancer research.

There are some special problems posed by special imports.

Question: We need to list these and be sure we are properly handling them.
Import journal titles from PubMed
This is not part of the current CiteMS but should be. Thousands of citations were loaded into the system with no journal title at all.
When a PubMed citation is imported the citation contains an ISSN and an NLM journal title unique ID. If the title is not in our journal title table, then the system should automatically fetch and add the journal title to our journal title table, linking the citation to the title.

There should probably also be batch processes that can be run from time to time, even if only once or a few times a year, to update the journal titles, replacing modified records by updated versions that match the journal title IDs of existing records.
Logging and reports
When search results are imported the system should both log and display the information about the import batch.
- Description of the import
  Records the date, time, review cycle, summary topic, and user performing the import.
- Disposition of the citations
  For each citation in the import batch, we keep information linking it to the particular batch record, and the disposition of the citation in that import event.
  There are four different things that can happen to an imported citation. We record the disposition and use the information to generate a report showing the count of citations in each category of disposition and allow a user to see the actual citations.
  
  The categories are:
  - Imported as new
  - Rejected as duplicates
  - New summary topic added to an existing cite
  - New summary topic and new review cycle added to an existing cite
- Searching for information about an import
  A user should be able to search for information about an import, specifying any of the following search criteria to retrieve counts, citations, or both.
  A user should be able to enter any of the following search criteria to see the counts and, optionally, the citations.
  - Review cycle
  - Date
  - Summary topic
  - User
  - Editorial board
  - Disposition
Error handling
Import errors can occur if, for example, one of the citations in a batch is malformed or contains invalid data. The system should handle such errors gracefully. Specifically, it should:
- Validate input records
  If anything is wrong with an input citation, i.e., it is malformed or contains detectably invalid data, an error should be declared.
- Provide useful descriptive information about any validation errors
  At a minimum, this should include identifiers of the last citation successfully processed and a description of the error in the in the first erroneous citation after it.
- Treat an import batch as a DBMS transaction
  If any error occurs, the entire import batch should be halted and all data backed out. A user can then fix the problem and re-run the import.

3.3 Initial (pre-"publishing") review

3.3.1 Purpose

In the current system a user executes searches of PubMed and imports citations. Then each citation is reviewed for scientific value and relevance. Citations can be accepted or rejected during this review.

Rejected citations remain in the citation database but are not further processed. They do not appear in a board manager's queue of citations to review. They will not be seen again unless they are specifically called up later.

Accepted citations continue on to be seen by board managers and, perhaps, by board members.

This is considered to be part of the import process. Only when all citations have been reviewed the results of the review are "published". At that time the system forwards all accepted citations into the queues of board managers and the system turns to the next import cycle.

3.3.2 Functions

Search for citations not yet marked as accepted or rejected
The system presents a list of such citations to the reviewer who can mark them one way or the other, tag them, or leave them in the queue for later review.
The method of finding citations for intial review has to be flexible with regard to the citations that are presented to the user and the order in which they are presented. When the work load is high, the reviewer might decide to look in the most likely places for important articles first. For example, she might search for articles from the most prestigious journals, or articles reporting the results of randomized, double blind, placebo controlled trials, or articles on particularly important topics of the day.

[Inspiration: It would be possible to write SQL that would find the journals that have produced the most number of citations that made it all the way through our review process, and create a search strategy that would find all citations in the current review cycle that are from those journals. One of those things to do when we've done all the critical functionality.]

Or alternatively, the user might just plow through the queue of citations in date or internal ID order.

All the standard citation search facilities should be available, just as they are in other parts of the system. The only difference is that, for initial review purposes, the search is restricted to citations that have not yet been through initial review. When searching for any other purposes, these records are normally excluded.
Set the summary topic for a citation
By default, all citations in one import batch are assigned the same summary topic at the time they are imported. During initial review, this summary topic assignment can be modified. The modification can either add to or delete summary topics for the citation.
It is desirable for a user to be able to specifically search for all citations that have more than one summary topic assigned.
Set the initial review status of a citation
The default status is "Not yet reviewed".
During the initial review process, the status might change to any of:
- "Reviewed, awaiting final decision"
- "Reviewed, ready to publish"
- "Published"
- "Re-published"
  The "Re-published" category did not exist in the old system. It works the same as "Published" but highlights the fact that this citation has been through the system before.
"Publishing"
The existing system retains all citations in a state where they are only visible to the import reviewer. When all of the citations have been reviewed, the reviewer "publishes" them, releasing the accepted citations for board manager review and sending an email to the board managers informing them that the citations in the most recent review cycle are now available.
Normally, all citations for one editorial board would be published together.

"Publishing" batches in this way achieves a number of goals. It allows a reviewer to provisionally accept or reject an article and then change her mind based on what she sees in subsequent citations. It allows the board manager to see at once how many citations were accepted in this review cycle, look at all of them together, and efficiently organize her work with the citations.

However, there are times when it is desirable to publish a particular article or collection of articles to a board manager, even though we are not yet ready to publish everything for the corresponding editorial board. It should therefore be possible to "publish" individual citations before or after the entire review cycle is published for one board.

3.4 Initial CIPS review for full text retrieval

3.4.1 Purpose

A CIPS reviewer (a role that includes, but is not restricted to, board managers) looks at each imported citation and determines whether it is desirable to see the full text or not. Seeing the full text doesn't guarantee that the article will be sent on for review by board members, but it is a necessary prerequisite for such a review.

Note: The specific step of requesting full text for an article and recording a status for a citation of awaiting full text retrieval can be skipped for any article. The reason is that CIPS reviewers have immediate access to the full text of most articles through PubMed. They don't necessarily request full text as a step in the system, they just go look at the full text and move straight on to either rejecting the article or assigning it for scientific review.

If we save full text PDF files in the system, a CIPS reviewer who fetches full text on their own should be able to save the PDF in the same place and manner used by the person who specializes in retrieving full text.

3.4.2 Functions

View "published" citations which have not yet been further reviewed
The system presents a list of citations to the board manager which have passed the import review (i.e., "published" citations), but not yet further reviewed.
It is desirable to allow flexible sorting of these citations, for example: order by review cycle, summary topic, author, journal title, or possibly other criteria.

It is desirable to allow the board manager to select the number of citations to display on one HTML "page", and whether to include abstracts. See search and display functions.
Mark articles regarding full text retrieval
For each citation, the system provides form controls to mark it as requiring full text retrieval or not requiring it. The user may take a number of different actions:
- Reject the article
  The citation will remain in the CiteMS database but has reached the end of its work flow processing, at least for the current board and/or topic. It will not appear on any more task lists for this board and/or topic unless a user specifically calls it up for re-review.
- Mark for full text retrieval
  Enqueues a notification for this citation to the person that retrieves full text. The citation will continue in the work flow and will not appear again in the list of citations awaiting full text retrieval determination for this board.
  See "Retrieving full text" for more.
- Leave citation unmarked
  If no determination is made, the citation remains in the queue for full text retrieval determination. The next time the board manager asks to see citations awaiting such determination, this document will appear.
  This is the "I don't want to deal with this one right now" (non) action.
- All / None / Reset controls
  The current system allows a user to select all citations in the list at once for marking or to reset all checkboxes back to default values.

3.5 Retrieving full text for internal review

3.5.1 Purpose

The full text of articles that may have implications for the work of the PDQ boards must be read. Full text is retrieved for those who want to see print, and distributed to CIPS reviewers to read and review.

In the past, the retrieval of full text required xerox copying of printed journal articles. Today, most of it is done by locating PDF format versions and printing or distributing them. For those articles for which a PDF format is not available, a request is sent to the National Library of Medicine where the paper form article will be scanned to a PDF image and electronically transmitted to CIPS.

Currently a single person is able to do all of the full text retrieval. Since board managers are now sometimes getting full text PDFs from PubMed themselves, and since the number of journals for which PDFs are available is increasing, the amounts of centralized retrieval and scanning requests of NLM are both likely to decline.

See also the "Data to maintain" section above on "Full text PDFs" and the Functionality section on "Initial CIPS review for full text retrieval".

3.5.2 Functions

View the queue of citations awaiting full text retrieval
It may be desirable for a user to be able to sort these, for example by journal or perhaps by source (Science Direct, Springer, Wiley, etc.) if we have that information for many of the journals.
Retrieve the full text
Search for it in PubMed. If PDF is available by a link out to one of the journal access organizations, retrieve the PDF. If not send a request to NLM for the article.
When the PDF is retrieved either directly or by email from NLM, store it in a place where it can be retrieved again.
Print full text
For CIPS reviewers wishing printouts, print the full text.
For each printout, also print a "cover sheet". This is a piece of paper to be attached to the full text printout with information associating the text with a citation in the CiteMS.

A "cover sheet" is not the same as a "cover letter", which is a form sent out to board members who receive the full text and review it.

For those reviewers who wish to see electronic versions of the articles and have not already retrieved the electronic copy themselves, a link should be sent that links to the stored PDF.
Mark a citation as having full text obtained
Done by a staff member when the full text is available. The text is then physically routed to the board manager who requested it.

3.6 Select articles for scientific review

3.6.1 Purpose

After the full text of an article is retrieved, a board manager must decide whether the article should or should not be sent out for board member scientific review.

3.6.2 Functions

View citations that have not yet been marked for review
A CIPS reviewer should see the same information she sees when making the initial decision about whether to retrieve full text for a citation, and also see controls for entering the review decision and any desired comments.
There may be a link to the full text. See "Retrieving full text" above.
Mark an article for scientific review
If the user clicks a control to request scientific review of a citation for a summary topic, the system will display the list of board members who are registered as reviewers for that topic.
If the board manager is satisfied with that list (the most common case), no more need be done. When the board manager submits her decision to mark an article, or a list of articles, for review, the system will make the assignments.

A control should be available in place of the ordinary submit button to enable the board manager to optionally edit the list of board member assignments for an article. See below under "Optionally alter reviewer assignments" for what happens if that control is clicked.
Mark an article as not needing review
As in the initial review before requesting full text retrieval, marking the article at this point ends the life cycle of the citation. It remains in the database but requires no further processing unless and until a user initiates a new review of the article.
Leave citation unmarked
As in previous stages, leaving an article unmarked leaves it in its current state. It will appear again the next time a list of articles is generated to determine whether scientific review is warranted.

3.7 Optionally alter reviewer assignments

3.7.1 Purpose

The system knows what board member reviewers are associated with each summary topic on each editorial board. It automatically assigns reviews to all of those board members for any citation associated with that topic. However at least some board managers make (or would like to make) individual decisions, for at least some articles, about who should review the article.

The functions described below are intended to provide system support for an assignment process that incorporates individual board manager decisions.

3.7.2 Functions

Assign reviewers
The system should show two blocks of reviewers listed on the screen.
Each reviewer should have controls listed to select one of: "Assigned", "Not-assigned", "FYI"
- Reviewer blocks
  There should be one block each for:
  - Board members registered for the summary topic
    These are the board members who would be automatically assigned to review the cited article for this summary. For each of them, the "Assigned" control should be pre-selected by the system.
    To de-assign one of them, the board manager selects either "Not-assigned" or "FYI".
  - Board members not registered for the summary topic
    These are other members of the same board who are not registered for the summary topic. If the board manager wants one of them to receive a copy of the article anyway, she selects either "Assigned" or "FYI".
- Controls
  The three controls behave as follows:
  - Assigned
    The full text is sent to this board member with a cover sheet for responses. See notes elsewhere on cover letters and sheets.
  - Not-assigned
    The board member does not sent a copy of the full text for the article and is not expected to review it.
  - FYI
    The board member is sent a copy of the full text of the article but receives a separate cover saying that it is for his or her information only. No response is expected.
Optional preparation of a customized cover sheet?
The existing system produces standardized cover sheets that go out with printed articles. Some board managers would like to customize these with specific messages for individual reviewers or for all reviewers of a specific article.
If a board manager sends a customized cover letter to one or more reviewers, the system might prepare a draft cover letter with standard information (see Route … below) to accompany the mailing or emailing.

The user might then type in additional individual notes to include in the letter.

Alternatively, the system might output a file that can be read by Microsoft Word, or whatever, for the board manager to completely customize as desired.

Whether or not this function exists, or where it should appear, is an issue that overlaps EBMS functionality and might be better discussed there.
A note on timing issues
It is possible that a board manager could accept default assignments for reviewers for an article, then later change her mind and want to make customized assignments.
This is probably not a problem when the board manager adds another assignment or FYI, but is a problem if she wants to delete an assignment. We can record the withdrawal of the assignment but, if the article has already gone out, it would be up to the board manager to contact the reviewer outside the CiteMS and tell the board member not to bother to send in a response for that article.

3.8 Print and route materials to reviewers

3.8.1 Purpose

Assist in the manual process of preparing print materials for mailing to board member reviewers.

The discussion below pertains to printed materials only. Distribution of materials electronically is currently done for one editorial board using the commercial Dropbox service, handled directly by the board manager herself. In the future, electronic distributions may come under the purview of the Electronic Board Manager System.

This may be obsoleted by electronic distribution via the EBMS.

3.8.2 Functions

Communicate information to the print manager staff
A single article will often be sent to more than recipient, and one recipient will typically receive more than one article. It is therefore necessary for the system to prepare package information for the support person that specifies for each article how many copies to print, and who will receive each one.
- For each article to be printed the system should say:
  - What article to print
  - Where the PDF is stored (filename)
  - How many copies to print
  - Who the recipients are
- For each person to receive printouts the system should say:
  - How many articles should be sent to this person
  - What the articles are
  - The recipient's name and mailing address
Produce a cover letters and response sheets specific to each package
Cover letters and response cover sheets might be printed together
What follows describes support for a manual, paper oriented process. This may be obsoleted by the new EBMS.

We might have a system for customizing cover letters and or response sheets (see previous discussion of this.) If customized text is available for a cover letter or sheet it should be printed.
- The name and address of the recipient
- Boilerplate instructions
- Descriptions of the content
  - For each article
    - Citation
    - Summary topic
    - Review cycle
    - Special instructions?
    - Form for mailing back a response
    - Optional customized text for the article
- Optional customized text for the whole package

3.9 Enter board member responses

3.9.1 Purpose

Board members do not directly access the existing system and will not access the new system. If responses continue to be received in paper format, functions will be needed for CIPS staff to enter and edit those responses.

If responses are entered electronically via the web, this process will be handled entirely in the EBMS.

3.9.2 Functions

If we continue to support paper response then we need a function to:

Enter responses received from reviewers
For review responses received by mail, email, fax, etc., a CIPS staff member needs a function to enter the response into the system.
The system needs to record the ID of the reviewer who sent in a response and the responses themselves, including comments. It should also record the date and the ID of the CIPS staff member who entered the data.

3.10 Task/workflow management

3.10.1 Purpose:

Every citation in the system has a current status.

In some cases the status is not editorial board specific. Citations that have been rejected don't belong to any particular board.

Citations that have not, or not yet, been rejected, are assigned to one or more summary topics. Assignment to a summary topic also implies assignment to the editorial board that manages that topic.

Citations that have a particular status with respect to a board and summary topic are ready for some particular action. For example, a citation that has been reviewed and accepted by the import reviewer is ready for publication. After it is published, it is ready for review by a CIPS reviewer to determine whether full text is needed. After that it may be awaiting full text retrieval, or awaiting full text review, and so on.

A "workflow" describes the flow of citations through the various status values. A citation in any particular status can be routed to one or another queue and assigned a follow on status.

Workflow management may be control table and rule driven or may be hard coded. The system is likely to be straightforward and stable enough to make hard coded workflow management simpler to program.

The internals of workflow management are outside the scope of this document. They have more to do with design.

3.11 View citation history

3.11.1 Purpose

It is necessary to be able to see every action taken regarding a citation. A citation history screen should show all attributes of a citation and all actions taken for it.

The actual design of screen displays is outside the scope of this document. What follows are just notes on decisions to be made during screen design.

3.11.2 Functions

View history
There are many possible ways to sort and organize history displays for a single citation.
- Chronological ordering
  A straight chronological order shows each status and tag assignment and each action, in the order in which they occurred.
- Action ordering
  An action ordering shows all of actions grouped by action type and, within action type, chronologically.
- Board specific vs. all actions
  The normal display of a citation history will show the display for one board only. In the great majority of cases there is only one board that has ever reviewed a citation.
  For citations that have been reviewed by more than one board, the system should indicate on each board specific display that other boards have also reviewed this citation. If the user requests it, the display should enable a user to switch boards to see what the citation history looked like for another board.
  
  It may also be desirable to be able to see the citation history from all boards in a single chronological or action ordered display.
- Full or filtered display
  A full display will show everything that happened to a citation, with all text comments fully expanded.
  A filtered display will only show brief information for each status assigned to a citation, for example, status, user, date time, tag, but not full text comments. A user would click a link to see the comment associated with a status value or tag.
  
  We may also have specialized displays that only select certain status values, ignoring others.
Actions available from the history display
An authorized user should be able to add new summary topics, new tags, and new comments for a citaiton. A natural place to do this is from the history display - which is how it can be done in the existing system.
Some actions can also be added. Some cannot. Whether it is legal to add an action depends on the workflow definitions for what status a citation mut have before any particular new action can be taken for it.

3.12 Change the status of a citation

3.12.1 Purpose

There are two ways to change the status of a citation.

One is to do so in the normal workflow for citations. These are described above in separate functions for each action or status type.

The other is to make changes from the history screen. This is also discussed above.

3.13 Search

3.13.1 Purpose

Find information in the system.

3.13.2 Functions

Citation searching
The existing system has a very powerful search capability, providing the following functionality, all of which should be replicated in the new CiteMS.
See the search form in the existing system. (http://citems-dev.nci.nih.gov/StaffSearch.asp)

There may be ways to organize the searching a bit better. See design notes on user interfaces.
- Access points
  There are currently 18 different ways to search for citations on the search form. There wouldn't seem to be any reason not to provide all of them in the new system, plus at least one more for the new "tags" feature.
- Value entry with wild cards
  For free text entry fields allow the use of SQL wild cards in the search strings. Title searches, for example:
  "%robot%prostatectomy%"
- Combinatorial logic
  For selection list fields it is possible to select multiple values to be OR'd together.
  All fields are then AND'd together to perform the search.
- Sorted output
  The existing system allows user selectable sort orders for the output.
Search for users
Users should be able to search for users by various criteria.
This may be an EBMS rather than a CiteMS function.
- Search criteria
  - Name
  - Type of user (board member, manager, CIPS staff, CIAT, etc.)
  - Editorial board membership
  - Summary topic this person reviews
Outputs to a file
It's not a bad idea to have the search system, which is very flexible in the old system and should be equally so in the new, be able to direct output to a report file, possibly also to a spreadsheet compatible file if tools for that are available in the tool set we use for development.
Saving searches / reports
It would be useful for a user to be able to save a search for re-execution whenever desired.
NOTE: The existing CiteMS appears to have some infrastructure built in to support this, but I'm not sure it's fully implemented and usable.
- Accessibility
  Saved search criteria might have different scopes, for example:
  - Local saves
    Created by a user for his own use. Saved in his private space with whatever name he assigns.
  - Global saves
    Created by a sysadmin (or user?) for anyone to use, accessible to all.
  - Permissions?
    Question: Do we need to store permissions with a search, i.e., this search is for Admin users, this one is for board members and admin users, this one is for CIPS staff, this one is for everyone?
NOT lists
The existing system has the ability to exclude journal titles from a search results list based on the title being in an editorial board specific NOT list.
It appears to be a useful capability and should be replicated.

3.14 Integration with the Board Member System

3.14.1 Purpose

Some data and some functionality is relevant to both EBMS and CiteMS. There might be some savings in both programming and data maintenance if that data and functionality is shared.

3.14.2 Functions

No design has been done yet for CiteMS. It may be that we won't want a separate CiteMS system at all but will simply add CiteMS data and functionality to the EBMS, making one integrated system. Or we might want two systems that share data, or two systems that exchange data. These are design issues for later resolution.

3.15 Messaging and communications

3.15.1 Purpose

As described in the "Data to Maintain" / "Messaging and communications" section, messaging will be done by email.

The system will send email but not receive it.

3.15.2 Functions

Send a message to one or more users
Sending functionality includes:
- Automatic "From" addressing
  When sending a message, the system will automatically use some fixed CiteMS address as the From address. If we have two CiteMS systems, e.g., a production system and a test system, then the From address should reflect the specific system it comes from.
  The system should also automatically create a "Reply-to" address containing the user's email address. For example, mail from user John Smith might have something like:
  - From: CiteMS@beethoven.nci.nih.gov
  - Reply-to: jsmith@mail.nci.nih.
- Entering a subject
- Choosing recipients from among available users
  More than one should be allowed. We might have short aliases that can be used for direct entry.
- Entering a text body
- Adding hyperlinks to citations
  Links to multiple citations, one link per citation, should be allowed. When an email recipient clicks a citation hyperlink in the mail it should bring up one of the citation displays, perhaps a status display or a full citation display, or perhaps a link to PubMed. Content and variability remains to be determined.
- Add a copy to self
  It may be desirable to configure that in a user's profile, or perhaps provide a checkbox for selecting it.
- Send

3.16 Reports

3.16.1 Purpose

The existing system supports a number of reports that board managers and other staff members use in regular operations. Equivalent reports need to be available in the new system.

3.16.2 Functions

Canned reports
There are currently ten reports available to board managers and, I think, some others accessible to people with other roles. Each of these reports accepts variable input parameters, typically allowing a user to limit retrievals by editorial board or review cycle.
Equivalent reports need to be implemented.
Custom reports
We have used a system in the CDR in which custom SQL queries can be named, saved, retrieved, and executed on demand. This approach has been pretty useful and is fairly easy to implement. It can be implemented securely by running all SQL queries with a user id that only has read access to the database.
Not sure what reports should be treated as reports and what as just searches.

We might possibly also want to implement a little more sophisticated approach in which a report designer can write queries with prompts and replaceable parameters.

See saved searches above for another way to think about retrieving information using saved parameters.

3.17 Auditing

3.17.1 Purpose

We ought to have an audit trail for all changes that occur in the database. This is useful for management, security, and debugging.

A significant amount of auditing can be manageable to achieve if it's built in to the design, but can be extremely expensive to add-on later. However, even when it is designed in we will still have to make decisions about what is cost effective to keep.

If we have a comprehensive, integrated component for managing all queues in the system, it's possible that the queue manager might be the only place that needs to update the audit trail. If every "action" updates a queue and goes through the queue/workflow management software, then it may be that auditing can be implemented very simply and completely in one place.

See the "Audit trail" and Queues sections above.

3.17.2 Functions

Record entries
Retrieve information
The most likely regular use of the audit trail is in management information and history reports. It would also be useful in debugging.

3.18 System administration

There are a number of tasks for systems administrators that are outside the regular flow of citation processing. These include:

3.18.1 Manage users

Purpose
Users enter or leave the system from time to time. System administrators and or board managers need to be able to add users, modify their profiles and inactivate them.
It may also be desirable for users who have no authority to manage records pertaining to other users to nevertheless have some ability to modify some aspects of their own profiles.

It is not clear that we need to implement any functions at all in this area in CiteMS since the new Electronic Board Member System should already do everything we need.
Functions
Functions required (perhaps via EBMS) include:
- Add a new user
- Edit a user record
- De-activate a user record
  Users must never be completely deleted from the system since they are linked to actions in the past and the links must remain. But they can be marked as no longer active in the system and no longer able to log in.

3.18.2 Manage the NOT lists

Purpose
Authorized users need to be able to specify journal titles that will be ignored or automatically rejected on import. These are editorial board specific.
Functions
- Add a journal title to a NOT list.
- Remove a journal title from a NOT list.

3.18.3 Manage editorial boards

Rarely, new editorial boards are created and even more rarely, they may be deleted.

In the future, "working groups", and maybe subgroups, may also be created and deleted.

This activity will probably be wholly within the province of the EBMS with the CiteMS piggybacking on EBMS.

3.18.4 Add a new review cycle

At some point around the beginning of each month, a new review cycle is initiated and new searches are executed to import new citations.

We do not do this automatically because there may be tasks in the existing review cycle that are not quite finished. A systems administrator therefore creates a new review cycle when people are ready for it.

3.19 Help

3.19.1 Purpose

Training and assistance to users.

3.19.2 Functions

The existing system has help screens online. They appear to be essentially the same as the pages in the training manual.

There is no context sensitivity and no index. However the pages are organized hierarchically and it looks pretty easy to find anything a user might be looking for.

A similar capability may be very useful in the new system. If we can also have context sensitivity that would be nice to have but is not required.

4 Design Notes

4.1 Relationship to the existing system

4.1.1 Existing functionality

When designing and programming parts of the new CiteMS, it is important to carefully compare the functionality in each new module with the equivalent functionality in the existing system. The new system should be able to do everything that the old system could do unless we consciously decide that it need not do it.

We don't want functionality lost by accident.

4.1.2 Existing data

We will need to load data from the existing system into the new one.

This will be a MAJOR task requiring a lot of effort.

Data conversion is a complex topic beyond the scope of this document, but here are some notes regarding conversion.

Data from NLM
Data from NLM can be loaded by going back to NLM. We have a PubMed ID for every citation. We can use them to download complete XML versions of each citation from NLM, picking up the latest form and fullest information for each citation.
Each citation we retrieve from NLM will have an NLM journal title unique ID. We can use that to retrieve complete journal information to match every citation.

If we get the bibliographic data in that way then we will have accurate, up to date data. If not, we won't.
Data from EBMS or CDR
Some board member, manager, summary topic, and other information can either be loaded from EBMS or CDR, or can be accessed directly from those systems without creating new copies in the CiteMS. This is a complex issue because not all of the data we need will necessarily be in either of those other systems, possibly including historical data for people who are no longer associated with the editorial boards but who are linked to past actions and so may (or may not) need to be loaded into the new system.
If we do decide to get some information from EBMS by linking rather than copying, then we will probably need to add historical records from the old CiteMS into the EBMS and convert unique identifiers from the old form to the new when loading other linked data.
Data from the old CiteMS
All of the data pertaining to the history of a citation will have to be converted and imported.
Data cleanup
Citation and journal data can be cleaned up programmatically be fetching new versions from NLM. Some other data may need more labor intensive cleanup. We often have, for example, multiple records in the system for the same user. We probably want to write scripts to merge these into a single record for one single person. There might be also other data that requires a combination of manual and scripted work to convert.

4.2 Functionality in all boards isn't exactly the same

We need to think about why this is so and whether and how it should be so in the new system.

Examples of differences seem to include:

4.2.1 Subgroups

Pediatric and Genetics boards have them, others don't.

4.2.2 Handling special board situations

I'd like not to put special code into the system that is board dependent. It would seem to be more flexible and maintainable to have capabilities that are turned on or off for specific boards rather than have "if board == X then …" logic hard-wired into the program code.

An example of special board specific code in the existing system is the requirement that "CIPS staff can only make changes to citations that are assigned to their editorial board or to the Adult Treatment editorial board." (Training document, page 3)

If we must hardwire board specific logic then, ideally, it would be much better to have that code isolated in a collection of specialized routines that are separated from the main logic and invoked from the main logic by some sort of lookup rather than having the specialized logic embedded in the code.

Another approach is to subclass board functionality with almost all functionality in the superclass.

Whatever solution we adopt, it makes sense to think hard about alternatives before making a decision, and to emphasize isolation of non-generic code.

Note: this was a key design goal in the CDR, isolating any document type specific functionality away from the main C++ code. Most of it was isolated in XSLT scripts and Schema documents. This decision was eminently successful.

4.3 XML in the system

The existing system uses relational tables with integers and ASCII text. However, there are many cases in which XML might be a useful alternative. Possibilities include:

4.3.1 Citations

These are available from NLM in a somewhat richer form in XML than in the Medline print format. They are also easier for a program to parse. On the other hand, the Medline print format is easier for a human to read, assuming there is no special software used to style the XML.

4.3.2 Journals

Maybe the same issues here as for citations.

4.3.3 Saved and/or canned search and report specifications

It may be practical to produce a generic search and report module that uses specifications stored in XML to generate selections. For example, XML fields can contain:

SQL, with placeholders
Default values for arguments
Prompts to show to the user
Specification of row and column headers in outputs

4.3.4 Outputs

It may be desirable to make many of the outputs of the system XML rather than HTML or text. Final formatting for display can be done with XSLT on the server or CSS on the client, or both.

Possible advantages include:

Easier programming of outputs to serve both screen and print use
Easier handling of conversion to spreadsheet format
Maybe less re-programming to change output formatting?
Easier presentation of the same information for different user types?

4.4 Links to other systems

4.4.1 Should the database be shared with the EBMS?

It is my current understanding that the EBMS will be developed with a database that is independent of the CDR in the sense that data needed in the EBMS may be copied out of the CDR, but there will not be direct access to the CDR database. The reasons for this are beyond the scope of this document and will be taken for granted here.

However it is not clear that the CiteMS and EBMS need fully independent databases. It may be a good idea to use a single database for both with tables for summary topics, editorial boards, board members, and other objects that are shared between EBMS and CiteMS.

4.4.2 Should there be one EBMS/CiteMS system?

It might even be the case that the CiteMS and EBMS should be one single system sharing a number of modules as well as many database tables.

By using control tables that configure EBMS and CiteMS functions separately, it may be possible to get the same software to do the different tasks for each logical system differently.

Some of the modules that might conceivably be shared include:

User authentication
For example, common management of logins and logouts, session management, groups/roles, permissions, etc.
User editing, update, profiling, etc.
The user database will be almost the same with only a few users that use one logical system but not the other.
Task/queue management
For example, common techniques for finding objects in a queue by date, user, action, or other parameters, common mechanisms for recording an action, completing a task, moving an object from one queue to the next queue, finding objects that are overdue for task completion, etc.
Auditing
For example common techniques for recording actions and a single common data structure for all audit records.
Email
For example common mechanisms for sending mail to people in the people table.
Indexing and searching
For example using common index tables (like query_term in the CDR), common search parsing and normalization, etc.
Report generation
For example common storage and searching for reports, a common query parameter substitution mechanism, common execution mechanism, and maybe some common results formatting.
Some display management
For example common headers and footers, CSS stylesheets, XML and HTML generation, Javascript libraries, etc.

4.5 Some user interface concepts

4.5.1 Re-organization of screens and flows

The existing system uses a list of hierarchical functions to get to any action to perform. This works pretty well, but can require the user to do a lot of work to get from one place to another when processing a single citation or list of citations.

Alternatives are possible. For example:

Have a single way to get a list of citations
It might be a list with a particular status, a particular tag, a particular board, etc. The existing Search capability does this, but it doesn't lead to any specific functionality.
Maybe there should be a way to get a list of citations, like the existing Search function, and then choose an action to perform, enabling a user to get back to the same list after performing it.
Be able to access a citation status display from anywhere
The existing system has a very information citation history display and allows many different actions from that display.
We could have a similar, possibly even richer, history display, and make it easy to get their from any screen with a citation on it.

4.5.2 Correctability

As many functions as possible should be designed with the understanding that it can be done wrong but mistakes can be corrected. This is not always the case in the current CiteMS. Some not uncommon errors can only be corrected by a programmer - and not always very easily or robustly, even for a programmer.

Where practicable (and it's not always practicable), corrections are made using the same user interface as the original actions, or perhaps using the ubiquitous citation history display. Someone who knows how to do something will automatically know where to go and what to do to undo it.

This is not a universal rule. There may be certain actions that only an adminstrator should be allowed to undo and there may be others for which it might not be a good ideal to allow them to be undone at all. However errors should always be correctable, even if the correction is not strictly an undo-ing of the original action.

Examples of commonly executed actions that should be correctable include:

Delete a citation that's been imported by mistake
Re-assign a review to someone else
Re-assign a citation to a different board or summary topic
etc.

4.5.3 Consistency

We should design all of the interfaces to use similar concepts. Where Yes/No/No-action choices are available, it seems like a good idea to always use the same form controls (checkboxes, radio buttons, or whatever), to put the controls in the same place relative to citations that they control, to always have a "No-action" radio button if we have "Yes" and "No" buttons, to have consistent headers and footers on pages, etc.

We are fortunate to have a number of professional web designers available in OCE who can assist with user interface design.

The existing system seems to me to do a pretty good job of user interface consistency. We should strive to do at least as well, and better if we can.

4.5.4 Dynamic HTML

The existing system, designed in a little earlier era of web development does a lot of web page jumps and back tracks. For example, a user might look at a list of citations for review. She wants to see the abstract for one of the citations and clicks a button. The system takes her to a new page with the abstract. Then she wants to see the status history. The system takes her to a new page. Then she wants to go back to the review list. She has to back up two pages or click another button and go through another jarring screen jump back to the original page - possibly positioned in a different place.

A more modern approach would be to use dynamic HTML using JavaScript and, possibly, AJAX. The system could display the page of citations for review. The user could click one to see an abstract. The screen could open in place to display the abstract in context. It could open again to see status in context. It could then close again, leaving the user right where she was, on the same citation, positioned where it had been.

This concept can be applied to any place where it makes sense to add more information in context.

4.6 Security

The system needs to be available to contractors outside NIH. It may not need to be available to board members since that will be handled by the EMBS.

The existing system was not well designed for security. We need to prepare a list of security conventions that should be followed in order to meet NIH and industry best practices from the beginning. Security designed in is always cheaper and more secure than security added on.

4.7 Phased implementation

When we have a firm set of requirements and a design, we should assign priorities for implementing functions.

We can make a new system operational and begin getting benefits from it when the basic functionality is tested and working and add other functions while the system is in use. Examples of lower priority functions might include:

4.7.1 Batch interfaces to download/update modified NLM data

Programs to automatically updated journal titles and citations from NLM after changes would be needed, but not necessarily on day 1.

4.7.2 Reports

Some reports are critical. There are others we can live without for a while if necessary.

4.7.3 Low volume user interfaces

Some database changes such as new values in control tables and probably other things could be made by hand by a programmer until a user interface is ready.

Author: <alan@NCI-01802749>

Date: 2011-06-22 00:06:17

HTML generated by org-mode 6.33x in emacs 23