New Citation Management System Requirements and Notes

Draft 2.5 March 17, 2008

This draft document is an attempt to specify requirementsm in outline form, for a new Citation Management System ("CiteMS").

Table of Contents

1 Organization of the document

The document is divided into three main parts:

1.1 Data

This describes the information that must be maintained in the system. It is the foundation of any information system.

The data is described as a set of information objects or categories, for example, citations, journals, persons, etc. For each category I've documented what I take to be the purpose of maintaining the information, followed by notes regardings its content, use, source, etc.

1.2 Functionality

This describes the basic functions that the system has to perform. Again, I've organized the functions into categories. For each category I've described the purpose of functions in this category followed by a list of what I take to be logical functions, for example retrieving and displaying certain information or supporting the entry and processing of new information.

1.3 Design notes

These are notes on requirements or possible designs that don't comfortably fit into the above two categories. In most cases they aren't really requirements so much as ideas for design that came up while thinking about the requirements and which I've recorded here so that they won't get lost.

2 Data to maintain

The tag "linked" following an information category means that the source, and possibly the control, of this data is outside the CiteMS.

2.1 Citations    linked

2.1.1 Purpose

Provide information to identify a single journal article, book, book chapter, or whatever, that can be cited. Information may be used in the new CiteMS itself in reports, displays, correspondence sent to physicians, etc. and may also be sent to the CDR, EBMS, or other consumers of citation data from the CiteMS.

2.1.2 Notes

  • "bib" table in old system
    • Currently contains plain ASCII text loaded from Medline format
    • Not maintained. Changes in PubMed are not reflected.
  • New system
    • Download xml from PubMed instead of parsing the Medline format
      • Much easier to parse
      • Contains much more information
      • Unicode, better representation of non-English journals
    • Never edit PubMed info
      • This is PubMed controlled data, not ours
        We do not need to update the content of PubMed citations. Even if there are errors in the data, we might be better off ignoring them than correcting them. See the following point about refreshing data.
      • We can periodically refresh it from PubMed to update with any changes
        PubMed does not make a lot of changes to records once they are in the system. MeSH headings change, but they are not significant to CiteMS. Bibliographic changes are relatively rare. We might, say once per year after the new MeSH is incorporated into PubMed, refresh all of our data from PubMed using an autmoated automated process.
      • Pre-Medline citations
        Some searches only download citations from PubMed that are fully indexed and processed. Others also allow "Pre-Medline" citations that contain partial records that will be supplemented and corrected later. See the Import functions for a discussion of this.
    • Allow other sources of citations
      All citations currently come from PubMed. This may continue to be true for the life of the new system on into the future, however it is conceivable that at least some citations to literature outside of PubMed might be considered to be in scope in the future.

      It makes sense to use data structures and hooks that will enable us to add non-PubMed data in the future without having to re-design the system. Our goal should be to not write software or allocate significant resources for the support of non-PubMed data unless and until we actually want to use such data, but to design the system to accommodate the changes if and when we do need them.

      Accommodating non-PubMed data in the design might use fairly simple techniques like marking each citation with a source indicator and only performing PubMed specific operations on a citation if the source is PubMed. All citations would be PubMed in the initial system and there might never be any other sources, but it might cost very little to add source checks where they are needed.

      It is very possible that if and when we do allow non-PubMed data into the system, we will be adding very small numbers of records, not importing large batches as we do from PubMed. If so, the most cost effective way to import such data might be to type it in rather than to write new import modules. If the new system uses XML, this might be as easy as using the PubMed XML schema with an off-the-shelf XML editor like XMetal, and ensuring that the import program does not require the source of the citations to be PubMed.

2.2 Journals    linked

2.2.1 Purpose

Every citation record should link to a journal title that is stored only once. If the journal title changes, is replace, etc. there is only one place to update it and one unique ID to link citation to it.

2.2.2 Notes

  • Journal complexity
    • Titles change
      A citation published under the old form of the title might need to be recoverable under that form and under the new form. For example, a report on what citations we have from journal ABC might or might not look at previous versions, depending on what the report user wants.

      This is easy to do if the link is to a journal record that has a history of title strings associated with it. It is the journal unique ID and the journal itself, not the title string, that a citation is linked to.

      Note that two journals can have the same title string. The strings should not be considered to be always unique, especially over time.

    • Titles merge and split
      This is a tricky problem having a significant impact on how we record the history of a title, assuming we do so.
  • Relegate all journal management to NLM
    Using NLM as the authority for our journal titles may have small drawbacks but has enormous benefits.

    Ideally, we should do zero management of journal titles. We should download them and, when necessary, apply updates, automatically as needed from NLM.

  • Journal title searching
    Question: Do we need to keep cross references from old versions of journal titles, possibly in a special table, or possibly in the audit trail?

    There are at different possibilities in searching that depend on our decisions regarding title history:

    • Allow searching on old versions of titles?
      If a user can search on two different versions of a title and get a unified result set, we need to keep the old versions.
    • Separate search results on old and new versions of titles?
      The problem is different if we need to retrieve separate results when searching on old and new versions of a title. For example, if title A1 became A2 on January 1, 2011 and we want a search on A1 to retrieve citations from before 1/1/11 and a search on A2 to retrieve citations on or after that date.
    • Unified search?
      Still another possibility is to only store the last version of a title and only permit searching on that.
    • Kitchen sink?
      Hardest of all is to allow any of the above three search types, under user control.

2.3 People    linked

2.3.1 Purpose

Store each person associated with the system only and exactly once. Any information we need to associate with that person can be associated with this record, e.g., address, phone number, affiliation, titles, etc. This is what we did in the CDR except that the CDR also has separate user records - which may or may not be something we want to normalize away

It looks like we can have a single person table for all users in the system and have separate structures that define roles and permissions (see below) to which people are linked.

2.3.2 Notes

  • User descriptions
    Having a single person table enables us to use a common format for all people for storing descriptive information - name, affiliation, contact details, etc., and common software for many user oriented functions such as sending mail or email to a user, formatting a display of user information, etc.
  • User profiles
    Users might find it helpful to have user editable profiles that enable them to control the way the CiteMS behaves for them.

    Example: Robin finds the default/fixed search results page length of 5 citations with abstracts or 10 without to be too limiting. She'd prefer much longer outputs with more citations on one display. This might be in a profile - possibly as a user setable value, or possibly as a remembered last value selected.

    If there are no other uses for a profile, that may not be enough to justify it. We have to think about whether there are others.

  • Source
    EBMS.

    The largest number of people in the system are board members and managers, who are all in the CDR, where full records are already kept. The new EBMS will also need the same people. If the people records are in a single table shared by EBMS and CiteMS, then whatever technique EBMS uses to stay in sync with the CDR should automatically keep CiteMS in sync too.

    A small number of people are only involved in literature surveillance and do not participate in EBMS activities. However it may still make sense to use a single database for all people, with one authentication and authorization system.

2.4 Roles and Permissions

2.4.1 Purpose

Different people in the system will have different roles and different things they are permitted to do. Currently there are four distinct roles in the existing CiteMS that see different functions on menus and/or have different permissions.

It's possible to setup custom tables for each type of permission, but it would be more flexible for the future to have a scheme that allows new roles to be added as required.

Possible roles might be: board member, board manager, CIPS reviewer, citation importer, system administrator, communications/mail person, etc.

Typical permissions might be: import citations, perform initial review, save full text, etc. - corresponding to any system functions that require permission control.

2.4.2 Notes

  • Adding roles and permissions
    Presumably, the total number of roles and permissions is small and stable. It's the IDs of the people in each role that change. We may not need a sophisticated user interface to create new roles and permissions but can simply directly modify the values in the tables if and when software modifications define new roles and functions.

    However, we do need a user interface to all non-programmer systems administrators to assign and de-assign users to roles and perhaps to assign and de-assign permissions to roles.

  • Many to many user <-> role relationships
    A single user might perform multiple roles in the system. A single role might be performed by multiple users.
  • Context specific permissions
    Many roles and permissions exist in the context of a particular board and possibly a particular summary topic. There may be a generic way to represent this in the database by saying that a particular role or permission is board specific or summary specific. In that case, the authentication software would check not only that the user has the required role to perform an action, but is also a member of the requisite board or summary topic review group that is associated with the citation or other object on which the action will be performed.

    We may also have user specific contexts. For example, it is possible that we will store user specific preferences in a profile for that user. Naturally, a user should only be able to update his own profile.

  • Roles in the permission structure are not the same as job titles
    There are certain functions that are typically associated with job titles. For example, board managers review citations and assign some of them to scientific reviewers.

    This works for most of the boards but not for the Adult Treatment board, for which there are too many citations for one person to review them all. Other CIPS staff members assist with the review of these citations even though there is only one person who is the manager of the Adult Treatment board.

    This isn't a difficult problem, but it is something that needs to be understood when approaching permissions.

2.5 Control tables

2.5.1 Purpose

These are domains of valid values with which objects (people, citations, actions, etc.) can be associated.

2.5.2 Notes

  • Components of a control table entry
    Each entry in a control table has at least the following properties. There may be quite a few others too.
    • A unique identifier
      This is a number that is used for linking control information to other data objects in the system. There are some areas in the old system where unique identifiers should have been used but were not.
    • A human readable character string
      This is what identifies the control information to users. Sometimes it changes.

      When there is a change in the language of a control value but not its function, the character string changes but the unique identifier should stay the same.

      When there is a new function added to the system it should always get a new unique identifier as well as a new string.

    • Active status
      If a control value has been used in the past but is no longer used it may be necessary to keep the value in the table so that, when looking at old records or old activities, we can display the human readable value for the events that took place in the past.

      However the values need to be marked so that they will not be used again.

  • Examples

    Here are some examples of control tables. There will be more than just these.

    • Citation decisions/actions
      • Purpose
        Actions taken for a citation, request full text, send for review, etc.

        Had ten values. They look a bit odd to me, but I presume they are what the users wanted.

      • Source
        Created by Board Managers and Literature Surveillance staff Currently stored in ludecision in the existing CiteMS.
    • Categories of evidence
      • Purpose
        Contains strings that characterize the nature of the evidence presented by a research study. Some are descriptions of how a study was conducted ("Randomized controlled clinical trials. Double-blinded") and some look like end points measured by a study ("Total mortality", "Indirect surrogates-Event free survival").
      • Notes
        • Board specific usage
          Not all boards use the same categories of evidence. Some appear to use just the descriptions of how a study was conducted while others appear to use just the study end-points. Others may not use this at all.

          The use of categories of evidence is currently under review. There may be an effort to consolidate and systematize their usage. Whatever we do will have to be flexible enough to acommodate change.

        • Source
          These are NCI designated strings, under our control.
    • Responses
      • Purpose
        Records what a reviewer/board member says should be done about a citation.
      • Notes
        A citation may receive more than one response, possibly including more than one from the same reviewer, for the same board, and same topic (Is that right? Even if not, it doesn't seem like a bad idea to enable it.)
        • Relationship to "citation decisions"
          The difference between "citation decisions" and "responses" appears to be that the former are dispositions made by CIPS/CIAT staff and the latter are dispositions made by board member reviewers.
        • Source
          Devised by the lit surveillance group.
  • See references?
    Since the names of controlled values sometimes change, it may be desirable to keep see references from the old values. A user searching on the old value sees everything.

    The problem of old and new values may have been more serious in the old system since it appears that, in some cases, linking was done by character string value instead of by unique ID. When the value changed, it could become necessary for a user to search on both old and new values to find all citations with the same logical status. The new system should always use unique IDs (see above), eliminating this problem.

2.6 Editorial boards    linked

2.6.1 Purpose

Editorial boards are composed of groups of scientists and clinicians organized by NCI to review broad areas of cancer research. Each board has its own board manager and its own working style that affects many aspects of the management of citations for topics handled by that board.

Citations are associated directly with summary topics and summary topics are associated directly with editorial boards. One citation can be associated with multiple summary topics but each summary topic can only be associated with one board. There are concepts that have application across editorial board boundaries but, where that happens, different summary topics are created.

Editorial boards have a name, a board manager, an associated list of members, possible sub-boards, and other things.

2.6.2 Notes

  • Source
    Board names are used in the CDR, cancer.gov, the new EBMS, and probably elsewhere. The data must match the editorial board names used elsewhere.

    Board names need not be physically linked to other system since the number of boards is very small and volatility is very low. It would be acceptable (I think) to manually update the CiteMS board list if and when a change occurs in PDQ Board designations, even though the same changes are also made manually i other systems.

    However, see the section on linking below.

  • Sub-boards
    Pediatrics and Genetics currently contain smaller groups. The ability to create smaller groups within a board should be built into the system and usable by any board that wishes to do so. Some boards that do not now divide into sub-boards might do so in the future.

    Question: We'll need to figure out just what a sub-board is. What kinds of objects are associated with sub-boards: summary topics, members, categories of evidence, other things?

    Question: Will we ever need sub-sub-boards? Are the editorials boards large enough to ever require it? Do we need to generalize the grouping of boards within boards within boards …?

    Question: Do sub-boards need to be visible to the system at all, or are they better treated as methods of organization outside the system? Are they just informal groups of board members with a particular interest that do not need any special recognition in the system?

    The issues of sub-boards may best be resolved by meeting with all of the board managers together so that they can hear each other's views as well as communicate their own.

    [That's probably a good idea for lots of issues.]

2.7 Review cycles

2.7.1 Purpose

The work of literature surveillance is broken down into monthly cycles. Each batch of citations entered into the system is identified as belonging to the monthly cycle in which it was entered.

The same citation processed in one monthly review cycle can also be associated with a different review cycle for another board or another Summary Topic.

Review cycles are heavily used in reports.

2.7.2 Notes

  • History
    The notion of a "review cycle" is inherited from the pre-CiteMS processes.

    Review cycles permeate the system but their actual use is more significant to some kinds of users than others. They may play a larger role in the import and initial review process, before citations get to board managers, than they do after that.

    Review cycles are roughly, but not rigorously, synchronized with calendar months. The work of a review cycle might take five weeks or three weeks and a new review cycle might be established a little earlier or or later than the actual first day of a month.

    Question: What are all the tasks that are currently slaved to review cycles?

    Question: What would happen to the tasks currently organized into review cycles if they were freed from that organization and allowed to "float" on more fluid date-time boundaries?

  • "Fast tracked" citations
    Sometimes citations are imported into the system immediately as a result of a board member or manager request, independently of the usual topic searches and, in effect, independent of the review cycles. The concept is analogous to the "hotfix" publishing concept in the CDR.

    Fast tracked citations may be new or may be old, for example old citations from before the first CiteMS became operational but they are important to have in the system.

    The fast track capability is important and must be preserved.

  • Alternatives to review cycles
    Even in the current system, almost every action pertaining to a citation is date-time stamped. It is possible that we could just rely on date-time stamps for synchronizing events, but treat the interval periods flexibly - using months for some purposes, weeks for others, longer periods for others.

    Alternatively, we could continue the review cycle process but allow all processes that use it to "float" their cycles in whatever way is best for that process.

  • Further analysis
    It's too early to make decisions regarding the future of the review cycle concept. We'll need to answer the questions and flesh out how alternatives would work before we have enough information to decide whether we want to keep the review cycle, strengthen it, weaken it, or adopt alternatives either in place of or in parallel with review cycles.
  • Source
    Devised by the lit surveillance group.

2.8 Summary topics    linked

2.8.1 Purpose

Identify topics that are separately considered in PDQ board work. Most, but not all, of the topics are essentially the same as those in the CDR and cancer.gov, though they sometimes have different names. However they may differ from CDR/cancer.gov topics, for example when certain board members work on a topic such as psycho-social aspects of genetics testing and counseling that bears on multiple specific PDQ Summaries, but are considered together for the purposes of editorial board work.

A single citation can be reviewed for inclusion in more than one summary topic.

The objects here are not the summaries themselves, just the topics, but we do need access to the topics.

2.8.2 Notes

  • Editorial board linkage
    Each summary topic belongs to one and only one board. In those cases where two boards had an interest in the same topic from different perspectives, the topic was split into two summary topics, one assigned to each board.

    Because of this, it is not necessary and is probably not be desirable to directly associate a citation with an editorial board. Strictly speaking, the citation is associated with a summary topic, and only through the topic, with a board. To store a board association with a citation consititutes a redundant, denormalized, and hence undesirable data representation.

  • Source
    Although summary topics in CiteMS are not one for one with those in the CDR and cancer.gov they should be one for one with those in EBMS.

    Both systems should use the same database table

2.9 PubMed searches

2.9.1 Purpose

Find newly published articles about cancer for review by the boards.

Each search is aimed at a particular summary topic and attempts to retrieve citations that are pertinent to that one topic.

2.9.2 Notes

Searches are very complicated. Sometimes a single search requires multiple pages to print.

Searches change all the time. They are continuously refined.

  • No automatic search execution
    Searches are currently stored outside the CiteMS system. It may be counterproductive to bring them in.

    I presume that Minaxi will execute a search and examine the results. Then she might import the citations or, alternatively, might refine the search in light of what she sees in the search results and then re-execute it. For this reason, it would appear to be a mistake to store searches in the system and have the system execute them without human direction. That would save time in executing each search but would result in retrieving more dross, less topic-accurate results, and might cost more labor time in the long run.

2.10 Full text PDFs?

2.10.1 Purpose

The current method of retrieving full text for use by reviewers is to fetch PDFs. They may be retrieved from one of the online academic journal vendors using the NIH account or, if no machine readable PDF is available, a request is sent to NLM where the document is scanned from a paper journal and sent as a PDF image back to us.

The person in charge of retrieving the full text saves the PDF file after printing it. Some board managers also save their own copy of the PDF on a shared drive for future reference.

It may be desirable to provide for storing all of the PDFs in our database, associating it with the citation for it. That would have a number of advantages, including:

2.10.2 Notes

  • Ability to use any search capability in CiteMS to find it.
    Currently, I don't think users are using any search software for PDFs stored in the file system. They use a conventional name for the file and browse for it.

    That could be fixed in the file system using a Microsoft Windows indexing service configured to use an Adobe PDF "iFilter". This might actually be a lot better than anything we could easily provide, though it's possible that we could integrate the Adobe indexer with our database software.

    However it's not clear that full text indexing is required or would be used in our application. We are typically not searching for material on a particular topic but rather searching for a particular article by its bibliographic description - something we'll have in the CiteMS.

  • Ability to click on a link in the citation to see the full text.
  • Easy access for all people with access to CiteMS.
    This could include physician Board Members if desired.
  • No need for multiple people to save the PDF separately.
    Unless they want their own copy for some reason, they can just access it through CiteMS.
  • Copyright issues
    I don't know what copyright implications might exist on a practice like this, and whether they are better or worse for us than what we're doing now.

    Copyright issues might also be affected by organization boundaries. What NCI chooses to do might be a problem for contractors or vice versa. If so, we might be constrained to the most restrictive requirements of any organization using the system.

    Question: Is there an expert at NCI who can advise us on what constitutes "fair use" of PDF files?

2.11 Queues (or states, or workflow)

2.11.1 Purpose

A citation may be in one or more queues awaiting further processing.

The notion of objects moving through states or queues is also known as "workflow".

2.11.2 Notes

Queues share a number of properties.

It might be desirable to have common workflow oriented software to do much of the processing for all queues. It might even be desirable to have one common data structure with multiple views or slices, each representing a different logical queue.

  • Properties of queues
    Some properties of queues include.
    • Name of the queue
      "Awaiting assignment to Summary Topic" "Awaiting initial review" "Awaiting full text retrieval" etc.
    • Responsible agent
      There may be a board, a department, a committee, or a person who is expected to deal with items in a particular queue, and who should receive both the lists of items in the queue and the alerts generated when an item is not being processed in a timely fashion.
    • Time limit on the queue
      How long should items be allowed to remain in the queue before action is required or alerts generated? Maybe this is variable, dependent on the object (e.g., book citations get more time than articles (I'm making that up)).

      Maybe individual items can have their time limits increased on a case by case basis.

    • Next steps
      Software needs to know what happens to an item when whatever action is required by item's being in the queue is complete. Obviously, the next steps are conditional based on what action was taken in the queue.
  • Properties of individual entries in a queue
    Each object in a queue has certain properties attached to it by virtue of its being in the queue.
    • Identifier of the object in the queue
      Typically it's a reference to a citation. There may be multiple references to the same citation in different queues or different (or the same) parts of one queue, for example a citation that is being reviewed by two different boards.
    • Person that placed the object in the queue
    • Agency to which the task has been assigned
      This would often be a group of people working on these tasks. Many other times it's an individual. Sometimes it might be a group which then further assigns it to an individual.
    • Date-time of entry into the queue
    • Due date-time
      After the due date alerts might start appearing if the task is not complete.
    • Date-time completed
    • Final disposition
  • Auditing
    All of the queue/workflow related actions that occur should be logged. That enables us to find out the history of our processing of a citation and get throughput reports (How many cites did we … last month, etc.)

2.12 Status information

2.12.1 Purpose

A great deal of what the CiteMS does is tracking the status of citations. Status values tell us what the decisions have been regarding a citation - what we are going to do with it and what has been done with it in the past.

We can think about different kinds of citation status as disparate things, but there are a lot of attributes and properties of all of the different kinds of status that can be seen more abstractly as common to all of them. We should therefore probably have a unified concept of what is a status value, and how we represent and process them in the system.

Status information is intimately bound up with queues and workflows (see above) and with auditing (see below).

2.12.2 Notes

  • Examples of status values
    Some examples might include:
    • At import time
      • Awaiting initial review
      • Accepted
      • Rejected
    • At board manager review time
      • Awaiting board manager review
      • Awaiting full text retrieval
      • Retrieval complete, awaiting disposition
      • Assigned for board member review
    • At board member review time
      • Awaiting board member review
      • Board member response received
    • etc.
  • Status components
    A single status item might contain the following components
    • Status ID
      The unique identifier of a value in a control table for status values. Changing the human readable representation of a status value (for example from "Retrieved PDF" to "Retrieved full text"), would not require changing thousands of status records, just the string in one control table entry.
    • Citation ID
      The unique identifier of a citation.
    • Person ID
      ID of the person who assigned this status. Perhaps status can also be assigned by a group, a department, or a program.
    • Date-time assigned
      The date-time stamp for when the citation aquired this status.
    • Comment
      It should be possible to enter a comment to any status value, for example, "I changed the status from accepted to rejected because Sharon said we shouldn't be citing any articles from this journal."

      Question: Is it necessary or desirable to be able to enter more than one comment for a status value, possibly with each comment having its own author and date-time independent of the date-time that the status was assigned?

      Question: If we do need multiple comments, do we need the ability to add comments to old status values? For example, Cynthia assigns a status to a citation and passes it on to Robin. Robin does something to the record that assigns a different status. Then she notices that Cynthia's comment had a question in it. Can she attach a comment to the earlier status value in direct answer to Cynthia's question?

      This is not a trivial issue because there might be programs that look at status history sequentially. If comments are not in the same date-time order as the status assignments they associate with, then there might be confusions.

      Question: Should comments be editable? If so, should they only be editable by the person who entered the comment?

      Note: whatever we do with comments should be flexible and powerful, but also simple and straightforward. As always, these goals are in conflict with one another.

  • Status history
    Every citation has a status history. The citation moves from one status to another. It may go back to a previous status but, if so, it still has a history that shows it once had what would normally have been a later status.

    The history has to be displayable, as it is in the existing system. Any functions that allow a user to assign a status must also be able to show the user the complete status history, including the person, date-time, and any comment(s) as well as the status value.

  • Status currency
    It might be the case that every citation has one and only one current status. Being in one status may imply certain previous status values. For example, a status like "Awaiting board member review" implies that statuses like "Accepted for inclusion" and "Full text retrieved" are almost certainly in the history for this citation, but there is only one status that applies right now.

    Question: Is that right? If it is right, we probably get a system that is simpler, easier to program, and easier to learn and use. Everything becomes sequential without complex parallelisms. However we don't want to oversimplify. This is an important decision.

    A

2.13 Descriptive tagging

2.13.1 Purpose

It is desirable to be able to tag citations with keywords or phrases chosen from a controlled vocabulary and then use those tags in subsequent searching. This could be used for a variety of different purposes, enabling the person who tags the citations to efficiently put citations into pigeon holes from which they can be easily found again.

There is no exact analog to this in the existing system. The closest thing to is the addition of free text notes in restricted form that a user can use to search for at a later time. For example, Cynthia might put the text "MXT" somewhere in the note field for a citation with that intent that Minaxi will search for that string and look at all of the citations marked that way. However the use of free text in a note field is not a reliable system for this. It is a workaround for a capability that is not present in the existing system but could be added to the new one.

It might be useful to treat tags in a somewhat similar way to the way we treat status values. See the notes below.

This is an experimental concept.

2.13.2 Notes

  • Vocabulary control
    Tags are most useful if they come from a controlled vocabulary. It would defeat the purpose if people used different character strings to represent what is really a single concept. So some kind of vocabulary control is necessary.

    The vocabulary should be under the control of users, perhaps only the system administrators - who could add or edit new tags. Perhaps the addition of new tags should be something that all of the board managers discuss before they are added.

    Question: Do we need to partition the vocabulary, e.g., hierarchically or by editorial board or by place in the process (import tags, CIPS review tags, etc.)? I'm thinking that the initial answer should be No. Let's not make a big deal of an informal and experimental process. However experience might later make a more structured system desirable.

  • Information associated with a tag
    It might be desirable to keep the kind of information with a tag that is similar to the kind kept for a status value. For example, a date-time stamp, a userid, a citation id, and even a comment.

    It's not hard to envision situations in which a comment would be useful. For example, a reviewer might decide that she doesn't want to make a decision about a citation now. She needs to see if any better citations are in the import queue on the same topic. She tags the citation with a tag like "Hold for further review" and adds a free text comment to help her remember why she held up the citation, e.g., "Wasn't there an article by D'Amato on this topic? Is it better?"

    Later, she calls up all citations with the tag "Hold for further review" and sees what it was that she was thinking about when she tagged the article.

    In this example tags are being used somewhat like status values, but in a less formal way. The tag does not affect how the citation is processed by software. It's a note from a user to him or herself, or to another user, not a marker that is significant to the CiteMS program.

  • Multiple occurrences
    Unlike status values, we would allow a citation to accumulate any number of tags.
  • Deleting tags
    Question: Should a user be able to delete tags? Probably yes. Old values will still be in the audit trail. A user could add or delete them at will.

    Question: Should a user be allowed to delete tags added by a different user? Probably yes. This is an informal process and we can probably benefit from trusting each other to do the right thing. In any case, a mistake would not send a citation into the virtual dustbin or out to a board member.

2.14 Audit trail information

2.14.1 Purpose

Every action in the system should be recorded. The record can be used for reports and can be invaluable for diagnosing problems. Seeing what happened in what order is essential when something goes wrong.

Sometimes an action will be taken and then rescinded. A status might be assigned to a citation and then taken away again. The status effectively disappears. But the audit trail should show that the status was assigned, by whom, and when, and should show the same information for the de-assignment. Information that had an ephemeral life in the system would leave a permanent record in the audit trail.

2.14.2 Notes

  • Centralized auditing
    It's useful to have a single place where auditing is done in order to get consistent information.
  • Serialization
    It is valuable to record events in such a way that we see the exact order of operations. One way to do that is to record two date/time stamps, one at the initiation of an operation and one at the end. If two operations by two users overlapped, we'll have a record of that in the trail.

    We might want to use some inter-process communication method to insure that only one timestamp is recorded at a time.

  • Contents
    It is useful to have certain common information recorded identically in every log entry, for example, start and stop times, person, citation, board, and action. Other information might be transaction specific.

3 Functionality in the New CiteMS

3.1 User authentication

3.1.1 Purpose

Determine who is able to use the system and what they are allowed to do.

3.1.2 Functions

Authentication requirements include:

  • Control access to the system at all
  • Control access to system functions
    Users with different roles need access to different functions.
  • Consider editorial board and summary topic in granting access to functions
    Sometimes access to a function is also dependent upon the user being associated with a specific editorial board or summary topic. See the section above on Data / Roles and Permissions.
  • Control access to user specific information, such as profiles
  • Provide single sign-on with EBMS and possibly other systems
    Many users will need access to both EBMS and CiteMS. It might be desirable if, when a user logs into one, he is automatically logged into the other.

    Or maybe not. Maybe this is not a big deal.

3.2 Import citations

3.2.1 Purpose

Load citations from PubMed into our system making some initial dispositions for each citation.

We do not now and may not ever import citations from any other source. However it inutitively seems like a bad idea to design the system in such a way that it is impossible to add citations from another source.

3.2.2 Functions

  • Search for new citations (outside CiteMS)
    The first step in importing citations is to search PubMed for new citations relevant to the mission of the editorial boards. This is done outside of and prior to any activity in the CiteMS itself. This process should continue to work as before, outside the system.
  • Import citations from PubMed search results file
    For the future, the file should be in XML format rather than Medline format.
  • Assign all citations to a review cycle
    Normally done for the entire batch at once. But see also notes on Review Cycles under Data above.
  • Assign each citation to one or more editorial board
    Normally but not necessarily, all citations entered in a batch might be assigned to one board with later additions or modifications made individually if needed.
  • Assign each citation to a Summary Topic
    In the most common case, searches are conducted for specific summary topics and all citations for one search are imported and automatically associated with that topic.

    However it is possible to import citations without specifically naming a summary topic and to assign them to topics afterwards. Any citation can be re-assigned to a different topic or can be assigned to additional topics after import.

  • De-duplicate imports
    Different PubMed searches will often retrieve overlapping sets of citations. The import program has to check each citation, currently by PubMed ID (but conceivably by other external ID if we ever support import from other sources) and not import citations that are already in the database.

    Question: If a citation imported for topic A is later imported again for topic B, does/should the system automatically assign the new topic B to this citation?

    We need to define exactly what happens in a de-duplication process. We may need to read code and run some imports in the test database to follow exactly what happens when a duplicate is discovered.

    The existing system does not currently allow a duplicate citation to re-appear in a board manager's input queue. This is not always for the best.

    Question: Can we figure out what needs to be done in these cases - both the default cases and the exceptional cases?

  • Replace pre-Medline citations

    Searches conducted at different times may produce "pre-Medline" that have not yet been indexed or reviewed for quality control. These records will be replaced in PubMed by full Medline records. The system needs to import these when they are available and replace the provisional records.

    Currently, this happens when two versions of a record appear in two different searches, or when a user (Minaxi) runs a report to identify pre-Medline records and download and import updates. However we might want to automatically download the full records when they become available without requiring searches or human intervention. This could be done in a batch, periodic process that connects directly to PubMed and retrieves the new versions. This might be a process that periodically updates records in general, whether or not the update is due to replacement of a pre-Medline record.

  • Managing special imports
    Most citations are imported as a result of searches for new citations in PubMed, but there are other ways that citations can be located for import.

    Sometimes retrospective searches are done. For example, a new search or an important refinement of a search might be executed asking for documents from the last X years instead of just the last month.

    Sometimes individual citations are imported, for example from a list supplied by a board manager or board member. In some cases these might be articles published before the inception of the old CiteMS that were never imported but are important in the history of cancer research.

    There are some special problems posed by special imports.

    Question: We need to list these and be sure we are properly handling them.

  • Import journal titles from PubMed
    This is not part of the current CiteMS but should be. Thousands of citations were loaded into the system with no journal title at all.

    When a PubMed citation is imported the citation contains an ISSN and an NLM journal title unique ID. If the title is not in our journal title table, then the system should automatically fetch and add the journal title to our journal title table, linking the citation to the title.

    There should probably also be batch processes that can be run from time to time, even if only once or a few times a year, to update the journal titles, replacing modified records by updated versions that match the journal title IDs of existing records.

3.3 Initial (pre-"publishing") review

3.3.1 Purpose

In the current system a user executes searches of PubMed and imports citations. Then each citation is reviewed for scientific value and relevance. Citations can be accepted or rejected during this review.

Rejected citations remain in the citation database but are not further processed. They do not appear in a board manager's queue of citations to review. They will not be seen again unless they are specifically called up later.

Accepted citations continue on to be seen by board managers and, perhaps, by board members.

This is considered to be part of the import process. Only when all citations have been reviewed the results of the review are "published". At that time the system forwards all accepted citations into the queues of board managers and the system turns to the next import cycle.

3.3.2 Functions

  • View citations that have not yet been marked as accepted or rejected
    The system presents a list of such citations to the reviewer who can mark them one way or the other, tag them, or leave them in the queue for later review.
  • Searching
    The review process has to be flexible with regard to the citations that are presented to the user and the order in which they are presented. When the work load is high, the reviewer might decide to look in the most likely places for important articles first. For example, she might search for articles from the most prestigious journals, or articles reporting the results of randomized, double blind, placebo controlled trials, or articles on particularly important topics of the day.

    Or alternatively, the user might just plow through the queue of citations in date or internal ID order.

  • "Publishing"
    The existing process retains all citations in a state where they are only visible to the import reviewer. When all of the citations have been reviewed, the reviewer "publishes" them, releasing the accepted citations for board manager review and she sends an email to the board managers informing them that the citations in the most recent review cycle are now available.

    We don't really know whether that is the best way to release citations. It might be. Or it might be desirable to release singly, immediately upon review, or in batches smaller than a complete import cycle.

    The batch release approach does have practical value. The reviewer can provisionally accept or reject an article and then change her mind based on what she sees in subsequent citations. By holding everything up, the early reviewed citations as well as the late reviewed citations can both benefit from the reviewer's slowly aquired knowledge of the entire review cycle batch.

    However, clearly, there are citations that are obviously acceptable. Keeping them in the import reviewer's queue holds up their availability to board managers and scientific reviewers.

    I therefore propose that we support batch publishing of citations but allow the reviewer to flexibly determine what is in the batch. For example, the reviewer might release an article immediately in a "batch" consisting of one citation, or use "tags" (q.v. in the Data section) to mark numbers of citations that are to be released together.

3.4 Initial CIPS review for full text retrieval

3.4.1 Purpose

A board manager looks at each imported citation and determines whether it is desirable to see the full text or not. Seeing the full text doesn't guarantee that the article will be sent on for review by board members, but it is a necessary prerequisite for such a review.

3.4.2 Functions

  • View citations which have not yet been marked for full text retrieval
    The system presents a list of citations to the board manager for which no full text retrieval decision has yet been made.

    It may be desirable to allow flexible sorting of these citations, for example: order by summary topic, by author, by journal title, or possibly other criteria.

    It may be desirable to allow the board manager to select the number of citations to display on one HTML "page", and whether to include abstracts. See search and display functions.

    Question: In the current system the citations are divided by review cycle. Will that be desirable in the new system? There are other review cycle issues and questions like this in other parts of this document.

  • Mark articles regarding full text retrieval
    For each citation, the system provides form controls to mark it as requiring full text retrieval or not requiring it. The user may select full text retrieval, no full text retrieval, or no action at all.
    • Mark for full text retrieval
      Enqueues a notification for this citation to the team that retrieves full text. The citation will continue in the work flow and will not appear again in the list of citations awaiting full text retrieval determination for this board.

      Question: Should the full text retrieval indication for a citation be board and summary topic specific?

      Assuming that it is specific either to a board or a board + topic, we should probably include information in the display showing the past history of full text retrieval for this citation. This might help a board manager in deciding whether to request it for the current board and topic, and also reduce the time and effort needed to retrieve the full text since we either already have it or are in the process of getting it for another board or topic.

    • Mark for no full text retrieval
      Marking for no full text retrieval means that the citation will remain in the CiteMS database but has reached the end of its work flow processing, at least for the current board and/or topic. It will not appear on any more task lists for this board and/or topic unless a user specifically calls it up for further processing.
    • Leave citation unmarked
      If no determination is made, the citation remains in the queue for full text retrieval determination. The next time the board manager asks to see citations awaiting such determination, this document will appear.
    • All / None / Reset controls
      The current system allows a user to select all citations in the list at once for marking or to reset all checkboxes back to default values. Similar controls might be useful in the new system.

3.5 Retrieving full text

3.5.1 Purpose

The full text of articles that may have implications for the work of the PDQ boards must be read. Full text is retrieved, printed and given to board managers who may proceed further with it.

In the past, the retrieval of full text required xerox copying of printed journal articles. Today, most of it is done by locating PDF format versions and printing them. For those articles for which a PDF format is not available, a request may be sent to the National Library of Medicine where the paper form article will be scanned to a PDF image and electronically transmitted to CIPS.

3.5.2 Functions

  • View the queue of citations awaiting full text retrieval
    It may be desirable for a user to be able to sort these, for example by journal or perhaps by source (Science Direct, Springer, Wiley, etc.) if we have that information for many of the journals.

    Question: How does the existing system signal the requirement for full text retrieval? Does the user who retrieves full text read a screen or must a piece of paper be printed? Should it always be printed? Is it a convenience for the person who retrieves full text or will some people like it and some not?

  • Produce lists of articles for which review is required
    There must be an intelligent way to organize these to assist in retrieval. Perhaps we can record which journals are available online, which are only on paper in the CIPS library, which may require access to the NIH library and then use that information to organize more efficient full text retrievals.
  • Alternative retrieval of full text
    Board managers all have quick and easy access to PubMed and to the full text of articles via NIH's full text retrieval services offered through PubMed links.

    Increasingly, they may not send a request out for full text. They just fetch it themselves. It's fast and easy. It avoids sending a request for full text to be searched for and printed when, after seeing it, the board manager decides not to use it.

    Right now, this appears to be an informal process that is more or less outside the system. We should probably incorporate it into the system so that, whoever retrieves the full text, status information and possibly the full text itself is appropriately recorded.

  • View and print "cover sheets"
    A "cover sheet" is a piece of paper to be attached to a full text printout with information associating the text with a citation in the CiteMS.

    A "cover sheet" is not the same as a "cover letter", which is a form sent out to board members who receive the full text and review it.

  • Mark a citation as having full text obtained
    Done by a staff member when the full text is available. The text is then physically routed to the board manager who requested it.

    Question: I presume that our operation is small enough that we don't need computerized tracking of the paper copy (e.g., Bonnie handed it off to Robin.) Is that right?

    Question: If multiple copies are required to send to multiple board members, when is the copying done? I presume that can't be done yet (i.e. at the time the full text is first aquired) because the board manager has not yet read the full text and determined that this article really does merit scientific review.

  • Store PDF of the article?
    Currently, the person that retrieves full text keeps the PDF file on a hard disk as well as printing paper copies. Some board managers also keep PDF files.

    Question: Would it be useful to store PDFs in a database, perhaps retrievable via the citation and any search that can find the citation?

    Question: Are there legal issues? If we are able to do it within NCI, are we also allowed to distribute the PDF electronically to reviewers, or must we send them paper? Are there special steps we should take regarding agreements with publishers or reviewers, etc.?

    If we do store the full text PDF for an article in a way that is accessible to the system, then all functions that display a citation might have a link to the full text just as they now have a PubMed ID hyperlink that takes them to the PubMed display of the citation.

    Question: If it turns out that we can legally store and access article PDF files for some journals but not others, is it worthwhile to keep information for each title saying whether we can do that and providing full text services around some titles but not others?

3.6 Select articles for scientific review

3.6.1 Purpose

After the full text of an article is retrieved, a board manager must decide whether the article should or should not be sent out for board member scientific review.

3.6.2 Functions

  • View citations that have not yet been marked for review or no review
    A CIPS reviewer should see the same information she sees when making the initial decision about whether to retrieve full text for a citation, and also see controls for entering the review decision and any desired comments.

    As with the initial review, this display is editorial board specific.

    There may be a link to the full text. See "Retrieving full text" above.

  • Mark an article for scientific review
    If the user clicks a control to request scientific review, a screen should appear to enable the article to be assigned to reviewers. See "Assign each citation to one or more reviewer" below.
  • Mark an article as not needing review
    As in the initial review before requesting full text retrieval, marking the article at this point ends the life cycle of the citation. It remains in the database but requires no further processing unless and until a user initiates a new review of the article.
  • Leave citation unmarked
    As in previous stages, leaving an article unmarked leaves it in its current state. It will appear again the next time a list of articles is generated to determine whether scientific review is warranted.

3.7 Assign each citation to one or more reviewer

3.7.1 Purpose

The system knows what board member reviewers are associated with each summary topic on each editorial board. It could automatically assign reviews to all of them. However at least some board managers make individual decisions, for at least some articles, about who should review the article.

The functions described below are intended to provide system support for an assignment process that incorporates individual board manager decisions.

3.7.2 Functions

  • The basis of assignments of citations to reviewers
    A number of factors are considered in determining who gets a citation for review.
    • Editorial board
      The citation is assigned to a member of the specific board for which the review is undertaken.

      Question: The training document says (bottom of page 3) that a staff member can make citation status changes for citations assigned to their own board or to the Adult Treatment board. What does that mean and why is it so?

    • Summary topic
      Each board member is recorded as having expertise in specific summary topics. A citation designated as belonging to a particular summary topic would normally be reviewed by the board members who are recorded as having expertise in those topics.

      There should not be any barrier preventing a board manager from sending an article to any board member for review, whether or not that member is recorded as being expert and interested in the particular topic of the article.

      A board manager cannot assign an article for review to someone who is not a member of her board. However she might forward an article to any board member for their information (typically marking it "FYI").

      Question: Should all the board members associated with a topic always receive every citation to which that topic has been assigned? Should that be automatic?

    • Review cycle?
      Question: What is the relationship of reviews to review cycle?

      The training document says that CIPS staff examines citations from the latest review cycle for each editorial board and summary topic. Shouldn't all citations for which a decision to review has been made, but no assignments been made, appear in any list of citations to be assigned, regardless of their review cycle?

    • Human decisions about who gets a citation
      The final decision about who will receive an article for review should probably always be made by a board manager. The system should produce a list of eligible board managers, possibly even with check marks defaulted on for all of those with an association with the summary topic for the article. But the board manager should have the option of altering assignments before submitting the decisions to the system.

      The board manager may make the decision based on such factors as how many board members are qualified to review the article, how many are needed, whether the article has special aspects to it of particular interest to someone who may or may not be associated with the specific summary topic, the current outstanding or past work load of individual board members, and what the history of board member review performance has been in the past.

  • Information presented to a board member for assignment
    It might be desirable to display all information about relevant board members to a board manager when she is making assignments. For example, the system might display a list of all members of the particular editorial board organized with those who are associated with a summary topic for this citation sorted at the top and the rest in alphabetical order.

    We could configure the system so that, when an article is first assigned for review, all of the reviewers who are associated with the relevant summary topic for the current editorial board will, by default, marked to receive the article for review. The board manager would only use the screen to make changes to the default settings.

    For each board member we could display something like the following information.

    • Name
    • Whether this citation has already been assigned to this board member
      This is userful if the board manager is considering adding to the list of members who are currently assigned to review the article.

      If we've configured the system to make default assignments, then all of the board members associated with the relevant summary topic will show up as assigned. If the screen is brought up later in order to add or delete a reviewer or FYI person, no defaulting will be done. Otherwise the defaults would overwrite decisions already made.

    • Whether the full text has already been sent [added 3-15-2011]
      If a board manager changes her mind about someone who has been assigned but full text has not yet been sent, the item should be removed from the queue for sending full text to that board member.

      This is all complicated by the Dropbox and other electronic delivery mechanisms for which "sent" and even "queued for sending" full text is an ambiguous concept.

    • Number of summary topics registered for this board member
    • Number of reviews ever assigned
    • Number of reviews ever completed
    • Number of reviews currently outstanding
      This might be just for the current review cycle, the last few cycles, or for a user entered date range.
    • Question
      Question: Is any of this useful? Do board members consider this sort of thing when making assignments? If so, is it all in their heads anyway and of no use in the system.

      The data to support this sort of display will be in the system anyway and should be fairly easy to retrieve if we want it.

  • Review assignment controls
    We might have a control for each board member on the form. For board members not currently assigned to review the article, the control allows the board manager either to assign the article to them for review, or to send the article to the person "FYI".

    For those members who are already assigned, the control might allow the board manager to remove the assignment or the FYI (this might be useful if someone hasn't yet sent out the full text to the FYI recipient and the board manager changes her mind about whether the article will be of interest for the reviewer.)

  • Additional FYI controls
    It is my understanding that a board manager will sometimes send an article as an FYI to a person who is not that manager's editorial board.

    We might support this capability, for example by listing all of the other board members after the end of the members of this board, each with a "Send FYI" control.

    If this is rarely used we might just have a control that a user can click to add such a list to the display so that it's not there when it's not needed.

  • Preparation of a cover letter?
    If a board manager sends a customized cover letter to one or more reviewers, the system might prepare a draft cover letter with standard information (see Route … below) to accompany the mailing or emailing.

    The user might then type in additional individual notes to include in the letter.

    Alternatively, the system might output a file that can be read by Microsoft Word, or whatever, for the board manager to completely customize as desired.

  • Completion of an assignment.
    When a user submits the reviewer assignment form, the article moves into the next stage for routing and the user is taken back to the screen for marking documents for review.

    It might be desirable to reconstruct the display to then only show other documents waiting for review assignment. The current article has now completed this stage and no longer belongs on a list of articles for which a decision is pending.

    If assignment is canceled instead of submitted the system should take the user back to the screen that includes the citation that was up for assignment. The citation has no longer been marked for review. It can be so marked again, taking the user to the assignment screen again, rejected for assignment, or left in the queue for later action.

  • A note on "correctability"
    In this and some other functions there will be cases where a decision might be reversed but a trail needs to be recorded in the system showing what was done. This trail needs to be separate from the current status information so as not to confuse the current status.

    For example, if a citation is marked for review by Joe Oncologist and then later it is decided to send it to Mary Geneticist instead, we may need to keep a historical record of the fact that it was assigned and then later de-assigned to Joe. Reports of how many articles have been assigned to Joe, or what is currently on his plate, shouldn't see the assignment. On the other hand, a board manager trying to trace the history of the disposition of this article needs to see that it was first assigned to Joe, then de-assigned and re-assigned to Mary, and needs to be able to see any associated comments.

    This is achievable if we keep separate status and audit trail tables and point our reports accordingly to the correct source for information.

3.8 Route materials to reviewers

3.8.1 Purpose

Assist in the preparation of materials for mailing or emailing to board member reviewers.

3.8.2 Functions

  • Communicate information to support staff
    For paper communications, support staff need to see what article is to be sent, who will get it, etc.

    The staff will have to procure the printout of the article for mailing.

  • Produce packages for reviewer
    • Paper packages
      If the reviewer prefers paper, or an article is only available in paper, a paper cover letter and perhaps a mailing label need to be printed.
    • Electronic "package"

      If the reviewer prefers electronic delivery the system may be able to prepare the emails and attachments.

      • Determine whether article is available electronically
        We may be able to store info with the journal to tell us this.

        Some journals pose no problems at all. The journal is "open" and free. Others may have copyright issues. See the separate section on PDFs.

      • Determine whether it is best to send a URL a PDF, or something else
        At least one board manager, Robin Harrison, is using the free "Dropbox" service to make information available to board members. This is convenient, easy to setup, costs nothing for the board members to use, transmits documents transparently without requiring action by the recipient, and avoids clogging email inboxes.

        It seems like a creative approach. Other creative approaches are also possible.

      • Produce a cover letter specific to this package
        See the section above on system support for producing cover letters. Cover letters and response cover sheets might be printed (if printing rather than communicating electronically) at the same time.
        • Addressed to person
        • Boilerplate
        • Descriptions of the content
          • For each article
            • Citation
            • Summary topic
            • Review cycle
            • Description of material (paper, url, page count?)
            • URL if appropriate
            • Special instructions?
            • Comments?
        • Produce return mailer for the response?
          For some reviewers they may want a return mailer. Others, maybe all(?) might prefer to return their comments online
    • Send email
      For emailings, the system can directly send the mail.

      We need to think about how to coordinate that with paper, Dropbox, or other delivery methods for the full text of the articles.

3.9 User/reviewer functions

3.9.1 Purpose

Board members do not directly access the existing system. Responses to articles are sent in on paper, or perhaps by fax or email.

We need functionality to support the entry of information received in that way and may also want functions that board members can use directly, through a web browser, to transmit their responses online.

3.9.2 Functions

  • CIPS staff functions
    For review responses received by mail, email, fax, etc., a CIPS staff member needs a function to enter the response into the system.

    The system should record the userids of both the board member who wrote the response and the CIPS staff member who entered the data in the audit trail.

  • Possible reviewer functions
    • Review citation history?
      See the history of actions and decisions relating to a citation - when was it imported, what review cycles, boards, topics, topics, reviewers have been associated with the citation, in what order.

      See what responses have been made by others in the system.

      Question: Should this be accessible to board member reviewers? Or should it only be accessible to CIPS and CIAT staff?

      Question: If this sort of information should be available to board members online, should it be in the CiteMS or EBMS (assuming those two are different)?

      Question: Should board members be able to see each other's responses online?

      Question: How does this relate to EBMS activities? It would seem desirable to handle issues very consistently between the two systems (if they are two systems) and possibly even make the two appear to be one system to board members.

    • Enter responses
      Responses include a combination of selecting from among a fixed list of dispositions that the reviewer is recommending, and free text notes.
    • Enter further comments and notes
      Just like CIPS users, board members should be able to correct mistakes.

      It may not be a good idea to allow them to delete or edit comments or notes they've already made. There could be problems in which someone reads a comment and acts on it, then the comment disappears.

      We might save all comments in the audit trail (q.v.) but that's probably not something that board members would have access to, and if we have access to a board members comments it seems right that he should be able to see whatever he did that we can see.

    • See his own history - what he's done in the past
    • See his current queue - what's waiting for him to do
    • See the Summary for which a citation is to be considered?
      This may be an EBMS function rather than a CiteMS function. We might, however, provide links to EBMS, or invoke an EBMS function to provide information to the user.
      • Link to cancer.gov to see the published version
      • Link to CDR to see QA or other version of current working doc
      • Link to CDR for old versions?

3.10 Task/queue management

3.10.1 Purpose:

There are multiple queues of outstanding tasks. See below for a list of them.

Ideally, these should all be managed in a coherent fashion, possibly sharing a significant amount of software and displays.

  • Logical queues
    Here are some queues or states that a citation can be in.

    See also the discussion of "Queues" in the section on data to maintain.

    • Imported, awaiting assignment to a Summary Topic
    • Awaiting initial board manager review
    • Imported, awaiting full text examination
    • Awaiting assignment to Summary and Reviewer
    • Ready to send to Reviewer
    • Out for review
    • Returned from review, awaiting final disposition

3.10.2 Functions

  • Task grouping?
    One task is composed of subtasks which may in turn have subtasks, recursively.

    Higher level task completes when all subtasks are complete.

  • Task editing
    We need to be able to edit tasks, deleting or modifying them after they're already enqueued.

3.11 View citation history

3.11.1 Purpose

It is necessary to be able to see every action taken regarding a citation. A citation history screen should show this, broken down into separate sections by editorial board.

3.11.2 Functions

  • View history
    Each of the actions taken should be listed in chronological order, showing the date, action, person, and comments.

    The existing system organizes the history by action, showing all actions of a given type in one block, ordered chronologically (I think), then all actions of the next logical type, and so on. Presumably this works very well for most citations.

    Other display orders might possibly be better for some citations or some uses. We need to think about whether alternative orderings are desirable.

  • Add new actions
    The existing system presents controls to the user right on the history display for adding a decision, status, or review response to the document. This seems like a very logical approach because it ensures that the user can see everything that has been done to a citation before doing something new.

    If there are some users who should be able to view history but not change it, then we might have the same history screen with and without user controls based on the role of the logged in user.

    The user manual for the existing system notes that actions must be added without skipping any steps in the workflow. We should not, for example, be able to record a review response for an article that has not been sent out for review.

    Question: This seems like a very convenient function but it would seem to result in two different ways to do some of the same actions. Is that a good thing? Is it avoidable?

3.12 Manage users

3.12.1 Purpose

Users enter or leave the system from time to time. System administrators and or board managers need to be able to add users, modify their profiles and inactivate them.

It may also be desirable for users who have no authority to manage records pertaining to other users to nevertheless have some ability to modify some aspects of their own profiles.

It is not clear that we need to implement any functions at all in this area in CiteMS since the new Electronic Board Member System should already do everything we need.

3.12.2 Functions

Functions required (perhaps via EBMS) include:

  • Add a user
  • Edit user profile
    Personal info, email address, system preferences, board membership, topics, etc.
  • Inactivate user
    We can't actually delete users from the system if any actions have been linked to them. Their records need to remain accessible, though they have to be inactivated so that they their userids cannot perform additional actions.

3.13 Search

3.13.1 Purpose

Find information in the system.

3.13.2 Functions

  • Citation searching
    The existing system has a very powerful search capability, providing the following functionality, all of which should be replicated in the new CiteMS.

    See the search form in the existing system. (http://citems-dev.nci.nih.gov/StaffSearch.asp)

    • Access points
      There are currently 18 different ways to search for citations on the search form. There wouldn't seem to be any reason not to provide all of them in the new system.
    • Value entry with wild cards
      For free text entry fields allow the use of wild cards in the search strings. Title searches, for example:

      "%robot%prostatectomy%"

      Some free text entry searches will be different in the new system. For example, if we use the XML citations from NLM much more author information is available, and it's available in separate author elements rather than jumbled together in a string.

      Character sets will also be different and we'll need to think about how to index authors and titles in non-English languages.

    • Combinatorial logic
      For selection list fields it is possible to select multiple values to be OR'd together.

      All fields are then AND'd together to perform the search. Allows user to combine all of the search criteria with boolean ANDs and sort the results by any of six fields.

    • Sorted output
      The existing system allows user selectable sort orders for the output.
  • Search for users
    An administrator or maybe all users should be able to search for users by various criteria
    • Search criteria
      • Name
      • Type of user (board member, manager, CIPS staff, CIAT, etc.)
      • Editorial board membership
      • Summary topic
  • Outputs to a file
    It's not a bad idea to have the search system, which is very flexible in the old system and should be equally so in the new, be able to direct output to a report file, probably also to a spreadsheet compatible file, possibly in CSV format.
  • Saving searches / reports
    It would be useful for a user to be able to save a search for re-execution whenever desired.

    NOTE: The existing CiteMS has some capabilities in this regard. I've seen it in the import system, maybe also the web based system. I need to study what's there.

    This should work both for citation searches and user searches.

    • Accessibility
      Saved search criteria might have different scopes, for example:
      • Local saves
        Created by a user for his own use. Saved in his private space with whatever name he assigns.
      • Global saves
        Created by a sysadmin (or user?) for anyone to use, accessible to all.
      • Permissions?
        Question: Do we need to store permissions with a search, i.e., this search is for Admin users, this one is for board members and admin users, this one is for CIPS staff, this one is for everyone?
  • NOT lists
    The existing system has the ability to exclude journal titles from a search results list based on the title being in an editorial board specific NOT list.

    It appears to be a useful capability and should be replicated.

3.14 Integration with the Board Member System

3.14.1 Purpose

Some data and some functionality is relevant to both EBMS and CiteMS. There might be some savings in both programming and data maintenance if that data and functionality is shared.

3.14.2 Functions

No design has been done yet for CiteMS. It may be that we won't want a separate CiteMS system at all but will simply add CiteMS data and functionality to the EBMS, making one integrated system. Or we might want two systems that share data, or two systems that exchange data. These are design issues for later resolution.

3.15 Email

3.15.1 Purpose

The existing system appears to make no use of email from the system itself. The new system could use email for some communications. Simple email functionality is easy to add and is available in pre-packaged modules in most modern programming environments.

3.16 Reports

3.16.1 Purpose

The existing system supports a number of reports that board managers and other staff members use in regular operations. Equivalent reports need to be available in the new system.

3.16.2 Functions

  • Canned reports
    There are currently ten reports available to board managers and, I think, some others accessible to people with other roles. Each of these reports accepts variable input parameters, typically allowing a user to limit retrievals by editorial board or review cycle.

    Equivalent reports need to be implemented.

  • Custom reports
    We have used a system in the CDR in which custom SQL queries can be named, saved, retrieved, and executed on demand. This approach has been pretty useful and is fairly easy to implement. It can be implemented securely by running all SQL queries with a user id that only has read access to the database.

    Not sure what reports should be treated as reports and what as just searches.

    We might possibly also want to implement a little more sophisticated approach in which a report designer can write queries with prompts and replaceable parameters.

    See saved searches above for another way to think about retrieving information using saved parameters.

3.17 Auditing

3.17.1 Purpose

We ought to have an audit trail for all changes that occur in the database. This is useful for management, security, and debugging.

A significant amount of auditing can be manageable to achieve if it's built in to the design, but can be extremely expensive to add-on later. However, even when it is designed in we will still have to make decisions about what is cost effective to keep.

If we have a comprehensive, integrated component for managing all queues in the system, it's possible that the queue manager might be the only place that needs to update the audit trail. If every "action" updates a queue and goes through the queue/workflow management software, then it may be that auditing can be implemented very simply and completely in one place.

See the "Audit trail" and Queues sections above.

3.17.2 Functions

  • Record entries
  • Retrieve information
    The most likely regular use of the audit trail is in management information and history reports. It would also be useful in debugging.

3.18 Help

3.18.1 Purpose

Training and assistance to users.

3.18.2 Functions

The existing system has help screens online. They appear to be essentially the same as the pages in the training manual.

There is no context sensitivity and no index. However the pages are organized hierarchically and it looks pretty easy to find anything a user might be looking for.

A similar capability may be very useful in the new system.

4 Design Notes

4.1 Relationship to the existing system

4.1.1 Existing functionality

When designing and programming parts of the new CiteMS, it is important to carefully compare the functionality in each new module with the equivalent functionality in the existing system. The new system should be able to do everything that the old system could do unless users have decided not to do it in the new system.

4.1.2 Existing data

We will probably need to load data from the existing system into the new one.

Data conversion is a complex topic beyond the scope of this document, but here are some notes regarding conversion.

  • Data from NLM
    Data from NLM can be loaded by going back to NLM. We have a PubMed ID for every citation. We can use them to download complete XML versions of each citation from NLM, picking up the latest form and fullest information for each citation.

    Each citation we retrieve from NLM will have an NLM journal title unique ID. We can use that to retrieve complete journal information to match every citation.

  • Data from EBMS or CDR
    It is possible that some board member and manager information can be loaded from EBMS or CDR. This is a complex issue because not all of the data we need will necessarily be in either of those other systems, possibly including historical data for people who are no longer associated with the editorial boards but who are linked to past actions and so may (or may not) need to be loaded into the new system.
  • Data from the old CiteMS
    All of the data pertaining to the history of a citation will have to be converted and imported.
  • Data cleanup
    Citation and journal data can be cleaned up programmatically be fetching new versions from NLM. Some other data may need more labor intensive cleanup. We often have, for example, multiple records in the system for the same user. We probably want to write scripts to merge these into a single record for one single person. There might be some other data that requires manual work to convert.

4.2 Functionality in all boards isn't exactly the same

We need to think about why this is so and whether and how it should be so in the new system.

Examples of differences seem to include:

4.2.1 Subgroups

Pediatric and Genetics boards have them, others don't.

4.2.2 Authorization to work with citations by board

"CIPS staff can only make changes to citations that are assigned to their editorial board or to the Adult Treatment editorial board." (Training document, page 3)

Question: Why is everyone able to make changes to citations in the Adult Treatment board?

4.2.3 Handling special board situations

I'd like not to put special code into the system that is board dependent. It would seem to be more flexible and maintainable to have capabilities that are turned on or off for specific boards rather than have "if board == X then …" logic hard-wired into the program code.

If we must hardwire logic then, ideally, it would be much better to have that code isolated in a collection of specialized routines that are separated from the main logic and invoked from the main logic by some sort of lookup rather than having the specialized logic embedded in the code.

Another approach is to subclass board functionality with almost all functionality in the superclass.

Whatever solution we adopt, it makes sense to think hard about alternatives before making a decision, and to emphasize isolation of non-generic code.

Note: this was a key design goal in the CDR, isolating any document type specific functionality away from the main C++ code. Most of it was isolated in XSLT scripts and Schema documents. This decision was eminently successful.

4.3 XML in the system

The existing system uses relational tables with integers and ASCII text. However, there are many cases in which XML might be a useful alternative. Possibilities include:

4.3.1 Citations

These are available from NLM in far richer forms than the old Medline format. There are more fields with more information and more international character sets.

4.3.2 Journals

Maybe the same issues here as for citations.

4.3.3 Saved and/or canned search and report specifications

It may be practical to produce a generic search and report module that uses specifications stored in XML to generate selections. For example, XML fields can contain:

  • SQL, with placeholders
  • Default values for arguments
  • Prompts to show to the user
  • Specification of row and column headers in outputs

4.3.4 Outputs

It may be desirable to make many of the outputs of the system XML rather than HTML or text. Final formatting for display can be done with XSLT on the server or CSS on the client, or both.

Possible advantages include:

  • Easier programming of outputs to serve both screen and print use
  • Easier handling of conversion to spreadsheet format
  • Maybe less re-programming to change output formatting?
  • Easier presentation of the same information for different user types?

4.4 Links to other systems

4.4.1 Should the database be shared with the EBMS?

It is my current understanding that the EBMS will be developed with a database that is independent of the CDR in the sense that data needed in the EBMS may be copied out of the CDR, but there will not be direct access to the CDR database. The reasons for this are beyond the scope of this document and will be taken for granted here.

However it is not clear that the CiteMS and EBMS need fully independent databases. It may be a good idea to use a single database for both with tables for summary topics, editorial boards, board members, and other objects that are shared between EBMS and CiteMS.

4.4.2 Should there be one EBMS/CiteMS system?

It might even be the case that the CiteMS and EBMS should be one single system sharing a number of modules as well as many database tables.

By using control tables that configure EBMS and CiteMS functions separately, it may be possible to get the same software to do the different tasks for each logical system differently.

Some of the modules that might conceivably be shared include:

  • User authentication
    For example, common management of logins and logouts, session management, groups/roles, permissions, etc.
  • User editing, update, profiling, etc.
    The user database will be almost the same with only a few users that use one logical system but not the other.
  • Task/queue management
    For example, common techniques for finding objects in a queue by date, user, action, or other parameters, common mechanisms for recording an action, completing a task, moving an object from one queue to the next queue, finding objects that are overdue for task completion, etc.
  • Auditing
    For example common techniques for recording actions and a single common data structure for all audit records.
  • Email
    For example common mechanisms for sending mail to people in the people table.
  • Indexing and searching
    For example using common index tables (like queryterm in the CDR), common search parsing and normalization, etc.
  • Report generation
    For example common storage and searching for reports, a common query parameter substitution mechanism, common execution mechanism, and maybe some common results formatting.
  • Some display management
    For example common headers and footers, CSS stylesheets, XML and HTML generation, Javascript libraries, etc.

4.5 Some user interface concepts

4.5.1 Same interfaces for CIPS and non-CIPS users?

I'm thinking that any functionality in common between these two groups should be implemented identically - same screens, same code. The differences should be expressed using additional stuff for each group rather than completely separate interfaces that wind up re-implementing the same functionality independently and, very likely, inconsistently. Board managers and other CIPS users would have no trouble assisting board members since they would, themselves, use the same interfaces.

4.5.2 Correctability

As many functions as possible should be designed with the understanding that it can be done wrong but mistakes can be corrected. This is not always the case in the current CiteMS.

Where practicable (and it's not always practicable), corrections are made using the same user interface as the original actions. That way, someone who knows how to do something knows where to go and what to do to undo it.

This is not a universal rule. There may be certain actions that only an adminstrator should be allowed to undo and there may be others for which it might not be a good ideal to allow them to be undone at all. However errors should always be correctable, even if the correction is not strictly and undo-ing of the original action.

Examples of commonly executed actions that should be correctable include:

  • Delete a citation that's been added (at least logically).
  • Re-assign a review to someone else.
  • Re-assign a citation to a different board or summary topic
  • etc.

4.5.3 Consistency

It's probably a good idea to design all of the interfaces to use similar concepts. Where Yes/No/No-action choices are available, it seems like a good idea to always use the same form controls (checkboxes, radio buttons, or whatever), to put the controls in the same place relative to citations that they control, to have consistent headers and footers on pages, etc.

We are fortunate to have a number of professional web designers available in OCE who can assist with user interface design.

The existing system seems to me to do a pretty good job of user interface consistency.

4.5.4 Dynamic HTML

The existing system, designed in a little earlier era of web development does a lot of web page jumps and back tracks. For example, a user might look at a list of citations for review. She wants to see the abstract for one of the citations and clicks a button. The system takes her to a new page with the abstract. Then she wants to see the status history. The system takes her to a new page. Then she wants to go back to the review list. She has to back up two pages or click another button and go through another jarring screen jump back to the original page - possibly positioned in a different place.

A more modern approach would be to use dynamic HTML using JavaScript and, possibly, AJAX. The system could display the page of citations for review. The user could click one to see an abstract. The screen could open in place to display the abstract in context. It could open again to see status in context. It could then close again, leaving the user right where she was, on the same citation, positioned where it had been.

This concept can be applied to any place where it makes sense to add more information in context.

4.6 Security

The system needs to be available to board members outside NIH.

4.6.1 No VPN required for board members

It should not require VPN access for functions that board members can perform. Using a VPN puts an onerous and almost certainly unacceptable burden on people who spend only a small amount of time using the system and who are essentially volunteers.

4.6.2 Database must still remain secure

We need to study this. The cancer.gov project may provide a model for how to do this. [I'm not a security expert and need advice on this aspect of the system design.]

4.7 Phased implementation

When we have a firm set of requirements and a design, we should assign priorities for implementing functions.

We can make a new system operational and begin getting benefits from it when the basic functionality is tested and working and add other functions while the system is in use. Examples of lower priority functions might include:

4.7.1 Batch interfaces to download/update modified NLM data

Programs to automatically updated journal titles and citations from NLM after changes would be needed, but not necessarily on day 1.

4.7.2 Reports

Some reports are critical. There are others we can live without for a while if necessary.

4.7.3 Low volume user interfaces

Some database changes such as new values in control tables and probably other things could be made by hand by a programmer until a user interface is ready.

Author: <alan@NCI-01802749>

Date: 2011-03-18 00:56:49

HTML generated by org-mode 6.33x in emacs 23