============================== ARTICLE 206010 ============================== 2010-04-13 08:23:22 Y 1 160 114 56 30 u'fast track' 2010-04-13 09:39:19 Y 1 160 114 56 30 u'fast tracked for lung, small cell and non-small cell' 2010-04-13 09:39:19 Y 1 80 114 56 30 u'fast tracked for lung, small cell and non-small cell' 2010-04-13 09:39:19 Y 1 136 114 56 30 u'fast tracked for lung, small cell and non-small cell' Should we: 1. eliminate the first row, ignoring the difference in the notes field? 2. collapse the first two rows with concatenated notes (note A; note B)? 3. keep all four rows? I vote for #3. ============================== ARTICLE 8407 ============================== 2002-08-09 13:55:59 Y 4 118 20 16 22 u'' 2002-08-09 13:59:00 Y 4 118 20 16 22 u'Note: Above mailer date incorrect..' This is an odd case. I'm inclined to keep both rows, because of the difference in the notes field. The problem is, the note refers to information we're not preserving. I guess that's OK. ============================== ARTICLE 9168 ============================== 2002-09-12 12:47:28 N 1 - 22 16 8 u'' 2002-10-11 16:10:57 N 1 - 22 16 8 u'' This one isn't really a question; I'm just going on record to say I'm planning to record both rows because of the different dates. ============================== ARTICLE 10938 ============================== 2003-07-28 12:15:53 Y 3 27 23 26 4 u'Added to August 2003 mailing by Marianne. The first mailing date was September 2002' 2003-07-28 12:15:54 Y 3 27 23 26 4 u'Added to August 2003 mailing by Marianne.' This is similar to the second case above, but a little more puzzling, since they were recorded less than a second apart, and there's no record in the history table anywhere of a 2002 mailing for this article (though the review cycle for all the article's history rows is August 2002). Again, I'm inclined to apply our principle of "keep it; we can always weed things out later." As a side note, the time stamp for this first row is going to come out as 2003-07-28 12:15:54, because the pywintypes or adodb libraries are doing a surprising (to me) rounding up of the fraction of a second if that fraction is greater than 1/2. I fixed it by hand above by eyeballing what was actually in the database, but I obviously won't be in a position to do that during the conversion. ============================== ARTICLE 9168 ============================== 2002-09-12 12:47:28 N 1 - 22 16 8 u'' 2002-10-11 16:10:57 N 1 - 22 16 8 u'' Again, not a question; just confirming that I'm going to preserve both rows because the dates are different. ============================== ARTICLE 11111 ============================== 2002-11-15 10:22:05 Y 5 165 23 16 23 u'' 2002-11-15 10:22:05 Y 5 143 23 16 23 u'' 2002-11-15 10:22:27 Y 5 165 23 16 23 u'' 2002-11-15 10:22:27 Y 5 143 23 16 23 u'' OK, here we go. If I stick with the logic used in this report, I'll keep all four rows, because the topic changes back and forth as I go through the sequence. It's a little inconsistent, though, to keep them all, knowing that had there only been two rows (for the same topic) the two would have been collapsed into one. So the question is, do we complicate the logic to detect this case and collapse even though the rows with the same topics are next to each other? I'm inclined to say yes to this question. It will complicate the logic more than you might think at first glance. If I find the sequence: 2002-11-15 10:00:00 Y 5 165 23 16 23 u'' 2002-11-15 11:00:00 N 5 - 23 16 23 u'' 2002-11-15 12:00:00 Y 5 165 23 16 23 u'' I'm going to keep all three rows (since we're not 100% certain about the semantics of a topic-less "no"), so I'll have to be just as careful not to collapse when other rows (for other topics) are intermixed). I'll also need to give some thought to what should happen if get rows for the same article for decisions other than "committee decision" mixed in between two "committee decision" rows on the same day for the same article/topic combo, identical in every aspect except the time stamp. That should almost certainly be processed without collapsing. ============================== ARTICLE 11240 ============================== 2002-09-20 12:32:41 Y 10 186 23 16 5 u'' 2002-09-20 12:33:08 Y 10 186 23 16 5 u'Sent by Pat independently' This is another example where I plan to avoid collapsing. We can override this decision later on by post-weeding (as noted above), but I'd rather not have a comment like "entered again to produce cover sheet" show up without the first entry being preserved, even if there was no comment at all in the first entry. So the general principle would be: don't collapse if anything but the timestamp is different, even if the only difference is that one has a comment and the other doesn't. ============================== ARTICLE 13238 ============================== 2003-01-02 12:38:40 Y 3 18 25 16 4 u'' 2003-01-02 12:38:40 Y 3 19 25 16 4 u'' 2003-01-02 12:39:19 Y 3 18 25 16 4 u'' 2003-01-02 12:39:20 Y 3 19 25 16 4 u'' 2003-01-02 12:39:31 Y 3 18 25 16 4 u'' 2003-01-02 12:39:32 Y 3 19 25 16 4 u'' 2003-01-02 12:41:31 Y 3 18 25 16 4 u'' 2003-01-02 12:41:32 Y 3 19 25 16 4 u'' This is an extreme example of the case illustrated above with article 11111. Makes the argument for complicating the logic in order to collapse these rows down to two even more compelling. ============================== ARTICLE 20447 ============================== 2003-04-21 11:44:32 Y 3 30 29 16 4 u'' 2005-01-06 12:45:47 Y 3 30 29 29 4 u'To be resent per Marianne.' 2005-01-21 10:48:11 Y 3 30 29 29 4 u'To be resent per Marianne.' We'll keep all three rows: each has its own date. ============================== ARTICLE 98493 ============================== 2006-06-29 11:27:31 Y 4 104 68 33 22 u'' 2006-06-29 11:27:32 Y 4 180 68 33 22 u'mistake - only to screening and prevention of lung cancer. -vrico' 2006-06-29 11:27:32 Y 4 105 68 33 22 u'mistake - only to screening and prevention of lung cancer. -vrico' 2006-06-29 11:27:32 Y 4 106 68 33 22 u'mistake - only to screening and prevention of lung cancer. -vrico' 2006-06-29 11:27:32 Y 4 106 68 33 22 u'' 2006-06-29 11:27:32 Y 4 107 68 33 22 u'mistake - only to screening and prevention of lung cancer. -vrico' 2006-06-29 11:27:32 Y 4 107 68 33 22 u'' 2006-06-29 11:27:32 Y 4 108 68 33 22 u'mistake - only to screening and prevention of lung cancer. -vrico' 2006-06-29 11:27:32 Y 4 108 68 33 22 u'' 2006-06-29 11:27:32 Y 4 183 68 33 22 u'mistake - only to screening and prevention of lung cancer. -vrico' 2006-06-29 11:27:32 Y 4 183 68 33 22 u'' 2006-06-29 11:27:32 Y 4 109 68 33 22 u'mistake - only to screening and prevention of lung cancer. -vrico' 2006-06-29 11:27:32 Y 4 109 68 33 22 u'' 2006-06-29 11:27:32 Y 4 110 68 33 22 u'mistake - only to screening and prevention of lung cancer. -vrico' 2006-06-29 11:27:32 Y 4 110 68 33 22 u'' 2006-06-29 11:27:32 Y 4 176 68 33 22 u'mistake - only to screening and prevention of lung cancer. -vrico' 2006-06-29 11:27:32 Y 4 176 68 33 22 u'' 2006-06-29 11:27:32 Y 4 112 68 33 22 u'mistake - only to screening and prevention of lung cancer. -vrico' 2006-06-29 11:27:32 Y 4 112 68 33 22 u'' 2006-06-29 11:27:32 Y 4 113 68 33 22 u'mistake - only to screening and prevention of lung cancer. -vrico' 2006-06-29 11:27:32 Y 4 113 68 33 22 u'' 2006-06-29 11:27:32 Y 4 114 68 33 22 u'mistake - only to screening and prevention of lung cancer. -vrico' 2006-06-29 11:27:32 Y 4 114 68 33 22 u'' 2006-06-29 11:27:32 Y 4 115 68 33 22 u'mistake - only to screening and prevention of lung cancer. -vrico' 2006-06-29 11:27:32 Y 4 115 68 33 22 u'' 2006-06-29 11:27:32 Y 4 116 68 33 22 u'mistake - only to screening and prevention of lung cancer. -vrico' 2006-06-29 11:27:32 Y 4 116 68 33 22 u'' 2006-06-29 11:27:33 Y 4 117 68 33 22 u'mistake - only to screening and prevention of lung cancer. -vrico' 2006-06-29 11:27:33 Y 4 118 68 33 22 u'mistake - only to screening and prevention of lung cancer. -vrico' 2006-06-29 11:27:33 Y 4 118 68 33 22 u'' 2006-06-29 11:27:33 Y 4 120 68 33 22 u'mistake - only to screening and prevention of lung cancer. -vrico' 2006-06-29 11:27:33 Y 4 120 68 33 22 u'' 2006-06-29 11:27:33 Y 4 121 68 33 22 u'mistake - only to screening and prevention of lung cancer. -vrico' 2006-06-29 11:27:33 Y 4 121 68 33 22 u'' 2006-06-29 11:27:33 Y 4 122 68 33 22 u'mistake - only to screening and prevention of lung cancer. -vrico' 2006-06-29 11:27:33 Y 4 122 68 33 22 u'' 2006-06-29 11:27:33 Y 4 123 68 33 22 u'mistake - only to screening and prevention of lung cancer. -vrico' 2006-06-29 11:27:33 Y 4 123 68 33 22 u'' 2006-06-29 11:27:33 Y 4 124 68 33 22 u'mistake - only to screening and prevention of lung cancer. -vrico' 2006-06-29 11:27:33 Y 4 124 68 33 22 u'' 2006-06-29 11:27:33 Y 4 125 68 33 22 u'mistake - only to screening and prevention of lung cancer. -vrico' 2006-06-29 11:27:33 Y 4 125 68 33 22 u'' 2006-06-29 11:27:33 Y 4 126 68 33 22 u'' 2006-06-29 11:27:43 Y 4 104 68 33 22 u'' Note sure what to make of this one. You can see from the comments that lots of the "yes" values should have been "no" but there's not a row anywhere for this article recording "no" as the decision. I'm going to preserve every row.