Issue Number | 4 |
---|---|
Summary | [Search Database] AND connector for summary topics |
Created | 2013-09-17 08:50:45 |
Issue Type | Bug |
Submitted By | Juthe, Robin (NIH/NCI) [E] |
Assigned To | alan |
Status | Closed |
Resolved | 2014-06-12 15:24:21 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/oceebms/issue.113284 |
TIR #2199 created 2012-12-17 by Robin Juthe
The AND search connector for summary topics doesn't appear to be working. I get 0 results when using the AND connector with topics that have a lot of overlap.
Juthe, Robin (NIH/NCI) [E]No presence information (7/23/2013 6:37
PM): Moving back to new since some adjustments are needed.
Juthe, Robin (NIH/NCI) [E]No presence information (5/15/2013 4:54 PM): I
verified that the Boards are now ANDed (on QA) - this is good - and I
verified that AND and OR are working properly for summary topics.
However, when these are used in combination with each other, I ran into a problem. If I select multiple Boards (e.g., Supportive Care and Genetics) and multiple topics (e.g., Adjustment to Cancer and Psychosocial Aspects) with the OR connector, then the ANDing of the Boards is cancelled out.
Kline, Bob (NIH/NCI) [C]No presence information (4/17/2013 4:27 PM):
Hi, Robin. Another one that was left in "awaiting build" and assigned to
Sridhar. Can you confirm that this is fixed, as Alan says? Thanks!
Meyer, Alan (NIH/NCI) [C]No presence information (1/6/2013 10:19 PM):
This is now fixed on dev and qa.
In looking at it, I noticed that searching for multiple boards was implemented the same way multiple topics had been implemented, i.e., it OR'ed multiple boards in a search but did not AND them, which I believe it was supposed to do. So I fixed that too. Now if two or more boards are selected, only articles with at least one topic in ach of them will be selected.
I commented out the old code so that, if ever we are supposed to OR
multiple boards, I can re-instate that.
Kline, Bob (NIH/NCI) [C]No presence information (12/18/2012 4:54 PM):
Another search API issue.
Juthe, Robin (NIH/NCI) [E]No presence information (12/17/2012 5:06 PM): FYI, in case this is useful - two topics that have a lot of overlap and may be a good example for testing are: Cancer Genetics Risk Assessment and Counseling and Psychosocial Aspects of Cancer Genetics
I tested with the example given on (5/15/2013 4:54 PM), namely:
"If I select multiple Boards (e.g., Supportive Care and Genetics) and
multiple
topics (e.g., Adjustment to Cancer and Psychosocial Aspects) with the
OR
connector, then the ANDing of the Boards is cancelled out."
It appeared to work for me. Testing on DEV, the OR produced 6,604
hits. The
AND produced 146 hits, all of which had both topics - which can't
happen
unless both boards are involved. On QA the numbers were 7,003 and
155.
Is it possible that Bob fixed this but didn't record the fix in JIRA,
or that
the version Robin tested on 5/15 didn't have the latest fix from
me?
[Unlikely I should think.]
Or am I misinterpreting the nature of the bug? [More likely?]
Can we get a response to Alan's question from a year ago, or close this ticket?
The problem I cited above still applies. Here's another example.
I selected the following boards: Cancer Genetics & Pediatric
Tx
I selected the following topics (with the OR connector): Wilms Tumor
& Genetics of Kidney Cancer
I expected to see articles that are associated with both Boards but
associated with either summary topic.
Instead, I am seeing articles that are associated with either summary
topic and either Board. For example, PMID: 16575893 is associated with
the Peds Board only (I don't see any connection to the Genetics
Board).
I understand the problem now. I believe it's caused by an optimization
I put in the code that says that topics imply boards, if topics are
specified boards don't need to be further checked.
I think I can fix this without completely abandoning the optimization.
What I'll do is something like this:
If all topics specified point to a single board:
The topic screening is sufficient. No additional processing is
needed to limit the search to that specific board.
If two or more topics specify two or more boards:
If the topics are AND'ed together:
The topic screening is sufficient. No additional processing
is needed to limit the search to that specific board.
Else:
Perform the additional board searching.
For certain specialized searches, this will add to the time taken, but
the majority of searches should be unaffected.
I'll implement that and test it.
Upon reflection, I no longer think the solutions proposed in the
previous comment are correct.
A user could choose two boards and one topic. The user's intent here is
to find all of the articles that are about that one specific topic but
that also have an active state for both boards, one board with the
specified topic, the other board for ANY topic.
An example use case is a board manager trying to find out if any
articles on the topic of breast cancer appear in the Cancer Genetics
article queue. Breast cancer is not a Cancer Genetics topic, so there's
no way find these articles by topic without combining the two boards.
Searching by title substrings within the Cancer Genetics board is
possible but it's too likely to produce false and missed hits. What we
want is to choose two boards plus the summary topic.
So the solution is to just give up the optimization that I proposed for
any search that involves more than one board. This will slow down all
searches that choose more than one board plus at least one topic.
However that's probably a fairly small minority of searches.
If I'm right about what's going on, the fix should be easy. It may be
as simple as replacing one line of code from the
EbmsSearch.srchTopicsBoards() function. Specifically:
Change the following block:
// No topics specified. Any boards?
else {
// Separate join required for each board.
foreach ($this->boards as $boardId) {
$alias = 'topic_alias_' . $delta++;
$this->qry->join('ebms_article_state', $alias,
"$alias.article_id = art.article_id");
$this->qry->condition("$alias.board_id", $boardId);
}
}
To something like this:
// If one or more topics were selected, but they all came from the
// same board, we don't need to qualify the search by board.
// The topics already limit the results to just that one board.
// But if there were no topics or more than one board, we must limit
// the results by board.
if (empty($this->topics) || count($this->boards) > 1) {
// Separate join required for each board.
foreach ($this->boards as $boardId) {
$alias = 'topic_alias_' . $delta++;
$this->qry->join('ebms_article_state', $alias,
"$alias.article_id = art.article_id");
$this->qry->condition("$alias.board_id", $boardId);
}
}
It's possible that that's all I need to do to solve the problem.
However I'll put it aside for now and work on tasks for earlier
iterations.
If we don't get to the iteration where this task is listed I suggest we
at least test the above change and see if it solves the problem. If so,
it's a cheap fix that we should put into the release.
I didn't plan to implement this but I kept thinking and thinking about
it and realized that the plan I explained in the previous comment,
although on the right track, wasn't sufficient. There are complex cases
where multiple boardw are specified qualified by topic, and cases that
go through different logic when qualified by state (e.g., reviewer
decision, committee decision, etc.) I also re-thought out the
optimization and decided that the decision on how to handle states could
be made in a single routine and applied in multiple places.
Rather than write all of that down and have to go through all of the
thought processes again when it came time to implement it, I went ahead
and implemented it now.
It's installed on DEV, but not yet in version control - I need to
coordinate with Bob on that.
To test, look especially at the following cases:
1. Two boards and two topics specified.
2. Two boards and one topic specified.
3. Two boards and a state specified.
The behavior is expected to be different on DEV vs. PROD or QA.
For case 3 above the software will select all articles that have both
boards and have the specified state, but the specified state can appear
on either board or with both of them.
We have no sophisticated way in the search input form to specify that we
want articles selected for two boards, but with a decision in one
particular one. So a search of type 3 for Genetics and Pediatric
Treatment with a committee decision can turn up with a Genetics decision
or a Pediatric decision or both.
Search times are longer than the old way when multiple boards are
selected but coming up with wrong results quickly doesn't seem like an
attractive alternative.
Search times should NOT be longer when 0 or 1 board is selected.
Marking this as resolved fixed, ready for testing.
Alan:
You said in the 10 June comment that this wasn't checked into version control. I assume that's no longer true. Can you confirm my assumption here in the ticket?
Thanks.
The AND connector for summary topics with multiple board selections is working as expected so I'm marking this QA verified. Thanks, Alan!
> You said in the 10 June comment that this wasn't checked into version control. ...
I checked it in on June 17 and it has been deployed to QA and Stage as well as Dev. In my comment on June 10 I said that I wanted to coordinate with you (Bob) before checking it in. Perhaps we coordinated and and forgot that we did? Perhaps we didn't coordinate and I forgot that I wanted to? Perhaps I just got antsy about having a significant change not checked in while big development was starting?
Whatever. It is checked in and I believe it should be.
Thanks, just wanted to make sure the check-in got recorded in the ticket.
Verified on prod.
Elapsed: 0:00:00.000598