CDR Tickets

Issue Number 3651
Summary External Refs Report
Created 2013-08-28 13:24:05
Issue Type New Feature
Submitted By Osei-Poku, William (NIH/NCI) [C]
Assigned To alan
Status Closed
Resolved 2013-10-24 11:07:53
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.112536
Description

We need a new CDR report to be able to quickly identify external refs and changes to the page titles of the links. This will affect mostly summaries and glossaries but we want to be able to use it for other document types in the future. It will be similar to the URL check report so let me know if it will be better to modify the URL report to show only external refs.
Report details:
1. The report will be run by document type, audience and language.
2. The results should display the following info.
-CDR ID
-DOC TITLE
-URL
-CDR TITLE (From new attribute or element)*
-WEB PAGE TITLE (Indicate changes with different font color?)**
-CHANGES? (YES/NO) or just highlight changes in page title (WEB PAGE TITLE) **

  • We will need to modify the external ref element to add a new attribute or element called Page Title (I create a new ticket for the schema changes after we’ve discussed this report). We will copy the page title from the Cancer.gov source page into this attribute whenever we add a new external link. This can then be used to check for changes in the page title when the report is run. Then display any title changes in the report display. If an attribute cannot be used for this purpose then a new element is the obvious alternative.

    • If you can display changes in different font color, there will not be the need for the CHANGES? column.

3. FORMATS of Report: HTML and EXCEL

Comment entered 2013-08-28 18:48:39 by Englisch, Volker (NIH/NCI) [C]

This request sounds very similar to an earlier request we had discussed (OCECDR-3547) and identified that it will be very difficult to properly identify the title of a web page.

Comment entered 2013-08-29 08:19:57 by Osei-Poku, William (NIH/NCI) [C]

In that issue (OCECDR-3547), we said the reason why it will be difficult to check title changes is that we were not storing the text in the html title tags of the page in the CDR so in this issue, I have proposed capturing that piece of information with the proposed attribute or element. As stated above, we intend to copy the text from the title tags from the page source on Cancer.gov and store it in the new attribute/element so that you can use it to check for changes. If that is also not possible, then I guess a simple ad-hoc query identifying external refs will do.

Comment entered 2013-08-29 08:53:50 by Kline, Bob (NIH/NCI) [C]

Displaying the value you enter for the new attribute is easy. Detecting whether what the user considers the title to be on an external web page (remember, we're not talking about the html/head/title element) has the same obstacles identified in OCECDR-3547. It's not possible in the general case to reliably identify the portion of an HTML document which the user will perceive to be the title, as there are countless ways to give a string sufficient relative prominence (including the use of images with embedded text) that the user will mentally associate the string with the role of "title."

Comment entered 2013-08-29 09:21:51 by Osei-Poku, William (NIH/NCI) [C]

What I am proposing is that users will look at the html source page of the link on Cancer.gov, copy the text in the html/head/title element and store it in the new attribute so that the program can use that information to check for and report possible changes. Isn't it possible to match the text in the new attribute with the text in the html/head/title of the link for possible changes?

Comment entered 2013-08-29 09:30:44 by Kline, Bob (NIH/NCI) [C]

That's certainly possible, but my recollection of the discussions for OCECDR-3547 was that CIAT had decided that element wasn't a reliable source for the "title."

Comment entered 2013-09-24 15:49:33 by alan
I've taken over this issue from Volker.  He has more on his plate than I
do.

Reading over the issue, the comments, and the earlier issue OCECDR-3547,
I'm not certain we're all on the same page with the requirements.

Here's an example of a case where the title of a document is not
obtainable from the <html><head><title> element:

    html title:
        "Bladder Cancer Home Page - National Cancer Institute"

    Title that appears to the user on the page display:
        "Bladder Cancer"

The title that appears to the user comes from a deeply nested inner
element as follows:

    <html>
     <body>
      <div>
       <div>
        <div>
         <div class="document-title-block">
          <h1>

Here's another case with a little different title variation:

    html title:
        "How to Find a Cancer Treatment Trial - National Cancer Institute"

    Title that appears to the user:
        "How to Find a Cancer Treatment Trial: A 10-Step Guide"

If we store the html title in a CDR document that has an xref url to
these documents, and have the program validate that they match, that
won't tell us if the displayed title string stored in the
document-title-block changed.  If we store the display string in the CDR
document, the report program will report an error every time it is run.

I would guess that there are many more cases where the html title does
not exactly match the display title than there are cases where it does.
It might even be true in every single case.

Given that the <html><head><title> element is not displayed to the user
and might not be the best choice of text to put in the cdr:xref, would
it still be helpful to check if the html/head/title element has changed?

Incidentally, this report will be tricky to test in the CBIIT
environment because we can't test anything in DEV or QA that accesses
cancer.gov.  For the same reason, the existing URL check report also
won't run on DEV or QA.

It won't be impossible to test this, but it won't be easy.

[I mentioned this to Volker and he said (passing on a comment from
Brett), "Yeah, that sucks."]

If we decide to proceed with this report anyway, I don't think we can
get it done in time for this coming CDR release.

Comment entered 2013-09-26 23:59:23 by alan

As we decided at the status meeting today, I spoke to Bryan about this
issue. He wanted to better understand the use of the report. He
thought that, if he knew exactly what CIAT was trying to accomplish, he
might be able to find a better way to do it, possibly using some logic
in Percussion and/or Gatekeeper to do things that just can't be done in
the CDR.

It may be practical, for example, to have Percussion automatically
generate some kind of notice when something changes that CIAT wants to
be notified about.

That makes sense to me. I suggest we write up the requirements in a
more generic way, show them to Bryan and Blair, and then decide whether
this is best implemented in the CDR, in Percussion, or in some
combination of systems.

That means it won't be ready for this CDR release, but we may get a more
useful outcome by waiting.

Comment entered 2013-09-27 10:40:49 by Englisch, Volker (NIH/NCI) [C]

Although this may not be a bad idea since it would cover the majority of the links we shouldn't forget that the ExternalRef element isn't only linking to Cancer.gov and we have no influence on those "external ExternalRefs".
Given William's statement that CIAT would be very selective about adding the new attribute your proposed approach might not have any effect if the attribute is only included for non-Cancer.gov links.

Comment entered 2013-10-02 17:03:50 by Osei-Poku, William (NIH/NCI) [C]

We create links to resources that are outside of the CDR, in Summaries and Glossary terms, by
copying the URLs of the web pages and storing them in the CDR. At the same time, when creating
the links in the CDR, we also enter the web page display titles in the CDR. These stored titles
and URLs in the CDR need to be maintained as the pages and titles change. Sometimes it becomes
necessary to remove links (URLs) from the CDR when they no longer exist or when they have been comprehensively updated. This report is expected to help us easily
identify CDR records containing links that need to be removed or updated because the underlying web pages have been removed or updated. Majority of the links we create go to non PDQ pages on Cancer.gov. Since we record and store the web page display titles and URLs in the CDR, we want to be able to know when the titles have been updated in the target pages so that we can also update the recorded display
titles in the CDR accordingly.

Currently, there is no way to know if the title of a web page we've linked to or recorded in the CDR has changed. We either have to be informed by email or may be lucky to come
across it by chance.
We are hoping to run this report periodically to retrieve a list of pages whose titles have
changed so we can update the corresponding CDR records.

Clearly, the straightforward approach would be to try to match the display title on the web page
with the title that is stored in the CDR since they match one to one. But as we've noticed with
some of the web pages,
it is impossible to extract text from images that are embedded in the web pages. So, the option
left to us is to try and find a match between the display title and the title text in the <html><head><title> tag. While researching
this, we noticed that for most of the web pages, the display title is included in the <html><head><title>
text which makes it likely that if there is a change in the display title, it is also likely the
<html><head><title> text will be updated accordingly to include the updated display title. If this assumption
is true, then reporting the changes in the <html><head><title> in the selected or target web pages should be
sufficient. We do recognize that this is not a perfect
solution but it seems to be a better solution than what we are currently doing.

Comment entered 2013-10-03 15:02:05 by alan
Thanks for the analysis William.

It looks like we've got two problems we'd like to solve:

  1. Are any of the external links (cdr:xref attribute values) broken,
     i.e. the linked-to web page does not exist?

  2. Have any of the linked to documents changed in such a way that the
     content of the element with the xref needs to change?  For example,
     we might have an element with the content="Radiation therapy" but
     that document was split up so that there are now separate documents
     for separate types of radiation therapy and so the links need to be
     updated and the text of the reference to the document changed.

For the first goal, we don't need to use stored titles at all.  We can
just write a program that reads our query_term table to find xrefs,
check each one, and report any that timed out or returned error codes.
That's not a bad idea, though I'd want to check with cancer.gov to see
if they already use a program that does that and we just need them to
send us the reports that pertain to CDR documents.  If they're not using
such a program it might be better to write or procure one for them in
order to cover all of the cancer.gov documents, not just the ones
originating in the CDR.

The second goal is harder to achieve.  I think the approach suggested by
William won't find all of the problematic links and may find some that
are not problematic, but it might help and I can't think of anything
better to implement purely in the CDR.  We'd want to study some sample
documents that we link to to see if other elements like meta tags, or
other heuristics like document size, are worth checking.

For links to articles on cancer.gov stored in Percussion, there might be
something better.  From my memory of a conversation with Bryan he
suggested exploring the possibility of storing something in Percussion
that would help out.  We could, for example:

  a. Store an optional suggested title in Percussion, a change to which
     would trigger a report to owners of documents that link to this
     document.  

  b. Alternatively, we could store an optional date element in
     Percussion that records the date on which a major change was made
     that could affect documents that link to this document.   A change
     to the date could trigger a report.

These approaches would not help with external references outside
cancer.gov, and they would require cancer.gov work.  However they would
benefit other content providers in addition to the CDR.

If we do implement William's suggestion in the CDR, without bothering
cancer.gov, we have two ways to acquire the titles:

  a. Have CIAT staff capture the titles of linked documents that the
     staff thinks need to be checked and add them to the documents
     containing the links.
  
  b. Have a batch job that reads all of our xrefs (which can be done
     just using our query_term table), and gather all of the html head
     titles from the target links, reporting any that are new or
     changed.

The second approach would cover more documents than the first and
require much less effort by CIAT.  If we only need to check specific
links, we could do that by adding a generic attribute, e.g.,
checkTitle="Y" to each such link, without requiring CIAT to actually go
and check anything - again saving some labor.

I've added Bryan as a watcher on this issue to get his opinion about it.

Any thoughts?
Comment entered 2013-10-03 17:40:12 by Osei-Poku, William (NIH/NCI) [C]

We have an existing program that currently handles problem 1. It is called the URL check report. Here is the path to report -

CIAT/OCCM Staff/Reports/General Reports/23.URL Check (Batch job - runs ~15 min)

Comment entered 2013-10-08 14:04:47 by alan
I looked at a number of alternative ways to handle this but finally
decided that the best thing to do is to implement it as William
specified it.  If it turns out not to be perfect, we'll change it
afterwards.

It will be a partial modification of the URL Check report.  I'll either
use the same user interface and add some radio buttons to enable a check
of URL's or of page titles, or I'll just create a new, similar
interface that looks the same.

One trivial modification I propose to William's requested user interface
is a checkbox that says whether to include everything on the report, or
only include external reference page titles that do not match the title
stored in our documents.

I would also need a pair of radio buttons to specify either html or
Excel.  However I propose to implement the html output first and get it
working then add the Excel later if we still want it.  I think I can get
it working faster that way, and it might turn out that the mismatch-only
checkbox will make the report small enough in the most useful case that
Excel is superfluous.

The internal logic will have some of the same logic that Bob used to
find web pages corresponding with URLs but with different selection
criteria using a new index (see below for ExRefPageTitle) and different
actions to take when an externally referenced page is found.

I haven't found any evidence that PageTitle has been added to the
schemas, so I propose to add a new optional attribute "ExRefPageTitle"
to the ExternalRef element defined in CdrCommonBase.xml.  I like the
"ExRef" prefix because I want to index it with a general rule (see
below) and not have any danger of conflicting with some future attribute
named "PageTitle".

The general indexing rule I would add to the query_term_def table is:

   "//@ExRefPageTitle".

That will index every occurrence of this attribute in any document type
where it is added.  We'll be able to run the report on new document
types or new elements within a document type without having to change 
any indexing definitions.

A user will start the report the same way the way the URL Check report
is started, entering the parameters and submitting the job.

When the report runs it will work (at a high level) as follows:

    Select all of the ExRefPageTitle values that meet the user's
    selection criteria.

    For each one found:

        Fetch the corresponding external page.

        If it can't be retrieved:
        
            Add an error message to the report output.

        Else:

            Parse the page and extract the <html><head><title> content.

            Normalize the content:

                Replace each html tag inside the title with a single
                space.  This eliminates the problem of super and
                subscripts, bold, or other tags inside the title that
                cannot be easily stored in an attribute, and are
                probably not relevant to what we are trying to do.

                Normalize all runs of whitespace characters (spaces,
                tabs, newlines, carriage returns) each to a single
                whitespace.

                Trim away leading and trailing whitespace.

            Normalize the content of the ExRefPageTitle using the exact
            same algorithm.

            Perform a case insensitive comparison of the two strings.

            If the two strings differ:

                Add a row to the output reporting the difference.

            Else:

                If a full output is requested, i.e., not just
                differences:

                    Add a row to the output reporting that there is no
                    change.

I'm not sure how I'll test it yet.  The DEV server has no access to
external web pages, even to cancer.gov.  What I'll probably do is pick a
document and add or modify some cdr:xref elements to refer to a page in
the CDR itself.

That's my plan.  Unless I hear otherwise, I'll start implementation some
time this afternoon.
Comment entered 2013-10-09 22:45:28 by alan

I've completed a draft of the report and gotten a clean compile.

Tomorrow I plan to make the required schema change on DEV, create some test data, and start testing.

I made a small modification to the user interface for the URL Check report. The same user interface will launch either report, depending on the user selection of a radio button. The generation of the report itself is new code integrated into the CdrLongReports module and conforming to the interfaces and reporting conventions established there.

Comment entered 2013-10-11 01:41:43 by alan
I have completed initial testing of the new report and it seems to
work, however I have only a trivial amount of totally artificial 
data with which to test.

The DEV server is completely isolated from the rest of the world, even
from other computers at NIH.  That makes it impossible for me to
construct realistic test data that refers to pages on cancer.gov.

In lieu of that I made some modifications to the Summary document
CDR258032 as follows:

  Changed /Summary/SummaryMetaData/SummaryURL/@cdr:xref
   from:
    http://cancer.gov/cancertopics/pdq/screening/prostate/Patient
   to:
    https://localhost/CdrAdmin.html
    SourceTitle = "Cdr Administration"
  Changed /Summary/SummaryMetaData/MobileURL
   from
    http://m.cancer.gov/topics/testing-screening/bycancer/prostate/patient
   to:
    https://localhost/CdrAdmin.html
    SourceTitle = "CDR Administration is right, but this ain't it"

The two references that would have gone to prostate cancer pages on
cancer.gov now point instead to the CDR Admin page on the DEV server
itself (https://localhost/CdrAdmin.html).

The title of that page is "CDR Administration".

I stored two SourceTitle attributes in the two elements:

    One is correct except for having the case the letters changed ("Cdr"
    instead of "CDR").  This difference does not cause a mismatch in the
    title comparisons.

    The other element has additional characters that do cause a mismatch
    in title comparisons.

The match outcome is reported using colors.  The red title is a mismatch
of the stored title to the actual title found on the web page.  If the 
appropriate radio button is selected on the user interface, the report
only show Errors and Mismatches, not successful matches.

Here is some more information about the program:

    Errors, i.e., inability to connect to a server or to retrieve a web
    page in a reasonable time (30 seconds), are reported in the place
    where a title would normally be reported.  Page retrieval errors are
    in dark red.  (This is not tested yet - I'll work on that.)

    My reporting of retrieval errors is simpler than the reporting Bob
    did in the URL Check report.  I don't analyze why it didn't work to
    the depth that he did.  We already have his report for that and it
    seemed like an unnecessary duplication to reproduce it here.

    The program is designed to work with any of the document types that
    can contain an ExternalRef type element, but it has only been tested
    so far with this one Summary.

    Audience is only checked if the document type is Summary.  Language
    is only checked if it's Summary or a GlossaryTerm type.  If nothing
    is selected for those, then all audiences and languages are
    included.

    The program only looks at the query_term table, not the
    query_term_pub table.  It is therefore only checking current working
    documents not publishable versions of documents.  That seemed like
    what we would want but, if not, it's easy to change.

    I had said in an earlier comment that I would add an
    "ExRefPageTitle" attribute to the ExternalRef element type.  However
    I see Bob has already added "SourceTitle", so that's what I used.

It's possible that I can test this in a more realistic way from my
workstation instead of on DEV or PROD, but it will require some effort
to do that.  There are many issues.

When we are back at work, I'll talk to CBIIT about it.  If they open a
single hole in the firewall just to go out to cancer.gov it will make
testing much easier.
Comment entered 2013-10-11 08:45:15 by Englisch, Volker (NIH/NCI) [C]

I don't know if this makes sense but would it be any easier for a "pseudo test" to run this on bastion-2 which has Internet but not DB access?

Comment entered 2013-10-14 18:16:48 by alan
Volker wrote:

> I don't know if this makes sense but would it be any easier for a "pseudo test" to run this on 
> bastion-2 which has Internet but not DB access?

That's an interesting idea, but I think the work involved would turn out to be greater than doing what I did.

I've modelled successful page title comparisons and several different types of failures using DEV alone, 
so hopefully everything will work on PROD.

We'll find out.
Comment entered 2013-10-14 18:51:55 by alan

I've tested what I can think to test and it looks like everything is working.

I've put the modified code into svn.

Testing on DEV is possible. All document types except Summary will report that no documents were found matching the input criteria. Testing with English language Patient Summaries will show the test data that I created.

Comment entered 2013-10-14 18:53:26 by alan

Implementing the changes in QA and production requires the following steps:

1. Update the CdrCommonbase.xml schema to the latest version.

2. Update CheckUrls.py in the CGI directory to the latest version.

3. Update CdrLongReports.py in the Lib/Python directory to the latest version.

Comment entered 2013-10-14 18:56:41 by Osei-Poku, William (NIH/NCI) [C]

Is QA set up the same way as Dev? I was just wondering why we can't test on QA before putting it into production.

Comment entered 2013-10-14 19:05:01 by alan

Unfortunately, QA is also unable to see cancer.gov.

When the craziness subsides and we're back at work I'll talk to CBIIT to see if we can open a hole in the firewall for either DEV, QA, or both to at least communicate with cancer.gov.

I'm not very knowledgeable about security issues, but I would think that opening a hole that allows programs on DEV or QA to initiate a connection to cancer.gov and receive responses from it would be reasonably secure. We don't need to enable cancer.gov or any other servers other than the bastion hosts to initiate a connection to DEV or QA.

Comment entered 2013-10-14 21:12:38 by Englisch, Volker (NIH/NCI) [C]

When the craziness subsides and we're back at work I'll talk to CBIIT to see if we can open a hole in the firewall for either DEV, QA, or both to at least communicate with cancer.gov.

I know that you can connect to www-qa.cancer.gov (or whatever that machine is called). I had them poke a hole in the firewall in order to access the CSS for publish preview.

Comment entered 2013-10-14 21:28:34 by alan

> ... I had them poke a hole in the firewall in order to access the CSS for publish preview.

That's a good precedent. Maybe they'll do the same for us for this problem.

Comment entered 2013-10-14 21:50:09 by Englisch, Volker (NIH/NCI) [C]

I don't think so. I wanted them to let me connect to Cancer.gov but only after intense negotiations did they finally agree to let me go to qa.cancer.gov. Maybe you're more persuasive than I am.

Comment entered 2013-10-21 14:18:51 by alan

I consulted with Wenling Bao at CBIIT and she suggested using the DEV and QA versions of cancer.gov - which looks like an excellent idea.

What I therefore plan to do is modify the program so that if it's running on DEV it will change the URLs

from:
http://cancer.gov/ ....
to:
http://www-blue.dev.cancer.gov/ ...

and something similar for QA.

The data in the documents will still point to cancer.gov. It will just be the that the report will modify where the references go when not running in production.

The test won't be 100% perfect. I could make an error in the URL rewrite that causes it to fail on PROD, but I think the required modification should be simple enough that it will be fairly easy to guard against that kind of bug.

I'll post when it's ready.

Comment entered 2013-10-21 14:34:56 by alan

To make this work I'm going to put two new entries in cdrapphosts.rc:

CBIIT:DEV:CG:www-blue:dev.cancer.gov
CBIIT:QA:CG:www.qa.cancer.gov

I'll probably have to lengthen the timeouts too. The report will take a lot longer to run than in PROD.

Comment entered 2013-10-21 14:38:41 by Kline, Bob (NIH/NCI) [C]

Looks like the number of fields is mismatched between the two entries. Is "www-blue" supposed to part of the last field?

Comment entered 2013-10-21 15:33:26 by alan

Good catch.

Actually I think it may be the other way around, i.e., a colon, not a period, after www in the QA entry. The fields are org:tier:use:name:domain. So I really want:

CBIIT:DEV:CG:www-blue:dev.cancer.gov
CBIIT:QA:CG:www:qa.cancer.gov

"CG" is a new use, not now included in our .rc file.. I'll be using "CG" to stand for "cancer.gov".

The way I plan to implement the changes in the code, I won't need CG entries for PROD and maybe not for stage, if and when we get one of those. I haven't decided yet whether to make the entries anyway.

Comment entered 2013-10-22 14:10:00 by alan
I think I've got things working on DEV using the new technique.  When testing
on DEV or QA, any url pointing to www.cancer.gov or m.cancer.gov will be
transformed on the fly to point to one of the accessible sites:

    www-blue.dev.cancer.gov
    m-blue.dev.cancer.gov
    www.qa.cancer.gov
    m.qa.cancer.gov

The report displays the original urls from the url attribute in the
ExternalRef in the CDR document, not the transformed one.  For example, if a
url has the value:

    http://www.cancer.gov/cancertopics/types/bladder

That's what the report will show, and not the transformed url that was tested,
i.e.:

    http://www-blue.dev.cancer.gov/cancertopics/types/bladder

Bear in mind that data on DEV and QA is likely to be both volatile and out of
date, so testing may show errors that will not appear on PROD.

My testing is very limited.  I modified two more Summaries to test links to
cancer.gov and mobile.cancer.gov in addition to the other links I had.

I have not modified the old URL Check report.  It still won't work on DEV or
QA.  However since no changes were made to that I presume no new testing is
required.
Comment entered 2013-10-22 14:13:02 by alan

The latest notes on deploying the changes in QA and production are:

1. Update the CdrCommonbase.xml schema to the latest version.

2. Update CheckUrls.py in the CGI directory to the latest version.

3. Update CdrLongReports.py in the Lib/Python directory to the latest version.

4. Update cdr.py (in order to get the new MutateCGUrl class.)

5. For QA only, not needed in production:
Update d:\etc\cdrapphosts.rc to include the entries for CG and CGMOBILE.

Comment entered 2013-10-23 13:09:20 by Osei-Poku, William (NIH/NCI) [C]

I ran a the report a couple of times on DEV and they all ran successfully but there were no output to review, which I believe should be expected. We may have to wait a few weeks after the changes are put in production to see anything show up the report with regards to the new changes.

Comment entered 2013-10-24 10:21:28 by alan

All of the test data I created was with the following parameters

Doc Type: Summary
Language: English
Audience: Patients

If you try that one you should see 7 lines of output from 4 CDR documents for all titles.

Comment entered 2013-10-24 10:42:54 by Osei-Poku, William (NIH/NCI) [C]

I can see the test data now. It looks good to me.

Comment entered 2013-10-24 11:07:53 by Osei-Poku, William (NIH/NCI) [C]

Marked it as Resolved.

Comment entered 2013-10-24 11:08:12 by Osei-Poku, William (NIH/NCI) [C]

Verified on DEV.

Comment entered 2013-11-07 20:31:45 by alan

I did some more testing of this and found that I was looking at the wrong field (language) instead of the right field (UseWith) for finding GlossaryTermConcept titles. I fixed that.

I also made a trivial change to the user interface. It had validation code that required a user to choose audience and language if the document type is Summary or GlossaryTermConcept. I removed that validation if the user is trying to run our new report. Audience is irrelevant to the new report and I am allowing a user to pick a language or not as he or she pleases. If no language is selected I'll include everything in the output - both English and Spanish.

Comment entered 2013-12-05 11:39:20 by Osei-Poku, William (NIH/NCI) [C]

This one is proving difficult to test. We need to wait for changes on Cancer.gov before we will be able to see the changes in the report and since we don't know when the changes on Cancer.gov will happen, I tried to test by entering the source title in the document and then modifying it to see if the mismatch will show but it didn't work when I tried it. Do you have any ideas about how to test this? Or we just have to wait until there is an update ?

Comment entered 2013-12-05 13:04:17 by alan

Let's talk about this at the status meeting today. I think what you did should work but I may be misunderstanding your explanation - or maybe I understand it fine but there is a bug in the code.

Comment entered 2013-12-05 19:54:01 by alan

The problem was caused by my forgetting to add a new index term to the query term definitions on Prod for the SourceTitle attribute. The query that selects documents searches the query_term table for docs that meet the doctype and other criteria, and have a SourceTitle attribute. But there were none because I hadn't specified that attribute for indexing.

I did that and re-indexed Summaries and it now appears to work. It should also work for documents of all other types in which we create SourceTitle attribute because the query term definition applies to all doctypes, not just summaries. I only re-indexed Summaries on the assumption that no other documents have had SourceTitles inserted.

Comment entered 2013-12-06 08:08:43 by Osei-Poku, William (NIH/NCI) [C]

Yes. It worked for summaries but not GTCs. I actually added the source title to at least one of the GTCs so please re-index when you get the chance.

Comment entered 2013-12-06 10:15:24 by Englisch, Volker (NIH/NCI) [C]

Alan did mention that he only re-indexed the summaries, so unless you had added the title to the GTC document today you wouldn't have gotten a result for any other document type.

I'm in the process of running the re-index for GTCs this morning. Please double-check that GTCs are working in a couple of minutes.

Comment entered 2013-12-06 10:36:42 by Osei-Poku, William (NIH/NCI) [C]

Sure I will check later to see if it works.

Comment entered 2013-12-06 11:58:08 by Osei-Poku, William (NIH/NCI) [C]

worked for GTCs. Thanks!

Comment entered 2013-12-06 14:28:05 by alan

Sorry, I forgot that you said you had also modified some Glossary docs.

Elapsed: 0:00:00.001669