CDR Tickets

Issue Number 5095
Summary [Media] New Image Demographic Information Report
Created 2022-01-13 12:58:55
Issue Type New Feature
Submitted By Osei-Poku, William (NIH/NCI) [C]
Assigned To Kline, Bob (NIH/NCI) [C]
Status Closed
Resolved 2023-05-02 09:35:48
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.308809
Description

This is a placeholder for a new report for the new block of elements being developed in  OCECDR-4961. I will provide the requirements next week.

Comment entered 2022-03-24 13:32:11 by Osei-Poku, William (NIH/NCI) [C]

Media Demographic Information Report_Updated.docx

Sample Summary Report.xlsx

 

I have attached the requirements document (MS Word Doc) and the sample reports in two MS Excel spreadsheets (one for each flavor of the report) for your review. The sample reports are also in the requirements document but they are not formatted well in Word.

Comment entered 2022-09-13 12:08:50 by Kline, Bob (NIH/NCI) [C]

What does "include module documents in ... the selection criteria" mean?

Comment entered 2022-09-13 15:45:05 by Osei-Poku, William (NIH/NCI) [C]

I think we want to have the option to either include or exclude module documents in the results.

Comment entered 2022-10-24 11:28:00 by Kline, Bob (NIH/NCI) [C]

In this context does "module document" mean "a summary document which can be used as a module" or "a summary document which can ONLY be used as a module"?

Comment entered 2022-10-24 11:30:43 by Kline, Bob (NIH/NCI) [C]

(It would be a good idea to always make that distinction clear in requirements for new tickets.)

Comment entered 2022-10-24 12:21:21 by Kline, Bob (NIH/NCI) [C]

Why is image category showing up twice in the selection criteria?

  1. "Ability to search by the Image Category" (see first page of the requirements document)

  2. Repeated under "Additional Requirements" (page 2 of the requirements document)

Comment entered 2022-10-24 12:25:28 by Kline, Bob (NIH/NCI) [C]

  and — just want to make sure you're both aware that this is an expensive item and that you're both satisfied that its implementation is justified.

Comment entered 2022-10-25 06:52:10 by Kline, Bob (NIH/NCI) [C]

Display Option (For images)

i. All Images (please display all image (if any) thumbnails with both English and Spanish demographic information and thumbnails)
ii. English Only (please display only images in the English docs without Spanish demographic information or image thumbnails)
iii. Spanish Only (please display only images in the Spanish docs without English demographic information or image thumbnails)

Could you please explain what "... in the [language] docs without [other language] demographic information or image thumbnails" would mean? The demographic information block doesn't have any language designation. Also, up until this point, I have assumed that when the requirements referred to "thumbnails" they were referring to scaled versions of the image generated on the fly by the report software, so I'm having a hard time imagining what is meant by "without ... image thumbnails." The only place where the word "thumbnail" occurs in the Media schema is as a member of the DerivationMethod valid value set. Since the software can generate a thumbnail for any given image, the only interpretation of this requirement would seem to be to exclude images which don't have "thumbnail" as the value of the method attribute of the optional FromMethod element, which seems far-fetched.

Please give very specific concrete examples of cases in which (a) images would be included, and (b) images would be excluded, covering all of the logical possibilities for which the software must make decisions to comply with this portion of the requirements.

Comment entered 2022-10-25 14:49:18 by Kline, Bob (NIH/NCI) [C]

- just wanted to make sure you are getting notifications for the requirements questions posted for this ticket (they're blocking implementation).

Comment entered 2022-10-25 15:01:57 by Osei-Poku, William (NIH/NCI) [C]

Yes, I have been getting them. They are on my list to respond to them soon. Thanks for the heads up.

Comment entered 2022-10-25 15:09:32 by Kline, Bob (NIH/NCI) [C]

Also, just so you're aware, because the filtering requirements are so complex (for example, look for language and audience in these half-dozen places in the documents, with complex rules about what to do in case of conflicting values), don't be surprised if this report takes a long time to run.

Comment entered 2022-10-25 15:15:46 by Kline, Bob (NIH/NCI) [C]

Good. Probably a good idea to focus on the loose ends for the tickets we already have before adding new ones to the pile. 🙂

Comment entered 2022-10-25 16:54:24 by Osei-Poku, William (NIH/NCI) [C]

If you find it more appropriate to make it a batch report that is emailed to the user after completion, that is OK.

Comment entered 2022-10-25 17:05:44 by Osei-Poku, William (NIH/NCI) [C]


Could you please explain what "... in the [language] docs without [other language] demographic information or image thumbnails" would mean? The demographic information block doesn't have any language designation. 

The media docs are language specific. That is, Spanish image docs don't share same documents as the English. So, the demographic information block for the Spanish will be in the Spanish media docs. 

 


 Also, up until this point, I have assumed that when the requirements referred to "thumbnails" they were referring to scaled versions of the image generated on the fly by the report software

Your assumptions are right here. We want to see a scaled down version of the images and not specific thumbnails stored in the media docs. I am not sure if this answers all the questions in your comments above so, please let me know. 

 


Please give very specific concrete examples of cases in which (a) images would be included, and (b) images would be excluded, covering all of the logical possibilities for which the software must make decisions to comply with this portion of the requirements.

Sure. Will give it a try. Thanks!

Comment entered 2022-10-25 17:09:46 by Osei-Poku, William (NIH/NCI) [C]

They are talking about the same thing but the one under "Additional Requirements" provided more information. Please ignore one of them.

Comment entered 2022-10-25 18:02:01 by Kline, Bob (NIH/NCI) [C]

Thanks. Be sure to include examples which illustrate "without ... image thumbnail" cases.

Comment entered 2022-10-26 15:28:33 by Osei-Poku, William (NIH/NCI) [C]


Could you please explain what "... in the [language] docs without [other language] demographic information or image thumbnails" would mean? The demographic information block doesn't have any language designation. 

 

  1. All Images (please display all image (if any) thumbnails with both English and Spanish demographic information and thumbnails

  2. English Only (please display only images in the English docs without Spanish demographic information or image thumbnails)

  3. Spanish Only (please display only images in the Spanish docs without English demographic information or image thumbnails)

The above quote is from the requirements document.

This report will be used by both the Spanish and English teams. So, depending on who is running the report and for what purpose, they may want to see either the English versions of the images (with the corresponding demographic information), or the Spanish versions of the Images (with corresponding demographic information), or Both the English and Spanish versions of the Images together (with corresponding demographic information). In that light we expect that there will be 3 display options for images, probably with radio buttons so that the user will select one of them.

  • All Images

  • English Only 

  • Spanish Only

Note, you could skip over the information I put in parenthesis to make things clearer, but I put them in there to communicate that we want to also see the demographic information and not just the scaled down versions of the images.

If the user selects All Images, then "display scaled down versions of the images from both English and Spanish docs that have been retrieved as part of the selection criteria, (including the display of corresponding demographic information)". All in this case means English and Spanish Images (and demographic information). A more appropriate phrase would be "Both English and Spanish Images" from the retrieved documents. 

If the user selects English Only then "display scaled down versions of images in the English docs (and not the images in the Spanish docs) for the retrieved documents, including the corresponding demographic information"

If the user selects Spanish Only then "display only images in the Spanish docs (and not images in the English docs) for the retrieved documents, including the corresponding demographic information"

 

Please let me know if this helps to clarify the requirements.

Comment entered 2022-10-26 16:06:03 by Kline, Bob (NIH/NCI) [C]

AHA! Now I think I know what you meant for this part of the requirements. You understand that "please display only images in the English docs without Spanish demographic information or image thumbnails" sounded like you wanted me find images in documents which do not have Spanish demographic information or image thumbnails, right?

 

I think part of the confusion stems from the fact that in your mind (I'm basing this on comments you have made in recent conversations) each image Media document is specific to only one language—it's either an English document, or a Spanish document. But this understanding is not consistent with the structure of the Media documents, in which the document-wide MediaLanguage element is optional, and there are lots of places throughout the schema which support (and in most cases, require) a separate language designation for the different parts of the document, which logically implies that it is expected that more than one language can be represented in the various parts of a single Media document. Does what I'm saying make sense?

Comment entered 2022-10-26 16:21:44 by Kline, Bob (NIH/NCI) [C]

To put it in simpler terms: if it's really true that each Media document is specific to exactly one language, then we should

  1. make the document-wide MediaLanguage element required; and

  2. remove all the other language elements from the Media schema

Comment entered 2022-10-26 17:11:41 by Osei-Poku, William (NIH/NCI) [C]

That is right. In relatively rare cases some HP images that come from cancer journals do require copyright permissions and don't get translated. In such cases the English and Spanish share the same documents. So, yes, it is possible for more than one language to share the same media document.

Comment entered 2022-10-27 07:26:48 by Kline, Bob (NIH/NCI) [C]

Now that we've cleared up the wording in the last portion of the requirements document, can you explain how the "filtering by language" and "display options by language" parts of the requirements are supposed to interact? Please provide examples showing how (for example) selecting "Spanish Only" for the Display Options portion of the form would produce different report output than selecting "Spanish" in the Filter by Language portion of the form. I need to know exactly what you want the software to do.

Comment entered 2022-10-27 13:16:33 by Osei-Poku, William (NIH/NCI) [C]

Sure. It will be simpler to have only one language option in this report, in which case, if you select a language, only images in that language documents are displayed. If you select both languages, then images in both languages’ documents are displayed by default.

I think we wanted to be granular with the images display options, but it is not necessary in that no one would want to run the report for both languages but will only want to see images for just one of the languages.

So, we can remove the Display Options completely from the requirements document and just add a requirement that, depending on the language selected, display the images in that language and when both languages are selected, display images in both languages’ documents. I will update the requirements document accordingly.

Comment entered 2023-01-05 13:19:47 by Kline, Bob (NIH/NCI) [C]

Similar question about image categories. Why do we want two places where we ask the user which image categories to select for the report? What should the software do if the user says she wants to "search by" image categories A and B, and then further down indicates that she wants to "filter by" categories C and D? Surely we don't this much convoluted complication in the report. 😛

Comment entered 2023-01-06 09:55:44 by Kline, Bob (NIH/NCI) [C]

Just to be as transparent and explicit as possible: I don't see any response recorded in the ticket to my comment asking whether and are OK with the LOE required for this ticket (though who knows what JIRA is hiding from me), so I'm proceeding with implementation of the request (when I get a response from to my latest questions). If that's wrong, please stop me now. 🙂

Comment entered 2023-01-06 10:50:08 by Kline, Bob (NIH/NCI) [C]

Manual task for this ticket: add the following paths to the query_term_def table (and reindex Media docs):

  • /Media/DemographicInformation/Age

  • /Media/DemographicInformation/Sex

  • /Media/DemographicInformation/Race

  • /Media/DemographicInformation/SkinTone

  • /Media/DemographicInformation/Ethnicity

Comment entered 2023-01-06 11:17:35 by Kline, Bob (NIH/NCI) [C]

What am I to make of the fact that I see lots (by "lots" I mean "thousands") of rows in the query_term table for Media documents which have "es" as the @language attribute value, but no TranslationOf element, as well as documents with "en" in the @language attribute AND a TranslationOf element? This is on the production server. Doesn't this undermine the logic I'm being asked to use for identifying the language of a Media document? To complicate matters further, I see that there are a bunch of Media documents (again, on the production server) which have both a @{}language{} attribute of "en" AND a @language attribute with "es" which doesn't inspire confidence in that attribute to reliably tell me whether I'm looking at an English or a Spanish Media document. 😛

I'd be delighted if you can show me that I've got a flaw in my SQL queries, and that the problems I think I'm seeing aren't real. But if there is such a flaw, I haven't been able to find it yet.

 

SELECT d.id AS "Media ID", 
       t.path AS "Translation Path",
       l.path AS "Language Path",
       l.value AS "Language"
FROM document d
JOIN query_term l ON l.doc_id = d.id
LEFT OUTER JOIN query_term t ON t.doc_id = d.id AND t.path = '/Media/TranslationOf/@cdr:ref'
WHERE l.path LIKE '/Media%@language'
AND ((t.path IS NOT NULL AND l.value = 'en') OR (t.path IS NULL AND l.value <> 'en'))
ORDER BY d.id
SELECT DISTINCT doc_id 
FROM query_term
WHERE path LIKE '/Media%@language'
AND value = 'en'
AND doc_id IN (
    SELECT DISTINCT doc_id
    FROM query_term
    WHERE path LIKE '/Media%@language'
    AND value = 'es'
)
Comment entered 2023-01-06 11:34:23 by Kline, Bob (NIH/NCI) [C]

I'm inclined to think I should abandon any notion that I could use the @language attribute as a means of reliably determining the language of a Media document, and instead rely on the presence (Spanish) or absence (English) of the TranslationOf element. I'm going to need that element anyway, in order to pull together the English/Spanish document pairs which have to appear on the same row of the report. Agreed?

Comment entered 2023-01-09 09:49:29 by Osei-Poku, William (NIH/NCI) [C]

I think you're retrieving all media documents instead of just images. I believe in the past we have used the Category element to limit it to one of the Media types.

Comment entered 2023-01-09 09:57:51 by Osei-Poku, William (NIH/NCI) [C]

In our last discussion about this ticket, we said that it would be OK to just get the barest minimum we can get from this report. For example, we can completely ignore the MS Excel option. We discussed including a link to the QC report to view the images. although having the images readily show would be a good feature to have. Would you be able to let us know which minimum solution you can implement with less LOE than the current estimated?

Comment entered 2023-01-09 10:24:17 by Osei-Poku, William (NIH/NCI) [C]

This is not intentional. We don't want two places where a user would choose the category. Specifying the category in one place should be OK. Thanks!

Comment entered 2023-01-09 10:34:50 by Kline, Bob (NIH/NCI) [C]

I don't understand how your comment about categories relates to my question about languages.

Comment entered 2023-01-09 12:09:51 by Osei-Poku, William (NIH/NCI) [C]

I understood your original question to be about the TranslationOf element not being able to reliably identify the language of a media document because your query was retrieving thousands or rows in the query_term table with different language attributes. And my response is that you're seeing all those rows because the query is looking at the whole universe of media terms in the CDR (Pronunciation, Meeting Recording etc). I think if you limit the query it to only Images, you will be able to see that you can identify the language of the Media Image doc by the TranslationOf element.  Please let me know if this does not answer your question.

Comment entered 2023-01-09 12:57:04 by Kline, Bob (NIH/NCI) [C]

I think if you go back and read my comments in this thread, you'll see that I was actually questioning the reliability of the @language attribute, which is where the original requirements tell me to look first as my source for determining the language, and that I was suggesting that we should use the TranslationOf element instead. Which is what you're recommending, right? So we agree on the solution, but you just didn't realize it. 🙂 (Perhaps JIRA was not showing you my 2023-01-06 11:34 comment.)

It would seem, in light of what you're telling me, that New Image Demographic Information Report would be a more suitable title for this ticket. Do you agree?

Comment entered 2023-01-09 13:20:52 by Osei-Poku, William (NIH/NCI) [C]

OK. Got it, and yes use the TranslationOf element. The revised report name is fine. Thanks!

Comment entered 2023-01-09 13:48:27 by Kline, Bob (NIH/NCI) [C]

I was referring to the title of the ticket, not the title of the report, though we can change that, too. Ticket title modified.

Comment entered 2023-01-23 09:17:53 by Kline, Bob (NIH/NCI) [C]

Installed on CDR DEV.

Comment entered 2023-03-07 09:55:19 by Kline, Bob (NIH/NCI) [C]
Comment entered 2023-04-25 21:36:49 by Osei-Poku, William (NIH/NCI) [C]

There seems to be a few problem with this report. 

1. Some results include images that do not have a Demographic Information Block yet, when you select 

  • Images

  • By Image Category and check "anatomy"

  • Any Audience

  • Any Language

  • Rest of criteria should all remain default

The results include this  doc 780508 which is a blocked document and does not contain any demographic information block. It also include 810405 which does not have any demographic information. 

2. One minor thing I see is that, when you select  Image Category as part of your selection criteria and you run the report successfully. When you click the back button, Image Category is still correctly selected but the the options to select the specific image category is replaced by the title field even though the Selection Method is not "By Image Title".   Please see attached image

3. Also, when selecting Age, Sex, Skin tone as part of your selection criteria, you get no results at all. Could it be that you need to re-index the documents before they will work?

4. Is it possible to have the option to exclude blocked documents? Or to exclude blocked documents altogether?

Comment entered 2023-04-26 09:30:19 by Kline, Bob (NIH/NCI) [C]

The new query term definitions were wiped out by the refresh. I have restored them and a reindex job is running. If, after that job completes (I will let you know when it has) you are still seeing combinations of criteria which do no produce the expected results, please provide explicit instructions for reproducing the problem. The requirements don't say anything about excluding blocked documents (or documents which meet the specified criteria but do not have any demographic information at all). If you want to change these requirements, put in a new ticket for a future release. The back button has never been a reliable way for returning to a dynamically populated browser form. If you want a button which redraws the form with a state which reflects the currently selected options, please add a ticket for implementing that enhancement in a subsequent release.

Comment entered 2023-04-26 10:44:17 by Kline, Bob (NIH/NCI) [C]

The re-index job has finished on QA.

Comment entered 2023-05-02 09:35:35 by Osei-Poku, William (NIH/NCI) [C]

This looks good on QA. I will create a ticket to enhance the report for Quinn.

Comment entered 2023-06-12 20:26:30 by Osei-Poku, William (NIH/NCI) [C]

Verified on PROD. Thanks!

Attachments
File Name Posted User
Image_Cate_Title.PNG 2023-04-25 21:36:17 Osei-Poku, William (NIH/NCI) [C]
Media Demographic Information Report_Updated.docx 2022-03-24 13:31:25 Osei-Poku, William (NIH/NCI) [C]
Sample Summary Report.xlsx 2022-03-24 13:32:05 Osei-Poku, William (NIH/NCI) [C]
Saple Media Report.xlsx 2022-03-24 13:31:25 Osei-Poku, William (NIH/NCI) [C]

Elapsed: 0:00:00.001413