Issue Number | 5095 |
---|---|
Summary | [Media] New Image Demographic Information Report |
Created | 2022-01-13 12:58:55 |
Issue Type | New Feature |
Submitted By | Osei-Poku, William (NIH/NCI) [C] |
Assigned To | Kline, Bob (NIH/NCI) [C] |
Status | Closed |
Resolved | 2023-05-02 09:35:48 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/ocecdr/issue.308809 |
This is a placeholder for a new report for the new block of elements being developed in OCECDR-4961. I will provide the requirements next week.
Media Demographic Information Report_Updated.docx
I have attached the requirements document (MS Word Doc) and the sample reports in two MS Excel spreadsheets (one for each flavor of the report) for your review. The sample reports are also in the requirements document but they are not formatted well in Word.
What does "include module documents in ... the selection criteria" mean?
I think we want to have the option to either include or exclude module documents in the results.
In this context does "module document" mean "a summary document which can be used as a module" or "a summary document which can ONLY be used as a module"?
(It would be a good idea to always make that distinction clear in requirements for new tickets.)
Why is image category showing up twice in the selection criteria?
"Ability to search by the Image Category" (see first page of the requirements document)
Repeated under "Additional Requirements" (page 2 of the requirements document)
~duganal and ~juther — just want to make sure you're both aware that this is an expensive item and that you're both satisfied that its implementation is justified.
Display Option (For images)
i. All Images (please display all image (if any) thumbnails with both English and Spanish demographic information and thumbnails)
ii. English Only (please display only images in the English docs without Spanish demographic information or image thumbnails)
iii. Spanish Only (please display only images in the Spanish docs without English demographic information or image thumbnails)
Could you please explain what "... in the [language] docs without
[other language] demographic information or image thumbnails" would
mean? The demographic information block doesn't have any language
designation. Also, up until this point, I have assumed that when the
requirements referred to "thumbnails" they were referring to scaled
versions of the image generated on the fly by the report software, so
I'm having a hard time imagining what is meant by "without ... image
thumbnails." The only place where the word "thumbnail" occurs in the
Media
schema is as a member of the
DerivationMethod
valid value set. Since the software can
generate a thumbnail for any given image, the only
interpretation of this requirement would seem to be to exclude images
which don't have "thumbnail" as the value of the method
attribute of the optional FromMethod
element, which seems
far-fetched.
Please give very specific concrete examples of cases in which (a) images would be included, and (b) images would be excluded, covering all of the logical possibilities for which the software must make decisions to comply with this portion of the requirements.
~oseipokuw - just wanted to make sure you are getting notifications for the requirements questions posted for this ticket (they're blocking implementation).
Yes, I have been getting them. They are on my list to respond to them soon. Thanks for the heads up.
Also, just so you're aware, because the filtering requirements are so complex (for example, look for language and audience in these half-dozen places in the documents, with complex rules about what to do in case of conflicting values), don't be surprised if this report takes a long time to run.
Good. Probably a good idea to focus on the loose ends for the tickets we already have before adding new ones to the pile. 🙂
If you find it more appropriate to make it a batch report that is emailed to the user after completion, that is OK.
Could you please explain what "... in the [language] docs without [other language] demographic information or image thumbnails" would mean? The demographic information block doesn't have any language designation.
The media docs are language specific. That is, Spanish image docs don't share same documents as the English. So, the demographic information block for the Spanish will be in the Spanish media docs.
Also, up until this point, I have assumed that when the requirements referred to "thumbnails" they were referring to scaled versions of the image generated on the fly by the report software
Your assumptions are right here. We want to see a scaled down version of the images and not specific thumbnails stored in the media docs. I am not sure if this answers all the questions in your comments above so, please let me know.
Please give very specific concrete examples of cases in which (a) images would be included, and (b) images would be excluded, covering all of the logical possibilities for which the software must make decisions to comply with this portion of the requirements.
Sure. Will give it a try. Thanks!
They are talking about the same thing but the one under "Additional Requirements" provided more information. Please ignore one of them.
Thanks. Be sure to include examples which illustrate "without ... image thumbnail" cases.
Could you please explain what "... in the [language] docs without [other language] demographic information or image thumbnails" would mean? The demographic information block doesn't have any language designation.
All Images (please display all image (if any) thumbnails with both English and Spanish demographic information and thumbnails)
English Only (please display only images in the English docs without Spanish demographic information or image thumbnails)
Spanish Only (please display only images in the Spanish docs without English demographic information or image thumbnails)
The above quote is from the requirements document.
This report will be used by both the Spanish and English teams. So, depending on who is running the report and for what purpose, they may want to see either the English versions of the images (with the corresponding demographic information), or the Spanish versions of the Images (with corresponding demographic information), or Both the English and Spanish versions of the Images together (with corresponding demographic information). In that light we expect that there will be 3 display options for images, probably with radio buttons so that the user will select one of them.
All Images
English Only
Spanish Only
Note, you could skip over the information I put in parenthesis to make things clearer, but I put them in there to communicate that we want to also see the demographic information and not just the scaled down versions of the images.
If the user selects All Images, then "display scaled down versions of the images from both English and Spanish docs that have been retrieved as part of the selection criteria, (including the display of corresponding demographic information)". All in this case means English and Spanish Images (and demographic information). A more appropriate phrase would be "Both English and Spanish Images" from the retrieved documents.
If the user selects English Only then "display scaled down versions of images in the English docs (and not the images in the Spanish docs) for the retrieved documents, including the corresponding demographic information"
If the user selects Spanish Only then "display only images in the Spanish docs (and not images in the English docs) for the retrieved documents, including the corresponding demographic information"
Please let me know if this helps to clarify the requirements.
AHA! Now I think I know what you meant for this part of the requirements. You understand that "please display only images in the English docs without Spanish demographic information or image thumbnails" sounded like you wanted me find images in documents which do not have Spanish demographic information or image thumbnails, right?
I think part of the confusion stems from the fact that in your mind
(I'm basing this on comments you have made in recent conversations) each
image Media
document is specific to only one language—it's
either an English document, or a Spanish document. But this
understanding is not consistent with the structure of the
Media
documents, in which the document-wide
MediaLanguage
element is optional, and there are lots of
places throughout the schema which support (and in most cases, require)
a separate language designation for the different parts of the document,
which logically implies that it is expected that more than one language
can be represented in the various parts of a single Media
document. Does what I'm saying make sense?
To put it in simpler terms: if it's really true that each
Media
document is specific to exactly one language, then we
should
make the document-wide MediaLanguage
element
required; and
remove all the other language elements from the
Media
schema
That is right. In relatively rare cases some HP images that come from cancer journals do require copyright permissions and don't get translated. In such cases the English and Spanish share the same documents. So, yes, it is possible for more than one language to share the same media document.
Now that we've cleared up the wording in the last portion of the requirements document, can you explain how the "filtering by language" and "display options by language" parts of the requirements are supposed to interact? Please provide examples showing how (for example) selecting "Spanish Only" for the Display Options portion of the form would produce different report output than selecting "Spanish" in the Filter by Language portion of the form. I need to know exactly what you want the software to do.
Sure. It will be simpler to have only one language option in this report, in which case, if you select a language, only images in that language documents are displayed. If you select both languages, then images in both languages’ documents are displayed by default.
I think we wanted to be granular with the images display options, but it is not necessary in that no one would want to run the report for both languages but will only want to see images for just one of the languages.
So, we can remove the Display Options completely from the requirements document and just add a requirement that, depending on the language selected, display the images in that language and when both languages are selected, display images in both languages’ documents. I will update the requirements document accordingly.
Similar question about image categories. Why do we want two places where we ask the user which image categories to select for the report? What should the software do if the user says she wants to "search by" image categories A and B, and then further down indicates that she wants to "filter by" categories C and D? Surely we don't this much convoluted complication in the report. 😛
Just to be as transparent and explicit as possible: I don't see any response recorded in the ticket to my comment asking whether ~juther and ~duganal are OK with the LOE required for this ticket (though who knows what JIRA is hiding from me), so I'm proceeding with implementation of the request (when I get a response from ~oseipokuw to my latest questions). If that's wrong, please stop me now. 🙂
Manual task for this ticket: add the following paths to the
query_term_def
table (and reindex Media
docs):
/Media/DemographicInformation/Age
/Media/DemographicInformation/Sex
/Media/DemographicInformation/Race
/Media/DemographicInformation/SkinTone
/Media/DemographicInformation/Ethnicity
What am I to make of the fact that I see lots (by "lots" I mean
"thousands") of rows in the query_term
table for
Media
documents which have "es" as the
@language
attribute value, but no
TranslationOf
element, as well as documents with "en" in
the @language
attribute AND a TranslationOf
element? This is on the production server. Doesn't this undermine the
logic I'm being asked to use for identifying the language of a
Media
document? To complicate matters further, I see that
there are a bunch of Media
documents (again, on the
production server) which have both a @{}language{
}
attribute of "en" AND a @language
attribute with "es" which
doesn't inspire confidence in that attribute to reliably tell me whether
I'm looking at an English or a Spanish Media
document.
😛
I'd be delighted if you can show me that I've got a flaw in my SQL queries, and that the problems I think I'm seeing aren't real. But if there is such a flaw, I haven't been able to find it yet.
SELECT d.id AS "Media ID",
AS "Translation Path",
t.path AS "Language Path",
l.path value AS "Language"
l.FROM document d
JOIN query_term l ON l.doc_id = d.id
LEFT OUTER JOIN query_term t ON t.doc_id = d.id AND t.path = '/Media/TranslationOf/@cdr:ref'
WHERE l.path LIKE '/Media%@language'
AND ((t.path IS NOT NULL AND l.value = 'en') OR (t.path IS NULL AND l.value <> 'en'))
ORDER BY d.id
SELECT DISTINCT doc_id
FROM query_term
WHERE path LIKE '/Media%@language'
AND value = 'en'
AND doc_id IN (
SELECT DISTINCT doc_id
FROM query_term
WHERE path LIKE '/Media%@language'
AND value = 'es'
)
I'm inclined to think I should abandon any notion that I could use
the @language
attribute as a means of reliably determining
the language of a Media
document, and instead rely on the
presence (Spanish) or absence (English) of the
TranslationOf
element. I'm going to need that element
anyway, in order to pull together the English/Spanish document pairs
which have to appear on the same row of the report. Agreed?
I think you're retrieving all media documents instead of just images. I believe in the past we have used the Category element to limit it to one of the Media types.
In our last discussion about this ticket, we said that it would be OK to just get the barest minimum we can get from this report. For example, we can completely ignore the MS Excel option. We discussed including a link to the QC report to view the images. although having the images readily show would be a good feature to have. Would you be able to let us know which minimum solution you can implement with less LOE than the current estimated?
This is not intentional. We don't want two places where a user would choose the category. Specifying the category in one place should be OK. Thanks!
I don't understand how your comment about categories relates to my question about languages.
I understood your original question to be about the TranslationOf element not being able to reliably identify the language of a media document because your query was retrieving thousands or rows in the query_term table with different language attributes. And my response is that you're seeing all those rows because the query is looking at the whole universe of media terms in the CDR (Pronunciation, Meeting Recording etc). I think if you limit the query it to only Images, you will be able to see that you can identify the language of the Media Image doc by the TranslationOf element. Please let me know if this does not answer your question.
I think if you go back and read my comments in this thread, you'll
see that I was actually questioning the reliability of the
@language
attribute, which is where the original
requirements tell me to look first as my source for determining the
language, and that I was suggesting that we should use the
TranslationOf
element instead. Which is what you're
recommending, right? So we agree on the solution, but you just didn't
realize it. 🙂 (Perhaps JIRA was not showing you my 2023-01-06 11:34
comment.)
It would seem, in light of what you're telling me, that New Image Demographic Information Report would be a more suitable title for this ticket. Do you agree?
OK. Got it, and yes use the TranslationOf element. The revised report name is fine. Thanks!
I was referring to the title of the ticket, not the title of the report, though we can change that, too. Ticket title modified.
Installed on CDR DEV.
There seems to be a few problem with this report.
1. Some results include images that do not have a Demographic Information Block yet, when you select
Images
By Image Category and check "anatomy"
Any Audience
Any Language
Rest of criteria should all remain default
The results include this doc 780508 which is a blocked document and does not contain any demographic information block. It also include 810405 which does not have any demographic information.
2. One minor thing I see is that, when you select Image Category as part of your selection criteria and you run the report successfully. When you click the back button, Image Category is still correctly selected but the the options to select the specific image category is replaced by the title field even though the Selection Method is not "By Image Title". Please see attached image
3. Also, when selecting Age, Sex, Skin tone as part of your selection criteria, you get no results at all. Could it be that you need to re-index the documents before they will work?
4. Is it possible to have the option to exclude blocked documents? Or to exclude blocked documents altogether?
The new query term definitions were wiped out by the refresh. I have restored them and a reindex job is running. If, after that job completes (I will let you know when it has) you are still seeing combinations of criteria which do no produce the expected results, please provide explicit instructions for reproducing the problem. The requirements don't say anything about excluding blocked documents (or documents which meet the specified criteria but do not have any demographic information at all). If you want to change these requirements, put in a new ticket for a future release. The back button has never been a reliable way for returning to a dynamically populated browser form. If you want a button which redraws the form with a state which reflects the currently selected options, please add a ticket for implementing that enhancement in a subsequent release.
The re-index job has finished on QA.
This looks good on QA. I will create a ticket to enhance the report for Quinn.
Verified on PROD. Thanks!
File Name | Posted | User |
---|---|---|
Image_Cate_Title.PNG | 2023-04-25 21:36:17 | Osei-Poku, William (NIH/NCI) [C] |
Media Demographic Information Report_Updated.docx | 2022-03-24 13:31:25 | Osei-Poku, William (NIH/NCI) [C] |
Sample Summary Report.xlsx | 2022-03-24 13:32:05 | Osei-Poku, William (NIH/NCI) [C] |
Saple Media Report.xlsx | 2022-03-24 13:31:25 | Osei-Poku, William (NIH/NCI) [C] |
Elapsed: 0:00:00.001413