CDR Tickets

Issue Number 3773
Summary [Summaries] Word Count of HP Summaries
Created 2014-05-29 15:47:49
Issue Type Task
Submitted By Juthe, Robin (NIH/NCI) [E]
Assigned To Kline, Bob (NIH/NCI) [C]
Status Closed
Resolved 2014-05-29 16:48:11
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.127642
Description

As we discussed today, we would like to get an estimated word count of the text content (excluding references) for all of the English HP summaries.

As decided in our meeting, Bob will use the vendor output of the summaries and determine the word count including each of the text elements with the exception of the reference sections. He will create an Excel spreadsheet including the Summary Type, CDR ID, and Word Count for each summary. Thanks!

Comment entered 2014-05-29 15:48:12 by Juthe, Robin (NIH/NCI) [E]

Adding Margaret to this issue.

Comment entered 2014-05-29 16:02:58 by Kline, Bob (NIH/NCI) [C]

I will treat hyphenated compounds as multiple words (discussed with Margaret). Same thing for slashes?

Comment entered 2014-05-29 16:05:03 by Kline, Bob (NIH/NCI) [C]

Also, should I skip "words" consisting solely of digits?

Comment entered 2014-05-29 16:14:32 by Juthe, Robin (NIH/NCI) [E]

I just talked to Margaret and her answer is "yes" to both questions posed above.

Comment entered 2014-05-29 16:47:36 by Kline, Bob (NIH/NCI) [C]

A little under a million words for the English HP summaries.

Comment entered 2014-05-29 16:48:11 by Kline, Bob (NIH/NCI) [C]

Report posted.

Comment entered 2014-05-29 16:55:45 by Beckwith, Margaret (NIH/NCI) [E]

Looks great Bob! Thank you for doing this so quickly.

Comment entered 2014-05-29 18:03:43 by Englisch, Volker (NIH/NCI) [C]

A little under a million words for the English HP summaries.

Bob, did you include in your count the rule that a picture is worth a 1000 words?

Attachments
File Name Posted User
ocecdr-3773.xlsx 2014-05-29 16:47:36

Elapsed: 0:00:00.001746