Issue Number | 3773 |
---|---|
Summary | [Summaries] Word Count of HP Summaries |
Created | 2014-05-29 15:47:49 |
Issue Type | Task |
Submitted By | Juthe, Robin (NIH/NCI) [E] |
Assigned To | Kline, Bob (NIH/NCI) [C] |
Status | Closed |
Resolved | 2014-05-29 16:48:11 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/ocecdr/issue.127642 |
As we discussed today, we would like to get an estimated word count of the text content (excluding references) for all of the English HP summaries.
As decided in our meeting, Bob will use the vendor output of the summaries and determine the word count including each of the text elements with the exception of the reference sections. He will create an Excel spreadsheet including the Summary Type, CDR ID, and Word Count for each summary. Thanks!
Adding Margaret to this issue.
I will treat hyphenated compounds as multiple words (discussed with Margaret). Same thing for slashes?
Also, should I skip "words" consisting solely of digits?
I just talked to Margaret and her answer is "yes" to both questions posed above.
A little under a million words for the English HP summaries.
Report posted.
Looks great Bob! Thank you for doing this so quickly.
A little under a million words for the English HP summaries.
Bob, did you include in your count the rule that a picture is worth a 1000 words?
File Name | Posted | User |
---|---|---|
ocecdr-3773.xlsx | 2014-05-29 16:47:36 |
Elapsed: 0:00:00.001746