Issue Number | 4469 |
---|---|
Summary | Character count in XMetal |
Created | 2018-05-09 15:16:32 |
Issue Type | New Feature |
Submitted By | Osei-Poku, William (NIH/NCI) [C] |
Assigned To | Kline, Bob (NIH/NCI) [C] |
Status | Closed |
Resolved | 2018-05-18 12:26:53 |
Resolution | Fixed |
Path | /home/bkline/backups/jira/ocecdr/issue.225724 |
We want a new feature in XMetal to be able to do a word count or character count or both. If possible, the new feature should be applied to any element that can take free text. If there are words in markup, only the words or characters should be counted. That is, the markup or tags should be ignored and only the text within the markup should be counted.
We discussed having the count apply to selected text. It could be used for a single element or more than one element. The count will include the number of characters within a given selection. We've decided not to add a word count at this time.
I implemented the least expensive approach, counting the non-markup
characters in the javascript macro directly. It was fast enough for the
easy cases (too fast to measure with a stopwatch for an 818-character
SummaryAbstract
block), but started to get a little more
painful for more substantial blocks (a little over 13 seconds for a
2,962-character SummarySection
block) and then the times
start to shoot up (over three minutes for even a small 10,575-character
liver cancer prevention patient summary when the entire
Summary
element is selected). So even if the expectation is
that the common usage for this macro is for smaller blocks, I'm going to
implement this at a lower level. I'd rather not have inadvertent
invocations with large selections block the user from working for that
long.
Ah, much better. I implemented the count logic in C++ inside the DLL,
and it's many orders of magnitude faster than XMetaL's
JavaScript interpreter. It took 0.203 seconds to count the 642,789
non-markup characters in the Summary
block for CDR62855.
Installed on DEV.
... many orders of magnitude faster ...
By that I mean between four and five orders of magnitude faster. What took 183 seconds using Just Systems' JavaScript interpreter (counting the 10,575 non-markup characters in the liver cancer prevention patient summary) took 0.005 seconds for my C++ implementation, or about 36,600 times faster. Reassuring to know that both implementations consistently came up with the same answers.
I don't think I'll repeat the benchmark comparison using CDR62855. :-)
It looks like spaces are counted by default. Is it possible to have the option to count or not count spaces?
That would be unusual. The standard approach to counting characters includes all characters.
I have been spoiled by MS Word 😃. MS Word counts both with spaces and without spaces. However, if that can not be implemented in XMetal, that is fine. As long as users know that spaces are counted, that should be fine.
Verified on DEV.
Verified on QA. Thanks!
Verified on PROD. Thanks!
Elapsed: 0:00:00.001873