CDR Tickets

Issue Number 4469
Summary Character count in XMetal
Created 2018-05-09 15:16:32
Issue Type New Feature
Submitted By Osei-Poku, William (NIH/NCI) [C]
Assigned To Kline, Bob (NIH/NCI) [C]
Status Closed
Resolved 2018-05-18 12:26:53
Resolution Fixed
Path /home/bkline/backups/jira/ocecdr/issue.225724
Description

We want a new feature in XMetal to be able to do a word count or character count or both. If possible, the new feature should be applied to any element that can take free text. If there are words in markup, only the words or characters should be counted. That is, the markup or tags should be ignored and only the text within the markup should be counted.

Comment entered 2018-05-10 14:31:05 by Juthe, Robin (NIH/NCI) [E]

We discussed having the count apply to selected text. It could be used for a single element or more than one element. The count will include the number of characters within a given selection. We've decided not to add a word count at this time.

Comment entered 2018-05-18 11:53:41 by Kline, Bob (NIH/NCI) [C]

I implemented the least expensive approach, counting the non-markup characters in the javascript macro directly. It was fast enough for the easy cases (too fast to measure with a stopwatch for an 818-character SummaryAbstract block), but started to get a little more painful for more substantial blocks (a little over 13 seconds for a 2,962-character SummarySection block) and then the times start to shoot up (over three minutes for even a small 10,575-character liver cancer prevention patient summary when the entire Summary element is selected). So even if the expectation is that the common usage for this macro is for smaller blocks, I'm going to implement this at a lower level. I'd rather not have inadvertent invocations with large selections block the user from working for that long.

Comment entered 2018-05-18 12:26:09 by Kline, Bob (NIH/NCI) [C]

Ah, much better. I implemented the count logic in C++ inside the DLL, and it's many orders of magnitude faster than XMetaL's JavaScript interpreter. It took 0.203 seconds to count the 642,789 non-markup characters in the Summary block for CDR62855. Installed on DEV.

https://github.com/NCIOCPL/cdr-client/commit/c65bf52

Comment entered 2018-05-18 12:49:38 by Kline, Bob (NIH/NCI) [C]

... many orders of magnitude faster ...

By that I mean between four and five orders of magnitude faster. What took 183 seconds using Just Systems' JavaScript interpreter (counting the 10,575 non-markup characters in the liver cancer prevention patient summary) took 0.005 seconds for my C++ implementation, or about 36,600 times faster. Reassuring to know that both implementations consistently came up with the same answers.

I don't think I'll repeat the benchmark comparison using CDR62855. :-)

Comment entered 2018-06-14 10:10:10 by Osei-Poku, William (NIH/NCI) [C]

It looks like spaces are counted by default. Is it possible to have the option to count or not count spaces?

Comment entered 2018-06-14 10:15:00 by Kline, Bob (NIH/NCI) [C]

That would be unusual. The standard approach to counting characters includes all characters.

Comment entered 2018-06-14 10:22:04 by Osei-Poku, William (NIH/NCI) [C]

I have been spoiled by MS Word 😃. MS Word counts both with spaces and without spaces. However, if that can not be implemented in XMetal, that is fine. As long as users know that spaces are counted, that should be fine.

Comment entered 2018-07-09 21:43:35 by Osei-Poku, William (NIH/NCI) [C]

Verified on DEV.

Comment entered 2018-07-19 10:45:16 by Osei-Poku, William (NIH/NCI) [C]

Verified on QA. Thanks!

Comment entered 2018-09-05 09:31:22 by Osei-Poku, William (NIH/NCI) [C]

Verified on PROD. Thanks!

Elapsed: 0:00:00.001873