The new decimal library is in place, and the new firmware now feels a lot more responsive when using trig functions, getting an instantaneous response, when before there used to be a slight delay.

In order to measure the speed of the new library, a new benchmark was prepared that uses transcendental functions and compares the speed to the 50g doing the exact same calculation. The same program was executed on a 50g with stock firmware and a 50g with newRPL. The objective of the program is to compute 100 points of a damped oscillatory function. The function in question is A*e^(-w0*c*t) * SIN( sqrt(1-c^2)*w0*t). In both cases the parameters used to run the program were: A=10, c=0.3, w0=1.25. The points are calculated from t=0 to 10 seconds, with a step of 0.1. The resulting userRPL program was:

<< -> A c w0 << 0 10 FOR t A c w0 t * * NEG EXP * 1 c DUP * - [SQRT] w0 * t * SIN * 0.1 STEP >> >>

And it was executed by a wrapper that gave the parameters:

<< 10. 0.3 1.25 OSC >>

Here are the results of the benchmark:

TEST CASE TIME [sec]
Stock firmware (12 digits) 3.507
newRPL (12 digits) 0.794 / 0.369
newRPL (32 digits - DEFAULT) 1.041/ 0.613
newRPL (120 digits) 3.466 / 3.033
newRPL (200 digits) 6.847 / 6.423
newRPL (1000 digits) 97.846
newRPL (2000 digits) 335.348

 

newRPL has 2 results for each test. The left one is the time "as the user perceives it", including the first 300 ms running in slow mode, then switching into full speed. The second number is "raw throughput", measures after the CPU is running at full speed, and gives a good idea of the computing power.

This benchmark executes about 700 multiplications, and 300 transcendental functions (100 SIN, 100 EXP and 100 SQRT) in that time. At comparable precision, newRPL is about 9.5x faster than the stock firmware. At around 120 digits of precision, newRPL finishes the task in about the same time as the stock firmware.

Unofficially, the same program was executed on an older version of the firmware still using the mpdecimal library. At the default precision of 36 digits, execution took about 8 seconds but can't be reported for 2 reasons: the ROM didn't have proper TICKS support therefore couldn't measure time accurately, and because it produced incorrect results, so it was disqualified from the comparison (although the number of operations was the same (despite having a bug somewhere), so times were indeed comparable).

After quite some time in researching algorithms and coding a new arbitrary precision library, it's getting close to completion.

The new library uses radix 1e8 instead of 1e9 to store and process the long numbers, which on one side increases memory usage by a 9/8 factor (this is a bad thing), but on the other side it allows for some speed improvements that are not possible with the 1e9 radix that was being used.

Basically, 1e9 can go only twice (and fraction) into a signed 32-bit integer. This forces carry correction after every operation, while 1e8 fits more than 20 times in a signed integer. This allows for example to perform several additions without worrying about carry or overflow.

The new library uses multiply-high methods to perform all divisions, which doesn't have any major impact on the PC because of its hardware division capability, but it is expected to have a major impact on ARM, where divisions were done in software. Also, some routines were optimized to process 3 long-digits (words containing 8-digits packs) at once, taking advantage of the triple pipeline on ARM.

Overall, there's no benchmarks done on ARM yet, but on the PC the new library computes a Taylor series sin() expansion 20% faster than mpdecimal working with the same number of digits, not bad considering it has to do 12.5% more multiplications, additions, etc. The speedup on ARM is supposed to be even higher.

But 20% speedup is not good enough reason to justify a new library from scratch. The new 1e8 radix allows an optimization that should increase transcendental functions performance by a larger factor. The typical decimal CORDIC algorithm requires 20 additions per decimal digit. An optimized algorith was used that requires only 8, but one of the down sides is that requires the multiplication of a long number by a small constant (2 and 5). This can be accomplished by using more than one addition (going back to the 20 additions), and in each case required slow carry propagations when using the higher radix. Now it can be done in a single operation, with the exact same speed cost of an addition. Since additions need to align the digits of one of the parameters, there's an implicit shift operation in all additions, which is performed through multiply-high methods. The small constant was included in the magic numbers, so that secondary multiplication is done at virtually zero-cost.

 

The down side of all this is the delay of the next demo until the new library completely replaces mpdecimal. But progress doesn't stop.

 

 

 

Yesterday, newRPL had its first day at work. I put a 50g loaded with the latest firmware (very close candidate to become Alpha demo 5), and headed out to the office for some real world number-crunching. I'm pleased with the overall feel of the keyboard and the quick stack operation. The first demo has very limited UI, merely the stack is functional and most of the keys in the keyboard. While there are no working soft-menus yet, all commands can be typed in Alpha mode if they are not available directly on the keyboard.

The OFF functionality is not implemented yet, so the calculator stayed ON for over four hours.

As expected, the fast mode almost never kicked in during simple number crunching, which helps conserve batteries as the calculator remains running at only 6 MHz.

The speed of transcendental functions was somewhat disappointing, more optimization is needed in that area.

Overall, using newRPL was just as pleasant as using the 50g, and more important, it got the job done without any crashes. I had to pull a battery to finally put it to rest at the end of the day, and that's a good sign. A few more tweaks here and there and the new demo will be ready to be released to the public.

 

The next demo is being delayed a few weeks due to adding Unicode support to newRPL. In order to properly exchange information, this was a necessary step. All strings in newRPL, including variable names will be UTF-8 encoded. This addition means several things for the project:

  • Users will be able to open and edit RPL source code with any UTF-8 compatible text editor (which nowadays means almost any of them), and pass it directly to newRPL without any conversion, as long as the proper Unicode characters are used. On the other hand, programs written on the calculator can be edited on a different device and even shared on the web with all symbols looking (and working) properly.
  • The user interface will no longer be limited to a character set with 256 symbols. More symbols will be available for use. For example, a variable named A₁ will be a valid name, where the 1 is the subscript number 1 (Unicode 0x2018).
  • Due to screen and font limitations, not all Unicode symbols will be available, but fonts can add more symbols over time, as the project matures.

The only negative consequence is the delay in the release of the next demo.