After battling with transcendental functions, the next step was to create a small demo. It's a console application that provides an interactive RPL calculator. Simplistic by design, this demo is not intended to show how the final user interface will look or feel. It's a barebones application that displays a minimal stack and a prompt. Depending on how successful this demo is among users, perhaps a GUI version will be developed later. A pre-compiled version of this demo will be made available for the most popular operating systems, to encourage testing. This means the first alpha release is getting closer and closer.
Transcendental functions are definitely not trivial to implement. The one that's been holding me back for the last week was arctanh() or tanh-1(). The CORDIC algorithm can easily calculate it, but there's a catch: it only converges for arguments within a certain range.
Turns out that arctanh(0.8) is at the limit of convergence, but the nature of this function makes it only useful in the approximate range -1.1 < x < 1.1 (therefore -0.8 < tanh(x) < 0.8). The problem with tanh() is that it maps the entire real numbers space into the range -1,1, and becomes quickly asymptotic to 1 as x grows. For example:
tanh(1100) = 1 - 7.1e-956
This is 0.999999... (about 956 nines) and then some useful digits. Since this is well within the 2000 digits target of our system, it was imperative that we use an algorithm capable of producing accurate results even in those extreme circumstances.
After reading through several papers in the subject, none of the solutions could be applied to our case (the only one that could reduce the arguments into the proper range of convergence simply didn't work for unknown reasons, perhaps I misunderstood the paper). In the end I found a paper with an interesting technique to increase the range of convergence: adding more CORDIC iterations at the beginning, with negative exponents. The paper was for binary CORDIC, not decimal so it wouldn't work, and also it required storing more constants for those negative exponents, and we have too many constants already.
Inspired by that idea I tried (without much hope I admit) repeating the first iteration of the algorithm (with the same exponent, to avoid storing any additional constants), and to my surprise the method converged, and the range was increased to 0.89 (from 0.8). Without any mathematical proof, I repeated the same iteration multiple times and it still converged, with the range increased with each repetition.
One additional iteration would approximately give one extra digit (one of those nines in the .99999...), but after about 40 digits it started needing more iterations. Again, with zero mathematical rigor I simply doubled the number of iterations per digit, and it worked! Perhaps with proper mathematical proof I'd be able to know exactly how many extra iterations are needed per digit and make it faster, but I don't think it's needed in this case. Fact is that we have now an algorithm capable of producing 2000 digits accurate arctanh, throughout the range, and at most only takes 3 times the original number of iterations (hence triple the time it takes to compute). Since this extra time is only added "as-needed" based on how close to 1.0 is the given argument, it will only affect those bad corner cases, and most important: produces correct results every time.
Maybe somebody someday will prove why this method converged and why it gives good accuracy. Perhaps I should contact the authors of the original paper with this range extending method to let them know of this finding and let them prove why it works. For now the project must go on...
Many transcendental functions are already implemented with up to 2000 digits precision. Due to rounding errors, it works internally with 9 digits more than the requested precision. For most functions, only the last 2 or 3 digits are incorrect, and using 9 extra digits produces correctly rounded results for all the test cases.
Both normal and hyperbolic functions are implemented, though the final range reduction for some functions (ln, log, atanh) are still pending.
The total amount of space required for the lookup tables of precomputed constants is almost 1 MByte. This can be reduced by half if we are willing to sacrifice speed for some special cases. Specifically, the CORDIC method utilized requires additional tables in order to be able to handle very small numbers at the same speed as it does for large numbers. The additional 2 tables could only be compressed to about 250 kbytes each, almost doubling the space requirements.
The target hardware has 2 MBytes of flash, so at the moment ROM space is not considered critical, as the core code is relatively small.
Now regarding speed, on the PC the exponential function provided by the MPDecimal library takes about 1500 msec to compute exp(x) with 2007 digits precision. At the same precision, the new algorithm produces the same result in less than 500 msec, so we have achieved a speedup of about 3x. Also, the new algorithm does not require temporary memory storage, while the more generic function needs to allocate up to 32 kbytes of RAM just to compute exp() or ln(). Since RAM is very limited on our target, this is one of the main issues with the generic implementation.
The implementation that comes with the library only has exp() and ln(), but no trigonometric functions or hyperbolics. In order to implement a hyperbolic function by its exponential definition cosh(x)=1/2*(exp(x)+exp(-x)) for example, requires 2 exponentials at 1500 msec each, making it really slow at 3 seconds per operation (keep in mind these timings are measure on a PC, it will be orders of magnitude slower on the calculator). The new algorithm computes those functions directly, therefore it takes the same 500 msec to produce any of them.
The main advantage of the default implementation is that it can compute those transcendental functions without lookup tables and on unlimited arbitrary precision. The new algorithm requires tables, therefore even though the precision remains arbitrary, it is capped to a maximum being the precision used to compute the constants. Since our target will not be able to manage larger numbers due to both RAM limitations and processing speed, the 2000 digit limit seems reasonable.
It will take a couple more weeks to finalize all transcendental functions, but it's definitely on track. After that a small console demo will be developed and published that runs on the PC, it will be the initial source code release and we might release also binaries of the demo for easy testing.
NewRPL breezed through its first job.
It generated 3 tables with 1008 numbers each at 2016 digits precision. To compute each number, there was about 3000 iterations of a series expansion with real multiplications, divisions and powers. All being executed with 2016 digits precision. All this number crunching was done in only 5 seconds per table on a laptop (that's about 9 million floating point operations at 2016 digits in 5 seconds). Of course, all this was on a PC, not on the real hardware.
This job couldn't be done on a 50g due to the size of the tables. Each table takes 880 kbytes, all 3 are 2.4MBytes. How will the calculator handle these tables? A new compression algorithm was written from scratch. The algorithm compressed each table to a size between 84 and 88 kbytes, for an estimated compression factor of 10:1. Not bad, considering the algorithm needs to be able to extract each individual number from the data and decompress it on the fly, as fast as possible. This is unlike normal compression algorithms that need to decompress the entire "blob" to access data within.
The algorithm was inspired on LZ4 (but ended up quite different), and on the PC it was able to decompress all 1008 numbers on a table, 1000 times in less than 3 seconds. In terms of throughput, this is equivalent to about 900 MBytes/second.
No we are ready to implement the decimal CORDIC method.... but that's a different story.