Transcendental functions are definitely not trivial to implement. The one that's been holding me back for the last week was arctanh() or tanh-1(). The CORDIC algorithm can easily calculate it, but there's a catch: it only converges for arguments within a certain range.

Turns out that arctanh(0.8) is at the limit of convergence, but the nature of this function makes it only useful in the approximate range -1.1 < x < 1.1 (therefore -0.8 < tanh(x) < 0.8). The problem with tanh() is that it maps the entire real numbers space into the range -1,1, and becomes quickly asymptotic to 1 as x grows. For example:

tanh(1100) = 1 - 7.1e-956

This is 0.999999... (about 956 nines) and then some useful digits. Since this is well within the 2000 digits target of our system, it was imperative that we use an algorithm capable of producing accurate results even in those extreme circumstances.

After reading through several papers in the subject, none of the solutions could be applied to our case (the only one that could reduce the arguments into the proper range of convergence simply didn't work for unknown reasons, perhaps I misunderstood the paper). In the end I found a paper with an interesting technique to increase the range of convergence: adding more CORDIC iterations at the beginning, with negative exponents. The paper was for binary CORDIC, not decimal so it wouldn't work, and also it required storing more constants for those negative exponents, and we have too many constants already.

Inspired by that idea I tried (without much hope I admit) repeating the first iteration of the algorithm (with the same exponent, to avoid storing any additional constants), and to my surprise the method converged, and the range was increased to 0.89 (from 0.8). Without any mathematical proof, I repeated the same iteration multiple times and it still converged, with the range increased with each repetition.

One additional iteration would approximately give one extra digit (one of those nines in the .99999...), but after about 40 digits it started needing more iterations. Again, with zero mathematical rigor I simply doubled the number of iterations per digit, and it worked! Perhaps with proper mathematical proof I'd be able to know exactly how many extra iterations are needed per digit and make it faster, but I don't think it's needed in this case. Fact is that we have now an algorithm capable of producing 2000 digits accurate arctanh, throughout the range, and at most only takes 3 times the original number of iterations (hence triple the time it takes to compute). Since this extra time is only added "as-needed" based on how close to 1.0 is the given argument, it will only affect those bad corner cases, and most important: produces correct results every time.

Maybe somebody someday will prove why this method converged and why it gives good accuracy. Perhaps I should contact the authors of the original paper with this range extending method to let them know of this finding and let them prove why it works. For now the project must go on...