Modular inverses using Newton iteration

Let $ R$ be a commutative ring with identity element. Given $ f \in R[x]$ and $ {\ell} \in {\mbox{${\mathbb{N}}$}}$ such that $ f(0) = 1$ compute the polynomials $ g \in R[x]$ such that

$\displaystyle f \, g \ {\equiv} \ 1 \mod{ \ x^{\ell} } \ \ {\rm and} \ \ {\deg} \, g < {\ell}.$ (8)

In Proposition 3 we observe that if there is a solution, then it is unique.

Proposition 3   If Equation (8) has a solution $ g \in R[x]$ with degree less than $ {\ell}$ then it is unique.


Proof. Indeed, let $ g_1$ and $ g_2$ be solutions of Equation (8). Then the product $ f (g_1 - g_2)$ is a multiple of $ x^{\ell}$ . Since $ f(0) = 1$ then $ g_1(0) - g_2(0)$ must be 0 . Hence there is a constant $ c \in R$ and polynomials $ h_1, h_2$ with degree less than $ {\ell} - 1$ such that

$\displaystyle g_1(x) \ = \ h_1(x) \, x + c \ \ {\rm and} \ \ g_2(x) \ = \ h_2(x) \, x + c$ (9)

It follows that $ f (h_1 - h_2)$ is a multiple of $ x^{{\ell} -1}$ . By repeating the same argument we show that $ h_1(0) = h_2(0)$ . Then by induction on $ {\ell}$ we obtain $ g_1 = g_2$ . $ \qedsymbol$


Remark 3   Since Equation (8) is an equation in $ R[x]/\langle x^{\ell} \rangle$ , a solution of this equation can be viewed as an approximation of a more general problem. Think of truncated Taylor expansions! So let us recall from numerical analysis the celebrated Newton iteration and let $ {\phi}(g) = 0$ be an equation that we want to solve, where $ {\phi}: {\mbox{${\mathbb{R}}$}} \longmapsto {\mbox{${\mathbb{R}}$}} $ is a differentiable function. From a suitable initial approximation $ g_0$ , the sequence, called Newton iteration step,

$\displaystyle g_{i+1} \ = \ g_i - \frac{{\phi}(g_i)}{{\phi}'(g_{i})}$ (10)

allows to compute subsequent approximations and converge toward a desired solution. In our case we have $ {\phi}(g) = 1/g - f$ and the Newton iteration step is

$\displaystyle g_{i+1} \ = \ g_i - \frac{ 1/g_i - f}{ - 1/{g_i}^2} \ = \ 2 g_i - f \, {g_i}^2.$ (11)


Theorem 1   Let $ R$ be a commutative ring with identity element. Let $ f$ be a polynomial in $ R[x]$ such that $ f(0) = 1$ . Let $ g_0, g_1, g_2, \ldots$ be the sequence of polynomials defined for all $ i \geq 0$ by

$\displaystyle \left\{ \begin{array}{rcl} g_0 & = & 1 \\ g_{i+1} & \equiv & 2 g_i - f \, {g_i}^2 \ \mod{\ x^{2^{i+1}}} \end{array} \right.$ (12)

Then for $ i \geq 0$ we have

$\displaystyle f \, g_i \ \equiv \ 1 \ \mod{ \ x^{2^i}}$ (13)


Proof. By induction on $ i \geq 0$ . For $ i = 0$ we have $ x^{2^i} = x$ and thus

$\displaystyle f \, g_i \ \equiv \ f(0) \, g_0 \ \equiv \ 1 \, \times \, 1 \ \equiv \ 1 \ \mod{ \ x^{2^i}}$ (14)

For the induction step we have

\begin{displaymath}\begin{array}{rcll} 1 - f \, g_{i+1} & \equiv & 1 - f (2 g_i ...
...2^{i+1}}} \\ & \equiv & 0 & \mod{ \ x^{2^{i+1}}} \\ \end{array}\end{displaymath} (15)

Indeed $ f \, g_i \ \equiv \ 1 \ \mod{ \ x^{2^i}}$ means that $ x^{2^i}$ divides $ 1 - f \, g_i$ . Thus $ x^{2^{i+1}} = x^{2^i + 2^i} = x^{2^i} \, x^{2^i}$ divides $ (1 - f \, g_i)^2$ . $ \qedsymbol$

Algorithm 2  

\fbox{
\begin{minipage}{10 cm}
\begin{description}
\item[{\bf Input:}] $f \in R[...
... \mod{ \ x^{2^i}} $\ \\
\> {\bf return} $g_r$\ \\
\end{tabbing}\end{minipage}}


Definition 2   A multiplication time is a function $ {\ensuremath{\mathsf{M}}}: {\mbox{${\mathbb{N}}$}} \longrightarrow {\mbox{${\mathbb{R}}$}}$ such that for any commutative ring $ R$ with a $ 1$ , for every $ n \in {\mbox{${\mathbb{N}}$}}$ , any pair of polynomials in $ R[x]$ of degree less than $ n$ can be multiplied in at most $ {\ensuremath{\mathsf{M}}}(n)$ operations of $ R$ . In addition, $ {\ensuremath{\mathsf{M}}}$ must satisfy $ {\ensuremath{\mathsf{M}}}(n) / n \geq {\ensuremath{\mathsf{M}}}(m) / m$ , for every $ m,n \in {\mbox{${\mathbb{N}}$}}$ , with $ n \geq m$ . This implies the superlinearity properties, that is, for every $ m,n \in {\mbox{${\mathbb{N}}$}}$

$\displaystyle {\ensuremath{\mathsf{M}}}(n m) \geq m {\ensuremath{\mathsf{M}}}(n...
...suremath{\mathsf{M}}}(n) \ \ {\rm and} \ \ {\ensuremath{\mathsf{M}}}(n) \geq n.$ (16)

Example 1   Examples of multiplication times are: Note that the FFT-based multiplication in degree $ d$ over a ring that supports the FFT (that is, possessing primitive $ n$ -th root of unity, where $ n$ is a power of $ 2$ greater than $ 2d$ ) can run in $ C\, d\log(d)$ operations in $ R$ , with some $ C\geq 18$ .


Theorem 2   Algorithm 2 computes the inverse of $ f$ modulo $ x^{\ell}$ in $ 3 \ensuremath{\mathsf{M}}(\ell) + 0(\ell)$ operations in $ R$ .


Proof. Theorem 1 tell us that Algorithm 2 computes the inverse of $ f$ modulo $ x^{2^r}$ . Since $ x^{\ell}$ divides $ x^{2^r}$ , the result is also valid modulo $ x^{\ell}$ . Before proving the complexity result, we point out the following relation for $ i = 1 \cdots r$ .

$\displaystyle g_i \ \equiv \ g_{i-1} \ \mod{ \ x^{2^{i-1}}}$ (17)

Indeed, by virtue of Theorem 1 we have

\begin{displaymath}\begin{array}{rcll} g_i & \equiv & 2 g_{i-1} - f \, {g_{i-1}}...
...}}} \\ & \equiv & g_{i-1} & \mod{ \ x^{2^{i-1}}} \\ \end{array}\end{displaymath} (18)

Therefore when computing $ g_i$ we only care about powers of $ x$ in the range $ x^{2^{i-1}} \cdots x^{2^i}$ . This says that Now recall that

$\displaystyle \frac{1}{2} + \frac{1}{4} + \frac{1}{8} + \cdots \ = \ 1$ (19)

So roughly the cost of the algorithm is in the order of magnitude of the cost of the last iteration. which consists of leading to $ 2 {\ensuremath{\mathsf{M}}}(2^r) + O(2^r)$ operations in $ R$ . But this was not a formal proof, although the principle was correct. Let us give a more formal proof.

The cost for the $ i$ -th iteration is

Thus we have $ \ensuremath{\mathsf{M}}(2^i) + \ensuremath{\mathsf{M}}(2^{i-1}) + 2^{i-1} \le \frac{3}{2} \, \ensuremath{\mathsf{M}}(2^i) + 2^{i-1}$ , resulting in a total running time:

$\displaystyle \sum_{1\le{i}\le{r}}{\frac{3}{2} \, \ensuremath{\mathsf{M}}(2^i) ...
...\, \ensuremath{\mathsf{M}}(2^r)+2^r} = {3\,\ensuremath{\mathsf{M}}(\ell) +\ell}$ (20)

since $ 2\ensuremath{\mathsf{M}}(n) \le \ensuremath{\mathsf{M}}(2n)$ for all $ n \in$   $ \mbox{${\mathbb{N}}$}$ $ \qedsymbol$


Remark 4   Once again in Algorithm 2 for $ i = 1 \cdots r$ we have

$\displaystyle g_i \ \equiv \ g_{i-1} \ \mod{ \ x^{2^{i-1}}}$ (21)

So when implementing Algorithm 2 one should be extremely careful in not recomputing the low terms of $ g_i$ that come from $ g_{i-1}$ .


Remark 5   Algorithm 2 can be adapted to the case where $ f(0)$ is a unit different from $ 1$ by initializing $ g_0$ to the inverse of $ f(0)$ instead of $ 1$ . If $ f(0)$ is not a unit, then no inverse of $ f$ modulo $ x^{\ell}$ exists. Indeed $ f \, g \equiv 1 \mod{ \ x^{\ell}}$ implies $ f(0) g(0) = 1$ which says that $ f(0)$ is a unit.


Remark 6   Let us take a closer look at the computation of

$\displaystyle g_{i-1} (2 - f \, g_{i-1}) \ \mod{ \ x^{2^{i}}}$ (22)

in Algorithm 2. Consider first the product $ f \, g_{i-1}$ . It satisfies:

$\displaystyle f \, g_{i-1} \ \equiv \ 1 \ \mod{ \ x^{2^{i-1}}}$ (23)

Moreover, the polynomials $ f$ and $ g_{i-1}$ can be seen as polynomials with degrees less than $ 2^{i}$ and $ 2^{i-1}$ respectively. Hence, there exist polynomials $ S, T \in R[x]$ with degree less than $ 2^{i-1}$ such that we have:

$\displaystyle f \, g_{i-1} = 1 + T x^{2^{i-1}} + S x^{2^{i}}.$ (24)

We are only interested in computing $ T$ . In order to avoid computing $ S$ , let us observe that we have

$\displaystyle f \, g_{i-1} \ \equiv \ (1 + S) + T x^{2^{i-1}} \mod{ \ x^{2^{i}} - 1}.$ (25)

In other words, the upper part (that is, the terms of degree at least $ 2^{i-1}$ ) of the convolution product of $ f \, g_{i-1}$ gives us exactly $ T$ .

So let us assume from now on that we have at hand a primitive $ 2^i$ -th root of unity, such that we can compute DFT's. Therefore, we can compute $ T$ at the cost of one multiplication in degree less than $ 2^{i-1}$ .

Consider now that we have computed $ 2 - f \, g_{i-1} \mod{ \ x^{2^{i}}}$ . Viewing $ 2 - f \, g_{i-1}$ and $ g_{i-1}$ as polynomials with degrees less than $ 2^{i}$ and $ 2^{i-1}$ respectively, there exist polynomials $ U, V, W \in R[x]$ with degree less than $ 2^{i-1}$ such that

$\displaystyle g_{i-1} (2 - f \, g_{i-1}) = U + V x^{2^{i-1}} + W x^{2^{i}}$ (26)

We know that $ g_{i-1} \ \equiv \ U \ \mod{ \ x^{2^{i-1}}}$ . Hence, we are only interested in computing $ V$ . Similarly to the above, we observe that

$\displaystyle g_{i-1} (2 - f \, g_{i-1}) \ \equiv \ (U + W) + V x^{2^{i-1}} \mod{ \ x^{2^{i}} - 1}$ (27)

Therefore, using DFT, we can compute $ V$ at the cost of one multiplication in degree less than $ 2^{i-1}$ .

It follows that, in the complexity analysis above (in the proof of Theorem 2) we can replace $ \ensuremath{\mathsf{M}}(2^i) + \ensuremath{\mathsf{M}}(2^{i-1})$ by $ \ensuremath{\mathsf{M}}(2^{i-1}) + \ensuremath{\mathsf{M}}(2^{i-1})$ leading to $ 2\,\ensuremath{\mathsf{M}}(\ell) + O({\ell})$ instead of $ 3\,\ensuremath{\mathsf{M}}(\ell) + O({\ell})$ .

Marc Moreno Maza
2008-01-07