Skip to content

FFT Processors Layouts

Nicolas Gama edited this page Jul 12, 2023 · 11 revisions

Layouts

FFT processor layouts (cplx and reim)

FFT and iFFT algorithms is carried-out in place on one of the FFT processors layouts: cplx or reim. The iFFT algorithm is non normalized, so if we run FFT then iFFT, one obtains $m$ times the original vector.

The cplx and reim layouts are both used to represent vectors of complex numbers; in the cplx layout, each complex element is a contiguous pair of floats (float64[2]), whereas in the reim layout, we have one contiguous array of with all the real parts followed by one contiguous array with all the imaginary parts.

Given a complex polynomial $$P(X)=c_0+c_1.X+...+c_{m-1}X^{m-1} \mod X^m-i$$

The complex coefficients of P are always presented in natural order $$[c_0, \dots, c_{m-1}]$$

The fft vector of $m$ evaluations of $P$ are presented in fracrevbit order: if the first element is $P(\omega)$ on the base root $\omega$, the evaluations order is:

$$[P(\omega), P(\omega+\exp(2i\pi.\textrm{frb}_{1}), \dots, P(\omega+\exp(2i\pi.\textrm{frb}_{m-1}) ]$$

where $(\textrm{frb}_k)_{k\in\mathbb{N}}$ is the fracrevbit sequence $(0,1/2,1/4,3/4,\dots)$. More generally, if the bits of $k$ are $k=\sum_{p\geq 0} e_i.2^i$, then $\textrm{frb}_k=\sum_{p\geq 0} e_i.2^{-i}$.

By extension, if $Q(X)$ is a real polynomial modulo $X^N+1$ where $N=2m$: $$Q(X)=a_0+a_1.X+...+a_{N-1}X^{N-1} \mod X^N+1$$ $Q$ is (uniquely and isomorphically) identified to the complex polynomial $P(X) = Q(X) \mod X^m-i$ of half degree, whose coefficients are $c_k = a_k + i. a_{k+m}$. The FFT on $P$ and $Q$ are identical and correspond to the vector of evaluations on the same list of roots of unity. The iFFT recovers the $m$ coefficients of $P$, and from there, we deduce the $N$ coefficients of $Q$ by extracting the real and imaginary parts.

cplx layout (adjacent layout)

The cplx layout is the classical representation for vector of complex numbers: one complex element is represented by the cplx = float64[2] pair (real,imag). A vector of m complex elements has layout cplx[m].

For a complex polynomial, the float64 coefficients are presented in order: $$[Re(c_0), Im(c_0), \dots, Re(c_{m-1}),Im(c_{m-1})].$$

For a real polynomial, the float64 coefficients are presented in order: $$[a_0, a_m, a_1, a_{m+1},\dots, a_{m-1},a_{2m-1}].$$

Evaluations are in fracrevbits order described above, each complex evaluation comes as a (real,imag) pair.

reim layout (split layout)

In the reim layout, a vector of $m$ complex numbers is split into two different float64 vectors of $m$ elements, one for all the real parts, and then one for all the imaginary parts.

When the reim layout is used to represent the coefficients of P, they are given in order $$[Re(c_0), \dots, Re(c_{m-1}),Im(c_0), \dots, Im(c_{m-1})].$$

When the reim layout is used to represent the coefficients of a real polynomial P, they are given in the "friendly" order $$[a_0, \dots, a_{N-1}].$$

As usual, FFT Evaluations are presented in the same fracrevbits order described above, the m real parts first, followed by the m imaginary parts.

Coefficient space layouts

In addition to the two FFT processors layouts, the library supports a few coefficients space layouts, that are convertible to and from cplx/reim.

Conversion between coefficient spaces and FFT-processor space are in general out-of-place. Backwards conversion from reim/cplx to a coefficient space contains an optional division (to renormalize the result after an iFFT) and a rounding (if the result is meant to be integer).

Real-valued coefficients rnx

The rnx layout represents the coefficients of a real polynomial mod X^N+1 in natural order as an array of N double-precision floats. This layout is in fact bitwise identical to the reim layout, except that rnx is usually parametrized by N instead of m=N/2.

Integer coefficients znx32 and znx64

The znx32 and znx64 layout represents the coefficients of an integer polynomial mod X^N+1 in natural order as an array of N signed integers (of resp. 32 or 64 bits).

Forward conversions to reim/cplx are exact from znx32 and retains at most 52 bits of precision for znx64.

Backwards conversion require a static guarantee on the output interval: the programmer must guarantee that after the optional division and rounding, the result fits in an int32 if the destination is znx32, and the programmer must provide an explicit bound if the destination is znx64.

Torus coefficients tnx32 and tnx64

The tnx32 and tnx64 layouts represents the coefficients of an polynomial mod X^N+1 modulo 1 in natural order as an array of N signed integers (of resp. 32 or 64 bits). For tnx32, the a value $u$ represents $u/2^{32} \mod 1$, and For tnx64, the a value $u$ represents $u/2^{64} \mod 1$.

Forward conversions to reim/cplx are exact from tnx32 and retains at most 52 bits of precision for tnx64.

Backwards conversion require a static guarantee by the programmer on the overhead (i.e. bound after the optional division): overhead <= 18 bits are efficient, an overhead of 20 bits can retain at most 32 bits of fractional precision, on the other hand, an overhead of 52 bits would lose all the bits and be completely useless.