Skip to content

dft api

Nicolas Gama edited this page Jun 2, 2024 · 2 revisions

SLQLios DFT api

The DFT api supports DFT/iDFT mod $X^N+1$ where $N$ is a power of two, as well as coefficient-wise add-mul in DFT space.

Here, DFT means either real/complex FFT, or NTT modulo a fixed modulus.

Many FHE and other post-quantum lattice constructions can be expressed solely on those operations.

Mathematically, even-though DFT, iDFT and addmul are sufficient to reach the best asymptotic complexity, from an engineering point of view, such constructions may result in a tremendous amounts of tiny library calls, and also to bottlenecks in memory bandwidth. If this happens in your project, you may consider using the higher-level arithmetic layers.

Summary of the DFT API's main module

FFT over real vectors modulo $X^N+1$

negacyclic real FFT is powered by the equivalent complex FFT modulo $X^m-i$ where $m=n/2$. Each real coefficient is represented by a double, and two complex layouts are supported:

  • the cplx layout (also known as "adjacent" layout), where real and imaginary parts are always together (e.g. typedef double[2] cplx). The complexes coefficient and evaluations arrays are in both cases cplx[m], or equivalently double [m][2].
  • the reim layout (also known as "split" layout), where the $m$ real parts are grouped together in a double[m], followed by the $m$ imaginary parts in another double[m] array. The evaluation space always consists in $m$ complex numbers (i.e.double evals[2][m]). For the coefficient space, reim layout offers the advantage that there is an bit-level equivalence between an array of $m$ complexe coefficients modulo $X^m-i$ (i.e. double complex_coeffs[2][m]) and an array of $N$ real coefficients modulo $X^N+1$ (i.e. double real_coeffs[N]).
  • In both layouts, coefficients are presented in natural order, and the evaluations are in mirror index order.

main FFT/iFFT and admul API

/** apply fft in place */
void reim_fft_simple(uint32_t m, void* data);
/** apply ifft in place (the result is not rescaled: m times larger) */
void reim_ifft_simple(uint32_t m, void* data);

/** element-wise multiplication r=ab in FFT space */
void reim_fftvec_mul_simple(uint32_t m, void* r, const void* a, const void* b);
/** element-wise addmul r+=ab in FFT space */
void reim_fftvec_addmul_simple(uint32_t m, void* r, const void* a, const void* b);

fast conversions to/from integers coefficients

/** converts r=a from ZnX 64bits. 
    log2bound is static upper-bound on log2(||a||): <=50 is efficient */
void reim_from_znx64_simple(uint32_t m, uint32_t log2bound, 
                            void* r, const int64_t* a);
/** converts r=a/divisor to ZnX 64bits.
    log2bound is static upper-bound on log2(||a/dividor||): <=50 is efficient */
void reim_to_znx64_simple(uint32_t m, double divisor, uint32_t log2bound,
                          int64_t* r, const void* a);


/** converts r=a from ZnX 32bits. 
    log2bound is static upper-bound on log2(||a||): <= 31 is efficient */
void reim_from_znx32_simple(uint32_t m, uint32_t log2bound, 
                                   void* r, const int32_t* x);
/** converts r=a/divisor to ZnX 32bits.
    log2bound is static upper-bound on log2(||a/dividor||): <=31 is efficient */
void reim_to_znx32_simple(uint32_t m, double divisor, uint32_t log2bound,
                                 int64_t* r, const void* a);

fast conversions to/from torus coefficients

/** converts r=a centermod 1 from TnX 32bits. */
void reim_from_tnx32_simple(uint32_t m, void* r, const int32_t* a);
/** converts r=a/divisor centermod 1 to TnX 32bits. 
    log2overhead is static upper-bound on log2(||a/dividor||): <= 19 is efficient */
void reim_to_tnx32_simple(uint32_t m, double divisor, uint32_t log2overhead,
                          int32_t* r, const void* a);

/** converts r=a/divisor centermod 1 to TnX doubles. 
    log2overhead is static upper-bound on log2(||a/dividor||): <= 18 is efficient */
void reim_to_tnx_simple(uint32_t m, double divisor, uint32_t log2overhead, 
                        double* r, const double* a);

NTT modulo Q120

Negacyclic NTT is supported modulo a fixed modulus Q120 = Q1.Q2.Q3.Q4 where

$\begin{array}{l} Q_1=2^{30}-2.2^{17}+1\ Q_2=2^{30}-17.2^{17}+1\ Q_3=2^{30}-23.2^{17}+1 \ Q_4=2^{30}-42.2^{17}+1 \end{array}$

Most of the API calls are presented in layout q120b, where each number modulo x Q120 is represented non-uniquely by four uint64_t: (x1,x2,x3,x4) s.t. xi mod Qi = x mod Qi.

// API is still experimental, currently supports only X86_64 AVX2

q120_ntt_precomp* q120_new_ntt_bb_precomp(const uint64_t n);
void q120_del_ntt_bb_precomp(q120_ntt_precomp* precomp);

q120_ntt_precomp* q120_new_intt_bb_precomp(const uint64_t n);
void q120_del_intt_bb_precomp(q120_ntt_precomp* precomp);

/** @brief computes a direct ntt in-place over data. */
void q120_ntt_bb_avx2(const q120_ntt_precomp* const precomp, 
                      q120b* data);

/** @brief computes an inverse ntt in-place over data. */
void q120_intt_bb_avx2(const q120_ntt_precomp* const precomp, 
                       q120b* data);

NTT modulo Q104 (TBD)