-
Notifications
You must be signed in to change notification settings - Fork 2
dft api
The DFT api supports DFT/iDFT mod
Here, DFT means either real/complex FFT, or NTT modulo a fixed modulus.
Many FHE and other post-quantum lattice constructions can be expressed solely on those operations.
Mathematically, even-though DFT, iDFT and addmul are sufficient to reach the best asymptotic complexity, from an engineering point of view, such constructions may result in a tremendous amounts of tiny library calls, and also to bottlenecks in memory bandwidth. If this happens in your project, you may consider using the higher-level arithmetic layers.
negacyclic real FFT is powered by the equivalent complex FFT modulo double
, and two complex layouts are supported:
- the
cplx
layout (also known as "adjacent" layout), where real and imaginary parts are always together (e.g.typedef double[2] cplx
). The complexes coefficient and evaluations arrays are in both casescplx[m]
, or equivalentlydouble [m][2]
. - the
reim
layout (also known as "split" layout), where the$m$ real parts are grouped together in adouble[m]
, followed by the$m$ imaginary parts in anotherdouble[m]
array. The evaluation space always consists in$m$ complex numbers (i.e.double evals[2][m]
). For the coefficient space,reim
layout offers the advantage that there is an bit-level equivalence between an array of$m$ complexe coefficients modulo$X^m-i$ (i.e.double complex_coeffs[2][m]
) and an array of$N$ real coefficients modulo$X^N+1$ (i.e.double real_coeffs[N]
). - In both layouts, coefficients are presented in natural order, and the evaluations are in mirror index order.
/** apply fft in place */
void reim_fft_simple(uint32_t m, void* data);
/** apply ifft in place (the result is not rescaled: m times larger) */
void reim_ifft_simple(uint32_t m, void* data);
/** element-wise multiplication r=ab in FFT space */
void reim_fftvec_mul_simple(uint32_t m, void* r, const void* a, const void* b);
/** element-wise addmul r+=ab in FFT space */
void reim_fftvec_addmul_simple(uint32_t m, void* r, const void* a, const void* b);
/** converts r=a from ZnX 64bits.
log2bound is static upper-bound on log2(||a||): <=50 is efficient */
void reim_from_znx64_simple(uint32_t m, uint32_t log2bound,
void* r, const int64_t* a);
/** converts r=a/divisor to ZnX 64bits.
log2bound is static upper-bound on log2(||a/dividor||): <=50 is efficient */
void reim_to_znx64_simple(uint32_t m, double divisor, uint32_t log2bound,
int64_t* r, const void* a);
/** converts r=a from ZnX 32bits.
log2bound is static upper-bound on log2(||a||): <= 31 is efficient */
void reim_from_znx32_simple(uint32_t m, uint32_t log2bound,
void* r, const int32_t* x);
/** converts r=a/divisor to ZnX 32bits.
log2bound is static upper-bound on log2(||a/dividor||): <=31 is efficient */
void reim_to_znx32_simple(uint32_t m, double divisor, uint32_t log2bound,
int64_t* r, const void* a);
/** converts r=a centermod 1 from TnX 32bits. */
void reim_from_tnx32_simple(uint32_t m, void* r, const int32_t* a);
/** converts r=a/divisor centermod 1 to TnX 32bits.
log2overhead is static upper-bound on log2(||a/dividor||): <= 19 is efficient */
void reim_to_tnx32_simple(uint32_t m, double divisor, uint32_t log2overhead,
int32_t* r, const void* a);
/** converts r=a/divisor centermod 1 to TnX doubles.
log2overhead is static upper-bound on log2(||a/dividor||): <= 18 is efficient */
void reim_to_tnx_simple(uint32_t m, double divisor, uint32_t log2overhead,
double* r, const double* a);
Negacyclic NTT is supported modulo a fixed modulus Q120 = Q1.Q2.Q3.Q4
where
$\begin{array}{l} Q_1=2^{30}-2.2^{17}+1\ Q_2=2^{30}-17.2^{17}+1\ Q_3=2^{30}-23.2^{17}+1 \ Q_4=2^{30}-42.2^{17}+1 \end{array}$
Most of the API calls are presented in layout q120b
, where each number modulo x
Q120 is represented non-uniquely by four uint64_t
: (x1,x2,x3,x4)
s.t. xi mod Qi = x mod Qi
.
// API is still experimental, currently supports only X86_64 AVX2
q120_ntt_precomp* q120_new_ntt_bb_precomp(const uint64_t n);
void q120_del_ntt_bb_precomp(q120_ntt_precomp* precomp);
q120_ntt_precomp* q120_new_intt_bb_precomp(const uint64_t n);
void q120_del_intt_bb_precomp(q120_ntt_precomp* precomp);
/** @brief computes a direct ntt in-place over data. */
void q120_ntt_bb_avx2(const q120_ntt_precomp* const precomp,
q120b* data);
/** @brief computes an inverse ntt in-place over data. */
void q120_intt_bb_avx2(const q120_ntt_precomp* const precomp,
q120b* data);