An Efficient Hybrid Framework for Privacy-preserving Machine Learning Using HE and TEE
HT2ML is a C++-based framework for privacy-preserving machine learning (PPML) based on Homomorphic Encryption (HE) and Intel SGX. To accelerate the HE-based computations, HT2ML selectively outsources HE-unfriendly computations to the SGX enclave while preseving the integrity and privacy of the computation.
This repository is the source code for the paper which is accepted for publication in Journal of Computer & Security, 2023 (paper link).
HT2ML runs with three dependencies:
- HElib: an open-source software library that implements two HE schemes, focusing mostly on effective use of the Smart-Vercauteren ciphertext packing techniques and the Gentry-Halevi-Smart optimizations:
- The implementations of the Brakerski-Gentry-Vaikuntanathan (BGV) scheme with bootstrapping.
- The Approximate Number scheme of Cheon-Kim-Kim-Song (CKKS).
- Open Enclave SDK: a hardware-agnostic open source library for developing applications that utilize Hardware-based Trusted Execution Environments (a.k.a, Enclaves). Open Enclave (OE) is an SDK for building enclave applications in C and C++. An enclave application partitions itself into two components: untrusted component (host) and trusted component (enclave).
- HEMat: a software package for performing a secure outsourced matrix computation using HE. HEMat is implemented based on the HE library HEAAN. It is described in more detail in the CCS2018 paper. Note: to support both integer-based and rational number-based HE matrix computations, HT2ML re-implements HEMat with HElib based on BGV and CKKS schemes.
In this implementation, HT2ML provides two examples: linear regression (LR) training and convolutional neural network (CNN) inference.
Linear Regression
- HT2ML vs HE only baseline
To explore the performance gains of processing integer-based task that HT2ML achieves, we compare HT2ML with HE only baseline. HE only baseline is an implementation of linear regression with the protection of only HE. We re-implement Wu et al's work SI-HE with the latest version of HElib and set it as HE only baseline.
CNN Inference
- HT2ML vs HE only baseline (E2DM)
- HT2ML vs Oblivious baseline
In terms of CNN inference, we perform rational number-based computations using CKKS and compare HT2ML with three baselines. HE only baseline is adopted from the CCS2018 (called E2DM). We implement E2DM with HElib according to the algorithms designed in the paper. Oblivious baseline is implemented with the usage of Oblivious Primitives inside the enclave. HCNN is the most recent and similar work that utilizing HE and SGX to protect the evaluation of CNN.
The following instructions will create an environment for HT2ML. Note that HT2ML has only been tested on Ubuntu 18.04, so we recommend that you install everything on Ubuntu 18.04.
OE provides several options such as Ubuntu 18.04 or 20.04 with SGX hardware or simulation mode. You can check which SGX level your machine support before installing OE. After confirming SGX support levels, install OE according to the corresponding instructions. Note that, Open Enclave SDK is no longer tested on Ubuntu 18.04 since the released version v0.19.0. Therefore, it would be better to download previous versions (HT2ML is tested on v0.16.0).
Note that HT2ML is tested on Ubuntu 18.04 with SGX1+FLC mode. In SGX1+FLC mode, the Open Enclave SDK takes advantage of the Flexible Launch Control mode for better managing architectural enclaves.
HT2ML performs HE-friendly computations (e.g., matrix/vector multiplications) in the host while performing HE-unfriednly computations (e.g., calculate non-linear functions or refresh the HE ciphertexts) inside the enclave. Therefore, we need to build the required libraries NTL and GMP against GLIBC (host) and MUSL (enclave) C library, respectively.
Build the NTL and GMP against GLIBC (i.e., in the host)
Take the GMP as an example:
# Download GMP
lzip -d gmp-6.1.2.tar.lz;
tar -xvf gmp-6.1.2.tar;
# Build GMP
mkdir host_gmp_install;
cd gmp-6.1.2 || exit 1;
make clean;
./configure CC=clang CFLAGS="-g -Og" --prefix=$SCRIPT_DIR/host_gmp_install;
make -j16;
make install;
cp $SCRIPT_DIR/host_gmp_install/lib/libgmp.a $SCRIPT_DIR/../host/;
cp $SCRIPT_DIR/host_gmp_install/include/gmp.h $SCRIPT_DIR/../host/;
Build the NTL and GMP against MUSL (i.e., in the enclave)
To build GMP, NTL and HElib against MUSL, we resort to build them in MUSL-based Alpine Linux using Microsoft Package Manager apkman. apkman
is a self-contained bash script which helps us install the packages in Alpine Linux.
Install apkman
chmod +x apkman
Build GMP using apkman
mkdir enclave_gmp_install;
cd gmp-6.1.2 || exit 1;
apkman exec make clean;
apkman exec ./configure CC=clang CFLAGS="-g -Og" --prefix=$SCRIPT_DIR/enclave_gmp_install;
apkman exec make -j16;
apkman exec make install;
cp $SCRIPT_DIR/enclave_gmp_install/lib/libgmp.a $SCRIPT_DIR/../enclave/;
cp $SCRIPT_DIR/enclave_gmp_install/include/gmp.h $SCRIPT_DIR/../enclave/;
We provide the script,
, for installing all the libraries in the host and enclave.
Configure apkman
To configure the apkman
, we add the following commands in the CMakeLists.txt
of each directory:
# Create an imported executable for use by tests.
add_executable(apkman IMPORTED)
apkman PROPERTIES IMPORTED_LOCATION ${PROJECT_SOURCE_DIR}/../../../../alpineapkman/apkman)
execute_process(COMMAND ${PROJECT_SOURCE_DIR}/../../../../alpineapkman/apkman help
# Fetch apkman root folder for accessing includes and libraries.
COMMAND ${PROJECT_SOURCE_DIR}/../../../../alpineapkman/apkman root COMMAND_ERROR_IS_FATAL
# Ensure that apkman is initialized.
apkman-init # Setup so that packages can be built from source.
COMMAND apkman add build-base clang gdb cmake git
COMMAND apkman --wrap-ld)
Note that you may have to change the path of apkman
in the configuration according to your own local installation path of apkman
Take the linear regression HETEE_LR
as an exmpale:
# Source the openenclaverc file
. /opt/openenclave/share/openenclave/openenclaverc
mkdir build && cd build
cmake ..
make run
This open source project is for proof of concept purposes only and should not be used in production environments.