Are audio DeepFake detection models polyglots?

Overview

This repository contains the official implementation of the paper "Are audio DeepFake detection models polyglots?", authored by Bartłomiej Marek, Piotr Kawa, and Piotr Syga.

In this study, we introduce a benchmark for advancing audio DF detection in multilingual settings and empirically explore three essential questions in this area. Specifically, we aim to check the extent to which detection efficacy varies by language, whether English benchmark-trained models are sufficient for effective cross-linguistic detection, and which targeted strategies best support DF detection in specific languages, precisely intra- or cross-lingual adaptations, \textbf{even assuming access to very limited non-English resources}.

The paper is available here

The codebase supports training, fine-tuning, and evaluating multilingual audio DeepFake detection models:

W2V+AASIST
Whisper+AASIST
LFCC+AASIST
LFCC+MesoNet
RawGAT-ST

Available languages include:

- Germanic: en (English), de (German),  
- Romance: fr (French), es (Spanish), it (Italian),  
- Slavic: pl (Polish), ru (Russian), uk (Ukrainian)

Datasets

For the experiments we use: ASVspoof2019LA and MLAAD.

MLAAD dataset has the following structure

MLAAD/
README.md (36-42)
├── bonafide
│   ├── de_DE
│   │   └── by_book
│   │       ├── female
│   │       │   └── angela_merkel
│   │       │   │    └── merkel_alone
│   │       |   │        └── wavs
│   │       |   ├── eva_k
│   │       │   │    ├── grune_haus
│   │       |   │    |    └── wavs
...
|   ├── en_UK
│   │   └── by_book
│   │       └── female
│   │           └── elizabeth_klett
│   │               ├── jane_eyre
│   │               │   └── wavs
│   │               └── wives_and_daughters
│   │                   └── wavs
....

├── fake
│   ├── ar
│   │   └── tts_models_multilingual_multi-dataset_xtts_v1.1
│   ├── bg
│   │   └── tts_models_bg_cv_vits
│   ├── cs
│   │   └── tts_models_cs_cv_vits

Repository Structure

├── configs/                # Example configuration files for training
├── scripts/                # Shell scripts for running experiments
├── src/                    # Core Python scripts for training, fine-tuning, and evaluation
├── test.py                 # Main evaluation script
├── train.py                # Main training script
├── finetune.py             # Main fine-tuning script
├── requirements.txt        # Required dependencies
├── .env                    # Define constant values 

---

## Usage

### Training
Train models from scratch using `train.sh` or the Python script directly.

#### Using `train.sh`
```bash
bash scripts/train.sh

Using `train.py`

python train.py \
  --config ${config_dir}/${cf} \
  --model_out_dir ${out_dir}

Fine-Tuning

Use fine-tuning.sh or the Python script directly for fine-tuning pre-trained models on specific languages.

Using `fine-tuning.sh`

bash scripts/fine-tuning.sh

Using `finetune.py`

python finetune.py \
  --config ${folder}/configuration.yaml \
  --model_out_dir ${model_out_dir} \
  --ft_languages ${ft_lang} \
  --lr ${lr}\
  --weight_decay ${weight_decay}

Evaluation

Run the evaluation pipeline using test.sh or the Python script directly.

Using `test.sh`

Edit test.sh to specify the languages for evaluation and execute:

bash scripts/test.sh

Using `test.py`

Run the Python script directly with specified parameters:

python test.py \
  --config ${folder}/configuration.yaml \
  --output_file ${output_file} \
  --eval_languages ${lang}

Results

Results from the evaluation will be saved to the specified output_file. Use these outputs to analyze model performance across different languages. The example of processing of results is shown in notebooks\post_processing.ipynb.

Citation

If you use this code in your research, please cite:

@misc{marek2024audiodeepfakedetectionmodels,
      title={Are audio DeepFake detection models polyglots?}, 
      author={Bartłomiej Marek and Piotr Kawa and Piotr Syga},
      year={2024},
      eprint={2412.17924},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2412.17924}, 
}

Contact

For questions or issues, please open an issue in the repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Are audio DeepFake detection models polyglots?

Overview

Datasets

Repository Structure

Using `train.py`

Fine-Tuning

Using `fine-tuning.sh`

Using `finetune.py`

Evaluation

Using `test.sh`

Using `test.py`

Results

Citation

Contact

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
configs		configs
configuration		configuration
df_logger		df_logger
notebooks		notebooks
src		src
utils		utils
.env		.env
.gitignore		.gitignore
README.md		README.md
finetune.py		finetune.py
install_whisper.py		install_whisper.py
mlaad_training.sh		mlaad_training.sh
parser.py		parser.py
requirements.txt		requirements.txt
test.py		test.py
test.sh		test.sh
train.py		train.py
train.sh		train.sh

bartlomiejmarek/are_audio_df_polyglots

Folders and files

Latest commit

History

Repository files navigation

Are audio DeepFake detection models polyglots?

Overview

Datasets

Repository Structure

Using train.py

Fine-Tuning

Using fine-tuning.sh

Using finetune.py

Evaluation

Using test.sh

Using test.py

Results

Citation

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Using `train.py`

Using `fine-tuning.sh`

Using `finetune.py`

Using `test.sh`

Using `test.py`

Packages