ExDDV: A New Dataset for Explainable Deepfake Detection in Video

This repository contains the necessary code to run the experiments in the paper "ExDDV: A New Dataset for Explainable Deepfake Detection in Video". If you use this dataset or code in your research, please cite the corresponding paper:

Vlad Hondru, Eduard Hogea, Darian Onchis, Radu Tudor Ionescu. ExDDV: A New Dataset for Explainable Deepfake Detection in Video. arXiv preprint arXiv:2503.14421 (2025).

Bibtex:

@article{hondru2025exddv,
  title={ExDDV: A New Dataset for Explainable Deepfake Detection in Video},
  author={Hondru, Vlad and Hogea, Eduard and Onchis, Darian and Ionescu, Radu Tudor},
  journal={arXiv preprint arXiv:2503.14421},
  year={2025}
}

Dataset

The csv file containing the annotations can be found in this repository. Use the columns movie_name and dataset to determine the input movie name.

Fine-Tuning for Deepfake Detection

There are three different folders inside the src folder containing the three models we used: LAVIS (BLIP-2), LLaVA and Phi3-Vision-Finetune. These are forked from the following repositories: LAVIS, LLaVA and Phi3-Vision-Finetune.

How to run

Each model is trained as per its corresponding repository. We used the respective training scripts: BLIP-2, LLaVA and Phi3-Vision. The only modification needed is to replaced the paths to the datasets in the corresponding files.

!Note for Phi3-Vision: The file processing_phi3_v.py from HuggingFace Transformers must be replaced with the script from here.

In-Context Learning for Deepfake Detection

The current implementation uses LLaVA 1.5, BLIP-2 and PHI3-Vision.

Note:
The code for each model is provided in separate Jupyter notebooks:

LLAVA: llava_incontext.ipynb
PHI3-Vision: phi_incontext.ipynb
BLIP-2: blip_incontext.ipynb

Overview

Extracting visual embeddings: from training and test video frames using a pre-trained vision model.
Constructing in-context prompts: by retrieving the top-k most similar training annotations.
Analyzing deepfake artifacts: in test frames using a language-vision model (e.g., LLAVA).
Optionally applying spatial masks: to focus the analysis on specific facial regions.

Pipeline Components

Data Preparation and Normalization

Dataset names (e.g., "FaceForensics++").
Locates movie files based on dataset and manipulation type.
Extracts training and test frames from a CSV file.

Embedding Extraction

Uses a vision model (e.g., CLIP with RN101) with support for different extraction layers ("first", "middle", "last") to compute image embeddings.

In-Context Prompt Construction

Generates custom prompts from training annotations using multiple prompt templates (different version will influence the level of detail in the response).

Deepfake Analysis

Loads the model and evaluates test frames using generated prompts.
Applies optional hard or soft masks around keypoints.

Evaluation and Results

Compares test frame embeddings with training embeddings using cosine similarity.
Retrieves top-k training examples to form a contextual prompt.
Saves detailed results to CSV files.

Getting Started

Requirements

Requirements will differ based on the vision model used. We have followed the indications from the original paper. For LLAVA, for example, use the following steps:

Clone the LLAVA repository and navigate to the LLaVA folder:

git clone https://github.com/haotian-liu/LLaVA.git
cd LLaVA

Install the package in a Conda environment:

conda create -n llava python=3.10 -y
conda activate llava
pip install --upgrade pip  # enable PEP 660 support
pip install -e .

For other vision models (e.g., BLIP-2, PHI3-Vision), refer to their official installation instructions.

Data Preparation

Dataset CSV:
Ensure your dataset CSV (e.g., dataset_last.csv) includes columns such as movie_name, dataset, manipulation, movie_path, click_locations, and text. The CSV should be split into training, validation(not used here), and test sets.
Video Files:
Place your video files in the data/ folder. The directory structure should follow the dataset names (e.g., data/Faceforensics++/).

Results and Evaluation

The pipeline outputs CSV files containing:

Test video information (file path and frame number).
Ground truth annotations.
Contextual prompts from top-k training annotations.
Cosine similarity scores.
Deepfake analysis results.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
ExDDV.csv		ExDDV.csv
README.md		README.md
blip_incontext.ipynb		blip_incontext.ipynb
clip_incontext.ipynb		clip_incontext.ipynb
clip_incontext_final.ipynb		clip_incontext_final.ipynb
database.py		database.py
database_mysql.py		database_mysql.py
llava_incontext.ipynb		llava_incontext.ipynb
main.py		main.py
phi_incontext.ipynb		phi_incontext.ipynb
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ExDDV: A New Dataset for Explainable Deepfake Detection in Video

Dataset

Fine-Tuning for Deepfake Detection

How to run

In-Context Learning for Deepfake Detection

Overview

Pipeline Components

Data Preparation and Normalization

Embedding Extraction

In-Context Prompt Construction

Deepfake Analysis

Evaluation and Results

Getting Started

Requirements

Data Preparation

Results and Evaluation

About

Releases

Packages

Languages

hhngdcz/ExDDV

Folders and files

Latest commit

History

Repository files navigation

ExDDV: A New Dataset for Explainable Deepfake Detection in Video

Dataset

Fine-Tuning for Deepfake Detection

How to run

In-Context Learning for Deepfake Detection

Overview

Pipeline Components

Data Preparation and Normalization

Embedding Extraction

In-Context Prompt Construction

Deepfake Analysis

Evaluation and Results

Getting Started

Requirements

Data Preparation

Results and Evaluation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages