Skip to content

Latest commit

 

History

History
35 lines (18 loc) · 1.24 KB

README.md

File metadata and controls

35 lines (18 loc) · 1.24 KB

Cyber Threat Intelligence Dataset

This is a dataset repository which is used for model training, evalutaions as a part of our CTI research;

Threat Behavior Textual Search by Attention Graph Isomorphism (Bae et al., EACL 2024)

The dataset consists of pretraining dataset, threat reports per APT groups and a collector tool (which I use for all of this collection, needed to update new reports after our work).

Large-scale Pretraining, Threat Reports Corpus Dataset

  • Textual corpus of threat reports

  • Collected from 8 vendors

Threat Reports, Classified by APT Groups

  • A collection of threat reports by APT groups

  • Our evaluation set is well-filtered, manually-verified set

  • We also provide the copied list from two public websites (Malpedia, ThaiCERT).

Threat Report Collector

  • Our dataset is as of 2022. 06, we will be releasing our collector as a tool (working on... will be uploaded soon)

MISC

  • Copyrights of all dataset belong to original authors or their vendors.

  • Any misuse of attack information is strictly prohibited.

  • Please contact us (Chanwoo Bae, bae68@purdue.edu) for any questions.

  • We kindly request to cite our paper with your use of dataset.