The script removes all sequences belonging to reference species from the alignments. Reference sequences are identified using the 1KITE1 sequence header format. Thus, the script can be used on any alignment that follows this header format.
- Python 3.6.4 or newer
- Required Python modules:
Bio
,os
,shutil
The script has to be in the same directory as the files from which the reference sequences are to be removed.
Note that input files have to have the ending .fas[ta]
.
It will automatically create a subdirectory named no_refseqs/
, in which the modified files will be saved.
[1] Misof, B., Liu, S., Meusemann, K. et al. Phylogenomics resolves the timing and pattern of insect evolution. Science 346, 6210 (2014). https://doi.org/10.1126/science.1257570