The PAT database has organized information derived from the reported literature of experimentally validated antimicrobial toxins, as well as corresponding immunity proteins, delivery mechanisms, structural characteristics, sequences, etc. PAT also has predicted potential antimicrobial toxin information in prokaryotic genomes and showed the taxonomic source and environmental distribution of typical antimicrobial toxins. Putative antibacterial toxins are exhibited in "EXPANSION" to distinguish them from experimentally validated antibacterial toxins.
Sequence information of reported antimicrobial toxin proteins, immunity proteins and secretion-related markers was collected from literatures. The secretion-related markers include trafficking domains, repeat domains, pre-toxins, or conserved motifs. Trafficking domains include, for example, VgrG, PAAR, LXG, DUF4157, Trp-Xaa-Gly (WXG), SpvB, TANFOR, Phage_Mu_F, and FhaB. Repeat domains include Haemagg_act, RHS repeat, etc. Pre-toxins are those proteins containing PT-HINT, PT-TG, PT-VENN, or the like. Conserved motifs are like Mix and Fix. All prokaryotic reference genomes were scanned by RPS-BLAST, which screened genes encoding both the antimicrobial toxin domain and the secretion-related marker domain (expected value threshold 0.01). Based on these predicted antimicrobial toxin genes, the taxonomic sources of typical antimicrobial toxins were counted and visualized.
The EMP was founded in 2010 to sample the Earth’s microbial communities at an unprecedented scale to advance our understanding of the organizing biogeographic principles that govern microbial community structure on Earth. A total of 262,011 OTUs and their abundance and nucleic acid sequence information were collected from the 10,000 samples released by EMP using Deblur software. Chimera filtering relied on the EMP project. The EMPO classified 17 microbial environments (level 3) as free-living or host-associated (level 1) and saline or non-saline (if free-living) or animal or plant (if host-associated) (level 2).
The NCBI reference sequence (RefSeq) database is a curated non-redundant collection of sequences representing whole or frame genomes. We obtained all 217,614 bacterial or archaeal genomes collected by the database. In addition, the taxonomic classification information of the OTU sequences was obtained from the NCBI Taxonomy database. Alignments between the EMP OTUs and prokaryotic genomes were performed using BLASTn, and the corresponding relationship was determined with a 16S rRNA-V4 region identity greater than 97% as the standard. Based on the analysis of 10,000 EMP samples, the occurrence frequency of each toxin family in EMP samples within one environment was calculated according to the standards (>97% 16S rRNA identities), thus obtaining the occurrence frequency of a toxin family in various global environments. Statistics were performed according to the level of 17 microbial environments (EMPO_3).
Zhang Z, Wang J, Wang J, et al. Estimate of the sequenced proportion of the global prokaryotic genome. Microbiome, 2020, 8: 134.
Zhang Z, Liu Y, Zhang P, et al. PAAR proteins are versatile clips that enrich the antimicrobial weapon arsenals of prokaryotes. Msystems, 2021, 6(6): e00953-21.
Liu Y, Zhang Z, Wang F, et al. Identification of type VI secretion system toxic effectors using adaptors as markers. Computational and structural biotechnology journal, 2020, 18: 3723-3733.
Liu Y, Wang J, Zhang Z, et al. Two PAAR proteins with different C-terminal extended domains have distinct ecological functions in Myxococcus xanthus. Applied and environmental microbiology, 2021, 87(9): e00080-21.
Gong Y, Zhang Z, Liu Y, et al. A nuclease‐toxin and immunity system for kin discrimination in Myxococcus xanthus. Environmental microbiology, 2018, 20(7): 2552-2567.