The online tool “T6SS-Comp” (Type VI Secretion System Gene Cluster Comparative ananlyses) is able to implement the BLASTP and TBLASTN-based comparative investigation of the bacterial genomic contexts. It gives efficient retrieval visualization for the identified gene clusters by using MultiGeneBlast [Medema et al. Mol Biol Evol, 2013].

For convenience, users could submit sequenced genome or scaffolds/contigs to generate custom databases. Furthermore, syntenic conservation (providing the annotated file with the genomic region of interest as input) and gene re-organization architecture (given the multiple protein fasta file) models have been integrated.
The gene cluster detection using T6SS-Comp
Input [TOP]
(1) You will be asked to input the T6SS locus of interest.
(2) You can either upload your sequences in Multi-FASTA format.

(1) Choose your subject genome(s) by accession number or select from "Genome List".
(2) Accession numbers should be separated by "return" separator.

(3) After selecting subject genome(s), click 'Submit' to submit accession number(s). You can click "[TOP]" and back to 'Submit' easily.

Set parameters and click "Run".

Compare [TOP]
The selected annotated gene cluster carrying a Type VI secretion system (T6SS) has been previously identified in Escherichia coli O157:H7 EDL933 genome, and possessing a high similarity with RS218-derived genomic island 1 (RDI-1) characterized in Escherichia coli K1 RS218 [Zhou Y et al., 2012]. Alignment against the completely sequenced genomes of E. coli 042 and O157:H7 Sakai is applied.
Query genome: E. coli O157:H7 EDL933 (NC_002655)
Subject genomes: E .coli O157:H7 Sakai (NC_002695) and E .coli 042 (NC_017626)
T6SS-Comp uses H -value together with identity criteria which reflects the degree of similarity in terms of the length of match and degree of the identity between each CDS examined and the set of comparator genomes examined (Fukiya et al., 2004). H -value reflected the degree of similarity between the matching query genome sequence and the CDS itself in terms of the length of match and the degree of identity. For each query, the H -value was calculated as follows:
, in whcih i is the level of identity of the region with the highest Bit score as a ratio from 0 to 1, the length of the highest scoring matching sequence, and the query length. Therefore H belonged to the set, .
For each annotated CDS, a threshold value of H () was used to determine whether the corresponding CDS was to be classified as 'conserved' or 'specific'. Taking Escherichia coli O157:H7 EDL933 (Refseq genome accession: NC_002655) as the query genome (reference), all annotated CDS were analysed against the two subject genomes, E. coli 042 (NC_017626) and E. coli O157:H7 Sakai (NC_002695) in turn. If all two H -values for a given Escherichia coli O157:H7 EDL933 CDS < 0.42 as the default threshold, the CDS were considered to be 'specific' with respect to E. coli 042 and E. coli O157:H7 Sakai.
Investigate genomic mosaicism and examine variable regions of Escherichia coli T6SS with T6SS-Comp
Figure 1 Synteny conservation: Genome map of Escherichia coli O157:H7 EDL933 (NC_002655 and from 238349 to 271907 as input) with CDS color-coded based on the hits of comparator E. coli genomes identified as harboring a sequence-conserved homologue; CDS shown in the matching colour are conserved across the two other full-sequenced E. coli genomes, E. coli O157:H7 Sakai (NC_002695) and 042 (NC_017626). E. coli O157:H7 EDL933 specific CDS were identified with an H -value cutoff less than 0.42. The H -value reflects the degree of similarity in terms of the length of match and the degree of identity between the matching query genome sequence and the CDS examined.
Figure 2 Gene re-organization: one T6SS map of Escherichia coli 042 (multiple protein fasta sequences as input) with CDS color-coded based on the hits of comparator genomes identified as harboring a sequence-conserved homologue; CDS shown in the matching colour are conserved across selected full-sequenced Salmonella enterica, Vibrio cholerae, Pseudomonas aeruginosa and Escherichia coli genomes.
CDSeasy, fast gene prediction and functional annotation for bacterial genomes, e.g. from 454 or Solexa de novo assemblies.
Easy to prepare the assembled contig/scaffold sequences from your partially sequenced bacterial genomes:
Step 1 Prepare a plain file containing the assembled contig/scaffold nucleotide sequences in the Multi-FASTA format, like mysequence.fas (3.9 Mb).
Step 2 Use CDSeasy to annotate your sequences.
  Upload your file, myseq.fas, into CDSeasy to generate a GenBank file, like, mysequence_.gbk;
  It takes ~10 minutes for CDSeasy to annotate the 5.3-Mb chromosomal sequence of K. pneumoniae strain HS11286.
Step 3 Upload your sequences as the reference sequence of T6SS-Comp.
  Select the 'Upload sequence' and then click the radio " or Upload a GenBank file containing the nucleotide sequence and annotation";
  Upload the file CDSeasy-output file, mysequence_.gbk;

For partially sequenced bacterial genomes, CDSeasy firstly generates a 'virtual complete genome' ('pseudochromosome') by connecting contig sequence without considering contig order and provides both contig-specific gene coordinates and corresponding pseudochromosome data. CDSeasy outputs include the sequence and annotation files in commonly used formats, such as GenBank and NCBI PTT.

(1) H.Y. Ou, X. He, E.M. Harrison, B.R. Kulasekara, A.B. Thani, A. Kadioglu, S. Lory, J.C. Hinton, M.R. Barer, Z. Deng and K. Rajakumar. (2007) MobilomeFINDER: web-based tools for in silico and experimental discovery of bacterial genomic islands. Nucleic Acids Res., 35, W97-W104. [Abstract]

(2) Medema, M.H., Takano, E. and Breitling, R. (2013) Detecting sequence homology at the gene cluster level with MultiGeneBlast. Mol Biol Evol, 30, 1218-1223. [Abstract]
Feel free to send comments or questions about T6SS-Comp to Hong-Yu Ou at
Laboratory of Molecular Microbiology
School of Life Sciences & Biotechnology
Shanghai Jiao Tong University
1954 Huashan Road
Shanghai 200030 P.R. China
Tel: +86 21 62933765
Fax: +86 21 62932418
Useful links
Bioinformatics Tools/ Databases used by SecReT6 and T6SS-Comp
  • NCBI BLAST, NCBI Basic Local Alignment Search Tool
  • MUSCLE, protein multiple sequence alignment
  • Jalview, a multiple alignment editor written in Java
  • Primer3Plus, pick primers from a DNA sequence
  • CGview, generate circular genome maps

  • ACLAME, A Classification of Genetic Mobile Elements
  • VFDB, Virulence Factor database
  • DEG, Database of essential genes
  • DrugBank, a knowledgebase for drugs, drug actions and drug targets
  • TTD, Therapeutic Target Database
  • ARDB, Antibiotic Resistance Genes Database
  • BacMet, Antibacterial biocide and metal resistance genes database
Other tools or servers of interest
  • WebACT, Database of sequence comparisons between prokaryotic genome sequences
  • Mauve, Multiple Genome Alignment
  • MUMmer, Ultra-fast alignment of large-scale DNA and protein sequences
  • xBASE, Database for comparative bacterial genomics
  • xBASE Annotation Service, quick annotation for unfinished bacterial genome sequences where a similar reference sequence is available
  • CGview , a comparative genomics tool for circular genomes

  • MobilomeFINDER, in silico and experimental identification of bacterial genomic islands
  • IslandViewer, a computational tool that integrates three different genomic island prediction methods
  • SIGI-HMM, Prediction of Genomic Islands in Procaryotic Genomes Using Hidden Markov Model

  • IslandPath, An aid to the characterization of genomics islands
  • Islander, Database of Genomic Islands
  • GIST, Genomic island suite of tools for predicting genomic islands in genomic sequences
  • INTEGRALL, Collection of integron sequences and genetic arrangements
  • HGT-DB, Horizontal Gene Transfer Database (HGT-DB)
  • PAIDB, Pathogenicity island database

  • IS Finder, Reference centre for bacterial insertion sequences

Permitted Sequence File Format
A single genome sequence file: FASTA
A single genome sequence file is prepared in FASTA format. It begins with a single-line description, followed by lines of sequence data. The description line must begin with a ">" symbol in the first column. It is recommended that all lines of text be shorter than 80 characters in length. It is suggested that the user download the *.fna or other required genome files in FASTA format from the NCBI at or specified genome sequencing centres. {wiki}
[Example] The genome sequence file of Escherichia coli O157:H7 EDL933: NC_002655.fna

CDS annotation file: NCBI PTT format
Tabular list of all protein-coding regions (CDS) in the corresponding genome sequence should be prepared in the NCBI PTT format.
The PTT file format is a tabular document of genomic protein features which are found in It has the following infomation:
Line 1 (optional)
Description of sequence to which the features belong
eg. "Escherichia coli O157:H7 EDL933, complete genome"
It is usually equivalent to the DEFINITION line of a GenBank file,
with the length of the sequence appended.
Line 2 (optional)
Number of feature lines in the table
eg. " 5312 proteins"
Line 3 (*required)
Column headers, tab separated
eg. "Location Strand Length PID Gene Synonym Code COG Product"

[Example] the CDS annotation file of Escherichia coli O157:H7 EDL933: NC_002655.ptt
NCBI GenBank format
The GenBank format consists of alternating description lines followed by sequence data. The header of the file contains information describing the sequence, such as its type, shape, length, and source. Features of the genome sequence follow the header, and include protein translations. The DNA sequence is the last element of the file, which ends with (and must include) a double slash.
[Example] the GenBank file of Escherichia coli O157:H7 EDL933: NC_002655.gbk
Last updated on 18 November 2013.