Home 
ThioFinder Tutorial for identification of a thiopeptide gene cluster in a DNA sequence
link 1. Input: Example I, paste a plain nucleotide sequence (genotype: Type I)
link 2. Input: Example II, upload a genome supercontig (genotype: Type II)
link 3. Input: Example III, select a NCBI-recorded genome sequences (genotype: Type III)
link 4. Input: Example IV, upload a genome sequence and annotation (genotype: Type III)
link 5. Output
link 6. Mutiple sequence alignment with MUSCLE and Jalview
link 7. Primer design with Primer3plus

1. Input I, a plain nucleotide sequence prepared in FASTA format
single nucleotide sequence file is prepared in FASTA format. It begins with a single-line description, followed by lines of sequence data. The description line must begin with a greater-than (">") symbol in the first column. It is recommended that all lines of text be shorter than 80 characters in length. It is suggested that the user download the *.fna or other required genome files in FASTA format from the NCBI at ftp.ncbi.nih.gov/genome/bacteria or specified genome sequencing centres. {wiki}
[Example I] Paste a 35-kb plain nucleotide sequence for nosiheptide biosynthesis link Download
 
example1 
Figure 1. The input single nucleotide sequence prepared in FASTA format.
[TOP]
2. Input: Example II, upload a genome supercontig (genotype: Type II)
[Example II] Upload a 6.6-Mb genome supercontig of Micromonospora sp. ATCC 39149 link Download the .gz file in the size of 1.8 Mb (gzip-compressed)

Note: only the gzip-compressed single FASTA file is acceptable (.gz)
 
example2 
Figure 2. The uploaded single genome sequence file (NZ_GG657738.fas.gz) prepared in FASTA format and compressed by gzip.
[TOP]
3. Input: Example III, select a NCBI-recorded genome sequences (genotype: Type III)
[Example III] Select a NCBI-recorded genome sequence Bacillus cereus ATCC 14579

 
example3 
Figure 3. Select a NCBI-recorded genome sequence Bacillus cereus ATCC 14579.
[TOP]
4. Input: Example IV, upload a genome sequence and annotation (genotype: Type III)
 
example4 
Figure 4. Upload a complete nucleotide sequence (NC_004722.fna.gz) and a annotation file (NC_004722.ptt) for the Bacillus cereus ATCC 14579 genome.
Genome sequence file: FASTA format and gzip-compressed
A single genome sequence file is prepared in FASTA format. It begins with a single-line description, followed by lines of sequence data. The description line must begin with a greater-than (">") symbol in the first column. It is recommended that all lines of text be shorter than 80 characters in length. It is suggested that the user download the *.fna or other required genome files in FASTA format from the NCBI at ftp.ncbi.nih.gov/genome/bacteria or specified genome sequencing centres. {wiki}

Note: only the gzip-compressed single FASTA file is acceptable (.gz)
[Example IV, file 1] The genome sequence file of Bacillus cereus ATCC 14579: NC_004722.fna.gz (1.6Mb)

Genome annotation file: NCBI PTT format
Tabular list of all protein-coding regions (CDS) in the corresponding genome sequence should be prepared in the NCBI PTT format.
The PTT file format is a table of protein features. It is used mainly by NCBI who produce PTT files for all their published genomes found in ftp://ftp.ncbi.nih.gov/genomes/.

[Example IV, file 2] the CDS annotation file of Bacillus cereus ATCC 14579 : NC_004722.ptt
[TOP]
5. Output
 
output 
Figure 5. ThioFinder output with the input of a plain nucleotide sequence shown asExample 1.
[TOP]
6. Mutiple sequence alignment with MUSCLE and Jalview
 
output 
Figure 6. Mutiple sequence alignment for the predicted protein of interests.
[TOP]
7. Primer design with Primer3plus
Primer3Plus can find suitable primer pairs for cloning the entire gene for heterologous expression of primer pairs for small gene internal fragments, useful for screening cosmid libraries using PCR.
 
output 
Figure 7. Primer design with Primer3plus.
[TOP]