Interesting Segment Identification

Here is an example to use SEGID.
Start SEGID from the webpage by clicking the button "start SEGID", a window will pop up:

 Now, input a multiple sequence alignment to get it work. Click button 'Input', then click button 'Input Alignment' in the pop-up dialog. Or, choose menu 'input'>'input Alignment'. Now you can an input dialog. Input the alignment in the text area. You may
     1. Type directly in the textarea, (Edit operations are the same as common text editors.) OR 
     2. Copy & Paste from an existing file. (If you have a file containing the alignment, load the file in any
        text editor, and copy the content of file. Eg. open the file with Notepad, choose menu 'edit'>
       'select all', then choose menu 'edit'>'copy' to copy the alignment data you want to input. Now, return
        to SEGID and move focus to the textarea in input dialog by clicking in the textarea. Press Ctrl+V
       to paste alignment. Or, if under Unix, click the middle button of mouse.)

Choose 'protein' or 'DNA' according to your data, also choose the appropriate data format.

Finally, click 'submit' to submit the alignment.

The alignment format SEGID recognizes includes FASTA, CLUSTAL, GCG-MSF, and Stockholm. For each format, an example alignment is provided. It can be loaded into input textarea by clicking button 'load example' in the input dialog and then choosing corresponding format. For example, following is a multiple sequence alignment of Clustal format including 9 sequences. 

CLUSTAL W (1.81) multiple sequence alignment

CARP            CCAGGACGACTAAATCAAGCCGCCTTTATTGCCTCACGCCCAGGGGTCTTTTACGGACAA
LOACH           CCAGGACGCCTTAACCAAACCGCCTTTATTGCCTCCCGCCCCGGGGTATTCTATGGGCAA
CHICKEN         CCTGGACGACTAAATCAAACCTCCTTCATCACCACTCGACCAGGAGTGTTTTACGGACAA
COW             CCAGGCCGTCTAAACCAAACAACCCTTATATCGTCCCGTCCAGGCTTATATTACGGTCAA
WHALE           CCAGGACGCCTAAACCAAACAACCTTAATATCAACACGACCAGGCCTATTTTATGGACAA
SEAL            CCAGGACGACTAAACCAAACAACCCTAATAACCATACGACCAGGACTGTACTACGGTCAA
MOUSE           CCAGGCCGACTAAATCAAGCAACAGTAACATCAAACCGACCAGGGTTATTCTATGGCCAA
RAT             CCCGGCCGCCTAAACCAAGCTACAGTCACATCAAACCGACCAGGTCTATTCTATGGCCAA
HUMAN           CCCGGACGTCTAAACCAAACCACTTTCACCGCTACACGACCGGGGGTATACTACGGTCAA
                ** ** ** ** ** *** *  *  * *   *    ** ** **  * *  ** ** ***

CARP            TGCTCTGAAATTTGTGGAGCTAATCACAGCTTTATACCAATTGTAGTTGAAGCAGTACCT
LOACH           TGCTCAGAAATCTGTGGAGCAAACCACAGCTTTATACCCATCGTAGTAGAAGCGGTCCCA
CHICKEN         TGCTCAGAAATCTGCGGAGCTAACCACAGCTACATACCCATTGTAGTAGAGTCTACCCCC
COW             TGCTCAGAAATTTGCGGGTCAAACCACAGTTTCATACCCATTGTCCTTGAGTTAGTCCCA
WHALE           TGCTCAGAGATCTGCGGCTCAAACCACAGTTTCATACCAATTGTCCTAGAACTAGTACCC
SEAL            TGCTCAGAAATCTGTGGTTCAAACCACAGCTTCATACCTATTGTCCTCGAATTGGTCCCA
MOUSE           TGCTCTGAAATTTGTGGATCTAACCATAGCTTTATGCCCATTGTCCTAGAAATGGTTCCA
RAT             TGCTCTGAAATTTGCGGCTCAAATCACAGCTTCATACCCATTGTACTAGAAATAGTGCCT
HUMAN           TGCTCTGAAATCTGTGGAGCAAACCACAGTTTCATGCCCATCGTCCTAGAATTAATTCCC
                ***** ** ** ** **  * ** ** ** *  ** ** ** **  * **       ** 

CARP            CTCGAACACTTCGAAAAC---------------------TGATCCTCATTAATACTAGAA
LOACH           CTATCTCACTTCGAAAAC---------------------TGGTCCACCCTTATACTAAAA
CHICKEN         CTAAAACACTTTGAAGCC---------------------TGATCCTCACTA---------
COW             CTAAAGTACTTTGAAAAA---------------------TGATCTGCGTCAATATTA---
WHALE           CTAGAAGTCTTTGAAAAA---------------------TGATCTGTATCAATACTA---
SEAL            CTATCCCACTTCGAGAAA---------------------TGATCTACCTCAATGCTT---
MOUSE           CTAAAATATTTCGAAAAC---------------------TGATCTGCTTCAATAATT---
RAT             CTAAAATATTTCGAAAAC---------------------TGATCAGCTTCTATAATT---
HUMAN           CTAAAAATCTTTGAAATA---------------------GGGCCCGTATTTACCCTATAG
                **       ** **                          *  *                

CARP            GACGCCTCGCTAGGAAGCTAA
LOACH           GACGCCTCACTAGGAAGCTAA
CHICKEN         ---------CTGTCATCTTAA
COW             ------------------TAA
WHALE           ------------------TAA
SEAL            ------------------TAA
MOUSE           ------------------TAA
RAT             ------------------TAA
HUMAN           ---------------------

You can load it by clicking button 'load example' in the input dialog, and then choose 'Clustal' in pop-up dialog.

SEGID reads the alignment, calculates a score for every column with chosen scoring scheme (default scoring is SP-score and IDENTITY matrix. Users can specify scoring method and matrix via 'set scoring scheme'.) Then the alignment is displayed, and conserved segments (high score substrings) are identified. Three algorithms for identifying conserved segments are provided.

By default, all maximal length segments with average value and length lower bound are colored with pink in the alignment, among which columns of particularly poor scores (below the threshold set by user) are colored with lightgray, and good columns are colored with magenta. Users can also switch to other algorithms of interesting by clicking button 'choose algorithm' or choosing menu 'view'>'algorithm'. Accurate positions of all these segments are output at the bottom of the window. Users can locate a segment in the alignment by clicking on the segment position data.

For more information about how to use this software, please refer to "help" menu or button in SEGID.

Home