KEY-STRING DIVERGENCE METHOD

 
     Author: Matko Gluncic (matko@phy.hr)
 
 

If you use KEY-STRING DIVERGENCE METHOD as a tool in your published research, we ask that the following reference be cited:

   Rosandic, M., Paar, V., Gluncic, M., Basar, I., Pavin, N.
   Key-string Algorithm - Novel Approach to Computational Analysis of Repetitive Sequences in Human Centromeric DNA.
   Croatian Medical Journal 44(4):386-406,2003

 
 


      KSA-divergence method enables a simple visual identification of HORs in a given genomic sequence. The method is organized in the following manner.

      First step. A given genomic sequence is segmented into KSA subsequences with a key-string algorithm that segments the sequence into alpha monomers.

      Second step. Key-string subsequences from the first step are transformed, in order of appearance, into modified KSA subsequences as follows:

- If the length of a subsequence is between 168 and 173 bp, the subsequence is left unchanged.
- If the length of a subsequence is larger than 173, the first 171 basepairs in subsequence are retained, while the rest of basepairs is deleted.
- If the length of a subsequence is less than 168 basepairs, it is deleted in the case if the following subsequence, in the order of appearance, is longer than 167 bp. For example, if a 79-bp subsequence is followed by a 170-bp subsequence, then the whole 79-bp subsequence is deleted.
- If the length of a subsequence is less than 168 basepairs and if the next subsequence is shorter than 168 bp, both subsequences are fused into a single subsequence. After that, only the first 171 basepairs of the fused subsequence are retained, while the rest is deleted. For example, if a 79-bp subsequence is followed by a 150-bp subsequence, they are fused into one subsequence of 79 bp + 150 bp = 229 bp (79 basepairs of the 79-bp subsequence followed by 150 basepairs of the 150-bp subsequence). In this fused 229-bp subsequence only the first 171 basepairs are retained, while the rest is deleted.
In this way, a given genomic sequence is transformed into an array of approximately 171-bp subsequences, referred to as KSA divergence-array.

      Third step. Now we compute divergence between any two subsequences in the divergence-array.

      Fourth step. Now we display divergence graphically using N divergence diagrams defined as follows. In the first diagram we display divergence of the subsequence 1 with respect to all the other subsequences (2,3,4,..). In that case we refer to 1 as the referent subsequence. The horizontal axis displays the enumerator k of a subsequence in the array (2,3,4,..) and vertical axis displays divergence with respect to the referent subsequence (1) . In the second diagram we display divergence of the subsequence 2 with respect to all the other subsequences (1,3,4,...); and so on. In this diagrams we visually identify a domain with periodic pattern. Then we select as a new referent sequence arbitrarily the one from this periodic domain and computationally construct the corresponding divergence diagram. This diagram exhibits strongly pronounced periodic minima (at divergence less than 1 %) at every nth position in the periodic domain, identifying the HORs.


 
 
 

KSA Web site designed by Matko Gluncic.
Please send any requests or problem reports to the web site manager (matko@phy.hr) .