|
KEY-STRING DIVERGENCE METHOD |
||||
|
Author: Matko Gluncic (matko@phy.hr)
|
||||
|
If you use KEY-STRING DIVERGENCE METHOD as a tool in your published research, we ask that the
following reference be cited: Rosandic, M., Paar, V., Gluncic, M., Basar, I., Pavin, N. |
||||
|
KSA-divergence method enables a simple visual identification of HORs in a given genomic sequence. The method is organized in the following manner. First step. A given genomic sequence is segmented into KSA subsequences with a key-string algorithm that segments the sequence into alpha monomers. Second step. Key-string subsequences from the first step are transformed, in order of appearance, into modified KSA subsequences as follows: - If the length of a subsequence is between 168 and 173 bp, the subsequence is left unchanged. - If the length of a subsequence is larger than 173, the first 171 basepairs in subsequence are retained, while the rest of basepairs is deleted. - If the length of a subsequence is less than 168 basepairs, it is deleted in the case if the following subsequence, in the order of appearance, is longer than 167 bp. For example, if a 79-bp subsequence is followed by a 170-bp subsequence, then the whole 79-bp subsequence is deleted. - If the length of a subsequence is less than 168 basepairs and if the next subsequence is shorter than 168 bp, both subsequences are fused into a single subsequence. After that, only the first 171 basepairs of the fused subsequence are retained, while the rest is deleted. For example, if a 79-bp subsequence is followed by a 150-bp subsequence, they are fused into one subsequence of 79 bp + 150 bp = 229 bp (79 basepairs of the 79-bp subsequence followed by 150 basepairs of the 150-bp subsequence). In this fused 229-bp subsequence only the first 171 basepairs are retained, while the rest is deleted. In this way, a given genomic sequence is transformed into an array of approximately 171-bp subsequences, referred to as KSA divergence-array. Third step. Now we compute divergence between any two subsequences in the divergence-array. Fourth step. Now we display divergence graphically using N divergence diagrams defined as follows. In the first diagram we display divergence of the subsequence 1 with respect to all the other subsequences (2,3,4,..). In that case we refer to 1 as the referent subsequence. The horizontal axis displays the enumerator k of a subsequence in the array (2,3,4,..) and vertical axis displays divergence with respect to the referent subsequence (1) . In the second diagram we display divergence of the subsequence 2 with respect to all the other subsequences (1,3,4,...); and so on. In this diagrams we visually identify a domain with periodic pattern. Then we select as a new referent sequence arbitrarily the one from this periodic domain and computationally construct the corresponding divergence diagram. This diagram exhibits strongly pronounced periodic minima (at divergence less than 1 %) at every nth position in the periodic domain, identifying the HORs. |
||||
|