

Hi: I have the following table and I am interested in calculating mismatch ratio. I am not completely clear how to do this and any help is deeply appreciated.
Length Matches 77 24A0T9T36
71 25^T9^T37
60 25^T9^T26 62 42A19
In length column I have length of the character string. In the second column I have the matches my reference string.
In fist case, where 77 is length, in matches from left to right, first 24 matched my reference string following by a extra character A, a null (does not account to proble) and extra T, 9 matches, extra T and 36 matches. Totally there are 3 mismatches
In case 2, I lost 2 characters (^ = loss of character compared to reference sentence) 
TOMISAGOODBOY T^MISAGOOD^OY (here I lost 2 characters)
= I have 2 mismatches TOMISAGOOODBOOY (here I have 2 extra characters O and O) = I have two mismatches
In case 4: I have 42 matches, extra A and 19 matches = so I have 1 mismatch
How can that mismatch number from matches string. 1. I have to count how many A or T or G or C (believe me only these 4
letters will appear in this, i will not see Z or B or K etc) 2. ^T or ^A or ^G or ^C will also be a mismatch
desired output:
Length Matches mismatches 77 24A0T9T36 3
71
25^T9^T37 2
60 25^T9^T26 2 62 42A19 1 10 6^TTT1 3
thanks Hs.
Hi,
I do not completely follow you, but perhaps you could check out this page: http://code.activestate.com/recipes/576869longestcommonsubsequenceproblemsolver/ Another source of inspiration could be the levenshtein distance. Regards, AlbertJan
Bob Gailer
9196364239
Chapel Hill NC
On Fri, Mar 2, 2012 at 2:11 PM, Hs Hs < [hidden email]> wrote:
> Hi:
> I have the following table and I am interested in calculating mismatch
> ratio. I am not completely clear how to do this and any help is deeply
> appreciated.
>
> Length Matches
> 77 24A0T9T36
> 71 25^T9^T37
> 60 25^T9^T26
> 62 42A19
>
>
> In length column I have length of the character string.
> In the second column I have the matches my reference string.
>
>
> In fist case, where 77 is length, in matches from left to right, first 24
> matched my reference string following by a extra character A, a null (does
> not account to proble) and extra T, 9 matches, extra T and 36 matches.
> Totally there are 3 mismatches
>
> In case 2, I lost 2 characters (^ = loss of character compared to reference
> sentence) 
>
> TOMISAGOODBOY
> T^MISAGOOD^OY (here I lost 2 characters) = I have 2 mismatches
> TOMISAGOOODBOOY (here I have 2 extra characters O and O) = I have two
> mismatches
>
>
> In case 4: I have 42 matches, extra A and 19 matches = so I have 1 mismatch
>
>
> How can that mismatch number from matches string.
> 1. I have to count how many A or T or G or C (believe me only these 4
> letters will appear in this, i will not see Z or B or K etc)
> 2. ^T or ^A or ^G or ^C will also be a mismatch
>
>
> desired output:
>
> Length Matches mismatches
> 77 24A0T9T36 3
> 71 25^T9^T37 2
> 60 25^T9^T26 2
> 62 42A19 1
> 10 6^TTT1 3
>
It looks like all you need to do is count the number of A, T, C, and G
characters in your Matches column. Maybe something like this:
differences = [
[77, '24A0T9T36'],
[71, '25^T9^T37'],
[60, '25^T9^T26'],
[62, '42A19']
]
for length, matches in differences:
mismatches = 0
for char in matches:
if char in ('A', 'T', 'G', 'C'):
mismatches += 1
print length, matches, mismatches
which produces the following output:
77 24A0T9T36 3
71 25^T9^T37 2
60 25^T9^T26 2
62 42A19 1

Jerry
On 02/03/12 19:11, Hs Hs wrote:
> 1. I have to count how many A or T or G or C (believe me only these 4
> letters will appear in this, i will not see Z or B or K etc)
This suggests to me that its related to chromosome analysis or somesuch?
There are some python libraries for biochemistry work.
Maybe you should Google for that and see if there is something
already out there that can do what you want?
Your explanation doesn't really make sense to me outside that context
and, since I'm not a biologist, it doesn't mean that much in that
context either!

Alan G
Author of the Learn to Program web site
On 3/2/2012 11:11 AM Hs Hs said...
> Hi:
> I have the following table and I am interested in calculating mismatch
> ratio. I am not completely clear how to do this and any help is deeply
> appreciated.
>
...and then there's always the standard library:
Help on class SequenceMatcher in module difflib:
class SequenceMatcher
 SequenceMatcher is a flexible class for comparing pairs of sequences of
 any type, so long as the sequence elements are hashable.
Emile
