For all you CS people who wanted to try your hand at bioinformatics, here's a tiny little problem I am working on. I am an inexperienced programmer, but might enough to fake my way through. I just need a little help getting started.
I will have about 200 strings of 23 letters long, and each letter can be either, A, C, G, or T.
I want to "score" these strings based on a certain letter at a certain positions.
Example string
nnnnXnnnnnnXnnXnnXXXXXnn
In this example I want to add a point for the letter A in the 5th position, and add a point for a T in the 11th position. I also want to subtract a point for a G in the 13th position. All of the n's can be any of the four letters.
I have a website that will search a longer string and pull these patterns out, but I can't figure out a command for this. Does anyone know a simple command to do something like this. I don't really have a language preference, but I know perl is regularly used for DNA sequence analysis.
Thanks.
-MS