UVA Problem 1368 – DNA Consensus String Solution:
Click here to go to this problem in uva Online Judge.
Solving Technique:
Rank 174, Run time 12 ms ( 0.012 s ).
First input is number of test cases. Next of two inputs one is number of strings to input and another is the size of string. Our strings will only contain A, C, G, T characters.
For solving this problem we first need to find consensus string. That is of all strings we check each column of all strings and count the maximum occurring character in that column. This character is our consensus string character for that column.
Consensus string size is same as the input string size. Since each of its characters are maximum occurring characters in the input string. Example,
TATGATAC TAAGCTAC AAAGATCC TGAGATAC TAAGATGT ======== TAAGATAC /* Consensus String, check each column for max character count */
Now that we have found the consensus string we just need to find consensus error. I will only explain the logic here, I have commented my code below.
TATGATAC TAAGCTAC AAAGATCC TGAGATAC TAAGATGT ======== TAAGATAC /* Consensus String */ ======== 11101021 /* Consensus Error, 1+1+1+0+1+0+2+1 = 7, count every character of a column that don't match that consensus string character */
Now if you understood my explanation time to go write a better solution :).
Important: Be sure to add or print a new line after each output unless otherwise specified. The outputs should match exactly because sometimes even a space character causes the answer to be marked as wrong answer.
Input:
3 5 8 TATGATAC TAAGCTAC AAAGATCC TGAGATAC TAAGATGT 4 10 ACGTACGTAC CCGTACGTAG GCGTACGTAT TCGTACGTAA 6 10 ATGTTACCAT AAGTTACGAT AACAAAGCAA AAGTTACCTT AAGTTACCAA TACTTACCAA
Output:
TAAGATAC 7 ACGTACGTAA 6 AAGTTACCAA 12
Code:
/*
* Author: Quickgrid ( Asif Ahmed )
* Site: https://quickgrid.wordpress.com
* Problem: UVA 1368 ( DNA Consensus String )
*/
#include<stdio.h>
int main(){
register unsigned n, a, b, i, j, k, error;
scanf("%u", &n);
while(n--){
scanf("%u%u", &a, &b);
/**
* Create the string array
*/
char **s = new char*[a+1];
for(i = 0; i < a; ++i){
/**
* Allocate space for each string
*/
s[i] = new char[b+1];
/**
* Input a string in i th position
*/
scanf("%s", s[i]);
}
/**
* Consensus string declaration
*/
char *conseq = new char[b+1];
/**
* K is used below
*/
for(k = i = 0; i < b; ++i){
/**
* Reset A,C,G,T count array
*/
unsigned int seq[4] = {0};
for(j = 0; j < a; ++j){
switch(s[j][i]){
case 65: /* 65, 'A' here seq[0] is A */
++seq[0];
break;
case 67: /* 'C' */
++seq[1];
break;
case 71: /* 'G' */
++seq[2];
break;
case 84: /* 'T' */
++seq[3];
}
}
unsigned max = seq[0], maxIndex = 0;
for(j = 1; j < 4; ++j){
if(seq[j] > max){
/**
* Find the max count of A,C,G,T in a column, in all strings
*/
max = seq[j];
/**
* Remember the maxIndex for creating consensus string
*/
maxIndex = j;
}
}
/**
* We keep adding to our consensus string the most count of A,C,G,T
*/
switch(maxIndex){
case 0:
conseq[k++] = 65;
break;
case 1:
conseq[k++] = 67;
break;
case 2:
conseq[k++] = 71;
break;
case 3:
conseq[k++] = 84;
}
}
for(error = i = 0; i < b; ++i){
for(j = 0; j < a; ++j){
/**
* Calculate the error count by moving column by column of all strings, then comparing against the consensus string character in that column
*/
if(s[j][i] != conseq[i])
++error;
}
}
/**
* Print the consensus string and error count EACH ON A NEW LINE
*/
printf("%s\n%u\n", conseq, error);
}
return 0;
}