Hamming Time

So in my previous post I mentioned Exercism. This website consists of a series of coding challenges in several different coding languages. My chosen track is Ruby, of course. Again, the list of exercises in this track is at GitHub.

I have worked through several of the exercises thus far. One whose very premise made me happy was the Hamming exercise… put in two lengths of DNA and calculate their Hamming distance. In one of my previous incarnations, I worked in a lab tossing plasmids into sweet innocent bacterial strains… slave labour to manufacture proteins. Anyway, I knew exactly what this exercise was trying to measure.

Following is the README, or basic instructions of the exercise:

Write a program that can calculate the Hamming difference between two DNA strands.

A mutation is simply a mistake that occurs during the creation or copying of a nucleic acid, in particular DNA. Because nucleic acids are vital to cellular functions, mutations tend to cause a ripple effect throughout the cell. Although mutations are technically mistakes, a very rare mutation may equip the cell with a beneficial attribute. In fact, the macro effects of evolution are attributable by the accumulated result of beneficial microscopic mutations over many generations.

The simplest and most common type of nucleic acid mutation is a point mutation, which replaces one base with another at a single nucleotide.

By counting the number of differences between two homologous DNA strands taken from different genomes with a common ancestor, we get a measure of the minimum number of point mutations that could have occurred on the evolutionary path between the two strands.

This is called the ‘Hamming distance’

My first iteration of this exercise is as follows:


class Hamming
  def self.compute(dna_strand_1, dna_strand_2)
    dna_strand_1_array = dna_strand_1.split(//)
    dna_strand_2_array = dna_strand_2.split(//)

    length_shortest_strand = dna_strand_1.length
    length_shortest_strand = dna_strand_2.length if dna_strand_2.length < dna_strand_1.length

    i=0
    hamming_distance = 0

    while i < length_shortest_strand
      if dna_strand_1_array[i] != dna_strand_2_array[i]
        hamming_distance = hamming_distance+1
      end
      i+=1
    end

    hamming_distance
  end
end

In my second iteration, I realized that there is no need to split the DNA strand into an array due to the each_char method. In addition, iterating to the end of strand 1 and ensuring that we were comparing to an existing base pair in strand 2, the length_shortest_strand variable could be eliminated.


class Hamming
  def self.compute(dna_strand_1, dna_strand_2)
    i = 0
    hamming_distance = 0
    dna_strand_1.each_char do |x|
      hamming_distance += 1 if dna_strand_2.byteslice(i) != x && dna_strand_2.byteslice(i) != nil
      i += 1
    end

    hamming_distance
  end
end

My last iteration utilized the with_index method to eliminate the iterator variable. I also assigned the individual base pairs that were being compared to the variable base_pair_a and base_pair_b for more readability.


class Hamming
  def self.compute(dna_strand_a, dna_strand_b)
    hamming_distance = 0
    dna_strand_a.each_char.with_index do |base_pair_a, i|
      base_pair_b = dna_strand_b[i]
      hamming_distance += 1 if base_pair_b != base_pair_a && base_pair_b != nil
    end
    hamming_distance
  end
end

This was a fun exercise!