Jasieniuk:Implementation of a method to calculate microsatellite genotype distance in polyploids: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
m (→‎Overview: word order typo)
(added "processing a file of genotypes" heading)
Line 57: Line 57:


<code>}</code>
<code>}</code>
==Processing a file of genotypes==


==References==
==References==

Revision as of 17:02, 22 December 2009

Overview

This is an implementation of the method of Bruvo et al. (2004). It calculates a pairwise measure of genetic distance between individuals at a single microsatellite locus. The allele copy number does not need to be known and the individuals do not even need to be the same ploidy level. Its primary advantage over band-sharing measures of genetic distance is that it allows for mutation, such that alleles aren't considered to be qualitatively different, but instead the difference in repeat number between two alleles is taken into account. The paper offers two options for dealing with virtual alleles (when one individual has more alleles than the other). My implementation uses the simpler option and sets virtual alleles as equal to infinity.

This is my first adventure into the R programming environment, so I apologize for any programming faux pas.

Arguments and Output

The function genotype.distance takes as arguments two single locus microsatellite genotypes. In the code these are genotype1 and genotype2. Each genotype should be a vector containing all of the alleles for a given individual at a given locus. The vectors do not need to be the same length, and should only be as long as needed to contain the alleles. The alleles should be expressed in repeat number rather than nucleotides (DNA fragment length). Because it is the difference in allele size that is measured, you can simply divide the nucleotide length by the repeat size (e.g. divide by two if you have dinucleotide repeats).

The output is a single number that is the distance between genotypes. In the paper, this is dl.

Code for genotype.distance

genotype.distance <- function(genotype1, genotype2) {

require(combinat) # for permn function

if(identical(genotype1, genotype2)) {dist <- 0} else { if(length(genotype1) >= length(genotype2)) {genotypeL <- genotype1; genotypeS <- genotype2} else {genotypeL <- genotype2; genotypeS <- genotype1}

# if genotypes are identical, just return a distance of zero without doing the rest of the calculation

# whichever genotype has more alleles, make this genotypeL (long) and the other genotypeS (short)

k <- length(genotypeL) # sets the ploidy level for this genotype comparison

genotypeSE <- c(genotypeS[1:length(genotypeS)], rep(Inf, k -length(genotypeS))) # brings both genotypes up to the same ploidy

allele.distances <- array(0 , c(k,k)) # Create an empty matrix to contain the distances between alleles

for(n in 1:k) { for(m in 1:k) {allele.distances[n,m] <- genotypeL[n] - genotypeSE[m]}} # fills the array with the differences in allele length

geometric.distances <- array(1 - 2^-abs(allele.distances) , c(k,k)) # geometric transformation based on mutation probabilities

#Next, find the minimum distance sum among all permutations

permutations <- permn(k) # we will go through the columns in every possible order

row <- 1:k # we will go through the rows in a constant, sequential order

mindist <- Inf # this variable will store the minimum sum encountered so far.

for(i in 1:factorial(k)) { # the loop to go through every possible sum of compatible allele comparisons

sum <- 0 # this is si, the sum of allele comparisons

column <- permutations[[i]] # choose one permutation of column orders for this round

for(j in 1:k) {sum <- sum + geometric.distances[row[j],column[j]]} # the loop to calculate the sum for this permutation

if(sum < mindist) {mindist <- sum} # is this the minimum sum found so far?

}

dist <- mindist/k # divide this minimum sum by the number of alleles

}

dist

}

Processing a file of genotypes

References

Bruvo R, Michiels NK, D'Souza TG, Schulenberg H (2004). "A simple method for the calculation of microsatellite genotype distances irrespective of ploidy level." Molecular Ecology 13:2101-2106.

Contact

User:Lindsay_V._Clark