

 I have  sequences of a bacteria gene each from one strain. Among them, there are five strains belong to the same sequence type (ie highly similar) and the other four strains belong to another sequence type. The rest of the  strains are from different sequence types such that they are pretty divergent among themselves and also the aforementioned  strains

 I classify the first five as clade , the next four as clade  and the rest as clade  which serves as a background clade by labeling the tree as instructed in the manual. So now I have a Clade Model C with three clades. I am mostly interested in whether there is a diversifying selection going on between clade  and clade . What is the right way to do this?

 I came across this paper saying I can use a null model that combines clade  and clade  into one clade such that we can simulate the situation of w3 == w4. Then by using an alternative model that has three clades, I can run LRT with df= to test for diversifying selection
http://www.biomedcentral.com/1471-2148/12/206 Does this make sense? Is this the right way for me to test for diversifying selection between two clades? Assume you have a simple tree with three labeled clades, as follows: ((A1,A2)$ , (B1,B2)$ , (C1,C2)$); Clade A is labeled with $, clade B with $, and clade C with $. This last labelling doesn't need to be specified—codeml will automatically label the unlabelled branches/clades with ''. First, to test for significant variation among clades you can compare the fit of Clade model C (using the labeled input tree shown above) versus M2a_rel. M2a_rel assumes that $, $, $, etc, are all evolving under the same selection pressures. This test should have degrees of freedom. Second, to test for significant variation between clades A and B while simultaneously allowing clade C to be different, you can compare the fit of CmC when run using the tree provided above versus CmC when run using a simpler tree. In this case, the simpler tree would assign clades A and B to the same group, like so: ((A1,A2)$ , (B1,B2)$ , (C1,C2)$); . This test should have degree of freedom.
 So with a three clades model C against M2a_rel, you can test if each of the three clades undergoing diversifying selection themselves to adapt to the environment.

 Not exactly.  You can't say that each of the clades is divergent.

 If you have three clades, you have three freely-estimated 'site class 2' parameters under CmC: w2, w3, and w4.  M2a_rel assumes that w2 = w3 = w4.  

 Comparing these two models tests whether the M2a_rel assumption holds, and if the test is significant then you can conclude that the assumption doesn't hold.  However, there are several ways to violate this M2a_rel assumption.

 For example:
w2 and w3 and w4 might all be different from each other
w2 might equal w3, with w4 being different
w2 might equal w4, with w3 being different
By analogy, think of an ANOVA where you're comparing the mean value in three groups. If you have a significant test result, you can then conclude that there is significant variation among the mean values for the groups. However, you can't know for sure which groups are significantly different unless you restructure your test by designing a more appropriate null model or unless you conduct pairwise comparisons between groups. If we test the three clades model C against a model with two of the clades merged, then we are testing whether there are diversifying selection between the two clades that are merged.

