Human genetic clustering – Wikipedia, the free encyclopedia

Posted: December 5, 2013 at 9:45 am

Human genetic clustering analysis uses mathematical cluster analysis of the degree of similarity of genetic data between individuals and groups to infer population structures and assign individuals to groups that often correspond with their self-identified geographical ancestry. A similar analysis can be done using principal components analysis, which in earlier research was a popular method.[1] Many of recent studies in the past few years have returned to using principal components analysis.

In 2004, Lynn Jorde and Steven Wooding argued that "Analysis of many loci now yields reasonably accurate estimates of genetic similarity among individuals, rather than populations. Clustering of individuals is correlated with geographic origin or ancestry."[2]

]

A study by Neil Risch in 2005 used 326 microsatellite markers and self-identified race/ethnic group (SIRE), white (European American), African-American (black), Asian and Hispanic (individuals involved in the study had to choose from one of these categories), to representing discrete "populations", and showed distinct and non-overlapping clustering of the white, African-American and Asian samples. The results were claimed to confirm the integrity of self-described ancestry: "We have shown a nearly perfect correspondence between genetic cluster and SIRE for major ethnic groups living in the United States, with a discrepancy rate of only 0.14%."(Tang, 2005)[full citation needed]

Studies such as those by Risch and Rosenberg use a computer program called STRUCTURE to find human populations (gene clusters). It is a statistical program that works by placing individuals into one of an arbitrary number of clusters based on their overall genetic similarity, many possible pairs of clusters are tested per individual to generate multiple clusters.[3] These populations are based on multiple genetic markers that are often shared between different human populations even over large geographic ranges. The notion of a genetic cluster is that people within the cluster share on average similar allele frequencies to each other than to those in other clusters. (A. W. F. Edwards, 2003 but see also infobox "Multi Locus Allele Clusters") In a test of idealised populations, the computer programme STRUCTURE was found to consistently underestimate the numbers of populations in the data set when high migration rates between populations and slow mutation rates (such as single-nucleotide polymorphisms) were considered.[4]

Nevertheless the Rosenberg et al. (2002) paper shows that individuals can be assigned to specific clusters to a high degree of accuracy. One of the underlying questions regarding the distribution of human genetic diversity is related to the degree to which genes are shared between the observed clusters. It has been observed repeatedly that the majority of variation observed in the global human population is found within populations. This variation is usually calculated using Sewall Wright's Fixation index (FST), which is an estimate of between to within group variation. The degree of human genetic variation is a little different depending upon the gene type studied, but in general it is common to claim that ~85% of genetic variation is found within groups, ~610% between groups within the same continent and ~610% is found between continental groups. For example The Human Genome Project states "two random individuals from any one group are almost as different [genetically] as any two random individuals from the entire world."[5] Sarich and Miele, however, have argued that estimates of genetic difference between individuals of different populations fail to take into account human diploidity.

The point is that we are diploid organisms, getting one set of chromosomes from one parent and a second from the other. To the extent that your mother and father are not especially closely related, then, those two sets of chromosomes will come close to being a random sample of the chromosomes in your population. And the sets present in some randomly chosen member of yours will also be about as different from your two sets as they are from one another. So how much of the variability will be distributed where?

First is the 15 percent that is interpopulational. The other 85 percent will then split half and half (42.5 percent) between the intra- and interindividual within-population comparisons. The increase in variability in between-population comparisons is thus 15 percent against the 42.5 percent that is between-individual within-population. Thus, 15/42.5 is 32.5 percent, a much more impressive and, more important, more legitimate value than 15 percent.[6]

Additionally, Edwards (2003) claims in his essay "Lewontin's Fallacy" that: "It is not true, as Nature claimed, that 'two random individuals from any one group are almost as different as any two random individuals from the entire world'" and Risch et al. (2002) state "Two Caucasians are more similar to each other genetically than a Caucasian and an Asian." It should be noted that these statements are not the same. Risch et al. simply state that two indigenous individuals from the same geographical region are more similar to each other than either is to an indigenous individual from a different geographical region, a claim few would argue with. Jorde et al. put it like this:

The picture that begins to emerge from this and other analyses of human genetic variation is that variation tends to be geographically structured, such that most individuals from the same geographic region will be more similar to one another than to individuals from a distant region.[2]

Read more from the original source:
Human genetic clustering - Wikipedia, the free encyclopedia

Related Posts

Comments are closed.

Archives