Ei mutta kun veli on jo aika lähellä sinua geneettisesti. Niin pari miljardia ei riitä alkuunkaan että kiinasta generoituisi joku samanlainen mutaatioiden kautta kun ne mutaatiot on jo myllänneet hyvin ne omat geenit siitä lähtien kun ne tiet erkaantui.
Sukulaisiin vertaamisessa tuskin onkaan järkeä.
Täällä on näköjään laskeskeltu todennäköisyyksiä
Genetic Similarities Within and Between Human Populations.
Thus the answer to the question “How often is a pair of individuals from one population genetically more dissimilar than two individuals chosen from two different populations?” depends on the number of polymorphisms used to define that dissimilarity and the populations being compared. The answer,
can be read from
Figure 2. Given 10 loci, three distinct populations, and the full spectrum of polymorphisms (Figure 2E), the answer is
≅ 0.3, or nearly one-third of the time. With 100 loci, the answer is ∼20% of the time and even using 1000 loci,
≅ 10%. However, if genetic similarity is measured over many thousands of loci, the answer becomes “never” when individuals are sampled from geographically separated populations.
On the other hand, if the entire world population were analyzed, the inclusion of many closely related and admixed populations would increase
This is illustrated by the fact that
and the classification error rates,
CC and
CT, all remain greater than zero when such populations are analyzed, despite the use of >10,000 polymorphisms (
Table 1, microarray data set;
Figure 2D). In a similar vein,
Romualdi et al. (2002) and
Serre and Pääbo (2004) have suggested that highly accurate classification of individuals from continuously sampled (and therefore closely related) populations may be impossible. However, those studies lacked the statistical power required to answer that question (see
Rosenberg et al. 2005).
How can the observations of accurate classifiability be reconciled with high between-population similarities among individuals? Classification methods typically make use of aggregate properties of populations, not just properties of individuals or even of pairs of individuals. For instance, the centroid classification method computes the distances between individuals and population centroids and then clusters individuals around the nearest centroid. The population trait method relies on information about the frequencies of each allele in each population to compute individual trait values and on the means and variances of the trait distributions to classify individuals. The Structure classification algorithm (
Pritchard et al. 2000) also relies on aggregate properties of populations, such as Hardy–Weinberg and linkage equilibrium. In contrast, the pairwise distances used to compute
make no use of population-level information and are strongly affected by the high level of within-groups variation typical of human populations. This accounts for the difference in behavior between
and the classification results.
Since an individual's geographic ancestry can often be inferred from his or her genetic makeup, knowledge of one's population of origin should allow some inferences about individual genotypes. To the extent that phenotypically important genetic variation resembles the variation studied here, we may extrapolate from genotypic to phenotypic patterns. Resequencing studies of gene-coding regions show patterns similar to those seen here (
e.g.,
Stephens et al. 2001), and many common disease-associated alleles are not unusually differentiated across populations (
Lohmueller et al. 2006). Thus it may be possible to infer something about an individual's phenotype from knowledge of his or her ancestry.
However, consider a hypothetical phenotype of biomedical interest that is determined primarily by a dozen additive loci of equal effect whose worldwide distributions resemble those in the insertion data set (
e.g., with
= 0.15;
Table 1). Given these assumptions, the genetic distance used in computing
and
CC is equivalent to a phenotypic distance, so
Figure 2 can be used to analyze this hypothetical trait.
Figure 2A shows that a trait determined by 12 such loci will typically yield
= 0.31 (0.20–0.41) and
CC = 0.14 (0.054–0.29; medians and 90% ranges).
About one-third of the time ( = 0.31) an individual will be phenotypically more similar to someone from another population than to another member of the same population. Similarly, individuals will be more similar to the average or “typical” phenotype of another population than to the average phenotype in their own population with a probability of ∼14% (
CC = 0.14). It follows that variation in such a trait will often be discordant with population labels.
The population groups in this example are quite distinct from one another: Europeans, sub-Saharan Africans, and East Asians. Many factors will further weaken the correlation between an individual's phenotype and their geographic ancestry. These include considering more closely related or admixed populations, studying phenotypes influenced by fewer loci, unevenly distributed effects across loci, nonadditive effects, developmental and environmental effects, and uncertainties about individuals' ancestry and actual populations of origin. The typical frequencies of alleles that influence a phenotype are also relevant, as our results show that rare polymorphisms yield high values of
CC, and
CT, even when many such polymorphisms are studied. This implies that complex phenotypes influenced primarily by rare alleles may correspond poorly with population labels and other population-typical traits (in contrast to some Mendelian diseases). However, the typical frequencies of alleles responsible for common complex diseases remain unknown. A final complication arises when racial classifications are used as proxies for geographic ancestry. Although many concepts of race are correlated with geographic ancestry, the two are not interchangeable, and relying on racial classifications will reduce predictive power still further.
The fact that, given enough genetic data, individuals can be correctly assigned to their populations of origin is compatible with the observation that most human genetic variation is found within populations, not between them. It is also compatible with our finding that, even when the most distinct populations are considered and hundreds of loci are used, individuals are frequently more similar to members of other populations than to members of their own population. Thus, caution should be used when using geographic or genetic ancestry to make inferences about individual phenotypes.