![]() |
|
| français - Espańol |
|
|
16.1 Sampling designThere exist in the population of interest a number of statistical units. For simplicity, we can think of these units as households or individuals. From an ethical perspective, it is usually preferable to consider individuals as statistical units of interest since it is in the welfare of individuals that we are ultimately interested, but for some purposes (such as the distribution of aggregate household wellbeing) households may also be appropriate statistical units. These statistical units are those for which we would like to observe socio-economic information such as their household composition, labor activity, income or consumption. Since it is usually too costly to gather information on all of the statistical units of a large population, one would typically be constrained to obtain information on only a sample of such units. Distributive analysis is therefore usually done using survey data. Since surveys are not censuses, we must take care to distinguish unobserved "true" population values from observed sample values. Sample differences across surveys are indeed due both to true population differences and to sampling variability. Population values are generally not observed (otherwise, we would not need surveys). Sample values as such are rarely of interest: they would be of interest in themselves only if the statistical units that appeared by chance in a sample were also precisely those which were of ethical interest. This is not usually the case. Hence, sample values matter in as much as they can help infer true population values. The statistical process by which such inference is performed is called statistical inference. The sampling process should thus ideally be such that it can be used to make some statistically-sensible distributive analysis at the level of the population, and not solely for the samples drawn. Sampling errors thus arise because distributive estimates are typically made on the basis of only some of the statistical units of interest in a population. The fact that we have no information on some of the population statistical units makes us infer with sampling error the population value of the distributive indicators in which we are interested. The error made when relying solely on the information content of one sample depends on the statistical units present in that sample. The drawing of other samples would generate different sampling errors. Because samples are drawn randomly, the sampling errors that arise from the use of these samples are also random. Since the true population values are unknown, the sampling error associated with the use of a given sample is thus also unknown. Statistical theory does, however, allow one to estimate the distribution of sampling errors from which actual (but unobserved) sampling errors arise. This nevertheless requires samples to be probabilistic, viz, that there be a known probability distribution associated to the distribution of statistical units in a sample. This also strictly means that there is absence of unquantifiable and subjective criteria in the choice of units. If this were not so, it would not be possible to assess reliably the sampling distribution of the estimators. To draw a sample, a sampling base is used. A sampling base is made of all the sampling units (SU) from which a sample can be drawn. The base of sampling units — e.g., the census of all households within in a country — is usually different from the entire population of statistical units — e.g., the population of individuals, say. There are several reasons for this, an important one being that it is generally cost effective to seek information only within a limited number of clusters of statistical units, grouped geographically or socio-economically. This also facilitates the collection of cluster-level (e.g., village-level) information. A process of simple random sampling draws sample observations randomly and independently from a base of sampling units, each with an equal probability of selection. Simple random sampling is rarely used in practice to generate household surveys. Instead, a population of interest (a country, say) is often first divided into geographical or administrative zones and areas, called strata. The first stage of random selection takes place from within a list of primary sampling units (denoted as PSU's) built for each stratum. Within each stratum, a number of PSU's are then randomly selected. PSU's are often departments, villages, etc.. This random selection of PSU's provides "clusters" of information. Since the cost of surveying all statistical units un each of these clusters may be prohibitive, it may be necessary to proceed to further stages of random selection within each selected PSU. For instance, within each department, a number of villages may be randomly selected, and within every selected village, a number of households may also be randomly selected. The final stage of random selection is done at the level of the last sampling units (LSU's). These LSU's are often households. Each selected LSU can then provide information on all individuals found within that LSU. These individuals are usually not selected — information on all of them appears in the sample. They therefore do not represent LSU's in statistical terminology. 16.2 Sampling weightsSampling weights (also called inverse probability, expansion or inflation factors) are the inverse of the sampling probabilities, viz, of the probabilities of a sampling unit appearing in the sample. These sampling weights are SU-specific. The sum of these weights is an estimator of the size of the population of SU's. Samples are sometimes "self-weighted". Each sampling unit then has the same chance of being included in the survey. This arises, for instance, when the number of clusters selected in each stratum is proportional to the size of each stratum, when the clusters are randomly selected with probability proportional to their size, and when an identical number of households (or LSU's) across clusters is then selected with equal probability within each cluster. It is, however, common for the inclusion probability to differ across households. One reason comes simply from the complexity of sample designs, which makes differential sampling weights occur frequently. Another reason is that the costs of surveying SU's vary, which makes it more cost effective to survey some households (e.g., urban ones) than others. Sampling precision can also be enhanced with differential probabilities of household inclusion. The idea here is to survey with greater probability those households who contribute more to the phenomenon of interest. It leads to a sampling process usually called sampling with "probability proportional to size". Assume for instance that we are interested in estimating the value of a distribution-sensitive poverty index. The most important contributors to that index are obviously the poor households, and more precisely the poorest among them. An a priori suspicion might be that such poorest households are proportionately more likely to be found in some areas than in others. Making inclusion probabilities larger for households in these more deprived areas will then enhance the sampling precision of the estimator of the distribution-sensitive poverty index since it will gather data that are more statistically informative. A reverse sample-design argument would apply for a survey intended to estimate total income in a population. The most important contributors to total income are the richest households, and it would thus be sensible to sample them with a greater probability. Yet one more consequence of the principle of "probability proportional to size" is the desirability of sampling with greater probability those households of larger sizes. Distributive analysis is normally concerned with the distribution of individual well-being. Ceteris paribus, larger-size households contribute more information towards such assessment, and should therefore be sampled with a greater probability (roughly speaking, with a probability proportional to their size). Omitting sampling weights in distributive analysis will systematically bias both the estimators of the values of indices and points on curves as well as the estimation of the sampling variance of these estimators1. Including such weights will usually make the analysis free of asymptotic biases. To see this, we follow Deaton (1998), p.45, and let Y be the population total of the x's, with a population of size N. An estimator of that population total is then given by E:18.2.1
where ti is the number of times unit i appears in a random sample of size n and where wi is the sampling weight. Let πi be the probability that unit i is selected each time an observation is drawn from the population. Households with a low value of πi will have a low probability of being selected in the survey, relative to others with a higher πi. Then,
and Y is therefore an unbiased estimator of Y. An analogous argument applies to show that 16.3 StratificationThe sampling base is usually stratified into a number of strata. The basic advantage of stratification is to use prior information on the distribution of the population, and to "partition" it in parts that are thought to differ significantly from each other. Sampling then draws information systematically from each of those parts of the population. With stratification, no part of the sampling base therefore goes totally unrepresented in the final sample. To be more specific, a variable of interest, such as household per capita income, often tends to be less variable within some stratum than across an entire population. This is because households within the same stratum typically share to a greater extent than within the entire population some socio-economic characteristics — such as geographical locations, climatic conditions, and demographic characteristics — that are determinants of the incomes of these households, stratification helps generate systematic sample information from a diversity of "socio-economic areas". 1DAD: Poverty| FGT Index. Because information from a "broader" spectrum of the population leads on average to more precise estimates, stratification generally decreases the sampling variance of estimators. For instance, suppose at the extreme that household income is the same for all households in a given stratum, and this, for each and every stratum. In this case, supposing also that the population size of each stratum is known in advance, it would be sufficient to draw only one household from each stratum to know exactly the distribution of income in the population. 16.4 Multi-stage samplingMulti-stage sampling implies that SU's end up in a sample only subsequently to a process of multi-stage selection. Groups (or "clusters") of SU's are first randomly selected within a population (which may be stratified). This is followed by further sampling within the selected groups, and followed by yet another process of random selection within the subgroups just selected. The first stage of random selection is done at the level of primary sampling units (PSU). An important condition would seem to be that first-stage sampling be random and with replacement for the selection of a PSU to be done independently from that of another. There are many cases, however, in which this condition is not met.
Generally, variables of interest (such as incomes) vary less within a cluster than between clusters. Hence, ceteris paribus, multi-stage selection reduces the "diversity" of information generated compared to SRS and leads to a less informative coverage of the population. The impact of clustering sample observations is therefore to tend to decrease the precision of estimators, and thus to increase their sampling variance. Ceteris paribus, the lower the within-cluster variability of a variable of interest, the smaller the gain of information that there is in sampling further within the same clusters. To see this, suppose the extreme case in which household income happens to be the same for all households in a cluster, and this, for all clusters. In such cases, it is clearly wasteful to adopt multi-stage sampling: it would be sufficient to draw one household from each cluster in order to know the distribution of income within that cluster. More information would be gained from sampling from other clusters. 16.5 Impact of sampling design on sampling variabilityThere are two modelling approaches to thinking about how data were initially generated. The first one, which is also the more traditional in the sampling design literature, is the finite population approach. The second approach is the super-population one: the actual population is a sample drawn from all possible populations, the infinite super-population. This second approach sometimes presents analytical advantages, and it is therefore also regularly used in econometrics. To illustrate the impact of stratification and clustering on sampling variability, consider therefore the following "super-population model", based on Deaton (1998), p.56. Then, the income xhij of a household j from a cluster i of a stratum h can be modelled as:
For simplicity, assume that the xhij are drawn from the same number n of clusters in each of the L strata, and that the same number of LSU (or "households") m is selected in each of the clusters. The indices hij then stand for:
For simplicity, also assume that αh, is distributed with mean 0 and variance 16.5.1 StratificationSay that we wish to estimate mean income μ. The estimator,
Let
be the estimator of the mean μh of stratum h. Clearly,
and
Because of the independence of sampling across strata, we also have that
The sampling variability of Stratification can in fact be thought of as an extreme case of clustering, with the number of selected clusters corresponding to the number of population clusters, and with sampling being done without replacement to ensure that all population clusters will appear in the sample. Suppose instead that one were to select L strata randomly and with replacement, to make it possible that not all of the strata will be selected. This is in a sense what happens when stratification is dropped and clustering is introduced. Using (16.4) and (16.5), we then have that
where th is a random variable showing the number of times stratum h was selected. Then, recalling that µh = µ + αh and linearizing (16.9), we have approximately that
and thus that
since
Since th follows a multinomial distribution, with var(th) = (L - 1)/L and cov(th, ti) = - 1/L, we find that
Hence, using (16.12) and (16.15), we obtain
The last term in (16.16) is the effect upon sampling variability of removing stratification. The larger this term, the greater the fall in sampling variability that originates from stratification. 16.5.2 ClusteringLet us now investigate the effect of clustering on the sampling variance, that is, on var (
The first line of (16.17) follows by the definition of Hence, for a per-stratum given number of observations mn, it is better to have a large n to reduce sampling variability, namely, it is better to draw observations from a large number of clusters. The larger the cross-cluster variability 16.5.3 Finite population correctionsSampling without replacement imposes that all of the selected sampling units are different. It therefore extracts on average more information from the sampling base than sampling with replacement, and ensures that the samples drawn are on average closer to the population of sampling units. Sampling without replacement therefore increases the precision of sample estimators. To account for this increase in sampling precision, a FPC factor can be used, although it complicates slightly the estimation of the variance of the relevant estimators. Assume simple random sampling of n sampling units from a population of N sampling units. Thus, we have that wi = N/n for all of the n sample observations. To illustrate the derivation of an FPC factor in this simplified case, we follow Cochrane (1977) and Deaton (1998), p.42-44. An estimator
where the random variable ti indicates whether — and how many times — the population unit i was included in the sample. Taking the variance of (16.18), we find:
Using (16.18) and (16.19), the distinction between simple random sampling with and without replacement is analogous to the distinction between a binomial and a multinomial distribution for the ti. With sampling without replacement, the probability that any one population unit appears in the final sample is equal to n/N, i.e., E[ti] = n/N. Since ti then takes either a 0 or a 1 value, it thus follows a binomial distribution with parameter n/N. The variance of ti is then given by
Substituting var(ti) and cov(ti, tj) into (16.19), and defining
we find
where 1 - f = (N - n)/N is an FPC factor. Take now the case of simple random sampling with replacement. We can then express ti for any given population unit i as a sum of n independent draws tij, with j = 1,..., n, each one tij indicating whether observation i was selected in draw j. Thus:
Since for any draw j, E[tij] = 1/N, the expected value of ti, is again n/N, but ti may now take values greater than 1. The draws tij being independent, and each draw having a binomial distribution with parameter 1/N, we have that
which is the variance of a multinomial distribution with parameters n and 1/N. It can be checked that the covariance cov(ti, tj) is given by -n/N2. Substituting var(ti) and cov(ti, tj) into (16.19) again, we now find
This is larger than (16.22): the difference between the two results equals
and depends on the magnitude of n relative to N. The larger the value of n relative to N, the greater the sampling precision gains that there are in sampling without replacement. 16.5.4 WeightingWe follow once more the approach of Cochrane (1977) and Deaton (1998), pp.45-49. Suppose that we are again interested in estimating the variance of the estimator Y of a total Y, but for simplicity assume that sampling is done with replacement so that we can for now ignore FPC factors. Y is now defined as:
Taking its variance, we find
ti follows once more a multinomial distribution, but now with var(ti) = nπi(l- πi) and cov(ti, tj) = -nπi πj. Substituting this into (16.28), we find
To estimate (16.29), we can replace population values by sample values and thus use the estimator
Denote as yi = wixi,i = 1,...,n, the n sample values of wixi, and let
with the difference that a familiar n/(n - 1) small-sample correction factor has been introduced in (16.31) to correct for the small-sample bias in estimating the variance of the yi. Incorporating weights in the estimation of sampling variances is thus relatively straightforward. 16.5.5 SummaryThe above material calls to mind the importance for statistical offices of making available sampling design information. This includes providing
Equipped with this information, distributive analysts can provide reliable estimates of the sampling precision of their estimators2. E; 18.9.1 16.6 Estimating a sampling distribution with complex sample designsWe provide in this section a detailed account of the computation of sampling variances in DAD, taking full account of the sampling design. Let:
2DAD: Edit|Set Sample Design.
The sampling covariance of two totals,
where
— note the similarity with (16.21) and (16.22) — and where fh, is a function of a user-specified FPC factor, fpch, for stratum h, such that,
Recall that setting fh ≠ 0 is useful only when the sampling design is of the form either of simple random sampling or of stratified random sampling with no subsequent sub-sampling within the PSU's selected. In both cases, sampling must have been done without replacement. The variance An often-used indicator of the impact of sampling design on sampling variability is called the design effect, deff. The design effect is the ratio of the design-based estimator of the sampling variance
For simple random sampling, we would have that
and, recalling (16.22), the sampling variance of
where var(y) is the variance of the population yhij, and where
Some of the above variables often take familiar forms and names:
16.7 ReferencesGeneral references on estimation and inference taking into account survey design include Asselin (1984) and Cochrane (1977). Applications to economic analysis are discussed and presented in Deaton (1998), Howes and Lanjouw (1998) (focussing on poverty analysis), and Zheng (2002) (with a specific focus on Lorenz curves). Alternative approaches to taking into account survey design can be found inter alia in Cowell (1989) (modelling sampling weights as jointly distributed with living standards), and in Biewen (2002b) and Schluter and Trede (2002a) (for dependence across members of the same sampling unit — households in their case). Kennickell and Woodburn (1999) illustrates the impact of a consistent estimation of survey weights in the US surveys of Consumer Finances for the analysis of the distribution of wealth. |
|||||||||||||
| guest (Read)(Ottawa) DST Login | Home|Careers|Copyright and Terms of Use|General Infomation|Contact Us|Low bandwidth |