![]() |
|
| français - Español |
|
|
15.1 Density estimation15.1.1 Univariate density estimationIt is often useful to visualize the shapes of income distributions. There are essentially two main approaches to doing so, and a mixture of the two. The first approach uses parametric models of income distributions. These models assume that the income distribution follows a known particular functional form, but with unknown parameters. Popular examples of such functional forms include the log-normal, the Pareto, and variants of the beta or gamma distributions. The main statistical challenge is then to estimate the unknown parameters of that functional form, and to test whether a given functional form appears to estimate better the observed distribution of income than another functional form. The second approach does not posit a particular functional form and does not require the estimation of functional parameters. Instead, it lets the data entirely "speak for themselves". It is therefore said to be non-parametric. The method is most easily understood by starting with a review of the density estimation used by traditional histograms. Histograms provide an estimate of the density of a variable y by counting how many observations fall into "bins", and by dividing that number by the width of the bin times the number of observations in the sample. To see this more clearly, denote the origin of the bins by y0 and the bins of the histogram by [y0 + mh, y0 + (m + l)h] for positive or negative integers m. For instance, if we take m = 0, then the bin is described by the interval ranging from the origin to the origin plus h. Also, let
Such a histogram is shown on Figure 15.1 by the rectangles of varying heights over identical widths, starting with origin y0. For bins defined by [y0+mh, y0+ (m + 1)h], the bin width is indeed a constant set to h, but we can also allow the widths to vary across the bins of the histogram. The choice of h controls the amount of smoothing performed by the histogram. A small bin width will h lead to significant fluctuations in the value of the histogram, and a very large width will set the histogram to the constant h-1. Choosing an appropriate value for such a smoothing parameter is in fact a pervasive preoccupation in non-parametric estimation procedures, as we will discuss later. The choice of the origin can also be important, especially when n is not very large. There can be, however, little guidance on that latter choice, except perhaps when the nature of the data suggest a natural value for y0. One way to avoid choosing such a y0 is by constructing what will appear soon to be a "naive" kernel density estimator, that is, one in which the point y in
This naive estimator can also be obtained from the use of a weight function w(u), defined as:
and by defining
This frees the density estimation from the choice of y0. This naive estimator can also be improved statistically by choosing weighting functions that are smoother than w(u) in 15.3. For this, we can think of replacing the weight function w(u) by a general "kernel function" K(u), such that1
A smooth kernel estimate of the density function that generated the histogram is shown on Figure 15.1. 1DAD: Distribution| Density Function. Figure 15.1: Histograms and density functions
In general, we would wish E: 18.5.1 The definition of
The "bumps" provided by the Gaussian kernel have the familiar bell shapes, are smoothly differentiable up to any desired level, and are such that 15.1.2 Statistical properties of kernel density estimationThe efficiency of non-parametric estimation procedures is usually measured by the mean square error (MSE) that there is in estimating the function f(y) at a point y. The MSE in estimating f(y) by
The most common way of defining a measure of global accuracy simply sums the mean square error across values of y. This yields the mean integrated square error (or MISE), a measure of the accuracy of estimating f(y) over the whole range of y:
2DAD: Distribution| Density Function. The relative efficiency of a particular choice of a kernel function K(u) can then be assessed relative to that choice of the kernel function which would minimize the MISE. The Gaussian kernel function has very good efficiency properties, although they are not quite as good as some other (less smooth) kernel functions, such as the (efficiency-optimal) Epanechnikov, the biweight or the triangular kernels, which are described and discussed for instance in Silverman (1986) (see in particular Table 3.1). 15.1.3 Choosing a window widthEven, however, if we were to agree on a particular shape for an argument-centered kernel function, there would still remain the question of which window width to choose. Again, conditional on the choice of a particular form for K(u), we can choose the window width that minimizes the MISE. To see what this implies, note first that we can decompose the MSE at y as a sum of the square of the bias and of the variance that there is in estimating
For symmetric kernel functions, the bias can be shown to be approximately equal to
where, as before, f(i)(y) stands for the ith -order derivative of f(y). The variance equals
where
Hence, considering (15.10), we find that the bias of Looking at (15.11), we find ceteris paribus that a flatter kernel (i.e., with a lower ck) decreases the variance of An increase in h plays an offsetting role on the precision of
This value of h* is conditional on both K(u) and f(y) being normal density functions. Silverman (1986) also argues for a more robust choice of h*, given by
where A = min(standard deviation, interquartile range/1.34). This is because (15.14)
Further (asymptotic) results show that, under some mild assumptions — in particular, that the density function f(y) is continuous at y, and that h → 0 and nh → ∞ as n → ∞ — the kernel estimator 15.1.4 Multivariate density estimationKernel estimation can also be used for multivariate density estimation. Let u, y and yi be d-dimensional vectors. We can estimate a d-dimensional density function as3:
where h is a window width common to all of the dimensions. The multivariate Gaussian kernel is given by 15.1.5 Simulating from a density estimateSimulations from an estimated density are sometimes needed to compute estimates of functionals of the unknown true density function. This is the case, for instance, for the estimation in DAD of indices of classical horizontal inequity. The estimation of such indices requires information on the net income distribution of those who have the same gross incomes, and such information cannot be gathered directly from sample observations of net and gross incomes since very few (if any) exact equals can be observed in random samples of finite sizes. Another use of simulated distributions is for computing bootstrap estimates of the sampling distribution of some estimators. The usual bootstrap procedure proceeds by conducting successive random sampling (with replacement) from the original sample Consider first the case of generating J independent realizations, 3DAD: Distribution|Joint Density Function. Step 1 Choose i with replacement from Step 2 Choose ε randomly using the probability density function K; Step 3 Set Note that this algorithm does not even require computing directly For the multivariate case, the above algorithm becomes just slightly more complicated. For instance, for the estimation of classical HI at gross income x, we need to generate a random sample of net incomes, Step 1 Choose i with replacement from
Step 2 Choose ε randomly using the probability density function K; Step 3 Set This gives a simulated sample of net incomes Because they follow an estimated density function that is on average smoother than the true one, the simulated samples generated by the above algorithms will have a variance that is generally larger than both the variance observed in the sample and the true population variance. Let for instance the sample variance of the yi be denoted as Step 3' Set in the univariate case. For the bivariate case, we also use Step 3', but replace
and
Equation (15.16) is in fact an example of a kernel regression of y on x, a procedure to which we now turn. 15.2 Non-parametric regressionsThe estimation of an expected relationship between variables is the second most important sphere of recent applications of kernel estimation techniques. Non-parametric regressions offer several useful applications in distributive analysis. An example of such an application is the estimation of the relationship between expenditures and calorie intake. Regressing calorie intake non parametrically on expenditure does not impose a fixed functional relationship between those two variables along the entire range of calorie intake. On the contrary, it allows a fair amount of flexibility by estimating the link between the two variables through a local weighting procedure. The local weighting procedure essentially considers the expenditures of those individuals with a calorie intake in the "region" of the specified calorie intake. It weights those values with weights that decrease rapidly with the distance from the calorie intake. Hence, those with calorie intake far from the specified level will contribute little to the estimation of the expenditure needed to attain that level. The results using this method are thus less affected by the presence of "outliers" in the distribution of incomes, and less prone to biases stemming from an incorrect specification of the link between spending and calorie intake. Basically, then, one is interested in estimating the predicted response, m(x), of a variable y at a given value of a (possibly multivariate) variable x, that is,
Alternatively, if the joint density f(x, y) exists and if f(x) > 0, m(x) can also be defined as:
The difficulty in estimating the function m(x) is that we typically do not observe in a sample a response of y at that particular value of x. Furthermore, even if we do, there are rarely other observations with exactly the same value of x that will allow us to compute reliably the expected response in which we are interested. Let then
To estimate m(x), kernel regression techniques use a local averaging procedure that involves weights K(u) that are analogous to those used in Section 15.1 for density estimation. Recalling (15.5) and (15.19), this leads to the following Nadaraya-Watson non-parametric estimator of m(x)4: E:18.8.1
To reduce the bias of using neighboring yi's, the kernel weights As in the case of the kernel density estimators, the kernel smoother
This asymptotic bias can be estimated consistently using estimates of m(2)(x), m(1)(x), f(1)(x) and f(x). Such an estimation, however, complicates significantly the computation of the sampling distribution of 4DAD: Distribution|Non-Parametric Regression. The variance of
The conditional variance 15.3 ReferencesThis chapter draws significantly from Silverman (1986) and Härdle (1990), to which readers are referred for more details and in-depth analysis. |
||||||||||||
| guest (Read)(Ottawa) Login | Home|Careers|Copyright and Terms of Use|General Infomation|Contact Us|Low bandwidth |