![]() |
|
| français - Español |
|
|
14.1 IntroductionDAD — which stands for "Distributive analysis/Analyse distributive" — is designed to facilitate the analysis and the comparison of social welfare, inequality, poverty and equity using micro (or disaggregated) data. It is freely distributed and its use does not require purchasing any commercial software. DAD's features include the estimation of a large number of indices and curves that are useful for distributive comparisons. It also provides various statistical tools to enable statistical inference. Many of DAD's features are useful for estimating the impact of programs (and reforms to these programs) on poverty and equity. The first version of DAD was launched in September 1998. It initially came to life following a request by the Canadian International Development Research Centre (IDRC) to Université Laval to support research then carried out in Africa in the context of the IDRC's program on the Micro Impacts of Macro-economic and Adjustment Policies (MIMAP). Improved versions of DAD subsequently appeared as errors and bugs were corrected and as attempts were made to make it more reliable, more flexible and broader in scope. The current version (January 2006) is 4.4. Several factors motivated us in the process of building DAD. First, there seemed to be an ever increasing need for developing-country analysts to carry out poverty and inequality "profiles". Much of development policy is indeed now assessed through poverty criteria, and this is carried out among other things through the elaboration of poverty assessments, poverty reduction strategy papers (the now well-known PRSP's), poverty and social impact analyses, etc.. Much of this distributive assessment had earlier typically been done by foreign consultants and by international organizations' technical staff. Little was left in the form of national capacity building and local empowerment following these largely external exercises. Local researchers and national policy analysts typically felt alienated by these poverty assessments that they often did not understand and that they could not usually influence. To break that segregation between foreign experts and local policy makers and analysts, it seemed useful to introduce tools that would benefit developing country analysts pedagogically and operationally. Second, micro-data accessibility was increasingly becoming less of a problem to developing-country researchers. This followed what had occurred in more developed countries some 20 years earlier when data tapes and records started to circulate widely in research centers and universities. This was made possible in large part by the amazing increase in storage and processing speed that the computer revolution was creating. Developing-country analysts were gaining from the same advances, though with some lag due to tighter resource constraints. Furthermore, in addition to the computing and technical demands that handling large data sets involved, developing country analysts often had to deal with data accessibility difficulties. This meant inter alia having to face skepticism and rent-seeking behavior from statistical agencies and international organization staff when requesting access to data that were supposed in principle to be public. That problem had also become less severe by the end of the 1990's, in part due to outside pressure. To process and analyze these data then typically became the next barrier to break. Third, much of distributive analysis was (and is still) handled as if it was not subject to statistical uncertainty. Indeed, a considerable amount of energy and resources seems to be wasted in discussions of poverty and inequality "results" that cannot be trusted on formal statistical grounds. Even changes in poverty headcounts of around 4% or 5% are often statistically insignificant within the usual statistical precision criteria. Needless to say, the efforts deployed by analysts and policy makers to account for variations of less than 1% or 2% (as often occurs) in poverty rates are typically a pure loss of resources. This unfortunate state of affairs needed to be remedied by a much greater use of appropriate statistical techniques. Though conceptually relatively simple, the use of these techniques nevertheless required reading through some technical literature as well as writing tedious computer programs. DAD was in large part written to help bypass these hurdles. Achieving this meant clearing the ground of statistically insignificant results and leaving more time and resources for the interpretation of those distributive findings that were statistically significant. DAD was thus conceived to help policy analysts and researchers analyze poverty and equity using disaggregated data. An overriding operational objective was to try to make DAD's environment as accessible and as user friendly as possible. Carl Fortin, our co-author, convincingly argued from the start that we should program DAD in the Java programming language. An object-oriented language, Java created a new paradigm of platform independence: once written, Java applications could run on any operating system as well as on the internet. Conceived by Sun in 1995, Java could still be considered in 1998 to be an infant programming language. By now, however, it has become an important pillar of the programming and internet industry. To make DAD completely free of charge, we also chose not to tie its use to statistical commercial softwares such as Excel, SPSS, SAS or STATA. We therefore opted to design DAD from scratch using some of Java's packages as building blocks. To make DAD as user friendly as possible, we use pop-up application windows and spreadsheets as the main working tools. This enables users to visualize a lot of information at a glance, and to manage that information easily. Most of the relevant variables and options needed for running applications can be selected from single application windows. DAD's use of spreadsheets has the advantage of displaying the entire data sets to be used. Small data sets can easily be entered manually. Changes to cell values can be made directly on the spreadsheet. The results of operations on data vectors can be checked easily. DAD also allows loading two data bases simultaneously, and makes it possible to display each of these two data bases alternatively on the spreadsheet. This makes it easy to carry out applications with either one or two data bases. That structure also enables DAD to account for whether the data bases are independent when it comes to computing standard errors on distributive estimators that use information from two samples. 14.2 Loading, editing and saving databases in DADDAD's databases are displayed on spreadsheets similar to those of SPSS, STATA, or Microsoft's Excel — see Figure 14.1. Every line in a sheet represents one observation or one data "record". Typically, an observation consists of one of the sampling or statistical units that were drawn into a survey. In distributive analyses, a sampling unit is often a household since it is households that are typically the last sampled units in surveys. When observations represent households, there will thus be as many lines or observations in the data as there are households drawn into the household survey. The statistical units (or units of interest) are usually (for ethical reasons) the individuals. Even though the sampling units originally drawn into the survey may have been the households, data sets are sometimes re-organized in such a way that each individual in a household is assigned its own line of data. There will then be as many observations in a data set as there are individuals found in the households. A database used in DAD is then a matrix (a set of columns) whose length is the number of observations discussed above and whose width is the number of variables contained in the database. Each column displays the values of a variable. A variable has as many values as there are observations in the database. All columns in DAD are therefore of the same length. Variable values can have a "float" format —indicating, for example, the level of household income —or an "integer" format — showing for instance the socio-economic category to which a household belongs.
Figure 14.1: The spreadsheet for handling and visualizing data in DAD. There are several options for entering data into DAD. The first one is to create a new database in DAD and then enter the variable values manually. This can be useful for exploratory or pedagogical purposes. Clearly, however, this option is not convenient for entering large databases into DAD. A second option for reading existing data bases into DAD is done by using well-known copy/paste facilities. Before doing this, however, a new data base must be created in DAD and then assigned a number of observations (or size) that corresponds to the length of the variables that will be copied/pasted. The third possibility for entering data into DAD is typically more reliable (and also faster) than the first two and involves two steps. The first step saves the database in an ASCII (or a text) format. The way in which this is done in practice depends on the software in which the data were previously handled. DAD's Users Manual gives examples of such output procedures for several common commercial softwares. One fast alternative to this is offered by the use of STAT/TRANSFER (note however that this requires buying a license), which transforms databases rapidly from the most popular formats into an ASCII format. Once the database is in ASCII format, it can easily be imported using DAD's Data Import Wizard. The wizard ensures inter alia that the imported database does not contain missing or unreadable values. Once the data are read in DAD, they can be submitted to a number of arithmetical and logical operations, variable names can be added or changed, and new variables can be created. Databases can subsequently be saved in DAD's preferred ASCII format (identified by the extension .daf). As already mentioned, many of DAD's applications can use simultaneously two databases. To use a second database, the user should first activate a second file by clicking on the button File2, and then follow the same procedures as for loading a first file. 14.3 Inputting the sampling design informationThe process of generating random surveys usually displays four important characteristics (this is discussed in more details in Chapter 16):
Recent versions of DAD enable taking that structure into account for the estimation of the various distributive statistics as well as for the computation of the sampling distributions of these statistics. When a data file is first read or typed into DAD, the survey design assigned to it by default is Simple Random Sampling. This supposes that the observations were independently selected from a large base of sampling units. This, however, is rarely how surveys are designed and implemented. Once the data are loaded, the exact sampling design structure can however be easily specified. This is done using the Set Sample Design dialogue box. Specifying the sample design structure can involve letting DAD know about (up to) 5 vectors (see Figure 14.2).
Figure 14.2: The Set Sample Design window in DAD. 14.4 Applications in DAD: basic proceduresOnce data have been read into DAD and that the sampling design has been specified, the field is wide open for the estimation of distributive statistics and for performing statistical tests. For every application programmed in DAD, there is a specific application window that facilitates the specification of variables, parameters and options to generate the desired distributive statistics. For example, Figure 14.3 shows the specific application window for computing the FGT poverty index with one distribution. There is a separate specific window for the case of two distributions. The list of all applications available in DAD's current version 4.4 appears in Tables 14.1 and 14.2. Most application windows, including that of Figure 14.3, are divided into three panels. The first panel is used to specify the relevant database variables needed for the estimation. The second panel (generally at the bottom of the application window) specifies the parameter values and options to be used by the estimator —examples include the level of inequality aversion, the value of the poverty line, the percentile to be considered, as well as whether indices should be normalized and whether statistical inference should be performed. The third panel activates buttons to generate various types of results. Some application windows can also generate popping-up dialogue boxes. One example of this can be found when clicking on the Compute line button in the Poverty| FGT application window. This serves to specify the manner in which the poverty line should be (or was) estimated. The following basic variables are typically required for carrying out DAD's computations.
DAD's applications with two distributions can be launched after having loaded two databases. Each time one launches an application that can support two distributions, the dialog box, shown in Figure 14.4, opens to allow the user to specify the desired number of distributions to be used as well as the name of the databases for these distributions. The application window for two distributions is very similar to that for one. The main difference is the addition of a second panel to specify the relevant variables to be used for the second distribution. The application for two distributions generally serves to compute distributive differences across the two distributions. For curve applications with two distributions, for instance, differences between the curves of the two distributions can usually be drawn. 14.5 CurvesDAD has built-in tools that facilitate the use of curves to display distributive information. Say, for instance, that we wish to graph a Lorenz curve. We can compare it to the 45° line to observe by how much income shares differ from population shares. This is done by following these steps:
Table 14.1: DAD's applications (version 4.4)
Table 14.2: DAD's applications (version 4.4 - continued)
Figure 14.5 shows an example of Lorenz curves drawn by DAD. We can also compare two Lorenz curves to test for inequality dominance of one distribution over the other. For this, we choose again the application Curves|Lorenz, but this time with two distributions. DAD can also usually draw curves that show how the levels of some distributive statistics vary with ethical parameters — Such as inequality or poverty aversion parameters. Take for instance the Atkinson index of inequality. It may be informative to check how fast it varies as a function of ε, its parameter of inequality aversion. To do this, follow these steps:
14.6 GraphsRecent versions of DAD are quite flexible in terms of editing, saving and printing graphs. On most application windows, a button Graph is available to draw graphs instantly. The type of graphs drawn depends on the application and on the type of Graph buttons selected. There are for instance two Graph buttons in the Poverty|FGT Index application window. Clicking on the Graph button plots estimates of the FGT index for a range of alternative poverty lines. Clicking on the Graph2 button draws instead estimates of the equally-distributed poverty gap that is equivalent to the estimated FGT poverty index, and this for a range of poverty aversion parameters α. Most of the options for editing DAD's graphs can be accessed from the Graph Properties dialogue box — see Figure 14.7. DAD's graphs can also be saved in a variety of formats. Table 14.3 lists some of them. Curves are useful tools to check various types of distributive dominance. Table 14.4 sums up some of the links between some of the applications and curves found in DAD and the tests for various orders of social welfare, poverty and inequality dominance. Table 14.3: Available format to save DAD's graph.
Table 14.4: Curves and stochastic dominance.
14.7 Statistical inference: sampling distributions, confidence intervals and hypothesis testingDAD facilitates statistical inference in a number of original ways:
The Standard deviation, confidence interval and hypothesis testing dialogue box is the main tool for telling DAD what to do in terms of statistical inference. This box is shown on Figure 14.8. 14.8 ReferencesFor further information on Java's development and structure, see (Deitel and Deitel 2003)'s introductory book, or Chapter 1 of (Lewis and Loftus 2000). DAD's official web page provides access to extensive information on the software:
Figure 14.3: Application window for estimating the FGT poverty index-one distribution.
Figure 14.4: Choosing between configurations of one or two distributions.
Figure 14.5: Lorenz curves for two distributions
Figure 14.6: Differences in Lorenz curves drawn by DAD
Figure 14.7: The dialogue box for graphical options
Figure 14.8: The STD option |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| guest (Read)(Ottawa) Login | Home|Careers|Copyright and Terms of Use|General Infomation|Contact Us|Low bandwidth |