Exploring and Correcting the Bias in the Estimation of the Gini Measure of Inequality

Juan F. Muñoz, Pablo J. Moya-Fernández, Encarnación Álvarez-Verdejo

DSEID: DSEID-001-2360116
DOI: 10.1177/00491241231176847
Journal: Sociological Methods & Research
Publisher: SAGE Publications
Published: 2025-2
Status: available

Abstract

The Gini index is probably the most commonly used indicator to measure inequality. For continuous distributions, the Gini index can be computed using several equivalent formulations. However, this is not the case with discrete distributions, where controversy remains regarding the expression to be used to estimate the Gini index. We attempt to bring a better understanding of the underlying problem by regrouping and classifying the most common estimators of the Gini index proposed in both infinite and finite populations, and focusing on the biases. We use Monte Carlo simulation studies to analyse the bias of the various estimators under a wide range of scenarios. Extremely large biases are observed in heavy-tailed distributions with high Gini indices, and bias corrections are recommended in this situation. We propose the use of some (new and traditional) bootstrap-based and jackknife-based strategies to mitigate this bias problem. Results are based on continuous distributions often used in the modelling of income distributions. We describe a simulation-based criterion for deciding when to use bias corrections. Various real data sets are used to illustrate the practical application of the suggested bias corrected procedures.

PDF

GROBID Extracted text; discontinued.

This text is generated from TEI extraction for accessibility, search, and TTS. Formulas, tables, figures, page layout, and references may not perfectly match the original PDF.

Extracted abstract

The Gini index is probably the most commonly used indicator to measure inequality. For continuous distributions, the Gini index can be computed using several equivalent formulations. However, this is not the case with discrete distributions, where controversy remains regarding the expression to be used to estimate the Gini index. We attempt to bring a better understanding of the underlying problem by regrouping and classifying the most common estimators of the Gini index proposed in both infinite and finite populations, and focusing on the biases. We use Monte Carlo simulation studies to analyse the bias of the various estimators under a wide range of scenarios. Extremely large biases are observed in heavy-tailed distributions with high Gini indices, and bias corrections are recommended in this situation. We propose the use of some (new and traditional) bootstrap-and jackknife-based strategies to mitigate this bias problem. Results are based on continuous distributions often used in the modelling of income distributions. We describe a simulation-based criterion for deciding when to use bias corrections. Various real data sets are used to illustrate the practical application of the suggested bias corrected procedures.

1 Introduction Lorenz (1905) and Gini (1912) were the first to develop measures of inequality. More than a century after these contributions, inequality analysis remains an active and essential topic in numerous fields. The Gini index, also often referred to as the Gini coefficient, is probably the most commonly used to measure inequality. This indicator ranges between 0 and 1, where 0 indicates perfect equality, and 1 the opposite. Inequality is of special interest in economic studies (Piketty, 2015; Tridico, 2018) , and the Gini index is used especially to measure income inequality. Many studies indicate that income inequality has been increasing overall in recent years (Bonacini et al., 2021) , and marked differences can be observed across various countries. For instance, results from the 𝐸𝑈-𝑆𝐼𝐿𝐶 (European Union Statistics on Income and Living Conditions) survey show that Slovakia has the smallest Gini index estimate (0.21), while Turkey (0.43) has the highest of all the countries in the 𝐸𝑈-𝑆𝐼𝐿𝐶. At a global level, the World Bank indicates that South Africa has the highest Gini index (0.63). These results are associated with countries, but more extreme values are expected at regional level, in subpopulations, small areas, etc.

The aforementioned differences across countries indicate that institutions and policies may have an important role to play in reducing inequality. In fact, reducing inequality is one of the 17 Sustainable Development Goals of the United Nations 2030 Agenda for Sustainable Development.

The Gini index is a common statistical tool employed by the 2030 Agenda for measuring inequality (see Szymańska, 2021) . However, it should be noted that in order to reduce inequality, it is crucial to be able to accurately measure this phenomenon without biases and/or errors. Indeed, Efron (1990) argued that a large bias is usually an undesirable aspect of an estimator's performance. The relevance of the Gini index has also been demonstrated by its use to describe inequality in many fields and among different socioeconomic groups, such as length of life or well-being (Wang et al., 2020) , educational opportunity (Bulle, 2016) , housing prices (Villar and Raya, 2015) , gender inequality (Larraz, 2015; Larraz et al., 2019) , input-and outcome inequality (Jasso, 2021) , and horizontal and vertical inequality (Canelas and Gisselquist, 2019) .

There is a formal theoretical definition of the Gini index for continuous distributions, and many equivalent formulations have been proposed in the literature. As discussed by Davidson (2009) , there is no disagreement about the definition of the Gini index for continuous distributions, since the various existing expressions provide the same outcome. However, for discrete distributions, many different formulations have been suggested in the extensive literature, and there has been notable controversy surrounding the appropriate version to use in this scenario (see also Langel and Tillé, 2013) . For discrete distributions, various expressions of the Gini index are plug-in formulations of theoretical definitions of the Gini index for continuous distributions. Throughout this article, we refer to this value derived from a continuous distribution as the true value of the Gini index, and formulations of the Gini index for discrete distributions are referred to as empirical versions or estimators of the true Gini index. A highly debated topic in the literature is whether or not to use a specific bias corrected estimator, which is denoted as 𝐺 in this paper. Jasso (1979) , Deltas (2003) and Davidson (2009) provide some arguments in favour of 𝐺 .

Statistical techniques can be based on infinite or finite populations. Classical statistical theory assumes that sampled units are independently selected from an infinite population, whereas survey sampling theory (see Särndal et al., 2003) considers that samples are selected from a finite population. Survey sampling has specific features, and this implies that statistical techniques designed for infinite populations must be modified so that they can be used for finite populations. For instance, the usual assumption of independence is not satisfied in finite populations when samples are selected without replacement. Note also that the use of continuous probabilistic distributions to model income and wealth distributions is common practice in many real-world applications. For instance, the Dagum, Pareto, Weibull and Gamma distributions are used, respectively, by Pérez and Alaiz (2011) , Atkinson (2017) , Bakar and Pathmanathan (2020) and Salem and Mount (1974) . The Lognormal distribution is often used to model household income in many countries (see Clementi and Gallegati, 2005) .

This paper describes, in Section 2, the most common formulations for calculating the Gini index from both discrete and continuous distributions, and in scenarios of infinite and finite populations. The first aim of this paper is to regroup and classify existing empirical versions of the Gini index, and provide a better overview of the problem of estimating this parameter. The second aim is to analyse, in Section 3, the biases of different versions of the Gini index. For this purpose, we consider a variety of Gini indices and various probabilistic distributions commonly used to model income distributions. Our results reveal that extremely large biases may appear, especially for heavy-tailed distributions and large Gini indices, and bias correction procedures are recommended in this situation. As expected, the bias problem is more serious in small samples, as is the case of rural studies (Wan, 2001), small areas (Frabrizi and Trivisano, 2016) , subpopulations (Särndal et al., 2003, p. 386) , etc. The third contribution is to describe, in Section 4, bias correction procedures that may reduce the aforementioned large biases. Bootstrap and jackknife methods are considered, and a novel empirical bootstrap is also adapted to the problem of estimating the Gini index. In Section 5, the bias correction procedures are analysed using Monte Carlo simulation studies. Section 6 describes a simulation-based criterion for deciding when to use bias correction procedures, which are then illustrated, in Section 7, by application to various real data sets. Finally, a brief discussion is presented in Section 8. The supplementary material contains: (i) the selected parameters of the analysed probabilistic distributions; (ii) results from simulation studies based on large samples (𝑛 = 500); (iii) description of the bias functions suggested in Section 4, and information on their percentages of use; and (iv) efficiency and bias ratios of estimators of the Gini index, which are explored in Sections 5 and 6.

2

The Gini index

Definition

We assume that inequality is analysed using a variable of interest 𝑌, which is a nonnegative continuous random variable. A popular formulation of the Gini index is defined in terms of the average absolute difference between each possible pair of individuals (Qin et al., 2010) , i.e.,

𝐺 = 1 2𝜇 |𝑥 -𝑦| 𝑑𝐹 (𝑥)𝑑𝐹 (𝑦), (1)

where

𝜇 = 𝐸[𝑌] = 𝑦 𝑓(𝑦)𝑑𝑦 = 𝑦 𝑑𝐹 (𝑦),

is the mean of 𝑌, and 𝐹 (𝑦) = 𝑃(𝑌 ≤ 𝑦) and 𝑓(𝑦) are, respectively, the distribution function and the probability density function of 𝑌. A formulation of 𝐺 based on the distribution function is (Qin et al., 2010; Berger and Gedik-Balay, 2020) :

𝐺 = 1 𝜇 { 2𝐹 (𝑦) -1}𝑦𝑑𝐹 (𝑦).

(2) Anand (1983) showed that the Gini index 𝐺 can be computed as 2/𝜇 times the covariance between 𝑌 and the distribution function 𝐹 (𝑦), i.e., 𝐺 = 2 𝜇 𝑐𝑜𝑣{𝑌, 𝐹 (𝑦)}.

(3)

Finally, Yitzhaki (1998) and Berger and Gedik-Balay (2020) consider the expression

𝐺 = 1 - 𝜇 𝜇 ,

where 𝜇 = 𝐸(𝑍) = ∫ { 1 -𝐹 (𝑧)}𝑑𝑧 is the expectation of the minimum 𝑍 = min{𝑌 , 𝑌 }, and 𝑌 and 𝑌 are two independent random variables with the same distribution as 𝑌. For continuous distributions, the Gini index can be defined in many other ways, as can be seen in Yitzhaki (1998), Giorgi and Gibliarano (2017) , etc. In practice, the value of 𝐺 is estimated by means of a sample 𝑆, with size 𝑛, and which can be selected from either infinite or finite populations (Langel and Tillé, 2013) . The estimation of 𝐺 under both scenarios is discussed in Section 2.2.

Estimation

For infinite populations, {𝑌 : 𝑖 ∈ 𝑆} are considered as a sequence, with size 𝑛, of nonnegative random variables with the same distribution as the variable of interest 𝑌. The Gini index is estimated using an estimator of 𝐺 based on the observations of individuals selected in the sample 𝑆, and which are denoted as {𝑦 : 𝑖 ∈ 𝑆}. Such estimators are usually defined as plug-in formulations derived from a theoretical definition of 𝐺. This methodology may introduce a bias in comparison to the true parameter 𝐺, especially for extreme values of the Gini index. As can be seen in Section 3, a notable example is the plug-in expression of Equation ( 2 ), which is defined as (see Qin et al., 2010; Berger and Gedik-Balay, 2020) :

𝐺 = 1 𝑛𝑦 {2𝐹 (𝑦 ) -1}𝑦 ∈ = 2 𝑛𝑦 𝑦 ∈ 𝐹 (𝑦 ) -1, (4)

where

𝑦 = 𝑛 ∑ 𝑦 ∈ is the sample mean, 𝐹 (𝑡) = 𝑛 ∑ 𝛿 ∈ (𝑦 ≤ 𝑡)

is the sample (empirical) distribution function, and 𝛿(⋅) is the indicator variable that takes the value 1 if its argument is true and 0 otherwise. The classical empirical version of 𝐺 (Giorgi and Gigliarano, 2017) is the plug-in expression of Equation (1), i.e.:

𝐺 = 1 2𝑛 𝑦 𝑦 -𝑦 ∈ ∈ .

(5)

Note that many equivalent versions of 𝐺 have been suggested in the extensive literature on the Gini index. For instance, Sen (1973) proposed the popular formulation

𝐺 = 2 𝑛 𝑦 𝑖𝑦 ( ) ∈ - 𝑛 + 1 𝑛 = 2 𝑛 𝑦 𝑟 ∈ 𝑦 - 𝑛 + 1 𝑛 ,

where 𝑦 ( ) are the values 𝑦 sorted in increasing order and 𝑟 is the rank of unit 𝑖 in the sample 𝑆.

Similarly, the Gini index can be defined using the regression coefficient of an ordinary least squares regression (see Ogwang, 2000) . This is the idea behind

𝐺 = 2𝛽 𝑛 - 𝑛 + 1 𝑛 ,

which assumes the regression model 𝑖 = 𝛽 + 𝑢 , and where the heterocesdatic error 𝑢 has variance 𝜎 /𝑦 ( ) . The least squares estimator of 𝛽 is given by

𝛽 = ∑ 𝑖 ∈ 𝑦 ( ) ∑ 𝑦 ( ) ∈ . (6)

Finally, an equivalent version of 𝐺 is the empirical version of Equation (3), i.e.,

𝐺 = 2 𝑛𝑦 𝑐𝑜𝑣 𝑖, 𝑦 ( ) ,

where

𝑐𝑜𝑣 𝑖, 𝑦 ( ) = 1 𝑛 𝑖 ∈ 𝑦 ( ) - 𝑛 + 1 2 𝑦.

The estimator 𝐺 and its equivalent versions satisfy the symmetry axiom of Sen (1973) , which establishes that an estimator of 𝐺 based on a set of observations, say {𝑦 : 𝑖 ∈ 𝑆}, must coincide with the Gini index estimated by means of the same approach but using the sample 𝑆 exactly replicated, i.e., doubled in size (see Davidson, 2009) . Alternatively, the bias corrected estimator

𝐺 = 𝑛 𝑛 -1 𝐺 , (7)

is often used instead of 𝐺 . Some equivalent expressions of 𝐺 are: Jasso (1979) suggested the use of 𝐺 , Wang et al. (2016) consider 𝐺 , and 𝐺 is used by Berger and Gedik-Balay (2020) , where 𝑧 = 𝑛 ∑ 𝑧 .

𝐺 = 2 𝑛(𝑛 -1)𝑦 𝑖 ∈ 𝑦 ( ) - 𝑛 + 1 𝑛 -1 ; 𝐺 = 1 2𝑦 𝑛 2 𝑦 -𝑦 ; 𝐺 = 1 - 𝑧 𝑦 .

∈

and

𝑧 . = 1 𝑛 -1 min ∈ ,

𝑦 , 𝑦 . (2004) and Davidson (2009) provide theoretical justifications for the use of 𝐺 to reduce the bias of 𝐺 . As can be seen in Section 3, 𝐺 may result in serious biases for small Gini indices, but this problem can be easily solved by replacing 𝐹 (𝑡) in Equation (4) with the smooth (or midpoint) distribution function 𝐹 * (𝑡) = 𝑛 ∑ [𝛿(𝑦 < 𝑡) + 0.5𝛿(𝑦 = 𝑡)] ∈ , and the resulting estimator coincides with 𝐺 (see Berger, 2008) . In addition, 𝐺 , 𝐺 and 𝐺 are related when 𝑦 ≠ 𝑦 for all 𝑖 ≠ 𝑗, since

Giles

𝐺 = 𝐺 - 1 𝑛 , (8)

if this condition is satisfied, and

𝐺 = 𝑛 𝑛 -1 𝐺 - 1 𝑛 (9)

according to Equations ( 7 ) and ( 8 ). Note that expressions (5) and ( 7 ), or their equivalent formulations, are more frequently used in practice, and practitioners must be aware of the bias of 𝐺 when 𝐺 is small. The use of the smooth distribution function in Equation (4) will prevent this bias problem. For empirical distributions, additional formulations of the Gini index can be seen in Giorgi and Gigliarano (2017) .

For a finite population 𝑈 with 𝑁 individuals, {𝑌 : 𝑖 ∈ 𝑈} denotes a sequence of nonnegative random variables with the same distribution function 𝐹 (𝑦), and {𝑦 : 𝑖 ∈ 𝑈} are the population values of the variable of interest. In practice, social surveys are used to estimate the Gini index, and they are generally based on complex sampling designs with unequal probabilities. Therefore, the sample 𝑆 is now selected from 𝑈 by using a sampling design with survey weights 𝑤 = 𝜋 , where 𝜋 = 𝑃(𝑖 ∈ 𝑆) are the inclusion probabilities, with 𝑖 ∈ 𝑆. The problem of estimating 𝐺 from finite populations thus entails two steps. First, an empirical version of 𝐺 based on the population values {𝑦 : 𝑖 ∈ 𝑈} is required. We denote the population empirical versions of 𝐺 in finite populations as 𝐺 , 𝐺 and 𝐺 , and they are defined as 𝐺 , 𝐺 and 𝐺 , respectively, after substituting the sample values with the population values in Equations ( 4 ), ( 5 ) and ( 7 ). The second step is to estimate the selected population empirical version (𝐺 , 𝐺 or 𝐺 ) using weighted estimators. Some that can be found in the literature are:

𝐺 = 2 𝑁 𝑦 𝑤 ∈ 𝑦 𝐹 (𝑦 ) -1; (10) 𝐺 = 1 2𝑁 𝑦 𝑤 ∈ ∈ 𝑤 𝑦 -𝑦 ; (11)

and

𝐺 = 1 - 𝑧 𝑦 , (12)

where

𝑁 = ∑ 𝑤 ∈ , 𝑦 = 𝑁 ∑ 𝑤 ∈ 𝑦 , 𝑧 = 𝑁 ∑ 𝑤 ∈ 𝑧 . , 𝑧 . = 1 𝑁 -𝑤 𝑤 ∈ ,

min 𝑦 , 𝑦 , and 𝐹 (𝑡) = 𝑁 ∑ 𝑤 ∈ 𝛿(𝑦 ≤ 𝑡). Note that Equations ( 10 ), ( 11 ) and ( 12 ) reduce, respectively, to Equations (4), ( 5 ) and ( 7 ) under simple random sampling without replacement (SRSWOR).

3

Simulation studies to analyse the bias

In this section, we analyse the bias of 𝐺 , 𝐺 and 𝐺 in comparison to the true (asymptotic) value 𝐺, and using samples selected from infinite populations. This analysis is equivalent to the problem of analysing the bias of 𝐺 , 𝐺 and 𝐺 in comparison to the true (asymptotic) value 𝐺.

Description

We consider various continuous probabilistic distributions (Pareto, Dagum, Lognormal, Weibull and Gamma) often used in the modelling of income distributions. For each probabilistic distribution, parameters involved in the theoretical formulation of 𝐺 are selected such that 𝐺 takes the values {0.1,0.2, … ,0.8}, thus allowing us to examine different levels of inequality. Additional parameters required in distributions are also fixed, and all of them can be seen in the supplementary material (Table A1 ). For the Dagum distribution, the theoretical value of 𝐺 depends on both shape parameters 𝑎 and 𝑝, and for this reason the values 𝑝 = {0.5,20} are also fixed, and such distributions are denoted, respectively, as Dagum-p0.5 and Dagum-p20. The aim is to analyse the biases of the various estimators of the Gini index under the described scenarios, with these estimators being calculated using samples randomly drawn from an underlying continuous distribution with a true value 𝐺 for the Gini index. This framework is also adopted by Deltas (2003) , Davidson (2009) , Berger and Gedik-Balay (2020) , etc. We analyse both small and large sample sizes, specifically, 𝑛 = {50,500}. This study is equivalent to analysing the biases for samples, with size 𝑛, selected under SRSWOR from a large finite population (𝑁 → ∞), with population values drawn from the analysed probabilistic distributions.

Let 𝜃 be a given statistic for the unknown parameter 𝜃, based on the observations {𝑦 : 𝑖 ∈ 𝑆}. Throughout this article, the expected value based on 𝑅 replications of 𝜃 is defined as

𝐸 𝜃 = 𝜃 ‾ = 1 𝑅 𝜃 ( ) , (13)

where 𝜃 ( ) is the statistic 𝜃 evaluated at the 𝑟-th pseudo original sample 𝑆 ( ) , which is also selected, with size 𝑛, from the distribution function 𝐹 (𝑦). 𝑅 = 1000 replications are considered in simulation studies. The empirical measures can be expressed in terms of either the true Gini index 𝐺 or the expected values of estimators. We use the expected values because large biases can be obtained and 𝐺 is unknown in practice. Reporting the results in terms of 𝐺 makes it more difficult for empirical researchers to assess the performance of estimators for the specific data that they are analysing. Finally, note that various figures in this paper require only a customary estimator of 𝐺, and we use 𝐺 because it is less biased than its competitors (𝐺 and 𝐺 ).

In this section, we first use Monte Carlo simulations to investigate the relative bias (𝑅𝐵) of the various empirical versions (𝐺 , 𝐺 and 𝐺 ) in comparison to the true (asymptotic) value 𝐺. For a given statistic 𝜃 , this measure is defined as

𝑅𝐵 = 100 × 𝐵 𝜃 𝜃 ,

where the empirical bias is given by 𝐵 𝜃 = 𝐸 𝜃 -𝜃 = 𝜃 ‾ -𝜃. Comparisons are based on distributions with different levels of skewness because the value of the coefficient of skewness may have an impact on the bias of estimators of the Gini index. For discrete distributions, the coefficient of skewness is defined as:

𝛾 = 𝜇 . 𝜎 , (14)

where 𝜎 = (𝜇 . ) / is the sample standard deviation, and 𝜇 . = 𝑛 ∑ (𝑦 -𝑦) ∈ is the 𝛼-th central moment based on 𝑆. The aim of Figure 1 is to investigate the skewness for the probabilistic distributions considered in this paper, so this figure displays the expected values 𝛾 ‾ versus the expected values 𝐺 ‾ . , where 𝛾 ‾ and 𝐺 ‾ . are calculated using Equation ( 13 ) after substituting 𝜃 ( ) with 𝛾 ( ) and 𝐺 ( ) , respectively, and which are computed using Equations ( 14 ) and ( 7 ) at the 𝑟-th pseudo original sample 𝑆 ( ) .

Figure 1 : Expected values of the coefficient of skewness (𝛾 ‾ ) based on samples with sizes 𝑛 = {50,500}, and randomly selected from various continuous probabilistic distributions (infinite populations). The x-axes show the expected values of the estimator 𝐺 (𝐺 ‾ . ).

From Figure 1 we observe that the Pareto distribution is the most highly skewed distribution, followed by the Dagum-p20, Dagum-p0.5 and Lognormal distributions, in that order. The Weibull and Gamma distributions have similar values of 𝛾 ‾ , and they are the least skewed distributions in

n = 50 Expected values of G n c Expected skewness 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 1 2 3 4 n = 500 Expected values of G n c 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 2 4 6 8 10 12 14 Pareto Dagum-p20 Dagum-p.05 Lognormal Weibull Gamma

this study. For highly skewed distributions, serious biases can be observed when 𝐺 is large, as a result of which the maximum value of 𝐺 ‾ . is far from 𝐺 = 0.8, the maximum Gini index used in this study. This is not the case with less skewed distributions, since accurate estimates are also obtained when 𝐺 is large, and the expected values 𝐺 ‾ . are close to the required Gini index. For the various probabilistic distributions, the expected skewness increases as the sample size rises. This can be explained by the upper bound for 𝛾 based on the sample size and suggested by Cramer (1957) . This bound indicates that estimates for the coefficient of skewness may underestimate the true value when the sample size is small (Dorić et al., 2009) . For the Dagum distribution, the expected skewness increases as its shape parameter 𝑝 increases.

In this section we also analyse the impact of the skewness on the bias using box plots for estimates of 𝐺 . Thus, we illustrate this relationship between skewness and bias by comparing two different distributions in terms of skewness (Pareto and Gamma, as can be seen in Figure 1 ).

Results and conclusions

Figure 2 displays the 𝑅𝐵𝑠 of 𝐺 , 𝐺 and 𝐺 when 𝑛 = 50. First, we analyse the results from the less skewed distributions (Weibull and Gamma). The bias of 𝐺 is negligible for the various expected values of estimators. The bias of 𝐺 is slightly larger, in absolute terms, than that of 𝐺 , but lies within a reasonable range. Biases of both 𝐺 and 𝐺 do not seem to be affected by the value of the Gini index. 𝐺 is severely biased when the expected values of estimators are small, with values of 𝑅𝐵 that can be close to 20%. This empirical version must be modified to correct this bias, and two simple solutions are discussed in Section 2.2. First, the distribution function 𝐹 (𝑡) can be replaced, in Equation ( 4 ), by the smooth distribution function 𝐹 * (𝑡). This adjustment allows empirical versions 𝐺 and 𝐺 to be equivalent. Second, we can use one of the transformations described in Equations ( 8 ) and ( 9 ) when all observations are different. As the Gini index increases, the 𝑅𝐵 of 𝐺 decreases and 𝐺 and 𝐺 have similar 𝑅𝐵𝑠.

For heavy-tailed distributions (Pareto, Dagum-p20, Dagum-p0.5 and Lognormal), biases of 𝐺 , 𝐺 and 𝐺 seem to be affected by the value of the Gini index, and serious negative 𝑅𝐵𝑠 are obtained as the expected values of estimators increase (as much as -25%). We also observe a strong relationship between the 𝑅𝐵 and the coefficient of skewness. The largest 𝑅𝐵𝑠, in absolute terms, are produced by the Pareto distribution, which is the most skewed (see Figure 1 ), and biases, in absolute terms, decrease as the values of 𝛾 ‾ decrease.

For larger sample sizes, readers are referred to the supplementary material, where Figure A1 replicates Figure 2 for samples with size n=500. We point out that the 𝑅𝐵, in absolute terms, decreases as the sample size increases. For less skewed distributions, the bias of 𝐺 is negligible, and the 𝑅𝐵 of 𝐺 can be close to 2% for the various distributions. Non-negligible biases are also observed for heavy-tailed distributions, with values of RB close to -15% when 𝑛 = 500. In Figure 3 we investigate the effect of the skewness on the bias of 𝐺 using box plots and various Gini indices, with the most (Pareto) and the least (Gamma) skewed distributions from this study.

From Figure 2 we observe that the bias of 𝐺 is negligible for the various Gini indices when samples are selected from the Gamma distribution, while Figure 3 confirms that the estimates are concentrated, with a low variability, around the target value 𝐺. This is not the case with the Pareto distribution, which shows highly biased estimates and marked variability. From Figures 2 and 3 we observe that the bias of 𝐺 , in absolute terms, increases as the Gini index rises, while from Figure 3 we see that the variability of estimates also becomes higher as 𝐺 increases, with values of 𝐺 that can be larger than 0.9 when 𝐺 = 0.4, or smaller than 0.3 when 𝐺 = 0.8.

Pareto RB 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 -20 -10 0 10 Dagum-p20 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 -20 -10 0 10 Dagum-p0.5 RB 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 -10 -5 0 5 10 15 Lognormal 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 -5 0 5 10 15 Weibull Expected values of estimators RB 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 5 10 15 Gamma Expected values of estimators 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 5 10 15 G n a G n b G n c

Bias correction procedures

Results from Section 3 indicate that the customary estimators of the Gini index can be severely biased, especially for heavy-tailed distributions and large Gini indices, and that the use of a bias correction procedure may alleviate this bias problem. Various bias correction procedures are presented in this section, before being analysed in Section 5, and then applied to various real data sets in Section 7. Section 6 describes a criterion for deciding when to use bias corrections.

Bootstrap and jackknife techniques (see Efron and Tibshirani, 1993, and Wolter, 2007) can be used as bias correction procedures. Some authors who have demonstrated the capacity of these methods to correct biases are Pfeffermann and Correa (2012) and Jiao and Han (2020) . When it comes to the problem of estimating the Gini index, such statistical techniques have been used mainly for the construction of confidence intervals and variance estimation (see Moran, 2006 , and Larraz et al., 2020 , for bootstrap techniques, and Berger, 2008 , and Davidson, 2009 , for jackknife techniques). The estimator 𝐺 emerges as the bias corrected version of 𝐺 (Deltas, 0.2 0.4 0.6 0.8 1.0 G Estimation of the Gini index 0.2 0.4 0.6 0.8 Pareto Gamma

2003; Davidson, 2009) , but Section 3 shows that 𝐺 can also be severely biased. Van Ourti and Clarke (2011) investigated a bias correction method for the Gini index, but focussing on the bias due to grouped data. We now explore correction procedures for the biases discussed in Section 3. Bias corrections are applied to 𝐺 and 𝐺 (defined, respectively, for infinite and finite populations) because they are less biased than the alternative empirical versions described in Section 2. However, bias correction procedures can also be applied to any other empirical version.

For an infinite population, we first suggest the jackknife technique proposed by Ogwang (2000) , which can be easily implemented by a fast algorithm. Langel and Tillé (2013) showed that this method has desirable properties for the variance estimation of the Gini index. Ogwang (2000) proposed the application of the jackknife technique on 𝐺 , with jackknife estimates defined as

𝐺 (𝑘) = 𝐺 + 2 𝑛𝑦 -𝑦 ( ) 𝑦 ( ) 𝛽 𝑛 + ∑ 𝑖 𝑦 ( ) 𝑛(𝑛 -1) - 𝑛𝑦 -∑ 𝑦 ( ) + 𝑘𝑦 ( ) 𝑛 -1 - 1 𝑛(𝑛 -1)

where 𝛽 is the regression coefficient defined by Equation ( 6 ). Note that 𝐺 (𝑘) is equivalent to applying 𝐺 successively to the observations {𝑦 ( ) : 𝑖 ∈ 𝑆} and after removing the 𝑘 -th unit. The bias corrected estimator applied to 𝐺 and based on Ogwang's jackknife is defined as

𝐺 . = 𝑛𝐺 -(𝑛 -1)𝐺 ‾ . ,

where 𝐺 ‾ . = 𝑛 ∑ 𝐺 (𝑘). The bias corrected estimator applied to 𝐺 and based on jackknife is given by

𝐺 . = 𝑛𝐺 -(𝑛 -1)𝐺 ‾ . , (15)

where 𝐺 ‾ . = 𝑛 ∑ 𝐺 (𝑘), and 𝐺 (𝑘) is the estimator 𝐺 computed from the observations {𝑦 ( ) : 𝑖 ∈ 𝑆} after removing the 𝑘 -th unit. Note that 𝐺 . is one of the two bias corrected estimators that we report in the results from infinite populations. Pfeffermann and Correa (2012) proposed an empirical bootstrap bias correction procedure based on pseudo original and bootstrap samples selected from plausible parameters. This method was used to estimate the prediction mean square error in small area estimation of proportions. We also propose the adaption of this empirical bootstrap method to the problem of estimating the bias of 𝐺 , thus giving rise to a novel bias corrected estimator of 𝐺.

The empirical bootstrap procedure considers a set of plausible parameters, which are randomly generated from a confidence interval for 𝐺. For each plausible parameter, a pseudo original sample is generated from the underlying distribution of the original sample data. This method uses a cross-validation procedure that splits the various pseudo original samples into two groups: training and validation. In Section 3 we observed that both the Gini index and the coefficient of skewness have an impact on the bias of 𝐺 for heavy-tailed distributions. For the training group, we suggest various functions underlying the bias correction, which depend on the estimates of 𝐺 and 𝛾 computed from each pseudo original sample and on the expected values of 𝐺 and 𝛾 based on bootstrap samples. Efron and Tibshirani (1993) and Hall and Maiti (2006) indicate that bias corrections may increase the variance, so the validation group is used to choose the optimum bias function that minimizes the mean square error (MSE) of the suggested bias corrected estimator. In Section 5, we also investigate the impact of using bias corrections on the MSE. For an infinite population, the algorithm for estimating the bias of 𝐺 and for computing the suggested bias corrected estimator is described in detail as follows:

Step 1 (Plausible parameters). Select at random 𝐻 plausible values for the target parameter 𝐺 from a Uniform distribution, i.e., 𝐺 ∼ 𝑈𝑛(𝐺 , 𝐺 ), with ℎ = 1, … , 𝐻, and where 𝐺 and 𝐺 are, respectively, the lower and upper limits of a confidence interval for the true Gini index 𝐺.

Step 2 (Pseudo original samples for training and validation groups). Generate a pseudo original sample 𝑆 , with size 𝑛, from 𝑓(𝑦; 𝐺 ) and for each ℎ = 1, … , 𝐻, where 𝑓(𝑦; 𝐺 ) is the probability density function 𝑓(𝑦) with a Gini index equal to 𝐺 . Then, split the 𝐻 samples at random into two groups, the training group 𝛺 and the validation group 𝛺 , such that 𝛺 contains a set of 𝑇 samples (𝑆 , with 𝑡 = 1, … , 𝑇), 𝛺 contains 𝑉 samples (𝑆 , with 𝑣 = 1, … , 𝑉) and 𝐻 = 𝑇 + 𝑉.

Step 3 (Estimates from the pseudo original samples). For the training group 𝛺 , compute 𝐺 and 𝛾 for each sample 𝑆 , and using, respectively, Equations ( 7 ) and ( 14 ). Similarly, for the validation group 𝛺 , compute 𝐺 and 𝛾 using the samples 𝑆 .

Step 4 (Training phase: expected values based on bootstrap samples). For the training group 𝛺 , generate 𝐵 bootstrap samples 𝑆 ( ) , with size 𝑛, from each sample 𝑆 , with 𝑏 = 1, … , 𝐵. Compute 𝐺 ( ) and 𝛾 ( ) for each bootstrap sample 𝑆 ( ) , using Equations ( 7 ) and ( 14 ), respectively. The expected values of 𝐺 and 𝛾 based on bootstrap samples are denoted, respectively, as 𝐺 ‾ . and 𝛾 ‾ , and are computed using 𝐺 ( ) and 𝛾 ( ) in Equation ( 13 ), after substituting 𝑟 and 𝑅 with 𝑏 and 𝐵, respectively.

Step 5 (Training phase: expected values based on pseudo original samples). For the training group 𝛺 , generate 𝑅 pseudo original samples 𝑆 ( ) , with size 𝑛, from 𝑓(𝑦; 𝐺 ) and for each 𝑡 = 1, … , 𝑇, with 𝑟 = 1, … , R. Compute 𝐺 ( ) for each sample 𝑆 ( ) , and using Equation ( 7 ). The expected value based on pseudo original samples is denoted as 𝐺 ‾ . , and is computed using Equation (13).

Step 6 (Training phase: coefficient estimates). For the training group 𝛺 , estimate the unknown coefficients of a set of eligible bias functions 𝑞 𝐺 ; 𝐺 ‾ . ; 𝛾 ; 𝛾 ‾ , with 𝑙 = 1, … , 𝐿, that predict the variable 𝐷 = 𝐺 ‾ . -𝐺 . An example of bias function is the linear expression

𝐺 ‾ . -𝐺 = 𝑎 + 𝑎 (𝛾 ‾ -𝛾 ). ( 16

Step 7 (Validation phase: bias corrected estimators). For the validation group 𝛺 and for each function 𝑞 , compute the suggested bias corrected estimator of 𝐺 , defined by

𝐺 . (𝑙) = 𝐺 -𝑞 𝐺 ; 𝐺 ‾ . ; 𝛾 ; 𝛾 ‾ ,

where 𝑞 is the bias function 𝑞 after substituting its coefficients with the estimates computed in Step 6.

Step 8 (Validation phase: optimum function). For the validation group 𝛺 , identify the optimum function 𝑞 that minimizes the MSE of the estimators 𝐺 . (𝑙), and which is defined as

𝑀𝑆𝐸 = 1 𝑉 𝐺 . (𝑙) -𝐺 .

Step 9 (Bias corrected estimator). Compute the suggested bias corrected estimator 𝐺 .

= 𝐺 -𝑞 𝐺 ; 𝐺 ‾ . ; 𝛾 ; 𝛾 ‾ ,

where 𝐺 ‾ . and 𝛾 ‾ are, respectively, the expected values of 𝐺 and 𝛾 based on bootstrap samples and derived from the original sample 𝑆. □ 𝐺 .

is the second bias corrected estimator computed for infinite populations. Any method can be used to construct the confidence interval required in Step 1. Some existing confidence intervals for the Gini index are based on bootstrap (Qin et al., 2010) , jackknife (Berger, 2008) , linearization (Deville, 1999) or empirical likelihood (Berger and Gedik-Balay, 2020) . Variance estimators for the Gini index (Langel and Tillé, 2013) can also be used to construct confidence intervals based on the normality assumption. For the sake of simplicity, we use the traditional bootstrap method with confidence interval limits given by 𝐺 = 𝐺 ( . ) . and 𝐺 = 𝐺 ( . ) .

, where 𝐺 ( ) .

is the 𝛼-th quantile of the bootstrap estimates 𝐺 ( ) . The latter are computed using the estimator 𝐺 on the bootstrap sample 𝑆 ( ) , which is taken, with size 𝑛, from the original sample 𝑆. As discussed by Pfeffermann and Correa (2012) , the selected interval must be broad enough to contain the target parameter 𝐺. We use a confidence level of 99% because of the serious biases detected in Section 3. Pfeffermann and Correa (2012) also argue that the size of this confidence interval has no direct effect on the bound of the bias, and give a discussion on the number of parameters that should be included in the training and validation groups.

For the training group, Step 6 requires bias functions 𝑞 that predict 𝐷 = 𝐺 ‾ . -𝐺 with the aim of estimating the bias of 𝐺 , i.e., 𝐵 𝐺 = 𝐸 𝐺 -𝐺. In Section 3, we concluded that both the Gini index and the coefficient of skewness may have an impact on the 𝑅𝐵 of 𝐺 , so we suggest bias functions that depend on 𝐺 and 𝛾 . The expected values 𝐺 ‾ . and 𝛾 ‾ based on bootstrap samples are also considered. Table A2 from the supplementary material describes the 𝐿 = 7 candidate bias functions considered in Step 6 of the suggested algorithm. For the sake of simplicity, we only consider multiple linear regression functions, but more complex functions and/or additional statistics can also be used, and are expected to yield more accurate results.

For an infinite population, additional bias correction procedures can be computed (See Wolter, 2007) . For instance, we also calculated, in Section 5, bootstrap methods based on additive and multiplicative corrections (see Hall and Maiti, 2006, and Pfeffermann and Correa, 2012 , for detailed definitions), but we omitted them because the bias correction estimators ( 15 ) and ( 17 ) are less biased. Pfeffermann and Correa (2012) also argue that the aforementioned additive and multiplicative corrections may yield non-negligible biases with small samples, meaning alternative bias correction procedures may be preferable.

Bootstrap and jackknife techniques were originally designed for infinite populations, and do not have a direct application to finite populations due to the inherent features of survey sampling. Adjustments are thus required to apply these methods to finite populations (Quatember, 2015) . For finite populations, the rescaled bootstrap technique (Rao et al., 1992 ) can be used for bias correction of a given empirical version of 𝐺. This method has been used in many research studies (Berger and Muñoz, 2015; Moya et al., 2020; etc.) in many areas (see Yang et al., 2010; Muñoz et al., 2018; etc.) . Simplicity is the main advantage of the rescaled bootstrap over alternative bootstrap methods, which can be more computationally intensive. The rescaled bootstrap consists in computing a new set of weights (named bootstrap weights) for each bootstrap sample, which are obtained by applying a scale adjustment to the original survey weights 𝑤 . Specifically, the bootstrap weights are given by

𝑤 ( ) = 𝑤 𝑟 𝑛 𝑛 -1 ,

with 𝑖 ∈ 𝑆 and 𝑏 = 1, … , 𝐵, where 𝑟 denotes the number of times that 𝑖 -th unit is selected in the bootstrap sample 𝑆 ( ) . For a finite population, we first consider the additive bias corrected estimator of 𝐺 based on the rescaled bootstrap, which is defined as

𝐺 . = 𝐺 -𝐺 ‾ . -𝐺 = 2𝐺 -𝐺 ‾ . , (18)

where 𝐺 ‾ . is the expected value of 𝐺 based on the bootstrap estimates 𝐺 ( ) , and which are defined as 𝐺 after substituting the original survey weights 𝑤 with the bootstrap weights 𝑤 ( ) .

Second, we also consider the aforementioned empirical bootstrap bias correction. We now describe an extension of this method to finite populations. It requires the rescaled bootstrap technique along with confidence intervals and estimators based on survey weights. A confidence interval that can be used in Step 1 is given by the limits 𝐺 = 𝐺 ( . )

and 𝐺 = 𝐺 ( . )

, where

𝐺 ( )

is the 𝛼-th quantile of the weighted estimates 𝐺 ( ) derived from the rescaled bootstrap and based on 𝐺 . The following Step 1-b must be included between Steps 1 and 2:

Step 1-b (Pseudo original finite population). Generate a pseudo original population 𝑈 * with observations given by {𝑦 * : 𝑖 ∈ 𝑈 * } and selected from 𝑓(𝑦; 𝐺 ), with 𝑘 = 1, … , 𝐾.

The pseudo original samples of Steps 2 and 5 are selected from 𝑈 * instead of 𝑓(𝑦; 𝐺 ), and using the same sampling design as for the original sample 𝑆. The weighted coefficient of skewness is defined as

𝛾 = 𝜇 . 𝜎 , (19)

where 𝜎 = (𝜇 . ) / and 𝜇 . = 𝑁 ∑ 𝑤 ∈ 𝑦 -𝑦 . In Step 3, 𝐺 and 𝛾 are replaced by 𝐺 and 𝛾 , which are calculated using the sample 𝑆 and Equations ( 12 ) and ( 19 ), respectively. In Step 4, bootstrap estimates are substituted with 𝐺 ( ) and 𝛾 ( ) , which are obtained using the rescaled bootstrap method. The expected values of 𝐺 and 𝛾 based on the rescaled bootstrap are denoted, respectively, as 𝐺 ‾ . and 𝛾 ‾ , and they are computed using 𝐺 ( ) and 𝛾 ( ) in Equation (13), after substituting 𝑟 and 𝑅 with 𝑏 and 𝐵, respectively. The same set of eligible functions are used in Step 6, but they depend on weighted quantities, i.e., 𝑞 𝐺 ; 𝐺 ‾ . ; 𝛾 ; 𝛾 ‾ . In Step 7, the suggested bias corrected estimators of 𝐺 are defined by 𝐺 . (𝑙) = 𝐺 -𝑞 𝐺 ; 𝐺 ‾ . ; 𝛾 ; 𝛾 ‾ , and they are used to identify, in Step 8, the optimum function 𝑞 . Finally, the suggested bias corrected estimator that we compute in finite populations is given by 𝐺 . = 𝐺 -𝑞 𝐺 ; 𝐺 ‾ . ; 𝛾 ; 𝛾 ‾ . (

) 20

For finite populations, additional bias correction procedures can also be computed. For instance, we also calculated, in Section 5, Campbell's (1980) jackknife (see Berger and Skinner, 2005, and Berger, 2008) and the multiplicative bootstrap method (see Hall and Maiti, 2006 ), but we omitted them because Campbell's jackknife is more biased than ( 18 ) and ( 20 ), and additive and multiplicative methods give similar results.

5

Simulation studies to analyse the bias correction procedures

Description

We now evaluate various bias correction procedures by means of the RB measure defined in Section 3, and using the probabilistic distributions and sample sizes also described in that section. As discussed in Section 4, bias correction procedures substantially mitigate the detected biases, but the price to pay is a possible increase in the MSE. For this reason, we use the relative root mean square error (𝑅𝑅𝑀𝑆𝐸) to investigate the effect of bias corrections on efficiency. For a given statistic 𝜃 , the corresponding 𝑅𝑅𝑀𝑆𝐸 based on 𝑅 replications is defined as

𝑅𝑅𝑀𝑆𝐸 = 100 × 𝑀𝑆𝐸 𝜃 / 𝜃 ,

where the empirical mean square error is given by 𝑀𝑆𝐸 𝜃 = 𝑅 ∑ 𝜃 ( ) -𝜃 . Sections 6 and 8 give more detailed discussions on the importance of both bias and MSE measures. In Section 3, we observed that 𝐺 yields extremely large values of 𝑅𝐵 when 𝐺 is small, and 𝐺 is slightly more biased than 𝐺 . For the sake of clarity, 𝐺 and 𝐺 and the corresponding bias corrected estimators are omitted from the figures in this section, but they can also be computed, as discussed in Section 4. Similarly, for finite populations, weighted estimators of 𝐺 and 𝐺 are omitted. For infinite populations, the percentage of the number of times that each eligible bias function is selected as the optimum function of the suggested algorithm described in Section 4 can be seen in the supplementary material (Table A3 ). For the various probabilistic distributions, the bias function defined in Equation ( 16 ) is the most often selected as the optimum function.

𝐵 = 1000 bootstrap samples are used in bootstrap methods. Following Pfeffermann and Correa (2012) , we consider 𝐻 = 200 plausible parameters, of which 𝑇 = 60 and 𝑉 = 140 are used for the training and validation groups, respectively. Samples are selected from finite populations with size 𝑁 = 10000, which in turn are drawn from the investigated continuous distributions. We consider unequal inclusion probabilities by using the randomized systematic sampling design (Wu and Thompson, 2020) . The effect of the design is increased by generating inclusion probabilities 𝜋 with a correlation of 0.7 between 𝜋 and 𝑦 (Berger and Gedik-Balay, 2020) . Pareto RB 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 -24 -18 -12 -6 0 Dagum-p20 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 -15 -10 -5 0 Dagum-p0.5 Expected values of estimators RB 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 -10 -7 -4 -1 2 Lognormal Expected values of estimators 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 -4 -2 0 2 G n c G n c.Jo G n based on samples with size 𝑛 = 50, and randomly selected from various continuous probabilistic distributions (infinite populations).

A simulation-based criterion for deciding when to use bias correction

Results from Section 3 indicate that the three common empirical versions of 𝐺 can be biased for heavy-tailed distributions, which may be a serious issue when the Gini index is large. An important problem that arises in practice is determining when to use bias correction procedures. Note that the bias and the MSE (or equivalently the RB and the are two relevant measures to evaluate the quality of estimators. However, as noted by Särndal et al. (2003, p. 164) , the bias must also be small relative to the standard error, since failure to meet this requirement

Pareto RRMSE 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 20 22 24 26 28 30 32 Dagum-p20 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 16 18 20 22 24 26 Dagum-p0.5 Expected values of estimators RRMSE 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 12 13 14 15 16 Lognormal Expected values of estimators 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 9.5 10.0 11.0 12.0

G n c G n c.Jo G n c.Bp

may result in invalid confidence intervals and/or undesirable coverage probabilities. This ratio between the bias and the standard error is popularly referred to as the bias ratio. For a given statistic 𝜃 and 𝑅 replications, the empirical bias ratio (BR) is defined as

𝐵𝑅 = 𝐵 𝜃 𝑉 𝜃 / ,

where the empirical variance is given by 𝑉 𝜃 = 𝑅 ∑ 𝜃 ( ) -𝜃 ̅ . Like the MSE, the BR also involves both bias and variance of the estimator. Särndal et al. (2003, p. 41) advise empirical researchers to avoid estimators that are considerably biased, and instead seek out estimators with small biases, and then choose one with a small variance. Following this idea, the suggested criterion consists of analysing both RB and BR measures for the estimator 𝐺 , and bias corrections are suggested when non-negligible biases are observed. RRMSE values can be used to choose the most efficient bias correction estimator. Särndal et al. (2003, p. 165) indicate that the effect of the bias ratio on the coverage probability can be ignored when |𝐵𝑅| < 0.1, and the use of bias corrected estimators is not justified here. The effect on the coverage probability is not extremely pronounced when |𝐵𝑅| ≤ 0.5, but it can be a serious problem otherwise. On the other hand, absolute values of RB lower than 2% can be considered negligible, and bias correction procedures are not recommended if this is the case. In summary, bias corrections are suggested when the estimator 𝐺 satisfies |𝐵𝑅| ≥ 0.1 and |𝑅𝐵| ≥ 2%.

The aim of Figure 7 is to show that the customary estimator 𝐺 can yield poor bias ratios, with bias corrections justified because they substantially minimize this problem. For infinite populations and samples with size 𝑛 = 50, 𝐺 yields absolute values of 𝐵𝑅 close to 1.4, and poor coverage probabilities are expected. Furthermore, the vertical lines in Figure 7 indicate the first expected value (𝐺 ̅ . ) with non-negligible biases, i.e., with absolute values of 𝑅𝐵 larger than 2%. We see that the condition imposed by the bias ratio (|𝐵𝑅| < 0.1) is more demanding than the condition based on the relative bias (|𝑅𝐵| < 2%), i.e., the first value of 𝐺 ̅ . with a |𝐵𝑅| ≥ 0.1 is smaller than the first value of 𝐺 ̅ . with a |𝑅𝐵| ≥ 2%. For example, non-negligible biases are observed for the Pareto distribution when 𝐺 ̅ . ≅ 0.2, and the absolute value of 𝐵𝑅 is larger than 0.1 in this situation. These results reveal the presence of a mild bias problem, which can be solved using bias correction procedures, as can be seen in Figures 4 and 7 . From Figure 7 we also observe that the BR values of the corrected estimators, in absolute terms, are generally smaller than 0.5, and are substantially smaller than those of 𝐺 . The desirable properties in terms of both 𝐵𝑅 and 𝑅𝐵 measures and the negligible impact on the efficiency (see the Pareto distribution in Figure 6 when 𝐺 ̅ . = 2) indicate that correction procedures are recommended to mitigate the detected biases. based on samples with size 𝑛 = 50, and randomly selected from various continuous probabilistic distributions (infinite populations). Using the estimator 𝐺 , horizontal and vertical dotted lines are fixed, respectively, at |𝐵𝑅| = 0.1 and at the first expected value with |𝑅𝐵| > 2%.

In Figure 8 we suggest a simulation-based criterion for deciding when to use bias correction procedures. This method is based on the expected values 𝐺 ̅ . and 𝛾̅ , since the Gini index and the skewness have a direct effect on the bias. Samples, with sizes between 50 and 1000, are drawn from the most skewed probabilistic distributions described in Section 3.

Using the estimator 𝐺 , this criterion is based on conditions: |𝐵𝑅| ≥ 0.1 and |𝑅𝐵| ≥ 2%. A grading scale classifies the non-negligible biases into three categories: mild (2 ≤ |𝑅𝐵| < 5), moderate (5 ≤ |𝑅𝐵| < 10) and severe (|𝑅𝐵| ≥ 10). This scale can be used to identify the scenarios where bias corrections are Pareto |BR| 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 Dagum-p20 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Dagum-p0.5 Expected values of estimators |BR| 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.0 0.2 0.4 0.6 0.8 Lognormal Expected values of estimators 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.0 0.1 0.2 0.3 0.4 0.5 G n c G n c.Jo G n c.Bp

either weakly or strongly recommended. Thus, while bias is not a serious issue for mild biases, the use of bias correction procedures is suggested to reduce this bias. Bias corrections are highly recommended in the case of moderate biases. The bias is a serious problem in the presence of severe biases, meaning bias corrections are strongly advised. 0.2 0.3 0.4 0.5 0.6 0.7 0.8 3.0 3.5 4.0 n=50 Expected skewness 0.2 0.3 0.4 0.5 0.6 0.7 0.8 6.5 7.5 8.5 n=200 0.2 0.3 0.4 0.5 0.6 0.7 0.8 10 11 12 13 n=500 Expected values of G n c Expected skewness 0.2 0.3 0.4 0.5 0.6 0.7 0.8 15 16 17 18 19 n=1000 Expected values of G n c 0.2 0.3 0.4 0.5 0.6 0.7 0.8

Mild

Moderate Severe are expected when 𝐺 ‾ . ≥ 0.39, and moderate biases are obtained when 𝐺 ‾ . ≥ 0.47. For samples with size 𝑛 = 500, mild and moderate biases are expected when 𝐺 ‾ . ≥ 0.48 and 𝐺 ‾ . ≥ 0.56, respectively. Finally, for 𝑛 = 1000, bias corrections could be applied when 𝐺 ‾ . ≥ 0.49, and moderate biases are observed when 𝐺 ‾ . ≥ 0.57. Severe biases are not expected when 𝑛 ≥ 1000 and 𝐺 ≤ 0.8 (the maximum true Gini index considered in this study). In summary, for small samples sizes (e.g., 𝑛 = 50), bias correction procedures may be required when the estimates of the Gini index are greater than 0.2, and can be highly recommended when they exceed 0.37. For larger sample sizes (e.g., 𝑛 = 500), mild biases can be expected when the estimates of the Gini index are greater than 0.48, and bias corrections are highly advisable when the estimates of the Gini index are larger than 0.56. For different sample sizes and Gini indices associated with specific data that empirical researchers are analysing, the aim of Figure 8 is to depict the values of the coefficient of skewness that would require the use of bias corrections.

Applications to real data sets

In this section, bias correction procedures are applied for estimating the Gini index in a total of six subpopulations with sizes between 26 and 503, and derived from three real data sets (see Table 1 ). A common goal in most surveys is not only to provide estimates for the whole population, but also for specific subpopulations (also named domains). For instance, estimates of unemployment in labour-force surveys are provided at national level, but this information is also of special interest at provincial and local levels. In household surveys, subpopulations are usually created on the basis of household sizes or consumption units. Age, sex and occupational groups are also often used to create subpopulations in many studies.

The first real data set consists of total net household incomes extracted from the 2019 Spanish Survey on Income and Living Conditions (ES-SILC). Subpopulations, with sizes 𝑛 = {26,51}, are created using different consumption units, with the aim of using the Gini index to estimate income inequality. The second real data set is obtained from the World Bank's Enterprise Survey (WBES), which has been used extensively in international management studies (Vendrell-Herrero et al., 2022; Gomes et al., 2018; etc.) . For private sector firms from over 130 developed and developing countries, the WBES contains information on a broad range of topics including competition performance, corruption, financial data, infrastructure, technology, etc. Using this survey, we estimate the Gini index of the labour productivity per hour worked in Argentinean firms for the years 2017 and 2018. The sizes of the resulting subpopulations are 𝑛 = {61,503}.

Finally, the third real data set (named WATER) consists of a survey on shower habits conducted in Andalusia, a region in southern Spain facing water scarcity. The interest is to analyse the inequality in time spent showering, creating subpopulations, with sizes 𝑛 = {38,74}, using the number of inhabitants at provincial level.

The bias corrected estimator 𝐺 .

is based on a continuous probabilistic distribution, and for this reason the Kolmogorov-Smirnov (KS) Goodness of Fit test is used to fit distributions to the various subpopulations used in this study. From Table 1 we observe that the Lognormal, Fisk, Dagum and Weibull distributions yield KS p-values above the usual significance level (5%), and the null hypothesis that data come from the corresponding continuous probabilistic distribution is not rejected. For the various subpopulations, the simulation-based criterion described in Section 6 indicates that the use of a bias correction procedure is recommended, since nonnegligible biases and BRs greater than 0.1 are expected according to Figure 8 . In particular, the estimator 𝐺 is expected to underestimate the true Gini index, with higher estimates expected from the bias corrected estimators. is based probabilistic distributions that fit the data.

For the subpopulation with size 𝑛 = 51 derived from ES-SILC, we observe that estimates of the coefficient of skewness and the Gini index are, respectively, 𝛾 = 3.51 and 𝐺 = 0.476. These results indicate that moderate biases are expected according to Figure 8 . As we expected, bias corrected estimators provide higher estimates than 𝐺 , with values as much as 4.8% larger than 𝐺 = 0.476 (see the estimation of 𝐺 . based on the Dagum distribution). For 𝑛 = 503 in the WBES population, the estimates 𝛾 = 15.85 and 𝐺 = 0.734 indicate the presence of serious biases, and the difference with respect to the estimator 𝐺 goes from 2.5% (𝐺 . = 0.752) to 6.8% (𝐺 . = 0.784). For the various subpopulations in this study, we observe that estimates derived from the bias correction procedures are larger than estimates based on 𝐺 , a result which coincides with the findings of Sections 3 and 5.

Discussion

The Gini index is a very popular indicator to measure inequality that has been used in many economic studies. For discrete distributions, the Gini index is usually estimated using a plug-in formulation of a given theoretical definition of the Gini index for continuous distributions. This methodology may introduce a serious bias in comparison to the true (asymptotic) value of the Gini index. Note that the Gini index can also be estimated using techniques such as empirical likelihood (Owen, 2001) , but there is no simple application of this method to complex sampling designs. The analysis of alternative estimation methodologies is beyond the scope of this paper, i.e., we assume the classical formulations derived from theoretical definitions of the Gini index.

First, this paper attempts to provide a better overview of the problem of estimating the Gini index by regrouping and classifying the most common empirical versions proposed for discrete distributions, and defined under the two existing statistical theories (infinite and finite populations). Second, this paper identifies the scenarios where the bias may be a serious issue, and such scenarios are based on common continuous distributions often used in the modelling of income distributions. For instance, 𝐺 (denoted as 𝐺 in finite populations) yields large biases when the Gini index and the sample size are small, but this bias problem can be easily solved by using the midpoint distribution function in the definition of 𝐺 . When all the sample observations are different, another solution is to use one of the transformations described in Equations ( 8 ) and ( 9 ). In addition, results derived from this study indicate that the various empirical versions of 𝐺 produce serious biases in the presence of heavy-tailed distributions and large Gini indices. Accordingly, bias correction procedures are suggested to mitigate this bias problem, and they are investigated using Monte Carlo simulation studies. We also describe a simulation-based criterion for deciding when to use bias corrections. Finally, bias corrected procedures are illustrated by application to the problem of estimating the Gini index in various real data sets.

The empirical bootstrap obtains less biased estimates than alternative bias correction procedures. With infinite populations, the traditional jackknife performs well in terms of relative bias. For finite populations, the rescaled bootstrap may reduce the bias of the existing empirical versions of 𝐺. It is important to note that the empirical bootstrap is a parametric procedure that requires generating sets of data from the probabilistic distribution fitted to the original sample. However, the use of continuous distributions in the modelling of income distributions is a common practice in many real-world applications, and the empirical bootstrap can thus be implemented if this is the case. In addition, it should be noted that for the sake of simplicity the empirical bootstrap bias correction is based only on standard regression functions, but alternative bias functions can also be used, and they may potentially improve the performance of this method. Finally, the empirical bootstrap is more computationally intensive than alternative procedures, but this is not a problem with current computing facilities.

The outcome of the grading scale described in Section 6 can help empirical researchers decide whether the specific data they are analysing have non-negligible biases and large bias ratios, meaning the use of bias corrections would therefore be recommended. For heavy-tailed distributions, non-negligible biases may appear in small samples (e.g., 𝑛 = 50) from low estimates of the Gini index (e.g., 𝐺 ≥ 0.2). For samples with sizes 𝑛 = 200 and 𝑛 = 1000, nonnegligible biases can be expected for estimates of the Gini index greater than 0.4 and 0.5, respectively. Severe biases are not expected when the sample size is larger than 1000. Figure 8 gives a more precise understanding of the conditions required in practice to apply a bias correction, which depend on the sample size and estimates of both the coefficient of skewness and the Gini index.

Both bias and MSE measures are important to evaluate the quality of estimators. Numerous authors indicate that the use of bias correction procedures may have an impact on the efficiency of bias corrected estimators. This issue has also been investigated in this paper, with the results indicating that said impact is not relevant, especially as the sample size increases. The empirical bootstrap is more efficient than alternative bias correction procedures, but slightly less efficient than the customary empirical versions of 𝐺, and may even have the smallest MSEs for large Gini indices. Conventional advice in the literature is to avoid estimators that are considerably biased, so empirical researchers should seek estimators with smaller biases, and then choose one with a small variance. Following this idea, the empirical bootstrap can be good choice for estimating the Gini index in the scenarios discussed in Sections 5 and 6. However, alternative bias correction procedures also perform well in terms of bias and efficiency in many situations, and they may be preferable in terms of simplicity.

For less skewed distributions (e.g., Weibull and Gamma), the bias of 𝐺 is not a problem, and the bias of 𝐺 lies within a reasonable range. This implies that bias correction procedures are not required for less skewed distributions. Bias corrections are applied to 𝐺 because it shows the best performance in this study. However, such procedures can easily be applied to any other estimation method in the literature.

The observed biases may have an important impact on the coverage rates of confidence intervals of the Gini index, especially in the case of the large bias ratios obtained by the estimator 𝐺 . This implies that bias corrected estimators are highly recommended for the construction of confidence intervals, since they can be invalid and/or undesirable coverage probabilities can be obtained in the case of moderate or severe biases. Large biases are also observed by the bias corrected estimators in the case of large Gini indices and highly skewed distributions. These arguments represent promising directions for future research. For instance, the interval estimation based on bias corrected estimators can be investigated to analyse when such confidence intervals have desirable empirical coverages. Alternative estimation methodologies can also be used to improve the estimation of the Gini index. In particular, it would be interesting to reduce the biases that still remain in the aforementioned extreme situations (highly skewed distributions with large Gini indices). For instance, information from auxiliary variables can be incorporated at the estimation stage, and more accurate results are expected.

Government of Andalusia and the European Regional Development Fund (project P18-RT-576) and two grants of the University of Granada (Unidad Científica de Excelencia "Desigualdad, Derechos Humanos y Sostenibilidad -DEHUSO" del Plan Propio; and Programa de Ayudas a la revisión de textos científicos de la Facultad de Ciencias Económicas y Empresariales) .

Figure 2 :

Figure 2: Relative biases (𝑅𝐵𝑠) of estimators 𝐺 , 𝐺 and 𝐺 based on samples with size 𝑛 = 50, and randomly selected from various continuous probabilistic distributions (infinite populations).

Figure 3 :

Figure 3: Box plots for 1000 estimates of the Gini index (𝐺) using the estimator 𝐺 and various values of 𝐺. Samples, with size 𝑛 = 50, are randomly selected from the Pareto and Gamma distributions (infinite populations).

Figure 4 :Figure 5 :Figure 6 :

456

Figure 4: Relative biases (𝑅𝐵𝑠) of estimators 𝐺 , 𝐺 . and 𝐺 .based on samples with size 𝑛 = 50, and randomly selected from various continuous probabilistic distributions (infinite populations).

Figure 7 :

Figure 7: Bias Ratios (𝐵𝑅𝑠), in absolute terms, of estimators 𝐺 , 𝐺 . and 𝐺 .

Figure 8 :

Figure 8: Grading scale based on the Relative Biases (RBs) and Bias Ratios (BRs) of 𝐺 when samples, with sizes 𝑛 = {50,200,500,1000}, are randomly selected from the Pareto, Dagum-p20, Dagum-p05 and Lognormal distributions (infinite populations). Non-negligible biases with |𝐵𝑅| ≥ 0.1 are considered as mild ( 2 |𝑅𝐵| < 5), moderate (5 ≤ |𝑅𝐵| < 10) severe (|𝑅𝐵| ≥ 10). The x-and y-axes show the expected values of 𝐺 and 𝛾 (𝐺 . and γ ‾ , respectively).For small sample sizes (𝑛 = 50) and γ ‾ ≥ 2.6, mild biases are expected when 𝐺 ‾ . ≥ 0.2, approximately. Moderate biases are observed when γ ‾ ≥ 3.5 and 𝐺 ‾ . ≥ 0.37, and bias corrections are highly recommended in this situation. For samples with size 𝑛 = 200, mild biases

Table 1 :

Estimates of the Gini index for various subpopulations derived from the ES-SILC, WBES and WATER populations. The null hypothesis that data come from a specific continuous probabilistic distribution is tested using the Kolmogorov-Smirnov (KS) Goodness of Fit test, and the corresponding p-values are provided. The bias corrected estimator 𝐺 .

Population 𝑛	𝛾	𝐺	𝐺	.	𝐺	.	Distribution KS p-value
ES-SILC	26 2.81 0.518 0.534 0.540 Fisk	0.85
						0.526 Lognormal	0.65
	51 3.51 0.476 0.486 0.488 Fisk	0.99
						0.480 Lognormal	0.62
						0.499 Dagum	0.11
WBES	61 3.22 0.505 0.513 0.509 Lognormal	0.38
						0.516 Fisk	0.37
	503 15.85 0.734 0.784 0.752 Fisk	0.57
WATER	38 3.92 0.358 0.368 0.370 Dagum	0.07
						0.359 Weibull	0.06
						0.366 Fisk	0.06
						0.363 Lognormal	0.05
	74 3.09 0.435 0.439 0.442 Fisk	0.10
						0.442 Dagum	0.10

References

Inequality and poverty in Malaysia: Measurement and decomposition, by Sudhir Anand. New York: Oxford University Press, 1983, 371 pp. Price: $27.50 S Anand 10.1002/pam.4050030242 Journal of Policy Analysis and Management J Policy Anal Manage 0276-8739 1520-6688 3 2 1983 Wiley
Pareto and the upper tail of the income distribution in the UK: 1799 to the present A B Atkinson Economica 84 334 2017
Income modeling with the Weibull mixtures S A A Bakar D Pathmanathan Communications in Statistics-Theory and Methods 2020
A note on the asymptotic equivalence of jackknife and linearization variance estimation for the Gini Coefficient Y G Berger Journal of Official Statistics 24 4 2008
Confidence intervals of Gini coefficient under unequal probability sampling Y Berger İ Gedik Balay Journal of Official Statistics 36 2 2020
On estimating quantiles using auxiliary information Y G Berger J F Muñoz Journal of Official Statistics 31 1 2015
A jackknife variance estimator for unequal probability sampling Y G Berger C J Skinner Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67 1 2005
Working from home and income inequality: risks of a ‘new normal’ with COVID-19 Luca Bonacini Giovanni Gallo Sergio Scicchitano 0000-0003-1015-7629 10.1007/s00148-020-00800-7 Journal of Population Economics J Popul Econ 0933-1433 1432-1475 34 1 2021 Springer Science and Business Media LLC
A Method of measuring inequality within a selection process N Bulle Sociological Methods & Research 45 1 2016
A different view of finite population estimation C Campbell Proceedings of the Survey Research Methods Section the Survey Research Methods Section ASA 1980. 1980
Horizontal inequality and data challenges C Canelas R M Gisselquist Social Indicators Research 143 1 2019
Pareto's law of income distribution: Evidence for Germany, the United Kingdom, and the United States F Clementi M Gallegati 10.1007/88-470-0389-X_1 Econophysics of Wealth Distributions. New Economic Windows A Chatterjee S Yarlagadda B K Chakrabarti Milano Springer 2005
Mathematical Methods of Statistics H Cramer 1957 Princeton University Press Seventh Printing, Princeton
Reliable inference for the Gini index R Davidson of Econometrics 150 1 2009
bias of the Gini coefficient: results and implications for empirical research G Deltas Review of Economics and Statistics 85 1 2003
Variance estimation for complex statistics and estimators: Linearization and residual techniques J C Deville Survey Methodology 25 1999
On measuring skewness and kurtosis D Dorić E Nikolić-Dorić V Jevremović J Mališić Quality Quantity 43 3 2009
More efficient bootstrap computations B Efron Journal of the American Statistical Association 55 1990
An introduction to the bootstrap B Tibshirani R 1993 Chapman and Hall New York, London
Small area estimation of the Gini concentration coefficient E Fabrizi C Trivisano Computational Statistics & Data Analysis 99 2016
Calculating a standard error for the Gini coefficient: some further results C Gini E Pizetti D E Giles Memorie di metodologica statistica 1912. 2004 66 Reprinted in Variabilità e mutabilità
The Gini concentration index: a review of the inference literature G M Giorgi C Gigliarano Journal of Economic Surveys 31 4 2017
Testing the selfselection theory in high corruption environments: evidence from African SMEs E Gomes F Vendrell-Herrero K Mellahi D Angwin C M Sousa 2018 International marketing review
On parametric bootstrap methods for small area prediction P Hall T Maiti Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68 2 2006
On Gini's mean difference and Gini's index of concentration G Jasso American Sociological Review 44 5 1979
Linking input inequality and outcome inequality G Sociological Methods & Research 50 3 2021
Bias correction with jackknife, bootstrap, and taylor series J Jiao Y Han IEEE Transactions on Information Theory 66 7 2020
Variance estimation of the Gini index: revisiting a result several times published M Langel Y Tillé Journal of the Royal Statistical Society: Series A (Statistics in Society) 176 2 2013
Decomposing the Gini inequality index: An expanded solution with survey data applied to analyze gender income inequality Larraz Sociological Methods & Research 44 3 2015
Spatial aggregation and resampling expansion of big surveys: An analysis of wage inequality B Larraz J M Pavía M Herrera-Gómez Regional Science Policy & Practice 13 3 2020
Beyond the gender pay gap B Larraz J M Pavía L E Vila Convergencia 81 2019
Methods of measuring the concentration of wealth M O Lorenz Publications of the American Statistical Association 9 70 1905
Statistical Inference for Measures of Inequality With a Cross-National Bootstrap Application Timothy P Moran 10.1177/0049124105283117 Sociological Methods & Research Sociological Methods & Research 0049-1241 1552-8294 34 3 2006 SAGE Publications
Rescaled bootstrap confidence intervals for the population variance in the presence of outliers or spikes in the distribution of a variable of interest P J Moya J F Muñoz E Álvarez-Verdejo F J Blanco-Encomienda Communications in Statistics-Simulation and Computation 2020
On estimating the poverty gap and the poverty severity indices with auxiliary information J F Muñoz E Álvarez-Verdejo R M García-Fernández Sociological Methods & Research 47 3 2018
J F Muñoz P J Moya E Álvarez-Verdejo 10.17605/OSF.IO/4YNBS R codes for estimators of the Gini index 2023
A convenient method of computing the Gini index and its standard error T Ogwang Oxford Bulletin of Economics and Statistics 62 1 2000
Empirical likelihood A B Owen 2001 Chapman and Hall/CRC
Using the Dagum model to explain changes in personal income distribution C G Pérez M P Alaiz Applied Economics 43 28 2011
Empirical bootstrap bias correction and estimation of prediction mean square error in small area estimation D Pfeffermann S Correa Biometrika 99 2 2012
About capital in the twenty-first century T Piketty American Economic Review 105 5 2015
Empirical likelihood confidence intervals for the Gini measure of income inequality Y Qin J N K Rao C Wu Economic Modelling 27 6 2010
The bootstrap method in survey sampling A Quatember Pseudo-Populations Cham Springer 2015
Some recent work on resampling methods for complex surveys J N K Rao C F J Wu K Yue Methodology 18 1992
A convenient descriptive model of income distribution: the gamma density A B Salem T D Mount Econometrica: Journal of the Econometric Society 1974
Model assisted sampling C E Särndal B Swensson J Wretman 2003 Springer Science & Business Media
Poverty, inequality and unemployment: Some conceptual issues in measurement A Sen Economic and Political Weekly 1973
Reducing socioeconomic inequalities in the European Union in the context of the 2030 Agenda for Sustainable Development A Szymańska Sustainability 13 13 7409 2021
The determinants of income inequality in OECD countries Pasquale Tridico 10.1093/cje/bex069 Cambridge Journal of Economics 0309-166X 1464-3545 42 4 2018 Oxford University Press (OUP)
A simple correction to remove the bias of the Gini coefficient due to grouping T Van Ourti P Clarke Review of Economics and Statistics 93 3 2011
Home-market economic development as a moderator of the self-selection and learning-by-exporting effects F Vendrell-Herrero C K Darko E Gomes D W Lehman Journal of International Business Studies 2022
Use of a Gini index to examine housing price heterogeneity: A quantile approach J G Villar J M Raya Journal of Housing Economics 29 2015
Changes in regional inequality in rural China: decomposing the Gini index by income sources G H Wan Australian Journal of Agricultural and Resource Economics 45 3 2001
Comparison of Ferguson's 𝛿 and the Gini coefficient used for measuring the inequality of data related to health quality of life outcomes H Y Wang W Chou Y Shao T W Chien Health and Quality of Life Outcomes 18 2020
Jackknife empirical likelihood confidence interval for the Gini index D Wang Y Zhao D W Gilmore Statistics & Probability Letters 110 2016
Introduction to variance estimation K Wolter 2007 Springer Science & Business Media
Simple single-stage sampling methods C Wu M E Thompson Sampling Theory and Practice 17-31) Cham Springer 2020
Improvements in ability to detect undiagnosed diabetes by using information on family history among adults in the United States Q Yang T Liu R Valdez R Moonesinghe M J Khoury American Journal of Epidemiology 171 10 2010
More than a dozen alternative ways of spelling Gini S Research on Economic Inequality 8 1998

Metadata

Title: Exploring and Correcting the Bias in the Estimation of the Gini Measure of Inequality
Delta ID: DSEID-001-2360116
Authors: Juan F. Muñoz, Pablo J. Moya-Fernández, Encarnación Álvarez-Verdejo
Abstract source: crossref
Source URL: https://digibug.ugr.es/bitstream/10481/85932/5/BiasCorrectionPrePrintC.pdf
Access: open_repository
Licence: cc-by-nc-nd
PDF SHA-256: 34424ff295f543e6b601ab2172d084a5bf95af518d41dedbc319fe152c49d3a4
TEI SHA-256: f36d9d86acc2746e9846b8a2cab40bfc148493070eecc10f500df0b4d5f36f97
GROBID: {"version":"0.8.2","revision":"a91ee48"}

Issues

No public issues have been filed for this DOI.

Submit an issue

Record history

When	Event	Field	Old	New
2026-06-18 19:37:53.011249+00:00	identifier_assigned	DSEID		DSEID-001-2360116
2026-06-18 15:18:58.204395+00:00	pdf_processed	pdf_sha256		34424ff295f543e6b601ab2172d084a5bf95af518d41dedbc319fe152c49d3a4