On Fitting Probability Distribution to Univariate Grouped Actuarial Data with Both Group Mean and Relative Frequencies | Research School of Finance, Actuarial Studies and Statistics

Khemka, G., Pitt, D., and Zhang, J., 2022, North American Actuarial Journal, 27(1),185-205.

Many publicly available datasets relevant to actuarial work contain data grouped in various ways. For example, operational loss data are often reported in a grouped format that includes group boundaries, loss frequency, and average or total amount of loss for each group. The process of fitting a parametric distribution to grouped data becomes more complex but potentially more accurate when additional information, such as group means, is incorporated in the estimation process. This article compares the relative performance of three methods of inference using distributions suitable for actuarial applications, particularly those that are right-skewed, heavy-tailed, and left-truncated. We compare the traditional maximum likelihood method, which only considers the group limits and frequency of observations in each group, to two research innovations: a modified maximum likelihood method and a modified generalized method of moments approach, both of which incorporate additional group mean information in the estimation process. We perform a simulation study where the proposed methods outperform the traditional maximum likelihood method and the maximum entropy when the true underlying distribution is both known and unknown. Further, we apply the methods to three actuarial datasets: operational loss data, pension fund data, and car insurance claims data. Here we compare the performance of the three methods along with the maximum entropy distribution (under the traditional maximum likelihood and the modified maximum likelihood methods) and find that for all three datasets the proposed methods outperform the traditional maximum likelihood method. We conclude that there is merit in considering the proposed methods while fitting a parametric distribution to grouped data.