Understanding the reputation of counterparties is a key factor in business decisions. Investors need to know the possibility of repayment of funds invested in bonds or in the form of loans. Companies must quantify the reputation of suppliers, customers, acquisition candidates, and competitors.

The traditional measure of credit quality is corporate ratings, such as those of Standard & Poor’s, Moody’s, or Fitch. However, such ratings only apply to the largest companies, not to millions of small companies. In order to quantify their creditworthiness, an alternative method is usually used to analyze smaller companies, namely the probability of default (PD) model.

## Calculate PD

Calculating the probability of default requires complex modeling and a large data set of past defaults, as well as a complete set of basic financial variables for a large number of companies. In most cases, companies that choose to use the PD model will obtain licenses from a small number of suppliers. However, some large financial institutions have established their own PD models.

Building a model requires collecting and analyzing data, including the fundamentals of collecting historical data. This information usually comes from financial statements. Once the data is compiled, it is time to form financial ratios or “drivers”-the variables that drive the results. These driving factors are often divided into six categories: leverage ratio, liquidity ratio, profitability ratio, scale index, expense ratio and asset quality ratio. These measures are widely accepted by credit analysis professionals because they are related to estimating credit.

The next step is to determine which companies in the sample are “defaulters”-those companies that have actually defaulted. With this information, the “logistic” regression model can be estimated. Statistical methods are used to test dozens of candidate drivers, and then select those drivers that are most important for interpreting future default values.

The regression model associates default events with various driving factors. The unique feature of this model is that the output range of the model is between 0 and 1, which can be mapped to a range of default probability of 0-100%. The final regression coefficients represent a model that is used to estimate the company’s default probability based on the company’s driving factors.

Finally, you can check the performance indicators of the resulting model. These may be statistical tests that measure the model’s predictive defaults. For example, you can use financial data for a five-year period (2001-2005) to estimate the model. The generated model is then applied to data from different periods (2006-2009) to predict defaults. Since we know which companies defaulted from 2006 to 2009, we can judge how well the model performed.

To understand how the model works, consider a small company with high leverage and low profitability. We just defined three model drivers for this company. Most likely, the model will predict that the company’s default probability is relatively high because it is small and therefore its revenue stream may be unstable. The company has a high leverage ratio and may therefore bear a high interest payment burden on creditors. And the company’s profitability is very low, which means that it generates very little cash to cover expenses (including a heavy debt burden). Overall, the company may find that it cannot repay its debts in the near future. This means it has a high probability of default.

## Art and science

So far, the model building process has been completely mechanical, using statistical data. Now it is necessary to resort to the “art” of the process. Check the drivers selected in the final model (maybe 6 to 10 drivers). Ideally, there should be at least one driver in each of the six categories previously described.

However, the above-mentioned mechanical process may cause the model to require six driving factors, all of which come from the leverage ratio category, but do not represent liquidity, profitability, etc. Bank loan officers who are asked to use such models to assist in loan decisions may complain. The strong intuition of these experts will convince them that other driver categories must also be important. The lack of these drivers may lead many to conclude that the model is inadequate.

The obvious solution is to replace some lever drivers with drivers of missing categories. However, this raises a problem. The original model is designed to provide the highest statistical performance metric. By changing the composition of the driver, from a purely mathematical point of view, the performance of the model is likely to decrease.

Therefore, a trade-off must be made between the inclusion of a wide selection of drivers to maximize the intuitive appeal of the model (art) and the potential decline of the model’s ability based on statistical measurement (scientific).

## Criticism of the PD model

The quality of the model mainly depends on the default quantity available for calibration and the cleanliness of the financial data. In many cases, this is not a trivial requirement because many data sets contain errors or lack of data.

These models only use historical information, and sometimes the input is out of date for as long as a year or more. This will weaken the predictive power of the model, especially if some significant changes have occurred that reduce the relevance of the driver, such as changes in accounting practices or regulations.

Ideally, a model should be created for a specific industry in a specific country. This ensures that the unique economic, legal and accounting factors of the country and industry can be properly captured. The challenge is that there is often a lack of data at the beginning, especially in terms of the number of defaults identified. If scarce data must be further subdivided into country-industry categories, then each country-industry model has even fewer data points.

Since it is a fact of life to lose data when building such models, many techniques have been developed to fill these numbers. However, some of these alternatives may introduce inaccuracies. Scarcity of data also means that the probability of default calculated using a small data sample may be different from the potential actual probability of default in the relevant country or industry. In some cases, the model output can be scaled to more closely match the base default experience.

The modeling techniques described here can also be used to calculate the PD of large companies. However, much more data is available about large companies because they are usually publicly listed, traded stocks and important public disclosure requirements. This data availability makes it possible to create other PD models (called market-based models) that are more powerful than the above models.

## in conclusion

Industry practitioners and regulators are very aware of the importance of the PD model and its main limitation-data scarcity. Accordingly, various efforts have been made around the world (for example, with the support of Basel II) to improve the ability of financial institutions to obtain useful financial data, including accurately identifying defaulting companies. As the size and accuracy of these data sets increase, the quality of the generated models will also increase.

.