Financial information, Limited qualitative information like “age of the firm” and “firm size”
Limitations
- Limited sample size
- Equal sample sizes used for both default & non-default firms which may reduce the error rates as compared to the population data
Literature Review 3
Modelling Credit Risk for SMEs: Evidence from the US Market
Edward I. Altman And Gabriele Sabato
Methodology used
- Quantitative (Financial) information is used in the following way:
- Conditional Logit Model
Data Requirements
Financial information like various financial ratios indicating the leverage, liquidity, profitability, coverage, etc.
Limitations
- No qualitative information used
- Logarithm values used to reduce the variance of financial ratios across companies belonging to different sector which may not be a good idea if the number of sectors is large
Literature Review 4
Expected Default Probability, Credit Spreads and Distant-from-Default
Dr. Heng-Chih Chou, Ming-Chuan University, Taiwan
- The article does not propose a new model but anlayzes the Option Pricing Model (Black-Scholes/Merton) as it applies to valuing a firm
-
It further goes on to show how Distance to Default in conjunction with the OPM can be used to arrive at a credit risk spread. Thus, it shows Credit risk spread as function of the Firm’s distance from default
-
The Distance to Default (DD) method used to arrive at the probability of default is not specific to SMEs and can be applied generally to arrive at relationship between the actual defaults and the DD measure if we have sufficient data available on the defaulting firms
- Limitations: Thus the DD method used in the article has a limitation of extensive default information requirement to arrive at expected default probabilities
Literature Review 5
Estimating Expected Default Probabilities using the Option Pricing Model
Chih-Min Hung,ESQ.
Methodology used
- Merton’s OPM is used to arrive at a default probability for the firms.
- The Debt Value of the firm is assumed to be the book value of the firm being analyzed
- For firms issuing bonds, the maturity is measured using duration
Data Requirements
- Time series data on the debt and Firm values
- Market prices of bonds issued by the debt raising firms
Limitations
Difficult to apply to SMEs due to the way debt valuation is handled for non-bond issuing firms. The assumptions made are suitable for large corporations as compared to the SMEs
Methodology
This section will discuss implementation details of the project. Essentially we will be dealing with three areas
- Choice of Method for coming about predicting default.
- Data Collection and Choice of Sample.
- Choice of Variables.
- Choice of Method
Option Pricing Method discussed in few research papers is difficult to implement in Indian Context as it requires market valuation of debt. and equity. Out of remaning methods two statistical methods widely used for predicting default are
- Multivariate Discriminate Analysis
- Logit Model
\\
We decided to go for Logit Model for variety of reasons
- MDA analysis requires variables to be normally distributed which is difficult to obtain.
- In Logit Model we directly obtain probability of default where in MDA analysis scores need to be put in slabs to predict default event
- Logit Model coefficients can be interpreted directly to predict change in probability of default by changing variables. In MDA coefficients can not be interpreted in terms of change in individual variables.
- For MDA model samples are required in equal proportion i.e. equal no. of cases of both outcomes which is difficult to get in our case.
- Makes non-linear transformation of data decreasing influence of outliers
- Data collection and Choice of Sample
Data Collection: In reality data collection turned out to be very difficult task. We planned to collect data of defaulting SMEs from RBI’s released list of defaulters.
Further, referred Capital Line and CMIE Prowess for Financial information of the SMEs However, these sources were limited for our purpose as they do not provide data with regards to companies that have undergone Credit restructuring or have defaulted on their commitments.
We were unable to obtain RBI’s defaulters list. However we could get list of firm names who had filed for bankruptcy at website of Board for Industrial & Financial Reconstruction ().
Data collection Procedure Involved following steps
After going through entire list available at , we could obtain data of 36 SME companies having defaulted on their loans and whose relevant data was available on either Capitaline or Prowess.
Final Sample Consisted of 30 SMEs defaulting on their loans (6 data points obtained earlier were excluded because of unavailability of entire data), along with 80 SMEs not defaulting on their loans. Non-defaulting SMEs were selected randomly from Prowess. Thus entire sample had 100 data points with 2.66 non-defaulting companies per defaulting company.
Due to less number of data points, sample couldn’t be divided among testing and training samples. We decided to monitor performance on intra-sample testing only.
- Choice of variables
Qualitative Vs. Quantitative
An ideal default prediction system will have both qualitative as well as quantitative variables. However our model is primarily based on quantitative variables to avoid complexity arising due to errors in qualitative variables.
Financial Ratios as Variables
Primarily various financial ratios will form variables(inputs) of our model. We have listed all applicable variables.
We obtained variables from each of following 4 categories
- Net Income Ratios
- Liquid Assets to Total Assets Ratio
- Liquid Assets to Current Debt. Ratio
- Turnover Ratios
Variable obtained are as shown in table below
Following procedures were performed for generating optimum variables.
Factor Analysis
Remaining Financial Ratios under separate categories were subjected to factor analysis to generate factor scores for the use of logit model.
Application of Factor Analysis
We have four variable categories. All variables under one category will be subjected to factor analysis. Ideally we would have liked to select only one variable from each category. However in 3 categories only one variable failed to explain significant portion of original information, in such cases two variables were selected.
- Flowchart of entire Process
Results Obtained
Factor Analysis was performed category-wise for all four categories.
7 factors were obtained from Factor Analysis as follows
(For detailed results refer Appendix C)
(For interpretation of Factor Analysis Results in Stata refer Appendix D)
High uniqueness values were noted for net income to total assets, net income to net worth, working capital to total assets, quick assets to Sales.
7 factors obtained in Factor Analysis were used as inputs to logit model with default flag as dependent variables
Logit Model Output shows that Liquid assets to Debt Ratios is most significant category for default prediction followed by Net Income Ratios.
(For Detailed Logit Model Output Refer Appendix E)
(For Interpretation of Typical Logit Model Output Refer Appendix F)
Model built from these factors was subjected to intra-sample testing with following results. Based on these coefficients probabilities model were predicted and best results are shown in table for threshold of 0.3
Values of Type I and Type II errors for different thresholds
As can be seen from table 0.3 is best suited threshold for minimum errors
Various Combinations of Variables with high uniqueness value were inputted to logit model along with factors obtained in factor analysis however improvements to model was insignificant in all cases
Application of Model
Flowchart for using this model is as follows
Drawbacks of Model
- Model is yet tested only for intra-sample data, its performance on real world data needs to be seen
- For a defaulting firm data taken was one year before the default. It has been seen that companies defaulting on loans exhibit sick characteristic for years hence ideally model should be built on data of ratios of atleast 3 years before default.
- No qualitative variables like state of economy, regulations, type of industry etc considered.
Conclusion
In this project we tried to contribute to the vast literature on default prediction.
Our Model’s performance has shown that inspite of disadvantages associated with not using qualitative variables, default can be predicted at accuracy of 85% which is remarkable.
Liquid Assets to Debt Ratios seem to be most significant category for default prediction followed by Net Income Ratios. Rest of the categories seem to be insignificant towards default prediction
References
- Erkki K. Laitinen, “Data system for assessing probability of failure in SME reorganization”
- Lyudmila Lugovskaya “Predicting default of Russian SMEs on the basis of financial and non-financial variables”
-
Edward I. Altman And Gabriele Sabato Modelling Credit Risk for SMEs: Evidence from the US Market
- Dr. Heng-Chih Chou, Expected Default Probability, Credit Spreads and Distant-from-Default
- Chih-Min Hung, Estimating Expected Default Probabilities using the Option Pricing Model
-
Altman, FINANCIAL RATIOS, DISCRIMINANT ANALYSIS AND THE PREDICTION OF CORPORATE BANKRUPTCY
Appendix A
Logit Function
Logistic function is a sigmoid shaped function used in statistics. Its value ranges from 0 to 1. The logit function is the inverse of the logistic function and its value for a number p between 0 and 1 is given by the formula:
The logistic function of any number α is hence given by the inverse of the logit:
If p is a probability then p/(1 − p) is the corresponding odds, and the logit of the probability is the logarithm of the odds; similarly the difference between the logits of two probabilities is the logarithm of the odds ratio (R), thus providing a shorthand for writing the correct combination of odds ratios only by adding and subtracting:
Logistic regression
Logistic regression (is also known as logistic model) is used for predicting the probability of occurrence of an event (which is usually a categorical variable) by using independent variables. Regression tries to fit data to a logistic function curve. Moreover the independent variables can be either numerical or categorical.
A logistic function is used because for an input of any value from negative infinity to positive infinity, the output is confined to values between 0 and 1. For logistic regression the input to the logistic function is denoted by variable z, which is usually defined as
where β0 is called the intercept and β1, β2, β3, etc are the coefficient of x1, x2, x3 respectively. The value of the regression coefficients marks the size of the contribution of that risk factor. A positive regression coefficient means that the explanatory variable increases the probability of the outcome, while a negative regression coefficient means that the variable decreases the probability of that outcome; a large regression coefficient means that the risk factor strongly influences the probability of that outcome; while a near-zero regression coefficient means that that risk factor has little influence on the probability of that outcome.
Appendix B
Multivariate Discriminant Analysis (MDA)
Multiple discriminant analysis is a generalized technique of . It is an extension of discriminant analysis, with many of the same assumptions and tests. MDA is used to explain a categorical dependent which has more than two categories, using a number of interval or dummy independent variables. It is closely related to regression analysis, principal component analysis and factor analysis trying to find a of which characterize or separate two or more classes of objects or events. Discriminant coefficients are calculated in a similar way that to ANOVA. Coefficients depend on between-groups sum of square and individual-group sum of squares. Coefficients are chosen in such way that difference between groups is maximized.
Discriminant Analysis approaches the problem by assuming that the conditional and are both with mean and covariance parameters being and , respectively. Under this assumption, the optimal solution is to predict points based on the ratio of the log-likelihoods being below some threshold T, so that;
It is often useful to see the conclusion in geometrical terms: the criterion of an input being in a class y is purely a function of projection of multidimensional-space point onto a direction.
Appendix C
Factor Analysis Results
- Net Income Ratios
- Liquid asset to total asset ratios
- Liquid Assets to Current Debt Ratios
- Turnover Ratios
Regression Coefficients to obatain factor scores
Appendix D
Interpretation of typical Factor Analysis Output in Stata
1st step in factor analysis using stata is to obtain raw factors as shown in following block
1) Method used here is Principal Component Factors, other options are Principal Factors, Iterated Principal Factors and Maximum Likelihood Factors
Since here we are worried about correlation among variables in single category its better to go for Principal Component Factors
2) Eigenvalue: In Layman’s words eigen value represents number of variables explained by that particular factor. Sum of all eigen values will be always equal to total number of factors. Usually factors with with eigen values more than 1 are assumed to have compressed data of multiple variables. In this example we can see that Eigen Value for factor 1 is significantly higher at 5.074. Here we will take factor1 and factor 2 for our analysis part and discard rest of the factors.
3) Factor loadings are the correlations between each variable and the factors obtained in factor analysis. Higher the correlation more is the weight of variable in factor. This analysis is done only for 2 factors as only 2 factors had eigen values greater than 1.
4) Uniqueness can be defined as variance of variable that is unique to itself. So when quick assets to sales ratio has uniqueness of 0.63 it means that 0.63 of variable’s variance is not explained by other variables or is not common with other variables. It can also be seen that higher the uniqueness, lower the correlation with factors.
3) Rotation: Factor scores given by this 1st step may not be optimum. Optimum factor scores can be obtained by rotating factors which is our 2nd step
In 2nd step we rotate factors: Popular method to do rotation is Varimax Rotation
1) Loadings table here is regenerated after rotation and it offers clearer picture of correlations among variables and factors. If you compare it with previous factor loading table you can observe changes in some values.
2) Factor rotation matrix shows just relative rotation done in this step.
In 3rd Step factor scores are estimated
- Regression coefficients are obtained using rotated factors
- Using these coefficients individual factor scores are obtained.
Appendix E
Logit Results Obtained
Appendix F
Interpretation of typical Logit Model Output
- Starting from the top maximum likelihood function estimates values of Beta for which maximum likelihood function is maximized.
- Likelihood Ratio Chi2 test indicates that with Prob < 0.0001 that this model is better than no model.
- P value for each variable indicates its statistical significance. Here in this case factor11 and factor31 are statistically significant at 10% significance.
- Coefficients obtained can be interpreted as log odds coefficient
Log(prob that Firm defaults/Prob that Firm doesn’t default)=-1.58*factor11-15.37*factor12+0.74*Factor21+0.054*Factor22-42.15*Factor31-6.83*Factor41-9.11*Factor42-13.38
- Standard Error Indicates +- correction to estimated logit coefficient for that variable.
Appendix G
Stata Do file
Indian Institute of Management, Kozhkode 2010