74 ───────┘ │ │ │ │ │
80 ─────────────┘ ├───┐ │ │ │
83 ─────────┬───┐ │ │ ├─────┘ │
86 ─────────┘ │ │ │ │ │
12 ─┬───────┐ ├───┘ │ │ │
91 ─┘ ├───┤ │ │ │
72 ───────┬─┘ │ │ │ │
77 ───────┘ │ │ │ │
33 ─────────────┘ │ │ │
2 ─┬───────┐ │ │ │
95 ─┘ │ │ │ │
57 ─┬─┐ ├───┐ │ │ │
105 ─┘ ├───┐ │ │ │ │ │
20 ─┬─┘ ├─┘ │ ├─┘ │
98 ─┘ │ ├─────┐ │ │
26 ───────┘ │ │ │ │
58 ─┬───────┐ │ │ │ │
106 ─┘ ├─┐ │ │ │ │
53 ─┬───────┘ ├─┘ │ │ │
102 ─┘ │ ├─┤ │
16 ───────────┘ │ │ │
6 ─────┬───┐ │ │ │
50 ─────┘ ├─────┐ │ │ │
11 ─┬─────┐ │ │ │ │ │
90 ─┘ ├─┘ │ │ │ │
24 ───────┘ ├───┘ │ │
14 ─┬───┐ │ │ │
92 ─┘ ├─────┐ │ │ │
9 ─────┘ ├───┘ │ │
10 ─────┬─────┘ │ │
78 ─────┘ │ │
52 ─────────────────────┘ │
39 ─────────┬───────┐ │
62 ─────────┘ ├───────┐ │
63 ─────────────────┘ ├─────┐ │
44 ─────────────────────────┘ ├─────────────────┘
40 ───────────────────────────────┘
Cluster 1:
The difference is not very significant because we have taken a relatively big cluster so that we can carry out Factor Analysis on the same. If we concentrate on smaller clusters, the demographic details will match more within a cluster. This has been highlighted by taking a smaller cluster within Cluster1.
The cluster formed was pretty big and nothing much can be deduced out of it.
Factor Analysis of this cluster did not give the same result as the factor analysis for the entire data. This is partly because the sample size is not enough.
Cluster Analysis with the cluster variables as the factors should give a better representation of the respondent’s characteristics and demography.
Cluster Analysis: Approach 2
In this approach, we use the 4 Factors as the Clustering variables, the weights assigned to each variable within the factor is the factor loading for that variable.
* * * * * * * * * * * * * * * * * * * H I E R A R C H I C A L C L U S T E R A N A L Y S I S * * * * * * * * * * * * * * * * * * *
Dendrogram using Average Linkage (Between Groups)
Rescaled Distance Cluster Combine
C A S E 0 5 10 15 20 25
Label Num +---------+---------+---------+---------+---------+
87 ─┐
114 ─┤
18 ─┼─┐
79 ─┘ ├───┐
47 ─┬─┘ │
84 ─┘ │
81 ─┬─────┤
111 ─┘ ├─────┐
82 ─┐ │ │
112 ─┼─┐ │ │
49 ─┤ ├───┘ │
94 ─┘ │ │
17 ───┘ │
42 ───┬───┐ ├─┐
45 ───┘ │ │ │
85 ─┐ │ │ │
113 ─┼─┐ │ │ │
21 ─┘ ├─┐ │ │ │
46 ─┬─┤ │ │ │ │
70 ─┘ │ │ ├─────┘ │
15 ─┬─┘ │ │ │
66 ─┘ │ │ │
67 ─┐ │ │ │
75 ─┤ │ │ │
32 ─┼─┐ │ │ │
19 ─┘ │ │ │ │
36 ─┐ │ ├─┘ │
93 ─┤ ├─┤ │
29 ─┤ │ │ │
71 ─┤ │ │ │
28 ─┼─┘ │ │
3 ─┤ │ ├─────┐
96 ─┤ │ │ │
59 ─┤ │ │ │
107 ─┤ │ │ │
5 ─┤ │ │ │
73 ─┘ │ │ │
61 ─┐ │ │ │
109 ─┤ │ │ │
34 ─┼─┐ │ │ │
30 ─┘ │ │ │ │
76 ─┐ │ │ │ │
110 ─┤ │ │ │ │
31 ─┼─┼─┘ │ │
56 ─┘ │ │ │
27 ─┬─┤ │ │
99 ─┘ │ │ │
65 ─┬─┘ │ │
69 ─┘ │ │
55 ─┐ │ │
104 ─┼─────────────┘ │
4 ─┘ │
7 ─┬─┐ │
88 ─┘ │ │
1 ─┬─┼─┐ │
26 ─┘ │ │ │
24 ─┐ │ │ │
25 ─┤ │ │ │
60 ─┤ │ │ ├─────────────┐
108 ─┼─┘ │ │ │
6 ─┤ ├───┐ │ │
16 ─┘ │ │ │ │
43 ─┬───┤ │ │ │
50 ─┘ │ │ │ │
14 ─┐ │ │ │ │
92 ─┼─┐ │ │ │ │
9 ─┘ ├─┤ │ │ │
53 ─┐ │ │ │ │ │
102 ─┼─┘ │ │ │ │
41 ─┤ │ │ │ │
80 ─┤ │ │ │ │
2 ─┤ │ │ │ │
95 ─┤ │ │ │ │
11 ─┤ │ │ │ │
90 ─┘ │ │ │ │
58 ─┬─┐ │ ├─────┐ │ │
106 ─┘ │ │ │ │ │ │
57 ─┐ ├─┘ │ │ │ │
105 ─┤ │ │ │ │ │
20 ─┼─┘ │ │ │ │
98 ─┤ │ │ │ │
22 ─┤ │ │ │ │
78 ─┘ │ │ │ │
72 ─┬─┐ │ │ │ ├─────────────┐
86 ─┘ │ │ │ │ │ │
83 ───┤ │ │ │ │ │
12 ─┐ ├─┐ │ ├─────┘ │ │
91 ─┼─┤ │ │ │ │ │
77 ─┘ │ ├─┐ │ │ │ │
48 ─┬─┘ │ │ │ │ │ │
68 ─┘ │ ├─┘ │ │ │
52 ─────┘ │ │ │ │
33 ───────┘ │ │ │
54 ─┬─┐ │ │ │
103 ─┘ ├───┐ │ │ │
10 ─┬─┘ ├───────┤ │ │
38 ─┘ │ │ │ │
63 ───────┘ │ │ │
23 ─┬───────┐ │ │ │
51 ─┘ │ │ │ │
8 ─┐ │ │ │ │
89 ─┼─┐ ├─────┘ │ │
74 ─┘ ├─┐ │ │ │
37 ─┐ │ │ │ │ │
101 ─┼─┘ ├───┘ │ │
44 ─┤ │ │ │
64 ─┘ │ │ │
39 ─────┘ │ │
35 ─┐ │ │
100 ─┼─────────────────────────────────┘ │
13 ─┤ │
97 ─┘ │
40 ───────────────────────────────────────┬─────────┘
62 ───────────────────────────────────────┘
Cluster 1
There are some common characteristics in the cluster formed as highlighted in the above tables. This cluster has more common characteristics as the previous one because:
- Cluster Distance (Euclidean) has been reduced to 7.5
- Cluster Analysis is done on the 4 Factors rather than the 11 Variables.
Cluster Analysis: Approach 3
In this approach, we coded the Demographic details into numbers and used them along with the 11 variables as the cluster variables.
* * * * * * * * * * * * * * * * * * * H I E R A R C H I C A L C L U S T E R A N A L Y S I S * * * * * * * * * * * * * * * * * * *
Dendrogram using Average Linkage (Between Groups)
Rescaled Distance Cluster Combine
C A S E 0 5 10 15 20 25
Label Num +---------+---------+---------+---------+---------+
87 ─┐
114 ─┤
85 ─┼─┐
113 ─┤ │
76 ─┤ │
110 ─┘ │
3 ─┬─┼─┐
96 ─┘ │ │
81 ─┬─┘ ├─┐
111 ─┘ │ │
27 ─┬─┐ │ │
99 ─┘ ├─┘ │
61 ─┐ │ │
109 ─┼─┘ │
59 ─┤ │
107 ─┘ ├───────────────────┐
82 ─┬─────┤ │
112 ─┘ │ │
54 ─┬───┐ │ │
103 ─┘ ├─┘ │
60 ─┬─┐ │ │
108 ─┘ ├─┘ │
58 ─┬─┤ │
106 ─┘ │ │
57 ─┐ │ ├─────────────────────┐
105 ─┼─┘ │ │
20 ─┤ │ │
98 ─┤ │ │
2 ─┤ │ │
95 ─┘ │ │
53 ─┬─┐ │ │
102 ─┘ ├───┐ │ │
37 ─┬─┘ │ │ │
101 ─┘ ├───────────────────┘ │
55 ─┬───┐ │ │
104 ─┘ ├─┘ │
35 ─┬─┐ │ │
100 ─┘ ├─┘ │
13 ─┬─┘ │
97 ─┘ │
11 ─┬─────────────────────┐ │
90 ─┘ │ │
14 ─┐ │ │
92 ─┼─┐ │ │
8 ─┤ │ │ │
89 ─┘ ├─┐ │ │
12 ─┬─┤ │ │ │
91 ─┘ │ │ │ │
7 ─┬─┘ ├─────────┐ ├─────────────────────────┘
88 ─┘ │ │ │
49 ─┬───┤ │ │
94 ─┘ │ │ │
36 ─┬───┘ │ │
93 ─┘ │ │
62 ───┐ │ │
63 ───┼─┐ │ │
44 ───┘ ├─┐ │ │
40 ─────┘ │ ├───────┘
32 ─┬─┐ │ │
80 ─┘ │ │ │
77 ─┐ │ │ │
86 ─┼─┤ │ │
33 ─┘ │ │ │
10 ─┐ ├─┐ │ │
78 ─┼─┤ │ │ │
9 ─┘ │ │ │ │
22 ───┤ │ ├───────┘
1 ─┬─┤ │ │
38 ─┘ │ │ │
25 ───┤ │ │
6 ─┐ │ │ │
50 ─┤ │ │ │
16 ─┼─┤ │ │
52 ─┘ │ │ │
24 ───┤ │ │
26 ───┤ │ │
4 ───┘ ├─┘
72 ─┬─┐ │
83 ─┘ │ │
41 ─┐ │ │
48 ─┤ │ │
74 ─┼─┤ │
39 ─┤ │ │
68 ─┘ │ │
64 ───┤ │
43 ─┬─┤ │
70 ─┘ │ │
15 ─┐ │ │
66 ─┼─┤ │
46 ─┘ ├─┘
42 ─┬─┤
45 ─┘ │
73 ───┤
23 ─┬─┤
51 ─┘ │
31 ─┬─┤
79 ─┘ │
28 ─┐ │
29 ─┤ │
71 ─┤ │
21 ─┤ │
30 ─┼─┤
34 ─┤ │
5 ─┤ │
75 ─┤ │
19 ─┘ │
47 ─┬─┤
84 ─┘ │
56 ─┐ │
67 ─┤ │
69 ─┤ │
17 ─┼─┤
65 ─┘ │
18 ───┘
We cut at Euclidean Distance 4.5.
Cluster 2
We clearly see the common characteristics in this approach for Cluster Analysis.
Factor Analysis on this Cluster
Factor 1: 2Ps of Marketing
F – Discounts & Promotional Offers (0.891)
G – Membership Cards (0.444)
H – Price Advantage (0.707)
Factor 2: Cashing on the Impulsiveness of the buyer by giving them wide range of products and a good overall experience.
C – Product Offering (Variety & Range) (0.758)
L – Experience during the visit (0.684)
M – Impulsive Buyer (0.969)
Factor 3: Products
D – Trust on Product Quality (0.748)
E – Brand Comparison between products (0.665)
J – Difference between Budgeted and Actual Spending (0.757)
The result from factor analysis on this cluster is quite similar to that for the entire dataset.
Assignment on Heteroscedasticity
Objective:
To check if there is any Heteroscedasticity in the explanatory variables of the chosen data set. If it exists then provide a remedy to remove the econometric problem.
Data Set:
Cross Sectional Data from the Forbes Database has been taken for Data Analysis. The annual data of the following variables is of the year 2006 (50 data points)
The regression is run on the 45 data points and 5 data points are left for the purpose of estimation.
1.Annual Profits of a company
2.Annual Revenues of a company
Apriori Reasoning:
Profits of a company depends on the revenues. The main determinant factor of the profits of a company must depend on the business the company makes ie the sales/revenues of the company as it is the main source of cash inflow. There can be other determinants of profits too but sales can be considered a single most relevant driving force behind the company’s profits.
This reasoning is applied to the Forbes50 companies of the year 2006. Every year Forbes comes out with its list of the 50 most profitable companies in terms of size of the profits.
Analysis:
Output of the regression run on 45 data points
Profits = -3203.4 + 0.104*(Revenues)
Profits and Revenues in Millions of US$
Significance of t-statistic of variable Revenue = 0.00%
Examination of the data for Heteroscedasticity
1.Scatter Plot
Two graphs have been plotted
1.With Ui2(Square of residual term) and Xi(Revenues)
2.With Ui2(Square of residual term) and Xi2(Square of the revenues)
Both the plots below generate inconclusive results relating to Heteroscedasticity.
2.Parks Test is performed
Here LnUi2(Error Variance) is taken as the dependent variable and Revenues(Xi) as the independent variable and regression is run on the dataset thus obtained.
Since the Significance of the t-statsistic of the independent variable is SIGNIFICANT,it suggests the presence of Heteroscedasticity in the dataset.
Remedial Measure Employed
1.Based on the analysis made above to remove Heteroscedasticity, since the Error Variance is proportional to Xi(Revenues) data transformation would be done as follows.
Profits= Pi,
Revenues = Ri
Pi/Sqrt(Ri) = B1/sqrt(Ri) + B2 Sqrt(Ri) + Ui/Sqrt(Ri)
Regression output on the transformed dataset.
Yi = -16.281 + 0.124*(Xi)
Analysis of the Estimated Equation
The Estimated Equation when checked for out for sample data the error variance came down from 7% to 3.4%
(Refer to the excel sheet for calculations)
Assignement on Mulicollinearity:
Objective:
To check if there is any Multicollinearity in the explanatory variables of the chosen data set. If it exists then provide a remedy to remove the econometric problem.
Data Set:
Time Series Data from the American Federal Database has been taken for Data Analysis. The monthly data of the following variables runs from 2004 January to 2009 June ie (12*5 + 6 = 66 data points)
The regression is run on the 62 data points and 4 data points are left for the purpose of estimation.
1.Average Hourly Earnings: Manufacturing Sector (earnings)
2.Average Weekly Hours: Manufacturing Sector(hours)
3.All Employees: Durable Goods Manufacturing(empindur)
4.All Employees: Non Durable Goods Manufacturing(empinnondur)
Apriori Reasoning:
For a time period, Hourly labor earnings in a sector would be driven by the Number of Labors Employed in that sector and the Labor Hours put by them. It is assumed that if more labor hours are being put then the hourly earning would go up. Similarly if the number of people employed goes up for a given industry conditions(industry fortunes remains stable) then the hourly earnings would come down else vice versa.
This reasoning is applied to the Manufacturing Sector in USA. The manufacturing sector is broken down into Durable goods and Non Durable goods producing sector.
SPSS output:
3.)Examination of the correlation matrix
Variable ‘empindur’ shows high correlation with ‘hours’ ie 81.5% and also with ‘empinnondur’ ie 89.5%
Examination of the data for multicollinearity
1.The correlation matrix output suggests that the ‘empindur’ could be dropped from analysis as it could cause multicollinearity.
2.Auxillary Regression and F-test is performed:
The regression is performed with one of the 3 Independent variables as dependent.
The output suggests that F-test value of the regression when variable ‘empindur’ is taken as dependent variable is highest. This output suggests that the variable employees in durable goods producing sector must be causing multicollinearity in the system and hence must be dropped.
Detection of Mulitcollinearity
Auxiliary Regressors
Case (a) dep var = hours
Ind var = employees in non durable goods, employees in durables goods
Case (b) dep var = employees in non durable goods
Ind var = hours, employees in durables goods
Case (c) dep var = employees in durable goods
Ind var = hours, employees in non durables goods
Analysis
Remedial Measure Employed
1.Based on the analysis made above the variable ‘empindur’ is dropped and regression is run on the data set remaining. The new estimated equation is checked against the original set for the out-of-sample data.
Analysis of the Estimated Equation
Hourly Earnings = -13624.18 +382.754*(hours) +1.313*(empinnondur)
1.)The t-statistic significance level of both the explanatory variables is 0%
The standardized Beta coefficients reveal that 45.5% of the variance in the dependent variable is explained by the variable ‘hours’ and 64.9% by the variable ‘empinnondur’
The estimated equation does not exhibit Multicollinearity.
2.)For the out-of-sample data when the estimated equation is used, the error in the data is 13% against 11% when the original regression equation is used.
Assignment on Autocorrelation:
Objective:
To check if there is any Autocorrelation in the disturbance terms of the chosen data set. If it exists then provide a remedy to remove the econometric problem.
Examination of the data for Autocorrelation
1.The graphical method is suggests the presence of Positive Autocorrelation.
2.Durbin Watson Test is run.
K=3, n=66
dL=1.503 (From Table)
dU=1.696 (From Table)
Durbin-Watson=.268
DW (.268) < dL (1.503). This implies positive serial correlation exists in the chosen data set.
Remedial Measure Employed
1.Row Transformation on the existing data set to remove autocorrelation.
The (1-d/2) is used as ‘Row’ when transforming the dataset and regression is run.
Transformed Equation:
Hourly Earnings = 7.749 -0.007*(hours) -0.004*(empinnondur)
Analysis of the Estimated Equation
For the out-of-sample data when the estimated equation is used, the error in the data is 2.62% against 11% when the original regression equation is used.
2.First Difference Method(One degree of freedom is lost)
Here (X(t)- X(t-1)) is used instead of X(t).Similarly for the other data points.
The regression is run on the new dataset thus obtained.
The equation obtained:
Hourly earnings = 0.017-0.043*(hours)+0.001*(empindur)-0.002*(empinnondur)
Analysis of the Estimated Equation
For the out-of-sample data when the estimated equation is used, the error in the data is 6.12% against 11% when the original regression equation is used.(Refer to the excel sheet for the calculations)
Assignment on Independent Dummy Variable ( Cross Section Data)
Objective:
There are cases when some factors need to be introduced in a regression model which are qualitative in nature and are measured in nominal scale.. This excersise has been to facillitate the undesranding of the effect of introduction of Dummy Variables. This exercises cheifly deals with introduction of independent Dummy Variable.
There are two kinds of Independent Dummy Variable
- Intercept Dummy
- Slope Dummy
Assumptions
In regression analysis dependent variabble is influenced by quantative as well as qualitative variables.
Data Set:
The source of the data is the “ Economic Report of the President”, 1997. The data gives the observations of 25 cases. It gives the savings data and income of the respondents. Data points from1-12 are savings and income data of females respondents(cases) and data points from13-26 are for males respondents savings and income stats.
A Priori reasoning is that gender does play a role in the savings and income pattern of the respondents of American residents.
To test the a priori reasoning and to find wether the such a case hypothesis is significant or not. We ran a cross section data dummy variable exercise.
Steps Involved in the process and subsequent Analysis
- A simple intercept dummy regression model was used. The output of the regression has been shown below
The step function or Anova model shows that the dummy is significant, as shown in the below table.
STEP 2 In this step we introduce one dummy variable for intercept.
Both the slope dummy and the intercept dummy are found to be significant, as shown in the previous table. This postulates that savings vary both because of gender, ie according to gender, the intercept of the line is different, and also the savings (Y variable) also varies in slope due income levels
Step 3
To find out wether the rate of change differs with respect to gender we carry out the third step. The Results of the third step are as shown below.
Final part
As we can clearly postulate from the findings, the conclusions are same as in Step 2. Both gender and income levels affect the savings. So both slope and intercept dummy are significant and affect the dependent variable.
Assignment on Dummy Variable (Time series data and structural stability/instability)
Objective:
Long Term data (time series data) analysis is attempted primarily to
- Understand the pattern
- Forecasting, for planning.
However long run analysis is influenced by several factors such as policy change, change in administration set up, socio-economic & political factors. The Trend or pattern of the data and forecasts will not properly be represented unless the above factors are taken into account. To check wether there has been any structural change in the two equations across a time period, this exercise is done.
DATA SET:
The data has been taken from India Stat, 2007. The data gives the components of GDP in India ( Rs Crore). It gives the contribution of Electricity, Gas and Supply as a component of GDP, as well as Finance, Insurance, and Real estate.
A Priori Reasoning
The a priori reasoning is that since India got economic liberalization in the year 1991, and there were large scale policy changes by the government, we can expect to find a structural change in the equation which has Finance, Insurance, Real Estate as the dependent variable and Electricity, Gas and Supply as the independent variable. Also another a priori reasoning is that if start to consume less of electricity, gas supply, they are reducing their marginal propensity to consume, they are saving more and investing lesser, thus the Finance, Insurance and Real estate sector will be effected.
Steps Involved and Subsequent Analysis
1) The data points from 1980-81 to 1989-90 are regressed
- The Data points from 1990-1991 to 2000-2001 are regressed
- Pooled data set (from 1980-2001) are regressed.
- Introduction of dummy variable. The coding is done as follows ( 1980-1989-90 is coded as 0 and 1990-91 to 2000-01)
Analysis
- Structural change in the equations has been found. The beta coefficients of the first regression equation, the beta coefficients of the second regression equation do not add up to the beta coefficients of the pooled data equation.
-
After introduction of the dummy variable, we find that ( alpha2, α2 is significant) as the results have been shown below, which indicates that intercepts are different.
-
Again ( β 2 ) is also significant, as shown in the final table of this Exercise analysis, indicating tht slopes are statistically different.
Step 1, Results from SPSS
STEP 2 Results from SPSS
Part 3 ( Pooled Data) Step 3 results from SPSS
Results after Introducing Dummy Variable
Assignment on LPM, Discriminant and Logit Analysis
LPM
Se = (1.341)(0.238)(0.229)(0.225)
t = (-0.326)(0.719)(0.318)(-0.165)
Discriminant Analysis
Analysis 1: Summary of Canonical Discriminant Functions
Logit Model:
Logistic Regression
Block 0: Beginning Block
Block 1: Method = Enter
Block 2: Method = Enter
Block 3: Method = Enter
The logit model too is able to correctly identify 56% of the cases of students who have taken Finance as majors in second year, based on the inputs - CQPI, Gender and Engineering background. However, here as shown by Wald statistic, only engineering background is significant, that too in inversely proportional manner. The Exp(B) value of Engg suggests that for unit increase in the predictor (Engg) value the predicted change in odds is 0.855, i.e. less than 1 meaning that the value decreases as predictor value increases.
Assignment on Eview - Panel Data
Objective:
To perform an analysis on panel data using the software Eview and also seeing the fixed and random effects.
Data Set:
Panel Data taken from Prowess (CMIE). The data taken is for five IT companies (Infosys, TCS, HCL, Satyam, Wipro) for five years (from 2004 to 2008). Therefore we have 25 observations.
The regression is run on the 25 data points.
1. Dependent variable - Profits
2. Independent variable 1 – Sales
3. Independent variable 2 – Personnel cost ( Compensation given to employees which indirectly gives us the idea of number of employees in a company)
4. Independent variable 3 – Assets
Apriori Reasoning:
According to us Profits of a company will not depend on the sales. As we have other large number of expenses which does not depend on sales so profit will not depend on sales and the coefficient of sales should be insignificant.
Profit will be dependent on personnel cost, that is, the compensation paid to employees which indirectly gives us an idea of number of employees. In IT companies we see that the size of the company is mainly decided by the workforce it has and not by the assets. So profit will be dependent on the personnel cost.
Profit will not be dependent on assets. Generally we see that profits are dependendt on assets but for IT companies we know that they don’t have huge physical assets. Their profits should not be so much be dependent on the total assets on their balance sheets. As it the human resource which is the main asset for the comapny, so the coefficient of assets should be insignificant.
Data set used for the exercise
Results and Analysis
- Pooled estimation
Analysis:
PROFIT= 15.03+ 0.09*SALES+ 0.34*PERSONNEL_COST- 0.02*ASSETS
Sig: (.29) (.03) (.63)
Here we see that adjusted R square is 93% which is sufficiently high. The coefficient for personnel cost is significant and for the rest of the two independent variables it is not constant. This is in accordance to our apriori reasoning. Thus we can say that profit in IT companies is mainly derived from personnel cost and is not dependent on assets and sales.
- Fixed effect: using the function @expand
Analysis:
PROFIT=.17*SALES+0.26*PERSONNEL_COST-0.06*ASSETS +152.7*HCL +498.7*INFO-133.35*SAT –74.1*TCS – 235.3*WIPRO
Sig: (.12) (.17) (.19) (.29) (.005) (.39) (.69) (.30)
Here we see that adjusted R square is 97% which is sufficiently high. None of the coefficients for the three independent variables are constant. Only coefficient for Infosys is significant. Here we have fixed the companies and analyzed the result across the five years. Here companies are like dummy variables. Only result for infosys is coming to be significant. One could say that the the four companies for which the data is not coming out to be significant have the same intercept. Other reason for the coefficient to be insignificant could be that the data taken is not sufficient.
- Fixed effect: using the function @expand
Analysis:
PROFIT=.17*SALES+0.26*PERSONNEL_COST-0.06*ASSETS +152.7*2004 +498.7*2005-133.35*2006 –74.1*2007– 235.3*2008
Sig: (.87) (.017) (.65) (.71) (.67) (.79) (.91) (.26)
Here we see that adjusted R square is 93% which is sufficiently high. Only the coefficient for personnel cost is significant which is in tandem to our apriori reasoning. Here we have fixed the time(year) and analyzed the result across the five companies. None of the coefficients for the years have come out to be significant. Years here are like dummy variables. This could either mean that different years have no effect ,that is, different years will have the same intercept or other reason could be that the data taken is not sufficient.
- Fixed effect: using the normalized function
Analysis:
PROFIT= -90.83+ 0.06*SALES+ 0.40*PERSONNEL_COST- 0.004*ASSETS
Sig: (.64) (.07) (.93)
Here we see that adjusted R square is 97% which is sufficiently high. The coefficient for personnel cost is significant and for the rest of the two independent variables it is not constant. This is in accordance with our apriori reasoning. Thus we can say that profit in IT companies is mainly derived from personnel cost and is not dependent on assets and sales. Here we must be careful in analyzing as the coefficients which we are seeing are normalized coefficients as they contain the effect for fixed effect.
- Random effect: using the normalized function
Analysis:
PROFIT= 42.33+ 0.16*SALES+ 0.27*PERSONNEL_COST- 0.06*ASSETS
Sig: (.12) (.14) (.19)
Here we see that adjusted R square is 96% which is sufficiently high. The coefficients for none of the independent variables are coming to be significant. This is not in accordance with our apriori reasoning. We cannot conclude anything concretely form this. This could be due the normalized effect contained in the coefficients because of random effect. Other reason could be data insufficiency.