The mortality of companies
B. Datasets, Definitions, and Survival biases
Arguments addressing the statistics of company lifespans hinge on the interpretation of the meaning of the death event for a company. In this paper, definitions of ‘birth’ and ‘death’ are based on the sales reports available in the Compustat database.
-> firms may ‘die’ through a variety of processes: they may split, merge or liquidate as economic and technological conditions change.
Data on publicly traded companies were obtained from the Compustat North America and Compustat Historical databases, covering the period of 1950–2009 and contain most financial information for North American and overseas American Depositary Receipt firms reported in their income statements and balance sheets, filed to the US Securities and Exchange Commission. A total of 28853 publicly traded companies are included in the database.
-From these, we excluded 2292 that did not report any sales over the 60-year timespan.
-We also noted that 6868 companies were listed (alive) either in 1950 or in 2009, with 160 of those companies reporting sales for the full 60-year span of the dataset.
We define ‘birth’ to occur not at a company’s founding, but rather when it first reports sales in the Compustat database. We take ‘death’ to occur in the year when a company stops reporting sales. For each company, we define lifespan to be the total number of years for which the company reports non-zero sales.
<- There are a number of companies that fail to report for several years between years of activity. Such cases of re-entry are not counted as additional new births or deaths; the additional years are simply added to the total lifespan.
-> This definition is similar to the Bureau of Labor Statistics Business Employment Dynamics measures of entries, which include mergers, takeovers and industrial reclassification
-> This broad definition of death will affect the conclusions we can draw from our data, as an instance of firm death does not necessarily connote failure (Carroll & Delacroix 1982).
** As a metric of mortality that is closely related to lifespan, we use the term half-life, defined as the time it takes for half of the firms in a given cohort to die (following the above definition of death).
* For survival analysis, this half-life corresponds to the age t by which the cumulative mortality fraction M(t) = 0.5 (50%). H(t)
iii. Survival biases and subsampling
The historical data on Compustat database do have problems of survival bias (Ball & Watts 1979): almost no firms die in the first 20 years of the dataset. To account for the effects of this bias, we ran our analysis both on the entire dataset and on a set limited to firms reporting sales between 1975 and 2009. A comparison between analysis of the entire sample and this reduced set suggests that the effect of survival bias is limited. This is likely because the first 20 years comprise a very small proportion of the entire dataset.
F. Estimating and Result
Annual numbers of entries (births) and exits (deaths) for publicly traded firms in the Compustat database
-> The number of births and deaths per year in this dataset varies substantially over time, reflecting, in part, general economic conditions.
-> Note that before 1975 very few firms die, reflecting a survival bias in the Compustat dataset.
-> Similarly, there are two spikes in births in 1960 and in 1974 that may be reflective of changes in the Compustat database or the conditions of market entry, not in the patterns we seek to analyse.
Frequency distribution of firm lifespans
-> a) shows the frequency distribution of firm lifespans is approximately exponential, independent of business sector.
Insets show the lifespan frequency distributions before normalization by sector size.
(the telecommunications, utilities and transportation sectors were omitted based on small sample size)
-> b) shows a similar frequency plot where colours denote the most common causes of mortality
(the reasons ‘other’ and privatization were omitted)
-> c) shows aggregate distributions are fit by a simple exponential function (solid lines), either restricted to the period of 1975 onwards or not.
–>> For the full window, the fit is N(t, T) = 2226e−λt with λ = 0.098 and 95% confidence interval . For the constrained window, the fit is N(t, T) = 2279e−λt with λ = 0.131 and 95% confidence interval .
–>> fewer firms die in the first few years of entering the market than a purely exponential distribution would suggest. This provides initial evidence for the liability of adolescence.
In the following, we perform a set of more rigorous statistical analyses to test the idea of a constant death rate as a function of age and to provide a set of estimates of the half-lives of publicly traded firms.
F.1 Constant death rate and exponentially distributed lifespans
The simplest conceptual framework for understanding the distribution of lifespans is inspired by a decay process in which the decay rate is assumed to be proportional to the number of remaining constituents.
For firms, this translates into the assumption that the number of deaths, ΔNd(t, T), occurring in some small discrete time interval from t to t + Δt is proportional to the number of companies remaining alive at time t, N(t, T), out of an initial cohort of N(0, T) firms at time t = 0. T denotes the time window of observation, which can be arbitrarily varied within the total timespan covered by the dataset. In the present case T ≤ 60 years. Thus,
where λ is the exit (or hazard) rate, which in general depends on both t and T. Since the number of firms remaining alive at time t is N(t, T) = N(0, T) − Nd(t, T), ΔNd(t, T) =−ΔN(t, T) if the time window, T, is kept fixed. In the limit of continuous time (Δt → 0), this leads to
whose general solution is given by
-> If λ is independent of t, but not necessarily of T, this reduces to the classic exponential form N(t, T) = N(0, T)e−λ(T)t which, as discussed above, is a good fit to the data.
The corresponding cumulative distribution function, M(t), for the fraction of companies that have died by time t within the observation window T is given by
Fits to data at successively larger T:
We observe that the modified exponential provides very good agreement with data over a broad range of values of T.
-> From these fits, we can compute the half-life of firms, t1/2, defined as the time taken for half of the original cohort, N(0, T), to die.
This is determined by solving M(t1/2) = 1/2, which results in
For this gives , whereas for
From half-life estimates across varying time windows equation leads, for large T, to a hazard rate λ ≈ 0.099 yr−1, corresponding to an asymptotic half-life of about 7 years .
Predicted Half-Life Estimates from Constant Hazard Model
The curves of Figure 2 result in estimates for the constant hazard rate (over its time interval) λ. These are then entered into the cumulative distribution function to determine the half-life of the set of firms with lifespans of T or less. The resulting curve is well fit is t1/2 = ln2 /λ *[1 − ln(1+exp(−λT))/ ln(2) ], with λ ≈ 0.0988 and 95% confidence intervals [0.0896, 0.1094]. This predicts that T → ∞, it will take approximately t1/2 = 7.019 years for half of all firms to die.
The Kaplan-Meier Mortality Curve
The nonparametric Kaplan-Meier estimator results in a form of the mortality curve, M(T) = 1 − S(T). The curves fitted to these distributions are M(T) = 0.925(1 − e^−0.074T ) with 95% confidence intervals [0.073, 0.076] and [0.918, 0.931]. For the constrained version, we estimate M(T) = 0.765 1 − e^−0.090T , with 95% confidence intervals [0.749, 0.782] and [0.085, 0.095].
1.However, the assumption that the hazard rate, λ, is independent of t for each time window T necessarily implies that all firms have finite lifespans and therefore presumes a priori that they all eventually die
2. our half-life estimate appears low; if we look at all firms born in a particular year, it often takes longer than 7 years for half of them to disappear. In 1975, for example, it took almost 12 years for half of the cohort to die.When we omit those firms that do not die before 2009, we reach the half-life more than 2 years earlier.
-> the omission of firms whose lifespans exceed the entire window of observation, leading to a bias towards early mortality
F.2 Survival analysis
Survival function, S(t), is cumulative probability of a firm being alive after time t
-> For a cohort of firms born at an initial time, t = 0, S(t) is expressed as the fraction of firms still alive at time t: S(t) = N(t)/N(0).
–> cumulative mortality function introduced above: F(t) = 1 − S(t).
–> probability distribution, or event density, p(t) or f(t) ≡ −dS(t)/dt (the incremental fraction of firm deaths occurring in a time interval from t to t + Δt)
–>> hazard rate, λ(t), is normalized mortality rate at time t: λ(t) = −d lnN(t)/dt,
–>>> it is often useful to introduce the cumulative hazard function, Λ(t), defined as
–>>> S(t) = e−Λ(t
The case when λ is constant, discussed F.1
-> straightforwardly leads to the conventional exponential distributions, S(t) = e−λt and p(t) = λe−λt, and the cumulative hazard function increasing linearly with time: Λ(t) = λt.
F.3 Maximum-likelihood estimator for constant hazard rate