Astratinvest

A commentary on why most of the structure you estimate is noise, why your optimiser trusts that noise the most, and what random matrix theory has to say about both — with the one equation that ties it all together.

There is a particular kind of beautiful portfolio I have learned to distrust on sight.

It comes out of the optimiser looking immaculate — low predicted volatility, sensible weights, a risk number you would be glad to put in front of an investment committee. Then it goes into the world and behaves nothing like its own risk model. The drawdowns arrive in directions the matrix swore were quiet. The diversification you thought you had measured turns out to have been an artefact of the measuring.

For a long time I treated this as a data problem — not enough history, a regime change, bad luck. It is none of those. It is a structural feature of estimating a large covariance matrix from a finite sample, and the cleanest language for it is random matrix theory. What I want to show is that almost everything that matters here drops out of a single self-consistent equation. Once you see that, you stop being surprised by the beautiful portfolio and start being suspicious of it, which is the correct response.

The object every allocator leans on

Strip a quant process down and somewhere near the centre is a covariance matrix. Mean-variance, minimum-variance, risk parity, factor-risk budgeting — all of them take the covariance of returns as input and, almost always, take its inverse as the thing that drives the weights.

You never have the true covariance $C$ . You have an estimate built from $N$ assets and $T$ observations, the sample covariance

E = \frac{1}{T}\, X X^{\top}

with $X$ the $N \times T$ matrix of demeaned, variance-normalised returns. The whole question is how badly $E$ misrepresents $C$ , and the control parameter is the aspect ratio

q = \frac{N}{T}

A universe of $N = 500$ names with four years of daily data, $T \approx 1000$ , gives $q = 0.5$ . That feels comfortable. It is not.

What pure noise looks like

Here is the experiment that reorganised how I read a correlation matrix. Take returns that are pure independent noise, so the true covariance is the identity and every true eigenvalue is exactly $1$ . Form $E$ and look at its eigenvalue spectrum. If the sample were faithful you would see a spike at $1$ . You don't:

alt

Five hundred assets, a thousand observations, true covariance equal to the identity. The measured eigenvalues spread across a factor of more than thirty, and trace out the Marchenko–Pastur density exactly.

In the limit $N, T \to \infty$ at fixed $q$ , the eigenvalues fill a continuous band with the Marchenko–Pastur density

\rho(\lambda) = \frac{\sqrt{(\lambda_+ - \lambda)(\lambda - \lambda_-)}}{2\pi q \lambda}

\lambda_\pm = \left(1 \pm \sqrt{q}\right)^{2}

For $q = 0.5$ that runs from $\lambda_- \approx 0.086$ to $\lambda_+ \approx 2.91$ . The true eigenvalues are all $1$ ; the measured ones range from below a tenth to nearly three, generated by nothing at all. Dispersion in your spectrum is not, by itself, evidence of structure. And it worsens at the boundary: when $T < N$ so $q > 1$ , a fraction $1 - 1/q$ of the eigenvalues are exactly zero — the matrix is singular, and the optimiser demanding $E^{-1}$ is asking for something that does not exist.

One equation behind all of it

Everything above, and most of what follows, comes from a single function. Define the Stieltjes transform of the spectral density,

\mathfrak{g}(z) = \int \frac{\rho(\lambda)}{z - \lambda}\, d\lambda

It is a generating function for the whole spectrum: its singularities sit on the eigenvalues, its expansion at infinity gives the moments, and its boundary values give the density. For the null model it satisfies a self-consistent quadratic — the Marchenko–Pastur fixed point —

q z\, \mathfrak{g}(z)^{2} - (z + q - 1)\, \mathfrak{g}(z) + 1 = 0

Three facts you care about fall straight out of this one line.

The band edges are where the two roots collide, i.e. where the discriminant vanishes:

(z + q - 1)^{2} - 4qz = z^{2} - 2(q+1)z + (q-1)^{2} = 0 \implies z = (q+1) \pm 2\sqrt{q} = \left(1 \pm \sqrt{q}\right)^{2}

There are your $\lambda_\pm$ , with no probability needed — just the discriminant of a quadratic.

The density is the jump across the cut, $\rho(\lambda) = -\frac{1}{\pi} \cdot \operatorname{Im} \mathfrak{g}(\lambda + i0^{+})$ . On the support the discriminant is negative, the square root turns imaginary, and the imaginary part is exactly $\sqrt{(\lambda_+ - \lambda)(\lambda - \lambda_-)} / (2\pi q \lambda)$ — the Marchenko–Pastur law again.

The inverse moment — the one that will bury the optimiser — comes from evaluating the equation at $z = 0$ . Setting $z = 0$ kills the quadratic term and leaves $-(q - 1)\cdot \mathfrak{g}(0) + 1 = 0$ , so

\mathfrak{g}(0) = \frac{1}{q - 1} = -\frac{1}{1 - q}

But $\mathfrak{g}(0) = \int \rho(\lambda)/(0 - \lambda)\, d\lambda = -\int \rho(\lambda)/\lambda\, d\lambda$ , which gives the clean and consequential result

\boxed{\int \frac{\rho(\lambda)}{\lambda}\, d\lambda = \frac{1}{1 - q}} \qquad (q < 1)

The average of $1/\lambda$ over a pure-noise spectrum is not $1$ ; it is $1/(1 - q)$ , and it diverges as $q \to 1$ . Hold onto that number. (For the connoisseur: this is all the shadow of free probability — Marchenko–Pastur is the free multiplicative analogue of a deterministic spectrum, with $S$ -transform $S(x) = 1/(1 + qx)$ , and the quadratic above is what that statement looks like once you unwrap it.)

How much of a real matrix is real

A genuine return matrix is not pure noise — a few eigenvalues punch clean through $\lambda_+$ . The largest is almost always the market mode, a single direction in which everything moves together, often carrying a third or more of the total variance; below it sit a handful of sector and style modes. Everything else lies inside the band.

[GRAPH IMAGE — 1*bxHp8YT7yjCwb2vhbXcH-Q.png] One market factor, six sector factors, idiosyncratic noise. The market mode lands near $\lambda \approx 157$ , off the right of the chart; once the six signal modes are removed, the remaining bulk is described almost perfectly by Marchenko–Pastur — fitted here with an effective variance $\sigma^2_{\text{eff}} = 0.65$ , the variance left over after the signal is stripped out.

That effective-variance fit is itself instructive: because the market mode siphons off so much variance and the trace is conserved, the noise that remains has less than unit variance, so the bulk edge sits well below the naive $2.91$ . The count is the sobering part. In a five-hundred-name universe you might find five to twenty eigenvalues above the edge; the other four-hundred-plus are statistically indistinguishable from noise. Laloux, Cizeau, Bouchaud and Potters made this measurement on real equity data and named it perfectly: the noise dressing of financial correlation matrices.

Why the optimiser reaches straight for the noise

If the story stopped at "the bulk is noisy" it would be a footnote. What makes it dangerous is the inverse. Decompose $C = \sum_i \lambda_i v_i v_i^{\top}$ ; then

C^{-1} = \sum_i \frac{1}{\lambda_i}\, v_i v_i^{\top}

and the minimum-variance portfolio puts weight $w \propto C^{-1}\cdot \mathbf{1}$ . The inverse weights each eigenvector by $1/\lambda_i$ , so the smallest eigenvalues dominate the solution — the optimiser is drawn, by construction, toward the directions it believes are quietest. And the smallest eigenvalues, down near $\lambda_-$ , are precisely the ones most deformed by noise. The optimiser concentrates the book on the least trustworthy eigenvectors, having mistaken their noise-deflated sample eigenvalues for genuine low risk. Michaud's old line that mean-variance optimisers are "estimation-error maximisers" is just this mechanism, in English.

And now the boxed result pays off. The quantity the optimiser effectively integrates is the mean inverse eigenvalue, $(1/N)\operatorname{tr} C^{-1} \to \int \rho(\lambda)/\lambda\, d\lambda = 1/(1 - q)$ . The same factor governs the gap between the risk you predict in-sample and the risk you actually realise on the global minimum-variance portfolio:

\sigma^2_{\text{realised}} \approx \frac{\sigma^2_{\text{predicted}}}{1 - q}

[GRAPH IMAGE — 1*QvkJL1zmlaRrCB814-dFYA.png] "How much your in-sample risk number lies" — the variance inflation factor $1/(1 - q)$ as a function of the aspect ratio $q$ .

The in-sample risk number understates reality by $1/(1 - q)$ in variance. At $q = 0.5$ you realise double the variance — about $1.41\times$ the volatility — your model promised; as $q \to 1$ it runs away entirely.

The immaculate portfolio at the top of this piece was not unlucky. It was reporting a number the mathematics guarantees is optimistic by exactly $1/(1 - q)$ .

Cleaning the matrix

The point of all this is not despair; you can repair the eigenvalues you cannot trust, and the literature offers a ladder.

The crudest effective move is eigenvalue clipping: keep every eigenvalue above $\lambda_+$ as signal, and replace every eigenvalue inside the band with a single trace-preserving constant. You are declaring the bulk to be noise and refusing to let the optimiser trade on it.

[GRAPH IMAGE — 1*G0APoI5dCmJDvFGEcYz8JA.png] The raw spectrum (solid) versus the clipped one (dashed), log scale. Everything below the noise edge collapses to a single value; the genuine factor modes are left untouched. The flattened bulk no longer dominates the inverse.

One rung up is linear shrinkage (Ledoit–Wolf): pull the sample matrix toward a structured target,

\hat{C} = (1 - \alpha)\, E + \alpha\, \bar{\mu}\, I

with $\alpha$ chosen to minimise expected error — dragging the over-dispersed eigenvalues back toward the centre without drawing a hard line at the edge.

The state of the art is the rotationally invariant estimator (Ledoit–Péché, Bouchaud–Potters). Within the class of estimators that privilege no direction you cannot improve on the sample eigenvectors, so you keep them; but you can optimally transform each eigenvalue. The oracle-optimal cleaned value is

\xi_i = \frac{\lambda_i}{\left|\, 1 - q + q\,\lambda_i\, \mathfrak{g}(\lambda_i) \,\right|^{2}}

where $\mathfrak{g}$ is the very Stieltjes transform from earlier, now evaluated on the empirical spectrum just off the real axis. The intuition under the algebra is clean: each eigenvalue is shrunk by an amount set by how strongly the rest of the spectrum interferes with it, measured through the same transform that produced Marchenko–Pastur in the first place. The whole arc — edges, density, risk factor, optimal cleaning — runs through one function.

Back to the beautiful portfolio

What I eventually understood is that the immaculate output was not telling me about the market. It was telling me about my sample. The low predicted risk was real in-sample and meaningless out-of-sample, because the optimiser had loaded onto eigenvectors that existed only in the noise of four years of returns, and the inverse had amplified exactly those — by a factor I can now write down, $1/(1 - q)$ .

The discipline that follows is simple. Before you trust a covariance matrix, compute $\lambda_+ = (1 + \sqrt{q})^{2}$ , fit the bulk with its effective variance, and ask how many eigenvalues actually clear the edge. Treat the rest as noise until you have cleaned it. And keep the headline close: most of what your risk model knows is noise, your optimiser trusts that noise the most, and the diversification you believe you measured is — until proven otherwise — a hallucination of the sample.

The market will eventually tell you which parts of your matrix were real. It is cheaper to ask the mathematics first.