Why You Can Start Gradient Descent Anywhere — Even on an NP-Hard Recovery Problem
With order-n complex Gaussian quadratic measurements, recovering a vector up to phase becomes both identifiable and solvable by gradient descent from any starting point — two faces of one geometric condition.
Suppose I hand you a vector that you cannot see directly. All you get are numbers, each of the form , where every is a known Hermitian matrix. Each measurement is a quadratic form — energy-like, sign-blind, and worst of all, blind to global phase: replace by and every is unchanged. Two questions hang over this immediately. First, is there even enough information to pin down (up to that unavoidable phase)? Second, even if there is, the obvious way to recover it is to minimize a nonconvex loss — and nonconvex least-squares over is the kind of problem that, in general, is NP-hard. So why would any honest person expect to solve it from a cold start?
This is the quadratic feasibility problem, and it is the subject of work I led with my co-authors Gautam Dasarathy and Angelia Nedić at Arizona State. Read the paper · arXiv:2002.01066.
What is actually at stake, and what I am not claiming
Quadratic feasibility — find with for all — is the feasibility core of a quadratically constrained quadratic program (QCQP). It is not a toy: it models power-system state estimation, x-ray crystallography, the turnpike problem, and unlabeled distance geometry. Its most famous special case is phase retrieval, where each is rank one and the measurement collapses to — you observe only squared magnitudes. Our setting allows arbitrary, full-rank Hermitian .
Let me fence off what I am not doing. I will assume the are complex Gaussian random matrices; I will assume the measurements are exact (no noise model); and I will give you a landscape result, not a finite-time convergence rate. Those are real limitations and I will come back to them. The payoff in exchange is a clean, two-part statement about when this problem is solvable at all and whether a dumb algorithm solves it.
Accept what phase retrieval already taught us — then find the crack
The "benign nonconvex landscape" program is by now a beautiful body of work. Sun, Qu, and Wright showed that the phase-retrieval loss has no spurious local minima and only strict saddles — every local min is global, every critical point that isn't a min has a direction of strict negative curvature. Ge, Jin, and Zheng gave a unified geometric analysis for low-rank problems in the same spirit. So you might think: just cite them and go home.
Here is the crack. Those results live, for the most part, in , and the real and complex problems are not the same problem with a cosmetic change of field. In the real case, the sign ambiguity gives you exactly two isolated global minima. In the complex case, the phase ambiguity gives you a whole continuum of equivalent global minima. A landscape proof that counts isolated minima, or that leans on real differentiability, simply does not transport. Separately, Huang, Gupta, and Dokmanić showed complex quadratic systems with full-rank random matrices are solvable by gradient descent — but only from a carefully chosen spectral initialization. The question I cared about: can you drop the clever initialization entirely?
Why the obvious recovery objective fights back
The obvious move is to minimize
the squared residual, with and exactly at feasible points. Two things go wrong at once. The loss is nonconvex, so a priori there could be spurious local minima where gradient descent gets stuck. And is not complex-differentiable — it depends on both and — so ordinary calculus on does not even hand us a gradient to descend on. Lurking underneath both is the prior question of whether has an essentially unique answer at all.
The one idea: measure distance the way the problem does
Here is the move the whole paper rests on. Stop measuring how far apart and are with — that metric is offended by phase, reporting a large distance between and even though they are the same point for us. Instead use the phase-invariant distance
Here is the rank-one Hermitian matrix (the "lifted" version of ), and is the Frobenius norm. The single fact that makes this the right yardstick: if and only if for some phase . The metric is zero exactly on the orbits we cannot and should not distinguish.
The mental image to keep: lift every vector to its outer product, and do all your geometry up there, where phase has already been quotiented out. Once you are in the world of , "recover up to phase" becomes "recover the matrix ," and the awkward symmetry evaporates.
The small algebraic engine that powers everything is a polarization identity. Write and , and define the symmetric outer product . Then
This converts "tell and apart" into "the must not crush the rank-2 Hermitian matrix ." That is a statement about how a linear map — the acting on matrices — treats low-rank matrices, exactly the territory where RIP-style isometry arguments live.
The smallest nontrivial case, with numbers
Let me make this concrete in . Take and — orthogonal unit vectors, clearly not equal up to any phase. Lift them:
So , i.e. . Now check the engine. With , and , so
That is exactly . The identity is not symbol-pushing; it reproduces the matrix entrywise.
Now watch a single random measurement act on this. For a complex Gaussian Hermitian (real diagonal, complex off-diagonal fixed by symmetry), the inner product against our diagonal real collapses to — a difference of two independent standard normals. Its expected square is . So one random quadratic measurement is an unbiased estimate of the squared phase-invariant distance. I averaged over random Hermitian Gaussians and got , right on top of . Average enough of them and the estimate concentrates — the only remaining question is how many "enough" is.
From unbiasedness to identifiability: an isometry
Call the measurement map . Say is -stable if
The structural fact, via Wang and Xu's phase-retrievability characterization: is injective up to phase if and only if it is -stable. Identifiability is a RIP-like sandwich in the metric — not analogous to one, literally the same statement. (The bookkeeping that turns injectivity into the two-sided bound on is in the paper.)
To earn that sandwich for Gaussian , combine the unbiasedness above with a concentration bound: the empirical average stays within of its mean except with probability . That controls one pair . To make it hold uniformly over the sphere, cover the sphere with a -net of size and take a union bound. This is where the sample complexity is born: the union bound multiplies the per-point failure by the net size , so the exponent must beat the net, . The sitting in the net's exponent is exactly what forces .
Theorem 1 (near-isometry / identifiability). If the Hermitian are complex Gaussian and , then for any , with probability , for all , with explicit and . Distinct inputs give distinct measurements: the model is identifiable.
The same condition makes the landscape benign
Now the second face. Because is not holomorphic, we differentiate with Wirtinger calculus — treat and as independent variables. The goal is to rule out spurious minima, and the technique is to exhibit strict negative curvature everywhere a non-solution could hide: at every point with small gradient there is a direction with
where is the ground truth generating the . Notice the right-hand side is — the same phase-invariant distance, again doing the bookkeeping. Strict negative curvature at every non-optimal stationary point means no flat traps.
Theorem 2 (benign landscape). For complex Gaussian with , with probability : is strict-saddle, and every local minimum satisfies . No spurious local minima; every local min is global up to phase. Corollary. A gradient method from an arbitrary initialization therefore converges to a global minimum, via the saddle-avoidance results of Lee, Simchowitz, Jordan, and Recht.
The one sentence to keep for a week: identifiability and tractable optimization are governed by one and the same geometric condition — the -stability that complex Gaussian measurements buy you at .
What the result really says, and where it goes vacuous
The defended claim, with its boundary: order- Gaussian quadratic measurements suffice for both identifiability and a globally optimizable landscape, with no clever initialization. That last clause is the sharp contrast with Huang–Gupta–Dokmanić, who needed a spectral start.
Now the honest limits. The guarantees are for complex Gaussian — not structured, deterministic, or coded measurements. The constants (, , , ) are existence-type and unoptimized; is order-, not a sharp threshold, and pushing smaller (stronger stability) inflates — a real tradeoff. The corollary gives asymptotic saddle-avoidance, not a finite-time rate. There is no noise model: are exact. And to be candid, this manuscript is theory-focused; I would not have you take a benchmark number from me here.
The load-bearing assumption is the Gaussianity, and it is exactly where the result threatens to go vacuous: every "with high probability" rides on the second-moment identity and on the concentration bound above. Break those — heavy tails, strong structure, adversarial — and neither the isometry nor the curvature lower bound is guaranteed.
What I would do next
The cleanest open question: how far can the Gaussian assumption be relaxed — to sub-Gaussian, coded-diffraction, or sparse — while keeping the landscape benign? After that: the tight sample-complexity constant (is optimal?); finite-time iteration complexity to replace the qualitative convergence; a noise/robustness model with stable recovery bounds; and exploiting structure (sparse or low-rank ). If you take one thing away, let it be the reframing: lift to , measure with , and identifiability and optimizability turn out to be the same fact seen twice.