Convergence in probability of order statistics.

probability
Published

August 15, 2021

Order statistics converge in probability to their sample quantiles, basically no matter what. That is a fact that I was surprised to find missing (as far as I could see) from the texts on my bookshelf. The books I have seem to analyze order statistics under stricter regularity conditions in order to get central limit theorems.

Obviously this is not new and the proof is nothing special, but some things are easier to prove yourself than find in a book, I guess. It’s a result that’s nice to have around, especially if you’re thinking about our AMIP paper. I had to sweat a little to avoid permitting myself to separately analyze point masses and continuity points, which would be an unnecessary complication.

Statement:

Let \(x_{(\lfloor \alpha N \rfloor)}\) denote the \(\lfloor \alpha N \rfloor\)-th order statistic of a dataset \(x_1, \ldots, x_N\), for \(0 < \alpha < 1\), and where \(x_{(0)}\) is undefined. Let the the data \(x_n\) be IID with distribution function \(F(x) = p(X \le x)\) and \(F_{-}(x) = p(X < x)\). Let \(q(\alpha) := \inf \{x: F(x) \ge \alpha \}\). Then \(x_{(\lfloor \alpha N \rfloor)} \rightarrow q(\alpha)\) in probability.

Proof:

Let \(x_{(k)}\) denote the \(k\)-th order statistic. By definition,

\[ \begin{align*} % x_{(k)} \le{} x \Leftrightarrow \sum_{n=1}^N \mathbb{I}\left( {x_n \le x}\right) \ge k \quad\textrm{and}\quad x_{(k)} \ge{} x \Leftrightarrow \sum_{n=1}^N \mathbb{I}\left( {x_n \ge x}\right) \ge N - k +1. % \end{align*} \]

For sufficiently large \(N\), \(\lfloor \alpha N \rfloor > 0\), so \(x_{(\lfloor N\alpha \rfloor)}\) is well-defined. Applying the first equivalence with \(k = \lfloor \alpha N \rfloor\) with any \(\epsilon > 0\) gives

\[ \begin{align*} % p\left(x_{(\lfloor N\alpha \rfloor)} \le q(\alpha) - \epsilon \right) ={}& % p\left(\sum_{n=1}^N \mathbb{I}\left( {x_n \le q(\alpha) - \epsilon} \right) % \ge \lfloor N \alpha \rfloor \right) % \\={}& p\left(\frac{1}{N}\sum_{n=1}^N \mathbb{I}\left( {x_n \le q(\alpha) - \epsilon}\right) \ge \frac{\lfloor N \alpha \rfloor}{N} \right) \\\rightarrow{}& \mathbb{I}\left( {F(q(\alpha) - \epsilon) \ge \alpha)}\right) = 0, % \end{align*} % \]

by the strong law of large numbers and the definition of \(q(\alpha)\). Similarly,

\[ \begin{align*} % p(x_{(\lfloor N\alpha \rfloor)} \ge q(\alpha) + \epsilon) ={}& p\left( \frac{1}{N}\sum_{n=1}^N \mathbb{R}\left({x_n < q(\alpha) + \epsilon}\right) \le \frac{\lfloor N \alpha \rfloor}{N} - \frac{1}{N} \right) \\\rightarrow{}& \mathbb{I}\left( {F_{-}(q(\alpha) + \epsilon) < \alpha}\right) = 0, % \end{align*} \]

again by the strong law of large numbers and the fact that \(F\) is increasing with \(F(q(\alpha)) \ge \alpha\). It follows that \(x_{(\lfloor N\alpha \rfloor)} \in (q(\alpha)- \epsilon, q(\alpha) + \epsilon)\), with probability approaching one, for any \(\epsilon > 0\). QED.