Ultraproducts

1. Introduction
2. Prerequisites
3. Motivation
4. Ultrafilters
5. Ultraproducts
6. Łoś’s Theorem
7. Examples
8. Ultrapowers
9. Nonstandard Analysis
- 9.1. Hyperreals
- 9.2. Calculus

\newcommand{\F}{\mathbf{F}}
\newcommand{\R}{\mathbf{R}}
\newcommand{\N}{\mathbf{N}}
\newcommand{\Q}{\mathbf{Q}}
\DeclareMathOperator{\monad}{monad}
\DeclareMathOperator{\galaxy}{galaxy}
\DeclareMathOperator{\st}{st}

1. Introduction

I’ve been learning about ultraproducts lately, as that’s a word I’ve heard a lot, but don’t know a ton about. It turns out that they are quite fascinating!

The definitions of ultraproducts and ultrafilters are inspired by some exercises in Marker’s Model Theory: An Introduction. The proofs are mostly just my solutions to those exercises. The section on nonstandard analysis is heavily inspired by Ultraproducts and their Application to Non-Standard Analysis by Miriam Pérez López-Serrano, available here.

2. Prerequisites

You should be able to understand this blog post if you have:

An understanding of first order logic. An undergraduate course in logic should suffice, as long as it covers:
1. The basic syntax of first order languages (e.g. function symbols, relation symbols, terms, formulas, connectives, etc.),
2. What a structure is,
3. And what it means for \(\mathcal{M} \models \phi\), for a structure \(\mathcal{M}\) and formula \(\phi\).
Familiarity with abstract algebra. Several of my examples come from algebra. Knowing some basic facts about groups and fields should suffice. The details of the examples don’t matter too much.
Familiarity with basic analysis. I compare proofs in nonstandard analysis with “the standard” \(\epsilon\)-\(\delta\) proofs, and assume you have some familiarity with these. If you’ve proved a limit using the \(\epsilon\)-\(\delta\) definition, that’s probably enough.

3. Motivation

Ultraproducts are an important construction in model theory. Most of model theory involves creating a model of some theory with some specific properties, and ultraproducts are a very powerful tool for doing this. Ultraproducts, in some sense, allow us to take a “limit” (this can be made rigorous in category theory of course) of an infinite family of models. So, for instance, we can take the “limit” of the finite fields \(\F_p\) to obtain an infinite field with characteristic zero that still behaves indistinguishably from a finite field. Or, we can take the “limit” of the models of finite subsets of a finitely satisfiable theory to obtain a model of the whole theory, i.e. prove the compactness theorem.

While you can view an ultraproduct as a “limit” of a family of models, that isn’t necessarily the best intuition. For one, ultraproducts are not unique, and can even vary quite widely, as we will see in section 7.3. A more precise way to think about ultraproducts is that the elements of the universe are sequences, where each component is from a different model. Then the element has whatever properties “most” components share. This allows for the kind of limiting behavior. For instance, regarding the finite field example, you can have the sequence \(x = (1, 1, 1, 1, \dots)\) and note that every component has order greater than \(1\), all but one have order greater than \(2\), all but two have order greater than \(3\), etc. so \(x\) must have infinite order. These sequences enable ultraproducts to have some sort of limiting or averaging behavior, but can be much less predictable than you might expect.

Now let’s get on with the construction! The idea is we will take the product of the entire infinite family of models to obtain our universe¹. We want our model to only have properties shared by “most” models, so we’ll quotient it out by an equivalence relation, where \((a_i) \sim (b_i)\) if \(a_i = b_i\) in “most” of the models \(\mathcal{M}_i\). This ends up being enough to create a full model, and prove that it satisfies some very nice properties. Unfortunately, what we mean by “most” is quite subtle, and requires some extra machinery, namely ultrafilters.

4. Ultrafilters

First, we need filters.

4.1. Filters

Given a set \(I\), a filter on \(I\) is a collection \(D\) of subsets of \(I\). Intuitively, the elements of \(D\) are “almost all” of \(I\). A useful example to have in your head is the collection of cofinite sets. \(D\) needs to satisfy the following properties:

\(I \in D\) and \(\emptyset \notin D\). So “all of \(I\)” is “almost all”, and “none of \(I\)” isn’t.
If \(A \in D\) and \(A \subseteq B\), then \(B \in D\). If \(A\) is “almost all” of \(I\), and \(B\) is a bigger set, then it should be “almost all” of \(I\) as well.
If \(A, B \in D\), then \(A \cap B \in D\). If \(A\) and \(B\) are “almost all” of \(I\), then so is \(A \cap B\).

As I suggested, the set of cofinite subsets of an infinite set \(I\) forms a filter known as the Frechét filter, which we can easily prove. Let \(D = \{X \subseteq I \mid I\setminus X \text{ is finite}\}\). Then we have:

\(I \in D\), since \(I\setminus I = \emptyset\) is finite. Likewise \(I \setminus \emptyset = I\) is infinite, so \(\emptyset \notin D\).
If \(A \in D\), so \(A\) is cofinite, and \(A \subseteq B\), then \(I\setminus B \subseteq I \setminus A\), so \(B\) is also cofinite.
If \(A, B \in D\) are cofinite, then \(I\setminus (A \cap B) = I\setminus A \cup I \setminus B\) is the union of two finite sets, and hence also finite.

One useful tool for creating filters comes from the finite intersection property.

A collection \(D\) of subsets of \(I\) has the finite intersection property if every finite intersection of elements of \(D\) is nonempty.

Let \(D\) be a collection of subsets of \(I\) with the finite intersection property. Then the set \(D'\) of supersets of finite intersections of \(D\) is a filter.

To be a little more explicit about \(D'\), we can say \[ D' = \{X \subseteq I \mid X \supseteq X_1 \cap \dots \cap X_n,\quad X_i \in D\}. \]

\(I\) is a superset of every element of \(D\), so it is in \(D'\). Conversely, the empty set is not a superset of a finite intersection of elements of \(D\), since every finite intersection of \(D\) is nonempty.
Let \(A \in D'\), so \(A \supseteq X_1 \cap \dots \cap X_n\), and let \(B \supseteq A\). Then \(B \supseteq X_1 \cap \dots \cap X_n\), so \(B \in D'\).
Let \(A, B \in D'\), so \(A \supseteq X_1 \cap \dots \cap X_n\) and \(B \supseteq Y_1 \cap \dots \cap Y_m\). Then \(A \cap B \supseteq X_1 \cap \dots \cap X_n \cap Y_1 \cap \dots \cap Y_m\), so \(A \cap B \in D'\).

4.2. Ultrafilters

Now we can make them ultra.

A filter \(U\) is an ultrafilter if it also satisfies that for every \(X \subseteq I\), either \(X \in U\) or \(I\setminus X \in U\).

This is a weird property. For example, consider the Frechét filter \(D\) on \(\omega\). This is not an ultrafilter as, for instance, neither the set of even nor odd numbers are in \(D\). This makes sense, as it doesn’t make much sense to say that either the even or the odd numbers are “almost all” of the natural numbers. However, we need this property in order to prove Łoś’s theorem later down the line, which is essential for ultraproducts to be useful.

4.3. Principal Ultrafilters

It’s pretty hard to come up with concrete ultrafilters (usually impossible, in fact), but there are some easy ones, known as the principal ultrafilters. Given a set \(I\) and a particular \(x \in I\), define \(D = \{X \subseteq I \mid x \in X\}\). We can easily see that \(D\) is an ultrafilter.

\(x \in I\), so \(I \in D\), and \(x \notin \emptyset\), so \(\emptyset \notin I\).
If \(A \subseteq B\) and \(x \in A\), then \(x \in B\).
If \(x \in A\) and \(x \in B\), then \(x \in A \cap B\).
Given \(X \subseteq I\), if \(x \in X\), then \(X \in D\). Otherwise \(x \in I \setminus X\), so \(I \setminus X \in D\).

Remarkably, there are non-principal ultrafilters. Not only that, but the ultrafilter theorem says that we can always extend a filter to an ultrafilter!

4.4. Ultrafilter Theorem

Let \(D\) be a filter on \(I\). There exists an ultrafilter \(U\) such that \(D \subseteq U\).

The idea is to use Zorn’s Lemma to construct a filter that is maximal with respect to \(\subseteq\), then show that this maximal filter is an ultrafilter. First, we can note that the set of filters on \(I\) containing \(D\) is partially ordered with respect to \(\subseteq\). Now, let \(\mathcal{C}\) be a chain. I claim that \(\bigcup \mathcal{C}\) is an upper bound. Certainly, if it is a filter, it is an upper bound for \(\mathcal{C}\), so we just have to show that it is a filter.

Since \(I \in C\) for every \(C \in \mathcal{C}\), \(I \in \bigcup \mathcal{C}\). On the other hand, if \(\emptyset \in \bigcup \mathcal{C}\), then there exists a \(C \in \mathcal{C}\) such that \(\emptyset \in C\), which contradicts the fact that \(C\) is a filter.
Suppose \(A \in \bigcup \mathcal{C}\) and \(B \supseteq A\). Then there exists a \(C \in \mathcal{C}\) such that \(A \in C\). Hence \(B \in C\), so \(B \in \bigcup \mathcal{C}\).
Suppose \(A, B \in \bigcup \mathcal{C}\). Then there exist \(C_1, C_2 \in \mathcal{C}\) with \(A \in C_1\) and \(B \in C_2\). Since \(\mathcal{C}\) is a chain, either \(C_1 \subseteq C_2\) or \(C_2 \subseteq C_1\). Without loss of generality, suppose \(C_1 \subseteq C_2\). Then \(A, B \in C_2\), so \(A \cap B \in C_2\) and hence \(A \cap B \in \bigcup \mathcal{C}\).

Now, since every chain has an upper bound, Zorn’s lemma implies that there exists a maximal element \(U\). We just need to show that \(U\) has the ultrafilter property. This isn’t difficult, but is a little tedious and technical. Let \(X \subseteq I\). If \(X \in U\), we are done, so suppose \(X \notin U\). We will define a new filter \(U'\) as \[ U' = \{Y \subseteq I \mid \exists Z \in U,\, Z \setminus X \subseteq Y\}. \] First, let’s show that \(U'\) is a filter.

\(I \setminus X \subseteq I\), so \(I \in U'\). If \(\emptyset \in U'\), then there exists a \(Z \in U\) with \(Z \setminus X \subseteq \emptyset\), which means \(Z \setminus X = \emptyset\), and hence \(Z \subseteq X\), implying that \(X \in U\), a contradiction.
Suppose \(A \in U'\) with \(Z\setminus X \subseteq A\) and \(B \supseteq A\). Then \(Z \setminus X \subseteq A \subseteq B\), so \(B \in U'\).
Finally, suppose \(A, B \in U'\) with \(Z_1 \setminus X \subseteq A\) and \(Z_2 \setminus X \subseteq B\). Then \(Z_1 \cap Z_2 \setminus X \subseteq A \cap B\) and \(Z_1 \cap Z_2 \in U\), so \(A \cap B \in U'\).

Now, note that \(I \setminus X \in U'\), since \(I \in U\). Finally, we have \(U \subseteq U'\), since for every \(Z \in U\), \(Z \setminus X \subseteq Z\). But, since \(U\) is maximal, \(U' = U\), so \(I \setminus X \in U\), and our proof is complete.

What makes this proof remarkable is, since the proof is basically just a single application of Zorn’s lemma, it is extremely non-constructive. Given a filter, we can extend it to an ultrafilter, but this ultrafilter is not even remotely unique, and impossible to describe! Despite this, and the fact that the choice of ultrafilter does impact the resulting ultraproduct, the details of the ultrafilter chosen are completely irrelevant most of the time. Usually it just matters that a non-principal ultrafilter exists. As we will see shortly, every non-principal ultrafilter contains the cofinite sets, which is usually all we need.

4.5. Some quick useful lemmas

There are some facts about non-principal ultrafilters that are very easy to state and prove, but will come in handy.

First, if an ultrafilter contains a singleton set, then it must be principal. Suppose \(\{x\} \in U\). Then certainly every set containing \(x\) is in \(U\). Conversely, let \(X \in U\). If \(x \notin X\), then \(x \in I \setminus X\), so \(X\) and \(I \setminus X\) are in \(U\), hence \(X \cap I \setminus X = \emptyset \in U\), a contradiction.

Furthermore, if an ultrafilter contains a finite set, then it must be principal. Suppose \(U\) contains \(\{x_1, \dots, x_n\}\), and suppose \(n\) is the smallest cardinality of \(U\)’s members. Then consider \(\{x_1\}\). Since \(U\) is an ultrafilter, either \(\{x_1\} \in U\) and hence \(U\) is principal by the previous lemma, or \(I \setminus \{x_1\} \in U\). In the latter case, if \(n > 1\), then \(\{x_2, \dots, x_n\} = I \setminus \{x_1\} \cap \{x_1, \dots, x_n\} \in U\) is a smaller finite set contained in \(U\) contradicting the minimality of \(n\).

As a corollary, we can note that every non-principal ultrafilter \(U\) must contain every cofinite set, as otherwise \(U\) would contain a finite set and hence be principal. Conversely, if an ultrafilter \(U\) contains every cofinite set, it must be non-principal, since, in particular, \(I \setminus \{x\} \in U\) for every \(x \in I\), so \(\{x\} \notin U\) and principal ultrafilters contain a singleton set.

Since the Frechét filter always exists and can be extended to a non-principal ultrafilter, as it contains every cofinite set, and all that matters in an ultraproduct is that the ultrafilter contains all the cofinite sets, we don’t really have to worry about the particular choice of ultrafilter at all, even though the resulting object depends on the choice.

5. Ultraproducts

Now let’s put these ultrafilters to good use and construct an ultraproduct! Fix some first order language \(\mathcal{L}\). Let \(I\) be some index set, and for each \(i \in I\), let \(\mathcal{M}_i\) be some \(\mathcal{L}\)-structure with universe \(M_i\). Likewise, let \(U\) be some ultrafilter on \(I\). Our goal will be to construct an \(\mathcal{L}\)-structure \(\mathcal{M} = \prod_{i \in I} \mathcal{M}_i / U\).

We begin by defining \(X = \prod_{i \in I} M_i\). In particular, we are using the general product, so elements of \(X\) are functions \(f\) mapping an index \(i\) into \(M_i\). Next, we will define an equivalence relation \(\sim\) on \(X\) by \(f \sim g\) if \(\{i \in I \mid f(i) = g(i)\} \in U\). We can interpret this as considering \(f\) and \(g\) to be the same if they agree almost everywhere. This is highly analogous to \(L_p\) spaces, where we work modulo differences on measure zero spaces (in fact, the collections of sets whose complement is measure zero forms a filter on the subsets of a measure space). We need to prove that \(\sim\) is an equivalence relation. Fortunately this is not hard. We have reflexivity, since \(\{i \in I \mid f(i) = f(i)\} = I \in U\). Likewise, if \(f \sim g\), so \(\{i \mid f(i) = g(i)\} \in U\), then \(\{i \mid g(i) = f(i)\} = \{i \mid f(i) = g(i)\} \in U\). Finally, suppose \(f \sim g\) and \(g \sim h\), so \(\{i \mid f(i) = g(i)\}\) and \(\{i \mid g(i) = h(i)\}\) are in \(U\). Then \(\{i \mid f(i) = h(i)\} \supseteq \{i \mid f(i) = g(i)\} \cap \{i \mid g(i) = h(i)\} \in U\), so \(f \sim h\).

The universe of \(\mathcal{M}\) will be \(M = X / \sim\). The equivalence class of a function \(f \in X\) is notated by \([f]\). We still have to provide interpretations of the constant, function, and relation symbols in \(\mathcal{M}\). First, for every constant symbol \(c\), we can easily define \(c^{\mathcal{M}} = [i \mapsto c^{\mathcal{M}_i}]\). For every relation symbol \(R\) of arity \(n\), we can say that \(([x_1], \dots, [x_n]) \in R^{\mathcal{M}}\) iff \(\{i \mid (x_1(i), \dots, x_n(i)) \in R^{\mathcal{M}_i}\} \in U\), i.e. we have \(([x_1], \dots, [x_n]) \in R^{\mathcal{M}}\) precisely when “most” \(\mathcal{M}_i\) agree. We do have to show that \(R^{\mathcal{M}}\) is well-defined. Suppose that \([x_1] = [y_1], \dots, [x_n] = [y_n]\), so \(A_1 = \{i \mid x_1(i) = y_1(i)\} \in U\), \(\dots\), \(A_n = \{i \mid x_n(i) = y_n(i)\} \in U\). Then for every \(i \in A_1 \cap \dots \cap A_n\), \(x_1(i) = y_1(i)\), \(\dots\), and \(x_n(i) = y_n(i)\). As such, if \(B = \{i \mid (x_1(i), \dots, x_n(i)) \in R^{\mathcal{M}_i}\}\), then \(\{(i \mid y_1(i), \dots, y_n(i)) \in R^{\mathcal{M}_i}\} \supseteq B \cap A_1 \cap \dots \cap A_n \in U\).

Function symbols can be interpreted by relationalizing them or by directly defining \(f^{\mathcal{M}}([a_1], \dots, [a_n]) = [b]\) if \(\{i \mid f^{\mathcal{M}_i}(a_1(i), \dots, a_n(i)) = b(i)\} \in U\). In the latter case, the choice of \([b]\) is unique, since \(f^{\mathcal{M}_i}\) are all functions. We have to verify that \(f^{\mathcal{M}}\) is well-defined, but the argument is virtually identical to that for relation symbols.

Putting all this together, we have an \(\mathcal{L}\)-structure \(\mathcal{M} = \prod_{i \in I} \mathcal{M}_i / U\) that is, in some sense, an average or limit of the \(\mathcal{M}_i\)’s. The whole point of this is the absolutely remarkable Łoś’s theorem.

6. Łoś’s Theorem

Given some \(\mathcal{L}\)-formula \(\phi\) with free variables among \(v_1, \dots, v_n\), denoted \(\phi(v_1, \dots, v_n)\), and \(([g_1, \dots, g_n]) \in M^{n}\), we have that \(\mathcal{M} \models \phi(v_1, \dots, v_n)\) if and only if \(A = \{i \in I \mid \mathcal{M}_i \models \phi(g_1(i), \dots, g_n(i))\} \in U\).

Loosely summarized, a formula \(\phi\) is true in \(\mathcal{M}\) precisely when it is true in “almost all” of the \(\mathcal{M}_i\)’s. This is an absolutely incredible theorem, and we will see some remarkable consequences of this theorem once we prove it. The reason ultrafilters are defined the way they are is to make this theorem work out. In particular, we needed to use the properties of filters in order to construct \(\mathcal{M}\), but we never used the fact that \(U\) is an ultrafilter. We will need it in this proof. Keep an eye out for where, as it is kind of interesting. This proof, as a whole, is surprisingly easy and routine, but it is enlightening seeing how the properties of ultrafilters and ultraproducts are used.

By induction on the structure of \(\phi\).

If \(\phi \equiv R(v_{j_1}, \dots, v_{j_m})\), then
\begin{align*} \mathcal{M} \models \phi([g_1], \dots, [g_n]) &\iff ([g_{j_1}], \dots, [g_{j_m}]) \in R^{\mathcal{M}} \\ &\iff \{i \mid (g_{j_1}(i), \dots, g_{j_m}(i)) \in R^{\mathcal{M}_{i}}\} \in U \\ &\iff \{i \mid \mathcal{M}_i \models \phi(g_1(i), \dots, g_n(i))\} \in U \end{align*}
If \(\phi \equiv s(v_1, \dots, v_n) = t(v_1, \dots, v_n)\), then
\begin{align*} \mathcal{M} \models \phi([g_1], \dots, [g_n]) &\iff s^{\mathcal{M}}([g_1], \dots, [g_n]) = t^{\mathcal{M}}([g_1], \dots, [g_n]) \\ &\iff \{i \mid s^{\mathcal{M}_i}(g_1(i), \dots, g_n(i)) = t^{\mathcal{M}_i}(g_1(i), \dots, g_n(i)\} \in U \\ &\iff \{i \mid \mathcal{M}_i \models s(v_1, \dots, v_n) = t(v_1, \dots, v_n)\} \in U. \end{align*}
This does technically require the lemma that \(s^{\mathcal{M}}([g_1], \dots, [g_n])(i) = s^{\mathcal{M}_i}(g_1(i), \dots, g_n(i))\), but this follows immediately from the definition of constants and functions and a quick induction.
I’m going to start to get a little sloppy with notation and leave off the free variables/arguments. They are very easy to put back in and don’t substantively change the argument, just massively clutter the notation. Suppose \(\phi \equiv \psi \wedge \theta\). We’ll start with the forward direction. Suppose \(\mathcal{M} \models \phi\). Then \(\mathcal{M} \models \psi\) and \(\mathcal{M} \models \theta\). By the inductive hypothesis, \(A = \{i \mid \mathcal{M}_i \models \psi\} \in U\) and \(B = \{i \mid \mathcal{M}_i \models \theta\} \in U\). Hence \[ \{i \mid \mathcal{M}_i \models \phi\} = \{i \mid \mathcal{M}_i \models \psi \text{ and } \mathcal{M}_i \models \theta\} \supseteq A \cap B \in U. \]

The reverse direction isn’t too bad either. Suppose \(\{i \mid \mathcal{M}_i \models \phi\} \in U\). Then \(\{i \mid \mathcal{M}_i \models \psi\} \supseteq \{i \mid \mathcal{M}_i \models \phi\} \in U\), so by the inductive hypothesis, \(\mathcal{M} \models \psi\). The same logic implies \(\mathcal{M} \models \theta\).
Next, suppose \(\phi \equiv \neg \psi\). Then we have
\begin{align*} \mathcal{M} \models \neg \psi &\iff \mathcal{M} \not\models \psi \\ &\iff \{i \mid \mathcal{M}_i \models \psi\} \notin U \\ &\iff \{i \mid \mathcal{M}_i \not\models \psi\} \in U \\ &\iff \{i \mid \mathcal{M}_i \models \neg \psi\} \in U. \end{align*}
Finally, suppose \(\phi \equiv \exists v\, \psi\). For the forward direction, suppose \(\mathcal{M} \models \exists v\, \psi\). Then there exists a witness \([b]\) such that \(\mathcal{M} \models \psi([b])\), which inductively means that \(\{i \mid \mathcal{M} \models \psi(b_i)\} \in U\). Hence the set of indices where \(\exists v\, \psi\) has a witness in \(\mathcal{M}_i\) contains \(\{i \mid \mathcal{M} \models \psi(b_i)\}\), as \(b_i\) is such a witness, and hence is in \(U\). Conversely, if \(\{i \mid \mathcal{M}_i \models \exists v\, \psi\} \in U\), then set \(b(i)\) to be the witness if it exists in \(\mathcal{M}_i\) and some arbitrary element of \(M_i\) otherwise². Then \(\{i \mid \mathcal{M}_i \models \psi(b_i)\} \supseteq \{i \mid \mathcal{M}_i \models \exists v\, \psi\}\), so by the inductive hypothesis, \(\mathcal{M} \models \psi([b])\) and hence \(\mathcal{M} \models \exists v\, \psi\).

7. Examples

This was a bit of a whirlwind, but we now have ultraproducts, and know what formulas they satisfy. Here’s a super quick proof of the compactness theorem using ultraproducts to show off the power of this construction.

7.1. Compactness

Let \(T\) be a finitely satisfiable theory. Let \(I\) be the set of finite subsets of \(T\). Since \(T\) is finitely satisfiable, for every finite set \(i \in I\), there exists an \(\mathcal{L}\)-structure \(\mathcal{M}_i\) such that \(\mathcal{M}_i \models i\). For each \(\phi \in T\), let \(X_\phi = \{i \in I \mid \phi \in i\}\). The \(X_\phi\) have the finite intersection property (\(\{\phi_1, \dots \phi_n\} \in X_{\phi_1} \cap \dots \cap X_{\phi_n}\)), and thus generate a filter. Extend this filter to an ultrafilter \(U\). Then we have \(\mathcal{M} = \prod_{i\in I} \mathcal{M}_i / U\). I claim \(\mathcal{M} \models T\). Let \(\phi \in T\). Certainly \(\mathcal{M}_i \models \phi\) for every \(i \in X_\phi\), so \(\{i \mid \mathcal{M}_i \models \phi\} \supseteq X_\phi \in U\), so by Łoś’s theorem \(\mathcal{M} \models \phi\).

Isn’t that remarkable? We didn’t have to bother with Henkin constructions or proof systems and Gödel’s completeness theorem. We just glue all the \(\mathcal{L}\)-structures modeling the finite subsets together and average them out and we have a model of \(T\).

7.2. Pseudofinite Fields

Now that we are armed with ultraproducts, let’s look at some really weird constructions. Take \(\mathcal{L}\) to be the language of fields. Then for each prime \(p\), \(\F_p\) is an \(\mathcal{L}\)-structure. Take any ultrafilter \(U\) on the prime numbers. Then we can construct \(K = \prod_p \F_p / U\). This is a very strange \(\mathcal{L}\)-structure. First, since every \(\F_p\) is a field, and that can be expressed in \(\mathcal{L}\), so is \(K\). However, unlike \(\F_p\), \(K\) is characteristic zero. We can see this because for every \(n\), the sentence \(\underbrace{1 + 1 + \dots + 1}_{n \text{ times}} \neq 0\) is true in all but at most one \(\F_p\) and hence true in \(K\) as well. It gets even weirder, however, because \(K\) is uncountable and characteristic zero, but only has a unique extension of every degree, as that is a property expressible in \(\mathcal{L}\) and true of every \(\F_p\).

Things get even weirder too. Pick any elliptic curve with rational coefficients. It is known that the number of solutions in \(\F_p\), \(N_p\), satisfies \(N_p \geq p - 2\sqrt p\), which grows without bound. Hence for any \(n\), all but finitely many fields \(\F_p\) satisfy \(N_p \geq n\). Thus every elliptic curve with rational coefficients has infinitely many solutions in \(K\).

What makes all this so weird is that, purely in terms of first order logic, \(K\) is indistinguishable from a finite field. Everything that is true about all finite fields is true about \(K\). In finite time, you can’t tell that \(K\) isn’t a finite field. For instance, you could keep adding one to try to find its characteristic, but wherever you give up and stop, there are finite fields with larger characteristics. You can check all of its field extensions, and those line up perfectly with what you’d expect from a finite field. You can count solutions to elliptic curves. You could tell that it has a lot of solutions, but you can’t tell in finite time that it has infinitely many, and large finite fields have tons of solutions to elliptic curves as well. But despite this, not only is \(K\) not finite, it’s uncountably infinite and has a lot of bizarre properties that are very different from finite fields, you just can never find out.

This isn’t very important, but it is too absurd and delightful not to share³. There are infinitely many primes \(p\) such that \(\F_p\) has no solution to \(x^2 - 2 = 0\) (follows immediately from quadratic reciprocity and Dirichlet’s theorem). Let \(P\) be the set of these primes. Then if \(U\) is a non-principal ultrafilter containing \(P\), then \(K = \prod_p \F_p / U\) is a pseudofinite field with characteristic zero (hence is an extension of \(\Q\)), where \(x^2 - 2 = 0\) has no root. Thus \(\sqrt 2 \notin K\), and since \(K\) is an extension of \(\Q\), \(\sqrt 2\) must be irrational.

7.3. Cyclic Groups

Here’s a weird example. We can glue almost anything with an index together using ultraproducts, so how about \(G = \prod C_n / U\), where \(C_n\) is the cyclic group of order \(n\)? It turns out, the resulting group varies wildly depending on the ultrafilter. No matter what, it will be Abelian, since every cyclic group is Abelian. Furthermore, no matter what it will definitely have elements of infinite order. For every \(k\), all but finitely many of the \(C_n\)’s have an element with order greater than \(k\), so this product will have an element of infinite order (in fact, infinitely many elements of infinite order). The question gets weirder if we ask about torsion elements, since it’s impossible to state that an element has finite order in general in first order logic, but we can ask about specific orders.

So, does \(G\) have an element of order 2? \(G\) having an element of order 2 means \(G \models \exists x\, x\cdot x = e\), which is true if \(\{n \mid C_n \models \exists x\, x \cdot x = e\} \in U\). But \(C_n\) has an element of order 2 if and only if \(n\) is even, so \(G\) has an element of order 2 if and only if the set of even numbers is in \(U\)! The set of even numbers isn’t cofinite, so it depends on the choice of non-principal ultrafilter \(U\). Similar analysis holds for every other order.

We can even choose precisely which finite orders we want. For any natural number \(n\), let \(X_n = n \N\) and \(Y_n = \N \setminus X_n\). The \(X_n\)’s have the finite intersection property, in fact they are closed under finite intersection, so they generate a non-principal ultrafilter \(U_1\). Likewise, the \(Y_n\)’s have the finite intersection property, so they generate a non-principal ultrafilter \(U_2\).

First, let’s look at \(G_1 = \prod C_n / U_1\). The set of even integers is just \(X_2\), so \(G_1\) has an element of order 2, in fact exactly one (since every group with even order has \(\phi(2) = 1\) elements of order 2). Likewise, the set of integers divisible by 3 is \(X_3\), so \(G_1\) also has two elements of order 3. Carrying this logic forward, \(G_1\) has elements of every finite order! More specifically, for every \(n \in \N\), \(G_1\) has \(\phi(n)\) elements of order \(n\). It also has infinitely many elements of infinite order, too! It’s really hard to say much more beyond this about \(G_1\), unfortunately. It is an enormously complicated, uncountably infinite group.

Now let’s consider \(G_2 = \prod C_n / U_2\). This group is much easier to talk about, though it is still quite weird. Unlike \(G_1\) which has elements of every finite order, \(G_2\) only has elements with infinite order, and no elements of any finite order. Things get weirder. Divisibility of groups is not expressible in first order logic, but we can say something about the divisibility of \(G_2\). In order for \(G_2\) to be divisible, we’d want for every \(k\), \[ G_2 \models \forall x\, \exists y\, \underbrace{y \cdot y \cdot \dots \cdot y}_{k\text{ times}} = x. \] Switching gears to \(C_n\), we can say that \(C_n \models \forall x\, \exists y\, y^{k} = x\) if and only if \(k\) and \(n\) are relatively prime. So for a fixed \(k\), we want to see if the set \(Z_k = \{n \mid \gcd(n, k) = 1\} \in U\). Suppose the prime factors of \(k\) are \(p_1\), \(p_2\), \(\dots\), \(p_r\). Then \(Z_k = Y_{p_1} \cap \dots \cap Y_{p_r}\) (\(n\) and \(k\) are relatively prime if and only if \(n\) is not a multiple of any of the prime factors of \(k\)). But then this means that \(C_n\) is a divisible, torsion free, Abelian group! So by the classification of divisible Abelian groups, \(G_2 \cong \Q^{(\mathfrak{c})}\).

8. Ultrapowers

We can also take products of multiple copies of the same structure. This is known as an ultrapower. We will use an ultrapower to construct the hyperreals and develop nonstandard analysis.

Let \(\mathcal{M}\) be an \(\mathcal{L}\)-structure, and \(U\) an ultrafilter on some set \(I\). Then \(\prod_{i \in I} \mathcal{M} / U\) is an ultrapower and is denoted \(\mathcal{M}^{U}\).

Elements of \(\mathcal{M}^{U}\) are equivalence classes of sequences of elements of \(\mathcal{M}\). Unlike with ultraproducts, which can be quite different from their factors, \(\mathcal{M}\) embeds elementarily via the canonical embedding \(d : M \to M^{U}\), where \(x\) maps to the equivalence class of its constant function.

The canonical embedding is an elementary embedding.

Certainly \(d\) is an injection. Suppose \(\mathcal{M} \models \phi(g_1, \dots, g_n)\). Then \(\{i | \mathcal{M} \models \phi(d(g_1)(i), \dots, d(g_n)(i))\} = I\), so by Łoś’s theorem, \(\mathcal{M}^{U} \models \phi(d(g_1), \dots, d(g_n))\). All this logic is reversible, so we have the converse as well.

9. Nonstandard Analysis

Calculus was famously developed non-rigorously using the ideas of infinites and infinitesimals. It turns out, by using a nonstandard model of the real numbers, we can rigorously develop the notion of infinites and infinitesimals and use them to do calculus.

Throughout this section, fix \(\mathcal{L} = \{0, 1, +, -, \cdot, <\}\), the language of ordered fields. Likewise fix the \(\mathcal{L}\)-structure \(\R = \{\R, 0, 1, +, -, \cdot, <\}\) consisting of all the usual operations of the real numbers. Note that \(\R\) has all of its usual properties, even though we cannot state all of them in \(\mathcal{L}\). For instance, stating the least upper bound property requires being able to quantify over subsets of \(\R\), which we cannot do in a first order language like \(\mathcal{L}\).

9.1. Hyperreals

Let \(U\) be some non-principal ultrafilter on \(\omega\). Then \(\R^{*} = \R^{U}\) is known as the set of hyperreal numbers.

A hyperreal number is an equivalence class of sequences of real numbers. The usual real numbers canonically embed in the hyperreals. So, for example, \(d(0) = [(0, 0, 0, 0, \dots)]\). We will abuse notation and often identify real numbers with their image under the canonical embedding, as well as not distinguish, for example, the binary function symbol \(+\), from the function \(+^{\R}\), from the function \(+^{\R^{*}}\), and just use \(+\) for all of them.

By Łoś’s theorem, we immediately get that everything that is true of \(\R\) that is expressible in \(\mathcal{L}\), is also true of \(\R^{*}\). So, for example, the sentence \(\phi = \forall x\, \forall y\, x + y = y + x\) is true in \(\R\), so it is also true in \(\R^{*}\). As such, \(\R^{*}\) is a real ordered field, it has square roots of non-negative elements, every odd degree polynomial has a root, etc.

However, \(\R\) and \(\R^*\) are different. For instance, \(\R\) has the Archimedean Property (for every \(x \in \R\), there exists a natural number \(n\) such that \(n > x\)), while \(\R^{*}\) does not! For instance, consider the hyperreal number \(x = [(0, 1, 2, 3, 4, 5, \dots)]\). Then for any natural number \(n\), all but \(n + 1\) components of \(x\) are greater than \(n\), meaning \(x > n\). In fact, \(x\) is greater than any real number, making it an infinite number.

A hyperreal number \(x\) is called infinite if \(|x| > r\) for every real number \(r\)⁴. It is called infinitesimal if \(|x| < r\) for every positive real \(r\). Note that \(0\) is considered infinitesimal.

We already saw that \([(0, 1, 2, 3, 4, 5, \dots)]\) is infinite. Infinitesimals also exist. An example would be \(x = [(1, 1 / 2, 1 / 3, 1 / 4, \dots)]\).

There are some weird hyperreals. Consider \(x = [(1, -1, 1, -1, 1, -1, \dots)]\). Is \(0 < x\)? That depends on the choice of ultrafilter. Fortunately, as we will soon see, every hyperreal is equivalent to a more sane number. It is best not to worry about the sequence representation of hyperreals. There are much more convenient and sensible ways of working with hyperreals purely algebraically.

Let \(x, y\) be hyperreals. We say \(x \approx y\) if \(x - y\) is infinitesimal.

We will show that \(\approx\) is an equivalence relation, but it will help to have some algebraic tools first.

Let \(x\) be hyperreal. The monad of \(x\) is defined to be \[ \monad(x) = \{y \in \R^{*} \mid x \approx y\}. \] The galaxy of \(x\) is defined to be \[ \galaxy(x) = \{y \in \R^{*} \mid x - y \text{ is finite}\}. \]

It turns out that \(\galaxy(0)\) of finite elements is a subring of \(\R^{*}\) with \(\monad(0)\) of infinitesimals being another subring of \(\R^{*}\) and an ideal of \(\galaxy(x)\).

The set \(\galaxy(0)\) is a subring of \(\R^{*}\).

Let \(x, y\) be finite. Then there exist real numbers \(r, s\) such that \(|x| < r\) and \(|y| < s\). Then \(|x + y| \leq |x| + |y| < r + s\) so \(|x + y|\) is finite. Recall that, by Łoś’s theorem, all the usual facts about absolute value hold in the hyperreal numbers as well. Similarly, \(|xy| = |x| |y| < rs\) so \(xy\) is finite. Finally, \(|-x| = |x| < r\). Thus \(\galaxy(0)\) is closed under addition, negation, and multiplication as desired.

\(x - y\) being finite is an equivalence relation.

\(x - x\) = 0 is finite.
\(|x - y| = |y - x|\) so \(x - y\) is finite iff \(y - x\) is.
Suppose \(x - y\) and \(y - z\) are finite. Then \(x - z = (x - y) + (y - z)\) is the sum of two finite numbers and hence is finite.

\(\approx\) is an equivalence relation.

\(x - x = 0\) is infinitesimal.
\(|x - y| = |y - x|\), so \(x - y\) is infinitesimal iff \(y - x\) is.
Let \(r\) be a positive real number and suppose \(x \approx y\) and \(y \approx z\). We want to show that \(|z - x| < r\). Since \(x \approx y\), \(|y - x| < r / 2\). Likewise, \(|z - y| < r / 2\). Thus \(|z - x| = |(z - y) + (y - x)| \leq |z - y| + |y - x| < r\).

Then \(\monad(x)\) is the equivalence class of \(x\) modulo \(\approx\).

The set \(\monad(0)\) of infinitesimal hyperreals is a subring of \(\R^{*}\) and, furthermore, an ideal in \(\galaxy(0)\), that is to say sums, products, and differences of infinitesimals are infinitesimal, and the product of an infinitesimal and a finite number is infinitesimal.

Next, let \(a\) be finite and let \(r\) be any positive real number. Then \(|\epsilon| < r / |a|\), so \(|\epsilon a| = |\epsilon| |a|< r\).

Is it starting to seem nice yet? These infinitesimals play a very similar role to the usual \(\epsilon\)’s and \(\delta\)’s in analysis, but are smaller than every positive real number, so you don’t need to mess around with constantly updating and processing their bounds. In particular, the product of an infinitesimal \(\epsilon\) and a finite number \(a\) is infinitesimal, so if you want \(\epsilon a\) to be small, you don’t need to update your bounds on \(\epsilon\), you just get it to be small for free.

We still have those weird hyperreals that we saw earlier that are hard to characterize. Fortunately, it turns out that every finite hyperreal can be uniquely written as \(r + \epsilon\) for some real number \(r\) and infinitesimal \(\epsilon\).

Every finite hyperreal \(x\) is infinitely close to a unique real number.

First uniqueness. Suppose \(x \approx r\) and \(x \approx s\) for real numbers \(r\) and \(s\). Then \(r \approx s\), so \(r - s \approx 0\). But, since \(r - s\) is real, \(r - s = 0\).

Now for existence. Let \(X = \{s \in \R \mid s < x\}\). Since \(x\) is finite, there exists a real number \(r\) such that \(|x| < r\), so \(r\) is an upper bound for \(X\). Hence \(X\) has a least upper bound, say \(t\)⁵.

Then for any real \(\epsilon > 0\), \(t + \epsilon \notin X\), so \(t + \epsilon \geq x\), i.e. \(x - t \leq \epsilon\). Likewise, \(t - \epsilon \in X\), so \(t - \epsilon \leq x\), so \(-(x - t) \leq -\epsilon\). Putting these together, we have \(|x - t| < \epsilon\). This was true for every real \(\epsilon\), so \(x \approx t\) as desired.

Let \(x \in \R^{*}\). We call the unique \(r \in \R\) such that \(x \approx r\) the standard part, and denote it \(\st(x)\).

From here we can really get going. Here’s a list of some easy to prove facts about \(\st\). I encourage the reader to think these through, as they are very easy to prove, and it is very fun to explore these ideas.

\(x \approx y\) iff \(\st(x) = \st(y)\).
\(x \approx \st(x)\).
If \(r \in \R\), \(\st(r) = r\).
If \(x \leq y\), then \(\st(x) \leq \st(y)\).
If \(x = r + \epsilon\) and \(y = s + \delta\), then \(\st(x + y) = r + s\) and \(\st(x - y) = r - s\).
Likewise, \(\st(x \cdot y) = r \cdot s\)⁶.

Now that we are familiar with the algebra of infinitesimals and the hyperreals, we can use them to develop calculus.

9.2. Calculus

We won’t take this too far, though I strongly encourage you to play around with these ideas if you are interested and translate as much elementary analysis as you can into non-standard analysis, as it is very fun to do.

The fundamental idea of the derivative is considering how a function \(f : \R \to \R\) changes if its input is changed by an infinitesimal amount. We can very easily formulate this idea using infinitesimals.

A real number \(r\) is called the derivative of \(f : \R \to \R\) at \(x \in \R\) if for every nonzero infinitesimal \(\delta x\) we have \[ r = \st\left( \frac{f(x + \delta x) - f(x)}{\delta x} \right). \]

Note that this requires evaluating a real function on hyperreal inputs. For any function definable in \(\mathcal{L}\), that is to say all the interesting ones, we get its extension into \(\R^{*}\) for free by Łoś’s theorem.

We are very quickly leaving the realm of logic and model theory and entering the domain of nonstandard analysis, so, to quote Michael Penn, this is a good place to stop.

Footnotes:

One interesting consequence of taking this infinite general product is that ultraproducts are really big. The smallest nontrivial ultraproducts, nontrivial ultraproducts of countably many finite models, has cardinality continuum.

Casual usage of the Axiom of Choice here. This use is probably much less problematic than most, which aren’t even that problematic, as it is only actually needed in fairly extreme cases. For example, if there’s ever a constant symbol in \(\mathcal{L}\), say \(c\), then you can use \(c^{\mathcal{M}_i}\) as the default target. Even if there aren’t any constant symbols, like in set theory, the theory the structures model can guarantee the existence of certain elements, like the empty set. Even if the theory doesn’t guarantee the existence of any single element, like the theory of linear orders for example, you still may not need the Axiom of Choice if you know of at least one element in each structure. For instance, you might construct various linear orders as subsets of the real numbers and happen to know that every structure contains \(0\), even if the existence isn’t guaranteed by the theory under study.

Credit: this math stackexchange answer.

⁴

Note that \(|\cdot|\) is definable. In particular, it is true in \(\R\) that for every \(x\), there exists a unique \(y\) such that \(x \geq 0 \wedge x = y \vee x < 0 \wedge x = -y\). Therefore we can use the absolute value function without worry.

⁵

Using the least upper bound property means this only applies to the real numbers, rather than any \(\mathcal{L}\)-structure, or even any \(\mathcal{L}\)-structure modeling the theory of ordered real linear fields.

⁶

In proving this theorem, you should use the fact that the product of an infinitesimal and a real number is infinitesimal. Isn’t it nice not having to worry about massaging all the \(\epsilon\)’s?