The Logical Implication

Languages and models of a first-order logic.

Mort Yao


Prologue. This is the summary of a set of notes I took in KU’s Introduction to Mathematical Logic course, that I should have finished months ago but somehow procrastinated until recently. I basically followed Enderton’s A Mathematical Introduction to Logic book (with all its conventions of mathematical notations). A few issues to address in advance:

Remark 7.1. (Language of propositional logic) The language of propositional logic may be seen as a stripped form of first-order languages, in which parentheses, connective symbols and sentential symbols (the only parameters; may be treated as 0-place predicate symbols) are present. Intuitively, that language might seem too weak to encode our formal reasoning in all kinds of mathematics and many practical areas, so to speak.

Remark 7.2. (Metatheory and philosophical concerns) A first-order expression, as a finite sequence (also called a tuple), may be defined in terms of ordered pairs in axiomatic set theory. But we will not appeal to set theory in our first definitions of expressions in logic. (So far we have no notion about what a “set” formally is!)

A further concern is whether our definitions of terms and formulas are well established, that is, since we are defining the notions of terms and formulas inductively, would it be possible that there is a certain term or formula that is covered by our recursive definition, but can never be actually built using these operations? To show that first-order terms/formulas are well-defined, a beginner might try to prove these induction principles by mathematical induction on the complexities of terms/formulas, but that would rely on the fact that the set of natural numbers \(\omega\) is well-ordered so that we can apply induction on numbers; to justify things like this, it is essential to use set theory or second-order logic, which we don’t even have until we define a first-order logic. Thus, unavoidable circularity emerges if we try to look really fundamentally.

For now, we must appeal to a metatheory that we can easily convince ourselves by intuition, so that we will accept these induction principles and the notion of “naïve sets” (or collections, if we don’t want to abuse the formal term of sets too much). Notwithstanding, I believe that a prudent person can bootstrap theories like this without drawing in any inconsistency.

Remark 7.3. (Context freedom, unique readability and parentheses) Since the formations of first-order terms and formulas make use of context-free rules, one familiar with formal languages and automata theory might ask, “Are the set of terms/formulas context-free languages?” Generally they are not, since our set \(V\) of variables (samely for predicate and function symbols) could be infinitely (or even uncountably) large, but a context-free grammar requires that every set must be finite. However, in our first-order language \(\mathcal{L}\), if these symbols can be effectively decidable, then there is an algorithm that accepts terms or formulas (or parses them). Furthermore, such parses are guaranteed to be unique, as shown by the Unique Readability Theorems in Enderton p. 105ff. Indeed, the inclusion of parentheses in our first-order language enables us to write any formula unambiguously. If we leave out all the parentheses, does a formula like \(\forall x P x \to \lnot Q x\) mean \((\forall x P x \to (\lnot Q x))\) or \(\forall x (P x \to (\lnot Q x))\)? An alternative syntax would be to use logical connectives in a prefix manner, e.g., \(\to \forall x P x \lnot Q x\) and \(\forall x \to P x \lnot Q x\), but that is hardly as comprehensible as our chosen syntax.

Remark 7.4. (Abbreviations on notation) Why don’t we have the existential quantifier \(\exists\) and some other connectives such like \(\land\), \(\lor\) and \(\leftrightarrow\), in our language? Because any first-order formula that makes use of these symbols can be seen as syntactical abbreviations and should be rewritten using \(\forall\), \(\to\) and \(\lnot\), as will be shown. A deeper reason is that \(\{ \to, \lnot \}\) is a functionally complete set of Boolean algebraic operators that is sufficient to express all possible truth tables in propositional logic. On the other hand, a formula like \(\exists x \varphi\) is just \((\lnot \forall x (\lnot \varphi))\), following from our understanding of what an existential quantification is.

Remark 7.5. (Sentences and truth values) In propositional logic, we don’t generally know whether a formula evaluates to true until every sentential symbol is assigned a truth value. (Sometimes we can tell the truth value with a little less information than what is required, if we apply a so-called short-circuit evaluation strategy, e.g., if \(A_1\) is false then we immediately know \((A_1 \to A_2)\) is true, or if \(A_2\) is true then \((A_1 \to A_2)\) is also true. But it is not the general case, and one should expect to evaluate both \(A_1\) and \(A_2\) before getting the answer.) Similarly, in a first-order logic, every free variable needs to have a definite assignment so as to give rise to the truth value of a formula. This is done by specifying a function \(s\) (where \(\operatorname{dom} s = V\) is the set of all variables) as the assignment of variables, and when applying \(s\) to a formula \(\varphi\) we get \(\varphi[s]\), which is a sentence that has a definite meaning (i.e., no variable occurs free). Note that the assignment of variables alone is not sufficient to determine the truth of a sentence – For example, \((P x y \to P x f y)\ [s(x \,|\, 0)(y \,|\, 0)]\) is a sentence since no variable occurs free in it, but we can’t decide whether it is true because we don’t know what the predicate \(P\) and the function \(f\) are. If we say, \(P\) is the arithmetic “less-than” relation and \(f\) is the successor function \(f : x \mapsto x + 1\), then we can tell that this is a true sentence (in fact \(P x y\) is false as \(0 < 0\) is false, but \(P x f y\) is true as \(0 < 1\) is true, so the overall sentence as a conditional is true). We could write \(P\) and \(f\) as \(<\) and \(S\), but the conventional interpretation of these symbols should not be taken for granted as if every symbol comes with an inherited meaning – They don’t, until we give them meanings.

Remark 7.6. (Dichotomy of semantic truthness and the liar’s paradox) It should be made clear from the definition that given a structure and an assignment, either \(\models_\mathfrak{A} \varphi[s]\) (exclusive) or \(\not\models_\mathfrak{A} \varphi[s]\), but not both! It follows from our intuition that a statement is either semantically true or false; and there is no third possibility.

A problem arises with self-referential terms, woefully: Assume that we have a first-order language \(\mathcal{L}\) with a 1-place predicate symbol \(P\), and the structure \(\mathfrak{A}\) assigns it the domain \(|\mathfrak{A}| = \text{Formula}(\mathcal{L})\), \(P\) is interpreted as \(P^\mathfrak{A} = \{ \langle \sigma \rangle \,:\, \models \sigma \}\), that is, \(\sigma \in P^\mathfrak{A}\) iff \(\models \sigma\). Let the sentence \(\tau\) be \((\lnot P x)\) and the assignment \(s : V \to |\mathfrak{A}|\) maps the variable \(x\) to the sentence \(\tau\), then is \(\tau[s]\) true or false in \(\mathfrak{A}\)? If we take \(\tau[s]\) as true, that is, \((\lnot P \tau)\) is true, then \(P \tau\) must be false, so \(\tau \not\in P^\mathfrak{A}\) thus \(\not\models \tau\). If we take \(\tau[s]\) as false, that is, \((\lnot P \tau)\) is false, then \(P \tau\) must be true, so \(\tau \in P^\mathfrak{A}\) thus \(\models \tau\). This is known as the classical liar’s paradox. One possible way to resolve this (given by Alfred Tarski) is by disabling impredicativity in our structures; more precisely, one can define a semantic hierarchy of structures that allows us to predicate truth only of a formula at a lower level, but never at the same or a higher level. This matter is far beyond the scope of this summary, but the important lesson to learn here is that it is generally a bad idea to allow something both true and false in our semantics; it would put our enduring effort to cumulate all “mathematical truths” into void.

Remark 7.7. (Decidability of truth/validity) In propositional logic, it is easy to see that given a truth assignment of sentential symbols, every formula can be decided for its truth or falsehood. Moreover, even without any truth assignment, one can enumerate a truth table to find out whether a given formula is a tautology. Truth and validity are decidable in propositional logic. However, this is often not the case in first-order logic: In order to decide whether a sentence is true, one needs to find the truth values of all prime formulas (i.e., formulas like \(P t_1 \cdots t_n\) and \(\forall v_i \psi\)) first, but the domain \(|\mathfrak{A}|\) may be an (uncountably) infinite set, thus makes it impossible to mechanically check the universal quantification for all members; moreover, the functions used in building terms may not be Turing-computable at all. To decide the validity of a sentence, we have to check its truth in all structures of the language (whose set may also be uncountably large), and that is an even more impossible task.2

If semantic truth/validity is generally undecidable, how do we say that some formula is true in a predefined structure? Well, we can’t, in most general cases, since an infinite argument of truth is a useless argument (you can’t present it to someone / some Turing machine, as no physical device is capable of handling such an infinite object). Fortunately, there is a feasible way to say something is true, without appealing to any specific structures (that may give rise to unwanted undecidability), and that is called a formal deduction (also called a proof, expectedly).

Remark 7.8. (Validity of logical axioms) It should be intuitively clear that all logical axioms are convincing, and that their validity can be argued without appealing to any specific model. In particular, for an axiom \(\theta \in \Lambda\), there is \(\vdash \theta\); we must be able to argue (in our meta-language) that \(\models \theta\), so that we can be convinced that our deductive system is a sound one. Remember that for any formula \(\varphi\), either \(\models \varphi\) or \(\not\models \varphi\) (which is just \(\models (\lnot \varphi)\)). If a proof of \(\theta\) (not as a formal deduction, but as an argument in our meta-language) does not even imply \(\models \theta\), that would be very frustrating.

Remark 7.9. (Tautological implication, logical implication and deduction) If \(\Gamma \models_t \varphi\) (i.e., \(\varphi\) is tautologically implied by \(\Gamma\) in propositional logic), we can argue that \(\Gamma \models \varphi\) when replacing sentential symbols by prime formulas in first-order logic. In the special case that \(\Gamma = \emptyset\), we are proving the validity of Axiom Group 1: \(\models_t \alpha \implies \models \alpha\) (every tautology is valid). The converse does not hold though, since we have \(\models (\alpha \to \forall x \alpha)\) (by Axiom Group 4), but \(\not\models_t (\alpha \to \forall x \alpha)\) as \(\alpha\) and \(\forall x \alpha\) are two different sentential symbols (surely \((A_1 \to A_2)\) is not a tautology in propositional logic!).

It is worth noticing that even though \(\Gamma \models \varphi \not\implies \Gamma \models_t \varphi\), we do have \(\Gamma \models \varphi \iff \Gamma \cup \Lambda \models_t \varphi\). Intuitively, the set \(\Lambda\) of logical axioms gives us a chance to establish truths about quantifiers and equalities (other than treating these prime formulas as sentential symbols that are too unrefined for our first-order logic). I haven’t done a proof of this, but I believe it should be non-trivial on both directions. Combining with Theorem 24B in Enderton p. 115, we get the nice result concluding that \[\Gamma \vdash \varphi \iff \Gamma \cup \Lambda \models_t \varphi \iff \Gamma \models \varphi\] which entails both the soundness and the completeness theorems. It is basically saying that these three things are equivalent:

  1. \(\Gamma\) proves \(\varphi\). (There is a formal deduction that derives \(\varphi\) from \(\Gamma\) and axioms \(\Lambda\) in our deductive system; it is finite and purely syntactical, in the sense that it does not depend on any structure or assignment of variables.)
  2. \(\Gamma\) logically implies \(\varphi\). (For every structure and every assignment of variables, \(\varphi\) holds true given \(\Gamma\). Of course, this is a semantical notion in the sense that it does involve structures and assignments of variables, which are infinite in numbers so it would be impossible for one to check this mechanically.)
  3. \(\Gamma \cup \Lambda\) tautologically implies \(\varphi\). (We can reduce a first-order logic to propositional logic by adding logical axioms to the set of hypotheses, preserving all truths. For each prime formula, this is still a semantical notion for its truth value depends on structures / assignments of variables.)

Remark 7.10. (Soundness and completeness) The soundness and the completeness theorems together evidence the equivalence of consistency and satisfiability of a set of formulas, or that of validity and provability of a formula. The completeness theorem is by no means an obvious result; the first proof was given by Kurt Gödel in 19303, but the proof that we use today is given by Leon Henkin in 1949 [1], which easily generalizes to uncountable languages.

Remark 7.11. (Completeness and incompleteness) Note that the completeness theorem should not be confused with Gödel’s incompleteness theorems. Specifically, the completeness theorem claims that (unconditionally) every formula that is logically implied by \(\Gamma\) is also deducible from \(\Gamma\) (i.e., \(\Gamma \models \varphi \implies \Gamma \vdash \varphi\)), while the first incompleteness theorem claims that (under some conditions) some consistent deductive systems are incomplete (i.e., there is some formula \(\varphi\) such that neither \(\Gamma \vdash \varphi\) nor \(\Gamma \vdash (\lnot\varphi)\)). As is clearly seen, the incompleteness theorem is purely syntactical and matters for provability (or decidability, one might say). The aforementioned liar’s paradox, where our semantics raises a contradiction that neither \(\Gamma \models \varphi\) nor \(\Gamma \models (\lnot\varphi)\) reasonably holds, may be seen as a semantical analogue of the first incompleteness theorem.

Equality is logical

The equality symbol \(=\) is a logical symbol, in the sense that the equivalence relation it represents is not open to interpretation and always means what it’s intended to mean (i.e., “the LHS is equal to the RHS”). But if so, how do we say \[1+1=2\] is a true sentence then? Can’t we just interpret the equality symbol as something else in a structure \(\mathfrak{A}\) such that \(\models_\mathfrak{A} (\lnot 1+1=2)\)?

One reason is that in many first-order systems, functions (operations) are defined as axioms using equalities; we need a general way to say “something is defined as…” or just “something is…” There would be no better way of saying this rather than representing it as an equality, so we won’t have the hassle of interpreting a made-up relation in every model. Consider the language of elementary number theory, in the intended model \(\mathfrak{N} = (\mathbb{N}; \mathbf{0}, \mathbf{S}, +, \cdot)\) of Peano arithmetic, the addition function is defined as a set of domain-specific axioms: \[\begin{align*} a + \mathbf{0} &= a &\qquad(1) \\ a + \mathbf{S} b &= \mathbf{S} (a + b) &\qquad(2) \end{align*}\] By Axiom Group 6 we have \(\mathbf{S0} + \mathbf{0} = \mathbf{S0} \to (\mathbf{S0} + \mathbf{S0} = \mathbf{S}(\mathbf{S0} + \mathbf{0}) \to \mathbf{S0} + \mathbf{S0} = \mathbf{S}\mathbf{S0})\). By (1) \(\mathbf{S0} + \mathbf{0} = \mathbf{S0}\). By (2) \(\mathbf{S0} + \mathbf{S0} = \mathbf{S}(\mathbf{S0} + \mathbf{0})\). Applying modus ponens twice, we get \(\mathbf{S0} + \mathbf{S0} = \mathbf{S}\mathbf{S0}\), which is the result we want (sloppily written as \(1+1=2\)).

The equality is a little special, since it is the most common relation with reflexivity, as justified by Axiom Group 5, i.e., \(x = x\) for any variable \(x\). We could exclude these from our logical axioms, but in many cases we would still need a reflexive relation symbol to denote equivalence (justified by some domain-specific axioms) otherwise. Technically, it would be convenient to just treat it as a logical symbol (together with the useful Axiom Groups 5 and 6). Note that though our logical axioms did not say anything about other properties like symmetry, antisymmetry and transitivity, they follow easily from Axiom Groups 5 and 6, in our deductive system:

Lemma 7.12. (Symmetry) If \(x = y\), then \(y = x\).

Proof. Given \(x = y\). By Axiom Group 5 we have \(x = x\). By Axiom Group 6 we have \(x = y \to (x = x \to y = x)\). Applying modus ponens twice, we get \(y = x\).

Lemma 7.13. (Transitivity) If \(x = y\) and \(y = z\), then \(x = z\).

Proof. Given \(x = y\), by symmetry it holds \(y = x\). Also \(y = z\). By Axiom Group 6 we have \(y = x \to (y = z \to x = z)\). Applying modus ponens twice, we get \(x = z\).

Lemma 7.14. (Antisymmetry) If \(x = y\) and \(y = x\), then \(x = y\).

Proof. Trivial.

Note that if any partial order is definable on a structure, the equality symbol may not be indispensable in our language, e.g., consider the language of set theory, where the equivalence of two sets \(x = y\) may be defined as \[(\forall v (v \in x \to v \in y) \land \forall v (v \in y \to v \in x))\]

The limitation of first-order logic

Consider again the language of elementary number theory, in the intended model \(\mathfrak{N} = (\mathbb{N}; \mathbf{0}, \mathbf{S}, +, \cdot)\) of Peano arithmetic, we have the set of all true sentences \(\text{Th}\mathfrak{N}\).4 Now we add a new constant symbol \(c'\) to our language, and extend our theory with a countable set of sentences \(\text{Th}\mathfrak{N} \cup \{ \underbrace{\mathbf{S} \cdots \mathbf{S}}_{n\ \text{times}} \mathbf{0} < c' \,:\, n \in \mathbb{N} \}\) (Here we define \(x < y\) as \(\lnot\forall z ((\lnot z = \mathbf{0}) \to (\lnot x + z = y))\)). Is there still a model for this extended theory?

For any given \(n \in \mathbb{N}\), the sentence \(\underbrace{\mathbf{S} \cdots \mathbf{S}}_{n\ \text{times}} \mathbf{0} < c'\) is clearly satisfiable (by simply taking \(c' = \mathbf{S} n\)). Then it is easily shown that every finite subset \(\Gamma_0 \subseteq \text{Th}\mathfrak{N} \cup \{ \underbrace{\mathbf{S} \cdots \mathbf{S}}_{n\ \text{times}} \mathbf{0} < c' \,:\, n \in \mathbb{N} \}\) is satisfiable. By the compactness theorem (b.), \(\text{Th}\mathfrak{N} \cup \{ \underbrace{\mathbf{S} \cdots \mathbf{S}}_{n\ \text{times}} \mathbf{0} < c' \,:\, n \in \mathbb{N} \}\) is also satisfiable. This means that we can construct a non-standard model of arithmetic \(\mathfrak{N}'\) with an additional bizarre element (specifically \(c'_0\)) that turns out to be greater than any other number (thus the model of this theory is not isomorphic to our standard model \(\mathfrak{N}\)).

Recall that in a Peano arithmetic modeled by \(\mathfrak{N}'\), numbers are closed under the successor function \(\mathbf{S}\). More precisely, if \(k \in |\mathfrak{N}'|\) , then \(\mathbf{S}k \in |\mathfrak{N}'|\) and \(\mathbf{S}k \neq \mathbf{0}\). This implies that not only \(c'_0 \in |\mathfrak{N}'|\), but also \(\mathbf{S} c'_0, \mathbf{S}\mathbf{S} c'_0, \mathbf{S}\mathbf{S}\mathbf{S} c'_0, \dots\) are all non-standard numbers in \(|\mathfrak{N}'|\). As none of these numbers is equal to \(\mathbf{0}\) (by one of Peano axioms), they form an infinite “chain” separately besides our familiar standard ones. One can write down all the (standard and non-standard) numbers sloppily as the following sequence: \[\langle 0, 1, 2, \dots, \quad c'_0, c'_1, c'_2, \dots \rangle\] where \(0\) is just \(\mathbf{0}\), \(1\) is \(\mathbf{S0}\), \(2\) is \(\mathbf{SS0}\), \(c'_1\) is \(\mathbf{S} c'_0\), \(c'_2\) is \(\mathbf{SS} c'_0\), etc.

Clearly, every number but \(0\) and \(c'_0\) in the sequence has a unique predecessor. There is certainly no such a predecessor \(j\) of \(0\), because otherwise we would have \(\mathbf{S}j = \mathbf{0}\), contradicting our axioms. But can we have a predecessor of \(c'_0\)? There is no axiom preventing us from constructing that thing! So here we go, enlarge our collection of numbers to: \[\langle 0, 1, 2, \dots, \quad \dots, c'_{-2}, c'_{-1}, c'_0, c'_1, c'_2, \dots \rangle\] where for each \(c'_{i}\), \(c'_{i+1} = \mathbf{S} c'_{i}\). Again, we know that every such non-standard numbers \(c'_i\) is greater than any standard number \(n\) (otherwise we can find a standard number \(n-i\) such that \((\lnot n-i < c'_0)\), contradicting our initial construction of \(c'_0\) by compactness. So the non-standard part is still a separate chain, thus as written above.

We can go even further. Let \(|\mathfrak{N}'|\) be this set of standard and non-standard numbers, and \(\mathfrak{N}' = (|\mathfrak{N}'|; \mathbf{0}, \mathbf{S}, +, \cdot)\) is still the intended model of Peano arithmetic on \(|\mathfrak{N}'|\). Consider adding yet another constant symbol \(c''\). Is \(\text{Th}\mathfrak{N}' \cup \{ \underbrace{\mathbf{S} \cdots \mathbf{S}}_{n'\ \text{times}} \mathbf{0} < c'' \,:\, n' \in |\mathfrak{N}'| \}\) satisfiable? By the same reasoning as before, we get a model \(\mathfrak{N}''\), with its domain of numbers \[\langle 0, 1, 2, \dots, \quad \dots, c'_{-2}, c'_{-1}, c'_0, c'_1, c'_2, \dots, \quad \dots, c''_{-2}, c''_{-1}, c''_0, c''_1, c''_2, \dots \rangle\]

Counter-intuitively, this is not the kind of “arithmetic” we are used to. What we are trying to do is to formulate the arithmetic for standard natural numbers that we use everyday (i.e., \(0, 1, 2, \dots\)) in first-order logic, but these non-standard numbers come out of nowhere and there is an infinite “stage” of them, such that each number in a latter stage is greater than every number in a former stage (How is that even possible?). And what is worse, in a subset of the non-standard model \(\mathfrak{N}'\): \[\{ \dots, c'_{-2}, c'_{-1}, c'_0, c'_1, c'_2, \dots \}\] There is no minimal element with respect to the intended ordering relation \(<\), thus it is not well-ordered by \(<\), so our good old mathematical induction will no longer work with non-standard numbers.

Well, the lucky part is that we can at least have the induction axiom as a first-order sentence: \[\varphi(\mathbf{0}, y_1, \dots, y_k) \land \forall x (\varphi (x, y_1, \dots, y_k) \to \varphi (\mathbf{S}(x), y_1, \dots, y_k)) \to \forall x \varphi(x, y_1, \dots, y_k))\] Since the standard model \(\mathfrak{N}\) and the non-standard model \(\mathfrak{N}'\) are elementarily equivalent (i.e., they satisfy the same sentences in a language excluding non-standard numbers), we still enjoy the nice induction principle for all of standard natural numbers. But for the non-standard part, we’re out of luck.

Ideally, we would like to have a bunch of axioms that perfectly defines the model of arithmetic, where no non-standard part is allowed to exist, i.e., the set of numbers is well-ordered by a definable ordering relation \(<\) so that we can apply the induction principle on all of them.5 Unfortunately, this is infeasible in first-order logic (without formulating the notion of sets). We will demonstrate two potential resolutions.

The intuition here is that in order to rule out any non-standard chains of numbers, we must find a 1-place predicate \(P \subseteq |\mathfrak{N}|\) such that for every standard number \(n\) we have \(P n\), distinguishing it from \((\lnot P n')\) where \(n'\) is non-standard. Certainly \(\mathbf{0}\) is a standard number; consequently every standard number \(x\) is followed by \(\mathbf{S}x\), which is also a standard one. \[P \mathbf{0} \land \forall x (P x \to P \mathbf{S} x)\] Once we have this \(P\) in mind, we say that every number \(n \in P\), so it is also standard. There would be no other numbers in our model, thus define our set of all numbers: \[\forall n P n\] Notice that \(P\) is not in our language; it is something we have yet to construct for our standard model. How to deal with this issue?

  1. The first approach is by allowing quantification over relations. So we can say “for all such relations \(P\), it holds that \(\forall n P n\)”. Formally: \[\forall P (P \mathbf{0} \land \forall x (P x \to P \mathbf{S} x)) \to \forall n P n\] Of course, in our previous definition of first-order languages, for \(\forall v_i \psi\) to be a well-formed formula, \(v_i\) is a variable such that \(v_i \in |\mathfrak{N}|\) given a model \(\mathfrak{N}\); here we have \(P \subseteq |\mathfrak{N}|\) hence \(P \in \mathcal{P}(|\mathfrak{N}|)\). So in a first-order logic we would not be able to do this (we can only quantify a variable in the domain \(|\mathfrak{N}|\)). This approach leads to a second-order logic (where we can not only quantify over a plain variable in \(|\mathfrak{N}|\), but also quantify over a relation variable in the power set of the domain, i.e., \(\mathcal{P}(|\mathfrak{N}|)\); that gives our logic more expressive power!).
  2. As we see, a relation is essentially a subset of \(|\mathfrak{N}|\) (thus its range is also a set); it is tempting to formulate Peano arithmetic using the notion of sets. Indeed, we can rewrite the formula in Approach 1 into the language of set theory as: \[\forall y (\emptyset \in y \land \forall x (x \in y \to S(x) \in y)) \to \forall n\ n \in y\] where we encode the standard number \(\mathbf{0}\) as \(\emptyset\), \(\mathbf{S}x\) as \(S(x) = x \cup \{x\}\). Clearly there is no non-standard number in this set-theoretic model. This is exactly how we define natural numbers \(\mathbb{N}\) (as a minimal inductive set \(\omega\)) in set theory, and its existence is justified by the so-called axiom of infinity. Note that once we introduce the set theory (using a first-order language), we have the equivalently expressive (sometimes paradoxical) power of any arbitrary higher-order logic. Fundamentally.6

References and further reading


Herbert B. Enderton, A Mathematical Introduction to Logic, 2nd ed.

Kenneth Kunen, The Foundations of Mathematics.


Terence Tao, “The completeness and compactness theorems of first-order logic,”

Asger Törnquist, “The completeness theorem: a guided tour,”

Eliezer Yudkowsky, “Godel’s completeness and incompleteness theorems,”

Eliezer Yudkowsky, “Standard and nonstandard numbers,”

David A. Ross, “Fun with nonstandard models,”


[1] L. Henkin, “The completeness of the first-order functional calculus,” The journal of symbolic logic, vol. 14, no. 3, pp. 159–166, 1949.

  1. Is there a constructive approach as a replacement of Henkin’s construction?

  2. We are using the notion of decidability/undecidability here even before we get to Gödel’s incompleteness theorem, but hopefully it’s no stranger to us as computability theory has all the model-specific issues (though non-logical) covered.

  3. Gödel’s original proof of the completeness theorem:

  4. It might be interesting to know that \(\text{Th}(\mathbb{N}; \mathbf{0}, \mathbf{S}, +, \cdot)\) is an undecidable theory, as shown by the aforementioned incompleteness theorem.

  5. If we accept the Axiom of Choice, then every set can have a well-ordering; so this is actually a reasonable request.

  6. Many logicians (Kurt Gödel, Thoralf Skolem, Willard Van Quine) seem to adhere to first-order logic. And that’s why.