In a listless afternoon I questioned myself: What are some of the most neglected theorems that deserve more attention? Many famous old theorems in mathematics as I recalled, are about some kinds of impossibilities: \(\sqrt{2}\) is irrational, \(\pi\) is transcendental, polynomial equations of degree 5 or higher are generally insolvable, squaring the circle is impossible, Fermat’s last theorem, etc; by arguing about impossibility they’ve landscaped lots of math subfields in a revolutionary way. Compared to mathematics, formal logic is a relatively young field, not really fruitful before Frege, and the study on the theory of computation is even less cultivated, with quite a bunch of landmarking results yet underrated or poorly informed. Among those neglected theorems in logic and computation, I decided to do a writeup on each one I ever heard of, to help myself understand them better. Undoubtedly, they ought to be brought to the limelight and made better understood by working mathematicians and programmers nowadays.
It should be put in the first paragraph of the preface to The Book on logic, if there could be one, that truth in any language cannot be defined in the same language. We have the classical liar’s sentence which says that “This sentence is false”. If this kind of inconsistency does not seem too disastrous to bug you yet, as you may guarantee not to involve yourself into any circular reasoning, then think about the following truthteller’s sentence:
If this sentence is true, then God exists.
Symbolically,
\[\sigma = (\text{True}(\sigma) \to \exists x\,\text{Godlike}(x)) \]
If you take this sentence to be false, then since it is a material conditional the only way it can ever be false is that the antecedent (\(\text{True}(\sigma)\)) is true but the consequent (\(\exists x\,\text{Godlike}(x)\)) is false. However, from that \(\text{True}(\sigma)\) is true we would immediately get that \(\sigma\) is true, which contradicts our assumption that \(\sigma\) is false. On the other hand, if you take this sentence to be true, then everything should work perfectly: \(\text{True}(\sigma)\) is true, \(\exists x\,\text{Godlike}(x)\) is also true, therefore the whole sentence \(\sigma\) is true, which just verifies the fact that \(\text{True}(\sigma)\) is true. Therefore, to maintain logical consistency in our reasoning, the only choice for a rational person is to accept the truth of \(\exists x\,\text{Godlike}(x)\), i.e., “God exists.”
This is not really about the ontological argument of God’s existence; it’s about how the inherent inconsistency of an unrestricted language can lead to any random truth trivially (ex contradictione quodlibet). If you believe that the above argument is all reasonable and God indeed exists, try yourself on the following sentence:
If this sentence is true, then Satan rules the world.
Let’s say, get it over since natural language is so tricky and facile but no one is really hurt by its expressive power.^{1} The sure thing is, no serious mathematician wants to derive ridiculous random facts (e.g., if some selfreferential proposition is true, then “\(2 + 2 = 3\)”), so in a formal language we are obliged to have a metatheorem like:
Undefinability theorem (Tarski 1936). Let \(\mathcal{L}\) be a formal language, and let \(\mathfrak{A}\) be an \(\mathcal{L}\)structure, with its theory \(\operatorname{Th}\mathfrak{A}\) capable of representing all computable functions. A Gödel numbering \(\mathfrak{g}(\phi)\) is defined for every \(\mathcal{L}\)formula \(\phi\). There is no formula \(\text{True}(n)\) such that for every \(\mathcal{L}\)formula \(\varphi\), \(\text{True}(\mathfrak{g}(\varphi)) \leftrightarrow \varphi\).
Before proceeding with the proof of Tarski’s undefinability theorem, we get our terminology straight first and prove a very important lemma, namely the diagonal lemma. The definitions of language, structure and theory are already made in Mst. #7. The capability of representing all computable functions can be informally understood as, for every Turingcomputable function \(f : x \mapsto y\), there is an \(\mathcal{L}\)formula \(\gamma_f\) such that \(\gamma_f(x, y) \in \operatorname{Th}\mathfrak{A} \leftrightarrow y = f(x)\).
A Gödel numbering in a formal language \(\mathcal{L}\) is an injective function \(\mathfrak{g} : \Phi \to \mathbb{N}\) that maps every \(\mathcal{L}\)formula to a unique natural number \(n \in \mathbb{N}\). It may be defined in any way that satisfies the desired injective property. For example, in a firstorder language \(\mathcal{L}_\text{Set}\) consisting of only variables \(x_i\) (\(i = 0, 1, 2, \dots\)), two connective symbols \(\{ \to, \neg \}\), the equality symbol \(=\), the quantifier symbol \(\forall\) and a predicate symbol \(\in\), one possible Gödel numbering function may be defined inductively as
\[\begin{align*} \mathfrak{g}(x_i) &= 2^0 \cdot 3^i \\ \mathfrak{g}(= t_1 t_2) &= 2^1 \cdot 3^{\mathfrak{g}(t_1)} \cdot 5^{\mathfrak{g}(t_2)} \\ \mathfrak{g}(\in t_1 t_2) &= 2^2 \cdot 3^{\mathfrak{g}(t_1)} \cdot 5^{\mathfrak{g}(t_2)} \\ \mathfrak{g}((\neg \psi)) &= 2^3 \cdot 3^{\mathfrak{g}(\psi)} \\ \mathfrak{g}((\psi \to \theta)) &= 2^4 \cdot 3^{\mathfrak{g}(\psi)} \cdot 5^{\mathfrak{g}(\theta)} \\ \mathfrak{g}(\forall x_i \psi) &= 2^5 \cdot 3^i \cdot 5^{\mathfrak{g}(\psi)} \\ \end{align*}\](*)
Claim 1. (*) defines a Gödel numbering \(\mathfrak{g}\) on the language \(\mathcal{L}_\text{Set}\). That is, for any \(\mathcal{L}_\text{Set}\)formulas \(\varphi_1\) and \(\varphi_2\), if \(\mathfrak{g}(\varphi_1) = \mathfrak{g}(\varphi_2)\), then \(\varphi_1 \cong \varphi_2\).
Proof idea. By structural induction on the formula and the fundamental theorem of arithmetic (every integer greater than 1 has a unique primefactorization).
Example 2. The Gödel number of formula \(\forall x_0 (\neg \in x_0 x_0)\) using (*) is
\[\begin{align*} \mathfrak{g}(\forall x_0 (\neg \in x_0 x_0)) &= 2^5 \cdot 3^0 \cdot 5^{\mathfrak{g}(\neg \in x_0 x_0)} = 2^5 \cdot 3^0 \cdot 5^{2^3 \cdot 3^{\mathfrak{g}(\in x_0 x_0)}} = 2^5 \cdot 3^0 \cdot 5^{2^3 \cdot 3^{2^2 \cdot 3^{2^0 \cdot 3^0} \cdot 5^{2^0 \cdot 3^0}}} \\ &= 2^5 \cdot 3^0 \cdot 5^{2^3 \cdot 3^{60}} \\ &= 32 \cdot 5^{339129266201729628114355465608} \end{align*}\]Note that the choice of our numbering function is purely technical; it doesn’t make any difference if we choose another set of prime numbers \(\{5, 7, 11\}\) instead of \(\{2, 3, 5\}\) as basis, as long as the decomposition of a Gödel number is unique. (Of course, such numbers would be even bigger.)
So, what is the whole point of this numbering thing, you might ask? As is shown in Example 2, a simple formula can yield a Gödel number which is insanely large (no kidding, even much greater than the total number of atoms in the universe!). We don’t actually use these numbers; we construct them just to convince ourselves that we can encode virtually any theory into the theory of arithmetic, or literally, strings into natural numbers. The overwhelming cost (complexity) is not of our concern for now. Hence, from now on, we can virtually reduce any problem in a formal language (set theory, type theory, analysis, geometry, …) to an arithmetical problem, i.e., a problem simply about manipulating and deciding on natural numbers.
Diagonal lemma (Carnap 1934). Let \(\mathcal{L}_\text{A}\) be a firstorder language of arithmetic, and let \(\mathfrak{A}\) be an \(\mathcal{L}_\text{A}\)structure, with its theory \(\operatorname{Th}\mathfrak{A}\) capable of representing all computable functions. Let \(\phi\) be an \(\mathcal{L}_\text{A}\)formula with one free variable. There is a sentence \(\psi\) such that \(\psi \leftrightarrow \phi(\mathfrak{g}(\psi))\) in \(\operatorname{Th}\mathfrak{A}\).
Proof. Let \(f\) be a total function defined on \(\mathfrak{A} = \mathbb{N}\): \[f(\mathfrak{g}(\theta)) = \mathfrak{g}(\theta(\mathfrak{g}(\theta)))\] for each \(\mathcal{L}_\text{A}\)formula \(\theta\) with one free variable; otherwise, define \(f(n) = 0\).
Since \(f\) is a computable function, there is a formula \(\gamma_f\) representing \(f\) in \(\operatorname{Th}\mathfrak{A}\). That is, for each formula \(\theta\), we have in \(\operatorname{Th}\mathfrak{A}\) \[\forall y\,(\gamma_f(\mathfrak{g}(\theta), y) \leftrightarrow y = f(\mathfrak{g}(\theta)))\] which is just \[\forall y\,(\gamma_f(\mathfrak{g}(\theta), y) \leftrightarrow y = \mathfrak{g}(\theta(\mathfrak{g}(\theta))))\] Define the formula \(\beta\) (with one free variable) as \[\beta(z) = \forall y\,(\gamma_f(z,y) \to \phi(y))\] Then we have in \(\operatorname{Th}\mathfrak{A}\) \[\beta(\mathfrak{g}(\theta)) \leftrightarrow \forall y\,(y = \mathfrak{g}(\theta(\mathfrak{g}(\theta))) \to \phi(y))\] which is just \[\beta(\mathfrak{g}(\theta)) \leftrightarrow \phi(\mathfrak{g}(\theta(\mathfrak{g}(\theta))))\] Since \(\theta\) can be any \(\mathcal{L}_\text{A}\)formula, let \(\theta = \beta\). Then we have in \(\operatorname{Th}\mathfrak{A}\) \[\beta(\mathfrak{g}(\beta)) \leftrightarrow \phi(\mathfrak{g}(\beta(\mathfrak{g}(\beta))))\] Let \(\psi = \beta(\mathfrak{g}(\beta))\). Thus we have in \(\operatorname{Th}\mathfrak{A}\) \[\psi \leftrightarrow \phi(\mathfrak{g}(\psi))\] Therefore we obtain \(\psi\) which is the required sentence (“fixed point”).
■
Now we are all set to prove the undefinability theorem:
Proof. By reductio ad absurdum: Suppose that there is an \(\mathcal{L}\)formula \(\text{True}(n)\) such that for every \(\mathcal{L}\)formula \(\varphi\), \(\text{True}(\mathfrak{g}(\varphi)) \leftrightarrow \varphi\). In particular, if \(\sigma\) is a sentence, then \(\text{True}(\mathfrak{g}(\sigma))\) holds in \(\mathfrak{A}\) iff \(\sigma\) is true in \(\mathfrak{A}\). Thus for all \(\sigma\), \(\text{True}(\mathfrak{g}(\sigma)) \leftrightarrow \sigma\) holds in \(\operatorname{Th}\mathfrak{A}\). However, by the diagonal lemma, there is also a sentence \(\psi\) such that \(\psi \leftrightarrow \neg \text{True}(\mathfrak{g}(\mathfrak{\psi}))\) in \(\operatorname{Th}\mathfrak{A}\). That is a contradiction. Therefore, such a formula \(\text{True}(n)\) does not exist.
■
Tarski’s undefinability theorem, along with the auxiliary diagonal lemma, showed some really fundamental limitative results in formal logic, and even more. The diagonal lemma obtained its name from the ingenious Cantor’s diagonal argument, which reveals the wellknown fact that the set \(\mathbb{R}\) of all real numbers is uncountable. The similar trick of diagonalization has been exploited in multiple contextually different but essentially interwoven arguments, e.g., Gödel’s incompleteness theorems, and that the halting problem is undecidable. The categorytheoretic generalization of this lemma is known as Lawvere’s fixedpoint theorem, applicable in any cartesian closed category.
Let’s consider an informal but intuitive connection between the undefinability theorem and the (renowned) incompleteness theorem. Given the fact that the universal truthtelling formula \(\text{True}(n)\) is undefinable in a formal language, one may wonder, “If some statement is true, then by the completeness theorem of firstorder logic it can always be proven true. But if truth is not even definable at all, how do I know if any statement could as well be proven false?”
Here’s the thing. We don’t. If we do, we would be able to tell truth from falsehood completely and provably. That is to say, for every formula \(\varphi\) in our language, there would be either a proof (\(\vdash \varphi\)) or a disproof (\(\vdash \varphi \to \bot\) or \(\vdash \neg \varphi\)) of it; the class of all such proofs would just make a perfect \(\text{True}(n)\), while \(\text{True}(\mathfrak{g}(\varphi))\) holds if and only if \(\varphi\) has a proof! That would contradict the undefinability theorem. So necessarily, we must take for granted that there could be some falsehood in a logical system (which is powerful enough to represent all computable functions via Gödel numbering, e.g., a number theory of elementary arithmetic) of which existence we can never disprove (because otherwise we could have defined that \(\text{True}(n)\)). Plug this metastatement into Löb’s theorem: (Let the modal operator \(\square\) denote the modality of provability, and let the constant \(\bot\) denote the falsehood.) \[\begin{align*} & \square (\square \bot \to \bot) \to \square \bot \\ \iff & \square (\neg \square \bot) \to \square \bot &\qquad\text{(Definition of negation)} \\ \iff & \square (\neg \square \bot) \to \bot &\qquad\text{(Axiom T and transitivity)}\\ \iff & \neg \square (\neg \square \bot) &\qquad\text{(Definition of negation)}\\ \end{align*}\] which informally says “It is not provable that falsehood is not provable.” which is just another way of claiming that \(\mathfrak{N} \not\vdash \operatorname{Cons}\mathfrak{N}\), the consistency of the logical system cannot be proved from within the system itself, a.k.a. Gödel’s second incompleteness theorem.
Digressing a little from our main topic, formal logic, let’s consider the application of the undefinability theorem to programming language theory as a dessert. It is well understood that, by the CurryHoward correspondence, a logical formula corresponds to a type, and a type checker of a programming language guarantees that every welltyped program consists of only types that correspond to valid formulas. If you write some sophisticated type and its type checking succeeds, the underlying formula must be valid as you have just proved that (socalled “programs are proofs”). Now, I challenge you, given a hypothetical, reasonably expressive programming language with full dependent types, can you write a typechecked selfinterpreter in it? Once you succeed, you would have proved that you can actually tell valid formulas (types) from false ones in all cases (by writing a selfinterpreter that accepts only welltyped programs), thus effectively define the truth in the language itself. By the undefinability theorem, that would be impossible (otherwise your underlying logical system can’t be consistent!). A slightly different but related result is that a total programming language can’t have a selfinterpreter; since a total language is not necessarily equipped with a type system or alike prooftheoretic constructions, the undefinability theorem may not directly apply, but a computabilitytheoretic argument of this is analogously diagonal (see [5]).^{2}
[1] Wikipedia, “Diagonal lemma”. https://en.wikipedia.org/wiki/Diagonal_lemma
[2] Wikipedia, “Tarski’s undefinability theorem”. https://en.wikipedia.org/wiki/Tarski%27s_undefinability_theorem#General_form_of_the_theorem
[3] nLab, “Lawvere’s fixed point theorem”. https://ncatlab.org/nlab/show/Lawvere%27s+fixed+point+theorem
[4] nLab, “Löb’s theorem”. https://ncatlab.org/nlab/show/Löb%27s+theorem
[5] Andrej Bauer, “Answer to: A total language that only a Turing complete language can interpret”. https://cstheory.stackexchange.com/a/24994/21291
[6] Matt Brown and Jens Palsberg, “Breaking Through the Normalization Barrier: A SelfInterpreter for Fomega”. http://compilers.cs.ucla.edu/popl16/popl16full.pdf
[7] Jason Gross, Jack Gallagher and Benya Fallenstein, “Löb’s Theorem – A functional pearl of dependently typed quining”. https://jasongross.github.io/lobpaper/nightly/lob.pdf
[8] Herbert B. Enderton, A Mathematical Introduction to Logic, 2nd ed.
One may find it intuitive to notice that, whether \(\text{True}(\sigma)\) is true or not, it has nothing to do with the existence of God or whatever other claims of real interest. Indeed, instead of restricting our language as we do here, we could as well devise a restrictive set of inference rules for an alternative logic (e.g., relevance logic) so that explosions can be expelled.↩
I am aware of the “selfinterpreter” for System \(\text{F}_\omega\) proposed in a POPL’16 paper by Brown & Palsberg [6]. According to a footnote in [7] and also from my unrefined understanding, the interpreter does not come with a separate syntax for types thus fails to capture the underlying logical consistency; therefore neither Löb’s theorem nor the undefinability theorem may apply in that case.↩
trackerstore
started to hog my CPU and disk space (again). For some really peculiar reason, GNOME developers decided that everyone should want to use their Well, I had had enough of this due to concerns of performance and privacy, and surely uninstalled the tracker
package a long time ago, but now it seems GNOME’s canonical file manager nautilus
starts to depend on tracker
(one of its old dependencies, libtrackersparql
, has now been fused with the tracker
package since version 2.0 in Arch’s downstream repository). That explains why it’s back, uninvitedly.
A few quirks noted (in Tracker 2.0.x):
trackerpreferences
has been summarily removed (d4a8d6e), while tracker
provides no commandline alternative for intuitive gsettings configuration yet. (Then why would they remove this very helpful GUI?)Hidden=true
to an overriding autostart file such as ~/.config/autostart/trackerstore.desktop
(in [1] [2]) no longer works in 2.0, because….tracker
ships with a systemd user service now (/usr/lib/systemd/user/trackerstore.service
), since they obviously think a desktoplevel autostart is not enough. While Arch Linux ships wisely with a /usr/lib/systemd/systempreset/99default.preset
containing “disable *
” which disables all new units by default [3], there is no equivalent userpreset
file doing the same, which means that trackerstore.service
, as a user service, is enabled by default and still gets to run on every boot.Now that uninstalling tracker
is not an option, one has to take both measures to block trackerstore
from running:
$ cp /etc/xdg/autostart/trackerstore.desktop ~/.config/autostart/trackerstore.desktop
Hidden=true
tracker
’s friends, such as trackerminerfs
, etc.)tracker
related services: $ systemctl user mask trackerstore
Look ma, no more hogs. Back to work!
[1] Ask Ubuntu, “trackerstore and trackerminerfs eating up my CPU on every startup”. https://askubuntu.com/questions/346211/trackerstoreandtrackerminerfseatingupmycpuoneverystartup
[2] “Disabling GNOME Tracker and Other Info”. https://gist.github.com/vancluever/d34b41eb77e6d077887c
[3] ArchWiki, “systemd”. https://wiki.archlinux.org/index.php/Systemd
[4] FreeDesktop.org, “systemctl(1)”. https://www.freedesktop.org/software/systemd/man/systemctl.html
“This is clear, in the first place, if we define what the true and the false are. To say of what is that it is not, or of what is not that it is, is false, while to say of what is that it is, and of what is not that it is not, is true.”
– Aristotle, Metaphysics
On truthbearers. A truthbearer is something of what we can talk about the truthvalue: a statement, a sentence, a proposition, etc. We will use the three aforementioned notions indistinguishably, but avoid the use of subjective notions such like assertion, belief, intuition and judgment as truthbearers. (These words however, can occur with their originally intended meanings.)
In Mst. #7 we defined that (in a formal language) a sentence is just a wellformed formula with no free variable. Furthermore, in Mst. #8 we defined a statement, naïvely, as a sentence that says something is true. But what makes a sentence true, or oppositely, not true?
Aristotle’s definition of truth. Aristotle’s original truth definition in Metaphysics (360s B.C.) can be denoted using the modern device of Quine’s quasiquotation, as follows:
⌜\(\psi\)⌝ is true if and only if \(\psi\).
⌜\(\psi\)⌝ is false if and only if it is not the case that \(\psi\).
Note that " ⌜\(\psi\)⌝" is a mention of some statement while “\(\psi\)” is the use of that statement in the metalanguage, so the above definition does make sense (instead of some “be whatever you be” nonsense as it might seem to an untrained eye).
An implication of this definition, is that “false” is the opposite of “true”. The bivalence of truth and falsity is an essential property of classical logic. Thus, we have the following tautology: \[\psi \lor \lnot\psi \qquad\text{(Law of Excluded Middle)}\] that is, given any interpretation, whether \(\psi\) is true or false, the above proposition is always true (or said to be valid). This fact may be justified via a truth table.
Theories of truth. From a metaphysical perspective, several theories of truth have been proposed and well argued:
Reading list
On the notion of truth and truthbearers:
 Michael Glanzberg, “Truth,” Stanford Encyclopedia of Philosophy, https://plato.stanford.edu/entries/truth/.
 Matthew McGrath, “Propositions,” Stanford Encyclopedia of Philosophy, https://plato.stanford.edu/entries/propositions/.
Correspondence theory:
 Marian David, “The Correspondence Theory of Truth,” Stanford Encyclopedia of Philosophy, http://plato.stanford.edu/entries/truthcorrespondence/.
Coherence theory:
 James O. Young, “The Coherence Theory of Truth,” Stanford Encyclopedia of Philosophy, http://plato.stanford.edu/entries/truthcoherence/.
Pragmatic theory:
 Christopher Hookway, “Pragmatism,” Stanford Encyclopedia of Philosophy, https://plato.stanford.edu/entries/pragmatism/#PraTheTru.
Deflationary theory:
 Daniel Stoljar and Nic Damnjanovic, “The Deflationary Theory of Truth,” Stanford Encyclopedia of Philosophy, http://plato.stanford.edu/entries/truthdeflationary/.
Ramsey’s redundancy theory of truth (a deflationary theory):
 Frank P. Ramsey, “Facts and Propositions.” [PDF]
Now we consider a (semi)formal language where we are able to state truth selfreferentially.
Antinomies.
This sentence is false.
“Yields falsehood when preceded by its quotation”
yields falsehood when preceded by its quotation.
If we take the truth (or falsity) of a statement to be bivalent, then the truthvalues of the above two sentences could not be consistently determined. Such antinomies motivated Alfred Tarski’s proposal of the semantic theory of truth, which eliminates such potentially paradoxical use in a formal language.
Semantic theory of truth. (Tarski, 1933) Given an object language \(\mathcal{L}_0\) and a metalanguage \(\mathcal{L}_1\),
⌜\(\psi_0\)⌝ is true if and only if \(\psi_1\). (Convention T)
where “\(\psi_0\)” is a sentence in \(\mathcal{L}_0\), “\(\psi_1\)” is a sentence in \(\mathcal{L}_1\). It is demanded that \(\mathcal{L}_0\) must be contained in \(\mathcal{L}_1\). Moreover, the word “true” (or “false”) does not occur in \(\psi_0\). Thus, no sentence in the object language \(\mathcal{L}_0\) can assert the truth/falsity of itself; rather, its truthvalue must be asserted by a higherlevel metalanguage \(\mathcal{L}_1\). Virtually, it would be intuitive to imagine a semantic hierarchy of formal languages that rules out the use of selfreference: \[\mathcal{L}_0 \sqsubset \mathcal{L}_1 \sqsubset \mathcal{L}_2 \sqsubset \cdots\] that is, the truth of a sentence in an object language \(\mathcal{L}_i\) can only be asserted in \(\mathcal{L}_{i+1}\) as the metalanguage. Since there is no final stage in this hierarchy, the “mostmeta” truths may never be asserted.
Tschema. For a formal language containing a given set of logical connective symbols, we give an inductive definition of truth in the following form:
(Note that the subscripts distinguishing the object language / metalanguage are implicit.)
Tarski’s semantic theory of truth is not only limited to the application in philosophical logic though: A similar definition is also used in model theory, where we define the satisfaction of a formula \(\varphi\) with respect to a structure \(\mathfrak{A}\) and an assignment \(s\) inductively.
Remark 9.1. (Semantic theory of truth and mathematical logic) The mathematical counterpart of Tarski’s semantic theory of truth yields the undefinability theorem (Tarski, 1936), which briefly states that arithmetical truth cannot be defined in arithmetic itself (it is worth noting that the proof is a nontrivial one which requires Gödel numbering on formulas in a formal language). The semantic undefinability also has a strong correspondence with the incompleteness theorems (Gödel, 1931), which will be covered in future notes hopefully.
Reading list
On Tarski’s semantic theory of truth:
 Wilfrid Hodges, “Tarski’s Truth Definitions,” Stanford Encyclopedia of Philosophy, https://plato.stanford.edu/entries/tarskitruth/.
 Alfred Tarski, “The Semantic Conception of Truth: And the Foundations of Semantics.” [HTML]
 Alfred Tarski, “Truth and Proof.” [PDF]
Truth and objectivity. Is truth objective? Yes. Given the object language \(\mathcal{L}_0\), consider a sentence \(\varphi[s]\) which is true under the model \(\mathfrak{A}\): \[\models_\mathfrak{A} \varphi[s]\] It is true because we can argue in our metalanguage \(\mathcal{L}_1\), that it is satisfied decidedly; thus it must be a truth in the object language. For example, we can argue extensionally that \[\models_\mathfrak{A} \text{Morning Star} = \text{Evening Star}\] Opinionatedly, there could be no such thing as a “subjective truth”, since the truth of a sentence must be justified in a higherlevel metalanguage (due to the semantic theory of truth). Per the objectmetalanguage distinction, truth is only objective by nature. Once we claim that “something holds” without appealing to a metalanguage for justification, then it is merely a belief rather than a truth. Beliefs can be, of course, illogical and ungrounded.
Logical validity of arguments. A true sentence may not always be true in every possible structure. Consider: \[\models_\mathfrak{B} \text{Morning Star} = \text{Evening Star}\] In an alternative universe \(\mathfrak{B}\) where Venus fails to be both the morning star and the evening star, such a statement would be trivially false. However, a logical argument must not depend on a specific structure or interpretation of its parameters; that is, when we argue logically that something is true, the argument itself must hold valid. Validity is a purely logical notion here. For example, consider adding the following axiom to our deductive system: \[t = t' \to t \cong t'\]
One can easily verify that this axiom is valid (i.e., always true): If \(t = t'\) is false, then \(t = t' \to t \cong t'\) is vacuously true; if \(t = t'\) is true, then we have \(t \cong t'\) so \(t = t' \to t \cong t'\) is also true.
Specifically, we have
\(\text{Morning Star} = \text{Evening Star} \to \text{Morning Star} \cong \text{Evening Star}\)
(If the morning star is the evening star, then it is isomorphic to the evening star.)
By modus ponens, the following deduction applies:
\(\Gamma; \text{Morning Star} = \text{Evening Star} \quad\vdash\quad \text{Morning Star} \cong \text{Evening Star}\)
(Assume that the morning star is the evening star, then it is isomorphic to the evening star.)
This is a valid argument, since it follows solely from our set of axioms and rules of inference, but relies on no particular structures or interpretations. Such a valid argument justifies the logical fact that the truth of its premises guarantees the truth of its conclusion; this, however, does not imply anything about the truth of premises (that would depend on the actual interpretation).
Soundness of arguments. To put it shortly, a sound argument is just a valid argument with true premises.
\(\frac{\text{Morning Star} = \text{Evening Star} \qquad \text{Morning Star} = \text{Evening Star} \to \text{Morning Star} \cong \text{Evening Star}}{\text{Morning Star} \cong \text{Evening Star}}\)
(Given that the morning star is the evening star, it is isomorphic to the evening star.)
Given true premises, the valid argument proves its conclusion: \(\text{Morning Star} \cong \text{Evening Star}\), and we know that it is guaranteed to be true. Such a truth is also called a logical truth, since it follows from a logical consequence of a sound argument in our deductive system. Notice that the above is just an instance of the soundness theorem, loosely stated as: Every provable sentence is true.
Two approaches to logical consequence.
By the soundness and completeness theorems \(\Gamma \models \varphi \Leftrightarrow \Gamma \vdash \varphi\), the two approaches agree with each other extensionally.
Reading list
Philosophical logicrelated issues:
 Jc Beall and Greg Restall, “Logical Consequence,” Stanford Encyclopedia of Philosophy, http://plato.stanford.edu/entries/logicalconsequence/.
 Dorothy Edgington, “Indicative Conditionals,” Stanford Encyclopedia of Philosophy, http://plato.stanford.edu/entries/conditionals/.
 Michael Nelson, “Existence,” Stanford Encyclopedia of Philosophy, http://plato.stanford.edu/entries/existence/.
Motivations of alternative logics. In classical firstorder logic, a deductive system preserves the semantic property of truth, as justified by the dual theorems of soundness/completeness. We will see that it does not always sufficiently capture the need for logical reasoning, thus numerous alternative (nonclassical) logics has emerged for this reason.
It is not uncommon to revise one or more axioms in classical firstorder logic. The resulting nonclassical logics can be useful in a constructive setting. In the following text, we will use syntactical forms like \(\alpha \to \beta\) to denote both a tautological conditional sentence and a valid deduction rule (\(\{\alpha\} \vdash \beta\)), as justified by the dual theorems of deduction/resolution.
In proof theory, substructural logics are a family of nonclassical logics where one (or more) structural rule (e.g., weakening, contraction, commutativity, associativity) is absent.
Reading list
On classical logic:
 Stewart Shapiro, “Classical Logic,” Stanford Encyclopedia of Philosophy, https://plato.stanford.edu/entries/logicclassical/.
On modality and modal logic:
 Boris Kment, “Varieties of Modality,” Stanford Encyclopedia of Philosophy, https://plato.stanford.edu/entries/modalityvarieties/.
 James Garson, “Modal Logic,” Stanford Encyclopedia of Philosophy, https://plato.stanford.edu/entries/logicmodal/.
On temporality, tense and temporal logic:
 Friedrich Hamm and Oliver Bott, “Tense and Aspect,” Stanford Encyclopedia of Philosophy, https://plato.stanford.edu/entries/tenseaspect/.
 Valentin Goranko and Antony Galton, “Temporal Logic,” Stanford Encyclopedia of Philosophy, https://plato.stanford.edu/entries/logictemporal/.
On vagueness, manyvalued logic and fuzzy logic:
 Roy Sorensen, “Vagueness,” Stanford Encyclopedia of Philosophy, https://plato.stanford.edu/entries/vagueness/.
 Siegfried Gottwald, “ManyValued Logic,” Stanford Encyclopedia of Philosophy, https://plato.stanford.edu/entries/logicmanyvalued/.
 Petr Cintula, Christian G. Fermüller and Carles Noguera, “Fuzzy Logic,” Stanford Encyclopedia of Philosophy, https://plato.stanford.edu/entries/logicfuzzy/.
On quantification, secondorder and higherorder logic, and their correspondence with set theory:
 Gabriel Uzquiano, “Quantifiers and Quantification,” Stanford Encyclopedia of Philosophy, https://plato.stanford.edu/entries/quantification/.
 Herbert B. Enderton, “Secondorder and Higherorder Logic,” Stanford Encyclopedia of Philosophy, https://plato.stanford.edu/entries/logichigherorder/.
 Jouko Väänänen, “SecondOrder Logic and Set Theory.” [PDF]
On free logic:
 John Nolt, “Free Logic,” Stanford Encyclopedia of Philosophy, https://plato.stanford.edu/entries/logicfree/.
On intuitionistic logic and constructive mathematics:
 Joan Moschovakis, “Intuitionistic Logic,” Stanford Encyclopedia of Philosophy, https://plato.stanford.edu/entries/logicintuitionistic/.
 Douglas Bridges and Erik Palmgren, “Constructive Mathematics,” Stanford Encyclopedia of Philosophy, https://plato.stanford.edu/entries/mathematicsconstructive/.
 David Meredith, “Separating Minimal, Intuitionist, and Classical Logic.” [PDF]
 Michael Dummett, “The Philosophical Basis of Intuitionistic Logic.”
On paraconsistent logic:
 Graham Priest, Koji Tanaka and Zach Weber, “Paraconsistent Logic,” Stanford Encyclopedia of Philosophy, https://plato.stanford.edu/entries/logicparaconsistent/.
On substructural logics, relevance logic and linear logic:
 Greg Restall, “Substructural Logics,” Stanford Encyclopedia of Philosophy, https://plato.stanford.edu/entries/logicsubstructural/.
 Edwin Mares, “Relevance Logic,” Stanford Encyclopedia of Philosophy, https://plato.stanford.edu/entries/logicrelevance/.
 Roberto Di Cosmo and Dale Miller, “Linear Logic,” Stanford Encyclopedia of Philosophy, https://plato.stanford.edu/entries/logiclinear/.
As discussed by the subsection of “The limitation of firstorder logic” in Mst. #7, a practical issue about firstorder theory is that it provides no way to express the socalled “wellordering property” (or its equivalents like the secondorder induction axiom) of sets: Every nonempty set of numbers has a least element. \[\forall P (\exists x Px \to \exists x (Px \land \forall y (Py \to (x = y \lor x < y))))\]
Recall that in firstorder logic, we have notably the following modeltheoretic results:
however, they do not generally hold in the standard semantics of secondorder (or any higherorder) logic. As a naïve counterexample [2] disproving the compactness, consider a secondorder sentence that defines a strict ordering \(R\): \[\exists R (\forall x \forall y \forall z (Rxy \land Ryz \to Rxz) \land \forall x (\lnot Rxx \land \exists y Rxy))\] let \(\lambda_\infty\) be the above sentence, which is true if and only if the universe is an infinite set. Let \(\lambda_i\) be the firstorder sentence saying that “there are at least \(i\) urelements in the universe”. Then the infinite set \[\Gamma = \{ \lnot \lambda_\infty, \lambda_1, \lambda_2, \lambda_3, \dots \}\] has no model, since \(\lnot \lambda_\infty \land \lambda_\omega\) would be a contradiction. However, every finite subset of \(\Gamma\) is clearly satisfiable.
Another intuitive way of thinking this is that since secondorder logic is capable of categorically axiomatizing Peano arithmetic (i.e., there is at most one model of arithmetic up to isomorphism), but we would be able to build a nonstandard model of arithmetic via ultraproducts given that the compactness theorem holds (as shown in Mst. #7), therefore the compactness theorem must be false then.
In secondorder logic, we have two separate “stages” of ranges that one may quantify over: variables and predicates/functions, denoted as set membership: \[v_0 \in v_1 \in \mathcal{P}(\mathfrak{A})\]
Naturally, in an \(n\)thorder logic we have \[v_0 \in v_1 \in \cdots \in v_{n1} \in \mathcal{P}^{n1}(\mathfrak{A})\]
Once the order of logic is raised to the limit ordinal \(\omega\), we reach the level of type theory. [2] Sloppily, in type theory, a type \(\text{T}_i\) is just a set \(V_i\), a function \(\text{f} : \text{T}_x \to \text{T}_y\) is just a relation set \(f\) such that \(\forall x \forall y_1 \forall y_2 ((\langle x, y_1 \rangle \in f \land \langle x, y_2 \rangle \in f) \to y_1 = y_2)\), where \(x \in V_x\) and \(y_1, y_2 \in V_y\). Hence to accommodate the function type \(\text{f}\), one needs a power set operation correspondingly in set theory. The \(\omega\)infinite hierarchy of sets of sets: \[V_0 \in V_1 \in \cdots \in V_n \in \cdots\] where \(V_0\) is the empty set \(\varnothing\) (a set that has no members), corresponds to the \(\omega\)infinite hierarchy of types of types: \[\text{Type}_0 : \text{Type}_1 : \cdots : \text{Type}_n : \cdots\] where \(\text{Type}_0\) is the bottom type (a type that has no values).
Both set theory and type theory can serve as logical foundations of mathematics [3], owing to their equivalent expressive power [4]. Some differences are worth noting:
Books:
Aristotle, Metaphysics. (English translation by W. D. Ross, 1925)
Available: https://ebooks.adelaide.edu.au/a/aristotle/metaphysics/
W. V. O. Quine, Philosophy of Logic, 2nd ed.
Michael Dummett, Truth and Other Enigmas.
Stewart Shapiro, Foundations Without Foundationalism: A Case for SecondOrder Logic.
Articles:
[1] J. Hintikka, “Two papers on symbolic logic form and content in quantification theory and reductions in the theory of types,” 1955.
[2] H. B. Enderton, “Secondorder and higherorder logic,” in The stanford encyclopedia of philosophy, Fall 2015., E. N. Zalta, Ed. https://plato.stanford.edu/archives/fall2015/entries/logichigherorder/; Metaphysics Research Lab, Stanford University, 2015.
[3] T. Coquand, “Type theory,” in The stanford encyclopedia of philosophy, Summer 2015., E. N. Zalta, Ed. https://plato.stanford.edu/archives/sum2015/entries/typetheory/; Metaphysics Research Lab, Stanford University, 2015.
[4] B. Werner, “Sets in types, types in sets,” in Proceedings of tacs’97, 1997, pp. 530–546.
Inspired by Wittgenstein’s theory of “Meaning as Use”, it is fair to say that when we use a word, we just mean it. The meaningful use of a word and a mention of it must be distinguished. Quotation, as the canonical device of making the distinction, is central to debates on this topic. Davidson argued that even quoted words can be both a use and a mention. Quine’s invention of quasiquotations (as an alternative quotational device) is of technical importance, and this notion has found its use in inductive definitions of formal logic, denotational semantics, metaprogramming in Lisp and string interpolations in many practical programming languages.
As firstly noticed by Frege, the semantics of words embraces two differently supporting aspects: sense and reference. Referring expressions such as descriptions and proper names, may both be denoted using a descriptivist theory (proposed by Russell); the Russellian view of names, however, was criticized by Kripke in favor of a manyworlds causal view.
On quotation. In the following text, we will utilize English quotation marks intensively. Quoted text is used as a literal mention of things whatever they enclose. By contrast, unquoted text is used as whatever its meaning is. For example,
I am who I am.
but
“I” is a word. “I” is not who I am.
(This issue will be discussed further in “Usemention distinction”.)
Statement. A statement is defined as a sentence that says something is true. The notions of English words “sentence” and “true” are clear in this informal context, and they all have their naïve meanings. “Something” is just an object (or objects) that the statement talks about.
Metalanguage and object language. To make a statement about something, we may use either a formal or a natural language. When that “something” is some language (or languages), we refer to it as the object language(s) that we are talking about and the language we are using to make the very statement as the metalanguage. For example,
The German phrase “ein weißes Pferd ist weiß” means “a white horse is white” in English.
Here we are talking about the isomorphic meanings of two sentences in German and in English (as our object languages), and English is the metalanguage that we use to make the statement. It doesn’t matter which metalanguage we use and whether the metalanguage is identical to one of the object languages, as long as it is capable of representing the semantics we want; The following statement (using Esperanto as the metalanguage) has the same meaning as above:
La germana frazo “ein weißes Pferd ist weiß” signifas “a white horse is white” en la angla.
Reading list
On the application of metalanguages and object languages in mathematical logic, see:
 G. J. Mattey, “Object Language and Metalanguage.” [PDF]
 “Metalanguage,” Wikipedia, https://en.wikipedia.org/wiki/Metalanguage.
 “Object Language,” Wikipedia, https://en.wikipedia.org/wiki/Object_language.
Ambiguity in interpretations. It might be tempting to just use a natural language as our metalanguage, but ambiguity could arise due to the sloppy nature of any human languages. Consider the following statement in Chinese:
白馬非馬。
(A white horse is not a horse.)
The argument is that the intension of “white horse” consists of the properties of being white and being horselike, while the intension of “horse” consists merely of the property of being horselike, hence conceptually “white horse” and “horse” are not identical to each other. However, if we interpret “是 (be)” as “be a kind of” thus “非 (be not)” as “be not a kind of”, as in the usual context, we could immediately tell that it is absurd to say:
白馬非馬。
(A white horse is not a kind of horses.)
Natural languages can be vague, however we are almost always in need of a definite interpretation when studying the semantics (of whatever language we like). It is most advisable to employ a formal language so that we don’t fall into dialectic traps like this. Consider the formal language of set theory, where we define a relation \(P_H\) (“being horselike”) such that \(P_H x\) if and only if \(x \in P_H\), and a relation \(P_W\) (“being white”) such that \(P_W x\) if and only if \(x \in P_W\). Then we can write the first meaning of the informal statement sloppily, but unambiguously as: \[\{ x \in P_H : x \in P_W \} \neq P_H\] which is true if and only if there is at least one \(x'\) such that \(x' \in P_H \land x' \not\in P_W\), so that \(x' \not\in \{ x \in P_H : x \in P_W \}\) but \(x' \in P_H\). There is a plausible argument by nature, since we are easily convinced that “There is a horse that is not white”.
Exercise 8.1. Rewrite \(\{ x \in P_H : x \in P_W \} \neq P_H\) formally (i.e., without using set comprehension), in the firstorder language of set theory containing only \(\in\) as a predicate symbol.
The second meaning of the statement can be unambiguously written as: \[\forall x ((x \in P_H \land x \in P_W) \to x \not\in P_H)\] which is true if and only if \(\forall x (\lnot (x \in P_H \land x \in P_W))\), i.e., “There is no such thing as a white horse”; or even \(\forall x (x \not\in P_H)\), i.e., “Nothing is a horse”. Both are ungrounded beliefs contradicting our empirical evidence.
Ambiguity in referring expressions. Difficulties in translating from a natural language to a formal language often arise in unintended ways. Consider again, the statement:
A white horse is not a kind of horses.
Doe the mentioned “white horse” refer to any arbitrary object that simply holds the properties of being a white horse, or a very specific white horse that one happens to find to be not a kind of horses? This distinction is significant because in the latter case, we have to formulate the statement using an existential quantification instead of the universal quantification: \[\exists x ((x \in P_H \land x \in P_W) \land x \not\in P_H)\] that gives us a third meaning of the statement “白馬非馬”.
Exercise 8.2. (1) Can it be the case that the sentence \(\exists x ((x \in P_H \land x \in P_W) \land x \not\in P_H)\) is true? (2) Why don’t we formulate its meaning as \(\exists x ((x \in P_H \land x \in P_W) \to x \not\in P_H)\)?
Descriptions and names. To eliminate any potential ambiguity when referring to something in a natural language, we firstly review three kinds of referring expressions:
A referring expression (singular term) picks out a particular object (or some particular objects) that the sentence talks about. The demonstratives are clearly context sensitive, so for now we will consider only descriptions and proper names for the sake of unambiguity.
For indefinite descriptions, consider the following sentence:
A white horse is white.
Based on our preceding discussion, there are two different ways of formulating it:
(If there is a white horse, then it is white.) \[\forall x ((x \in P_H \land x \in P_W) \to x \in P_W)\]
and
(There is a white horse which is white.) \[\exists x ((x \in P_H \land x \in P_W) \land x \in P_W)\]
N.B. The latter (and weaker) formulation is proposed by Bertrand Russell. [1]
For definite descriptions,
The white horse is white.
How do we understand its semantics? Russell proposed the following formulation:
(There is exactly one white horse which is white.) \[\exists! x ((x \in P_H \land x \in P_W) \land x \in P_W)\]
Exercise 8.3. Rewrite the above sentence using the existential quantifier.
Russell’s theory of descriptions can lead to justified denials of some obviously ridiculous statements. For example, the sentence
The invisible pink unicorn is white.
is puzzling, since we have no idea what an “invisible pink unicorn” would be like. Most of us would agree that there is no such thing as an invisible pink unicorn. Can the above statement be true then? By Russell’s formulation of definite descriptions, we say that this is equivalent to
There is exactly one invisible pink unicorn which is white.
Then we can immediately assert that it is not true, given our belief that no invisible pink unicorn really exists.
As an alternative view of the theory of descriptions, Gottlob Frege suggested that a sentence like above makes use of a definite description that fails to refer (as “invisible pink unicorn” does not refer to anything), thus it does not express a proposition and has no truth value. [2]
For proper names, we could say things like
Princess Celestia is white.
It has been disputed, however, whether a descriptivist theory can be applied on proper names. Russell argued that most names have some descriptive meanings, e.g., “Princess Celestia” can be descriptively understood as “The benevolent ruler of Equestria”. Modern causalists such as Saul Kripke, criticized the descriptivist theory [3], for that a name need not be a uniquely identifying description so as to refer to an object; rather, there must be a causal chain that makes the name a reference. For example, we know that the name “Princess Celestia” and the description “The benevolent ruler of Equestria” refer to the same referent because she defeated King Sombra and took over Equestria, or more realistically, the storywriter let her be so. However, in an alternate world where King Sombra still rules Equestria, the name “Princess Celestia” would not have the same referent as “The benevolent ruler of Equestria” (in fact the latter term would actually refer to nothing), but “Princess Celestia” will always be Princess Celestia; that is, a proper name is a rigid designator of objects, while a definite description may refer to very different things due to varying causality in different worlds. Kripke’s causal theory of reference suggested that, a name refers uniquely to the same object in every possible world, regardless of any particular facts about it. By contrast, the descriptivist view of a name may fail to capture its referent consistently in all worlds.
Sense and reference. As per discussion on descriptions and names, it is essential to make a distinction between a sense and a reference (this distinction is attributed to Frege’s 1892 work “Über Sinn und Bedeutung” (On Sense and Reference)). A reference denotes a particular object in the world that the term applies, while a sense denotes the way in which the term presents, regardless of what the actual referent is (or whether there is a referent). For example, the description “The invisible pink unicorn” clearly makes sense, but it has no reference in any possible physical world. As a second example, the proper name “Princess Celestia” and the description “The benevolent ruler of Equestria” have different senses (i.e., “Princess Celestia” means Princess Celestia, “The benevolent ruler of Equestria” means the benevolent ruler of Equestria, and these two meanings are never the same), however, they do refer to the same thing in this world.
It is worth noting that senses may not be injectively mapped to words or phrases, as a word can present different senses in different contexts. For example, the word “structure” has a meaning in mathematical logic, and it also has a meaning in abstract algebra. Despite that their senses are similar in essence, they apply to different domains of mathematics and should not be taken as identical. They can, nonetheless, have one common reference when we talk about a formal language \(\mathcal{A}\) of algebra, studying the theory of a particular algebraic structure (e.g., abelian group \((A, \cdot)\)) thus it is also a structure \(\mathfrak{A}\) that we assign to our language \(\mathcal{A}\).
Reading list
On a first introduction to descriptions, proper names, FregeanRussellian descriptivist theory and Kripke’s causal theory, see Wikipedia:
 “Sense and Reference,” Wikipedia, https://en.wikipedia.org/wiki/Sense_and_reference.
 “Definite Description,” Wikipedia, https://en.wikipedia.org/wiki/Definite_description.
 “Proper Name,” Wikipedia, https://en.wikipedia.org/wiki/Proper_name_(philosophy).
 “Descriptivist Theory of Names,” Wikipedia, https://en.wikipedia.org/wiki/Descriptivist_theory_of_names.
 “Causal Theory of Reference,” Wikipedia, https://en.wikipedia.org/wiki/Causal_theory_of_reference.
Further reading on descriptions, names and the issue of reference:
 Peter Ludlow, “Descriptions,” Stanford Encyclopedia of Philosophy, https://plato.stanford.edu/entries/descriptions/.
 Marga Reimer and Eliot Michaelson, “Reference,” Stanford Encyclopedia of Philosophy, https://plato.stanford.edu/entries/reference/.
 P. F. Strawson, “On Referring.” [PDF]
On Frege’s innovative distinction between sense and reference:
 Gottlob Frege, “On Sense and Reference.” [PDF] (English translation of “Über Sinn und Bedeutung”)
 Michael Dummett, “Frege’s Distinction Between Sense and Reference.”
Russell’s original essay on his theory of descriptions:
 Bertrand Russell, “On Denoting.” [PDF]
 “On Denoting,” Wikipedia, https://en.wikipedia.org/wiki/On_Denoting.
Kripke’s objections to descriptivist theories of proper names:
 Saul Kripke, “Naming and Necessity.” [PDF]
Usemention distinction. When trying to understand the semantics of a natural language, it is also critical to make the distinction between using a word (or phrase) and mentioning it. Consider again, the following sentence:
The white horse is white.
This sentence is true because a white horse is indeed, white (in the conventional interpretation). Furthermore,
“The white horse” is a definite description.
This sentence is also true because the phrase “the white horse” is well defined with a meaning that refers to the collection of horses that are white. However, it does not make much sense to say:
The white horse is a definite description.
Since a white horse is just a white horse, not a word or phrase with meanings. And we would also agree that
“The white horse” is white.
is confusing in some way, because the quotation marks suggest that “the white horse” may not have the intended meaning, i.e., it may appear as a literal mention of the English phrase “white horse” instead.
It is worth noting that though quotation marks are often an indication of mention while unquoted text is simply used, as in the above example, it is not always the case. Donald Davidson argued that “mixed quotation” exists so that quoted words can be simultaneously used and mentioned. [4]
Quasiquotation. While quotation marks can be used for denoting a mention, a practical issue is that when mentioning something (e.g., a definition), we sometimes need to use the reference of a part of the mention inside it. Recall that when defining a firstorder language in Mst. #7, we made claims like
If \(\psi\) is a wellformed formula, \((\lnot\psi)\) is a wellformed formula.
Note that the mention is implicit here, without any extra metasymbols. Can we just make the definition using quotation marks?
If \(\psi\) is a wellformed formula, “\((\lnot\psi)\)” is a wellformed formula.
It is immediately clear that the above definition does not fit our intended purpose: Let the open term (in our metalanguage) \(\psi\) be the formula (in the object language) “\((\alpha \to \beta)\)”, then what we are saying basically is
If “\((\alpha \to \beta)\)” is a wellformed formula, “\((\lnot\psi)\)” is a wellformed formula.
This sentence does not really make an inductive definition as intended. The problem is that the symbol “\(\psi\)” in “‘\((\lnot\psi)\)’” is not interpolated as what its use is (i.e., “\((\alpha \to \beta)\)”), but only appears as a literal Greek letter in the mention. To resolve this issue, W. V. O. Quine introduced the metasymbols called quasiquotation [5], so that it is technically feasible to make inductive definitions like this:
If \(\psi\) is a wellformed formula, ⌜\((\lnot\psi)\)⌝ is a wellformed formula.
Then let \(\psi\) be “\((\alpha \to \beta)\)”, ⌜\((\lnot\psi)\)⌝ becomes “\((\lnot (\alpha \to \beta))\)”. We have that
If “\((\alpha \to \beta)\)” is a wellformed formula, “\((\lnot (\alpha \to \beta))\)” is a wellformed formula.
just as intended.
Reading list
On the importance of usemention distinction:
 “Usemention Distinction,” Wikipedia, https://en.wikipedia.org/wiki/Use–mention_distinction.
 W. V. O. Quine, Mathematical Logic, chap. 1.4. (Use versus Mention)
 A. W. Moore, “How significant is the use/mention distinction?”
On quotation:
 W. V. O. Quine, Mathematical Logic, chap. 1.5. (Statements about Statements)
 Herman Cappelen and Ernest Lepore, “Quotation,” Stanford Encyclopedia of Philosophy, https://plato.stanford.edu/entries/quotation/.
 Donald Davidson, “Quotation.”
On the practical device of quasiquotation:
 “Quasiquotation,” Wikipedia, https://en.wikipedia.org/wiki/Quasiquotation.
 W. V. O. Quine, Mathematical Logic, chap. 1.6. (QuasiQuotation)
The usemention distinction and the auxiliary device of quasiquotation are philosophically essential not only for the definition of formulas/sentences in formal languages, but also for Alfred Tarski’s semantic theory of truth and models. Deeper debates on the essence of truth often involve the correspondence theory, the deflationary theory and the coherence theory. It is also informing to know how the liar’s paradox and Quine’s paradox can arise from direct or indirect selfreferential statements, and what Tarski’s undefinability theorem implies. Michael Dummett’s characterization of the principle of bivalence (in classical logic) as realism, together with the defence of intuitionism (an antirealism), are also highly relevant for understanding the philosophical basis of modern logic.
Exercise 8.1. \(\{ x \in P_H : x \in P_W \} \neq P_H\) is just \[\lnot ((x \in P_H \land x \in P_W) \leftrightarrow x \in P_H)\] which is just \[((\lnot (x \in P_H \to (\lnot x \in P_W))) \to x \in P_H) \to (\lnot (x \in P_H \to (\lnot (x \in P_H \to (\lnot x \in P_W)))))\]
Exercise 8.2. (1) Notice that \(\lnot ((x \in P_H \land x \in P_W) \land x \not\in P_H)\) is a tautology, by generalization we get that \(\forall x (\lnot ((x \in P_H \land x \in P_W) \land x \not\in P_H))\). By double negation \[\lnot (\lnot \forall x (\lnot ((x \in P_H \land x \in P_W) \land x \not\in P_H)))\] which is just \[\lnot \exists x ((x \in P_H \land x \in P_W) \land x \not\in P_H)\] so \(\exists x ((x \in P_H \land x \in P_W) \land x \not\in P_H)\) cannot be satisfied by any interpretation. (or: it is never true.)
(2) \(\exists x ((x \in P_H \land x \in P_W) \to x \not\in P_H)\) is equivalent to \[\lnot \forall x (\lnot ((x \in P_H \land x \in P_W) \to x \not\in P_H))\] Assume that there is an \(x_c\) such that \(x_c \not\in P_H\), then \((x_c \in P_H \land x_c \in P_W) \to x_c \not\in P_H\) would be true.
Notice that given \(\Gamma; \forall x (\lnot ((x \in P_H \land x \in P_W) \to x \not\in P_H))\), by substituting \(x\) with \(x_c\) we get \(\lnot ((x_c \in P_H \land x_c \in P_W) \to x_c \not\in P_H)\), but that would be inconsistent with our previous assumption. Thus by Reductio ad Absurdum, \(\Gamma \vdash \lnot \forall x (\lnot ((x \in P_H \land x \in P_W) \to x \not\in P_H))\).
Now that we can be easily convinced that our assumption holds, that is, there exists something that is not a horse. So this formulation would be trivially true!
Exercise 8.3. \[\exists x (((x \in P_H \land x \in P_W) \land \forall y ((y \in P_H \land y \in P_W) \to y = x)) \land x \in P_W)\]
[1] P. Ludlow, “Descriptions,” in The stanford encyclopedia of philosophy, Fall 2013., E. N. Zalta, Ed. https://plato.stanford.edu/archives/fall2013/entries/descriptions/; Metaphysics Research Lab, Stanford University, 2013.
[2] P. F. Strawson, “On referring,” Mind, vol. 59, no. 235, pp. 320–344, 1950.
[3] S. A. Kripke, “Naming and necessity,” in Semantics of natural language, Springer, 1972, pp. 253–355.
[4] D. Davidson, “Quotation,” Theory and Decision, vol. 11, no. 1, pp. 27–40, Mar. 1979 [Online]. Available: https://doi.org/10.1007/BF00126690
[5] W. V. Quine, Mathematical logic. Cambridge: Harvard University Press, 1940.
Remark 7.1. (Language of propositional logic) The language of propositional logic may be seen as a stripped form of firstorder languages, in which parentheses, connective symbols and sentential symbols (the only parameters; may be treated as 0place predicate symbols) are present. Intuitively, that language might seem too weak to encode our formal reasoning in all kinds of mathematics and many practical areas, so to speak.
Remark 7.2. (Metatheory and philosophical concerns) A firstorder expression, as a finite sequence (also called a tuple), may be defined in terms of ordered pairs in axiomatic set theory. But we will not appeal to set theory in our first definitions of expressions in logic. (So far we have no notion about what a “set” formally is!)
A further concern is whether our definitions of terms and formulas are well established, that is, since we are defining the notions of terms and formulas inductively, would it be possible that there is a certain term or formula that is covered by our recursive definition, but can never be actually built using these operations? To show that firstorder terms/formulas are welldefined, a beginner might try to prove these induction principles by mathematical induction on the complexities of terms/formulas, but that would rely on the fact that the set of natural numbers \(\omega\) is wellordered so that we can apply induction on numbers; to justify things like this, it is essential to use set theory or secondorder logic, which we don’t even have until we define a firstorder logic. Thus, unavoidable circularity emerges if we try to look really fundamentally.
For now, we must appeal to a metatheory that we can easily convince ourselves by intuition, so that we will accept these induction principles and the notion of “naïve sets” (or collections, if we don’t want to abuse the formal term of sets too much). Notwithstanding, I believe that a prudent person can bootstrap theories like this without drawing in any inconsistency.
Remark 7.3. (Context freedom, unique readability and parentheses) Since the formations of firstorder terms and formulas make use of contextfree rules, one familiar with formal languages and automata theory might ask, “Are the set of terms/formulas contextfree languages?” Generally they are not, since our set \(V\) of variables (samely for predicate and function symbols) could be infinitely (or even uncountably) large, but a contextfree grammar requires that every set must be finite. However, in our firstorder language \(\mathcal{L}\), if these symbols can be effectively decidable, then there is an algorithm that accepts terms or formulas (or parses them). Furthermore, such parses are guaranteed to be unique, as shown by the Unique Readability Theorems in Enderton p. 105ff. Indeed, the inclusion of parentheses in our firstorder language enables us to write any formula unambiguously. If we leave out all the parentheses, does a formula like \(\forall x P x \to \lnot Q x\) mean \((\forall x P x \to (\lnot Q x))\) or \(\forall x (P x \to (\lnot Q x))\)? An alternative syntax would be to use logical connectives in a prefix manner, e.g., \(\to \forall x P x \lnot Q x\) and \(\forall x \to P x \lnot Q x\), but that is hardly as comprehensible as our chosen syntax.
Remark 7.4. (Abbreviations on notation) Why don’t we have the existential quantifier \(\exists\) and some other connectives such like \(\land\), \(\lor\) and \(\leftrightarrow\), in our language? Because any firstorder formula that makes use of these symbols can be seen as syntactical abbreviations and should be rewritten using \(\forall\), \(\to\) and \(\lnot\), as will be shown. A deeper reason is that \(\{ \to, \lnot \}\) is a functionally complete set of Boolean algebraic operators that is sufficient to express all possible truth tables in propositional logic. On the other hand, a formula like \(\exists x \varphi\) is just \((\lnot \forall x (\lnot \varphi))\), following from our understanding of what an existential quantification is.
Remark 7.5. (Sentences and truth values) In propositional logic, we don’t generally know whether a formula evaluates to true until every sentential symbol is assigned a truth value. (Sometimes we can tell the truth value with a little less information than what is required, if we apply a socalled shortcircuit evaluation strategy, e.g., if \(A_1\) is false then we immediately know \((A_1 \to A_2)\) is true, or if \(A_2\) is true then \((A_1 \to A_2)\) is also true. But it is not the general case, and one should expect to evaluate both \(A_1\) and \(A_2\) before getting the answer.) Similarly, in a firstorder logic, every free variable needs to have a definite assignment so as to give rise to the truth value of a formula. This is done by specifying a function \(s\) (where \(\operatorname{dom} s = V\) is the set of all variables) as the assignment of variables, and when applying \(s\) to a formula \(\varphi\) we get \(\varphi[s]\), which is a sentence that has a definite meaning (i.e., no variable occurs free). Note that the assignment of variables alone is not sufficient to determine the truth of a sentence – For example, \((P x y \to P x f y)\ [s(x \,\, 0)(y \,\, 0)]\) is a sentence since no variable occurs free in it, but we can’t decide whether it is true because we don’t know what the predicate \(P\) and the function \(f\) are. If we say, \(P\) is the arithmetic “lessthan” relation and \(f\) is the successor function \(f : x \mapsto x + 1\), then we can tell that this is a true sentence (in fact \(P x y\) is false as \(0 < 0\) is false, but \(P x f y\) is true as \(0 < 1\) is true, so the overall sentence as a conditional is true). We could write \(P\) and \(f\) as \(<\) and \(S\), but the conventional interpretation of these symbols should not be taken for granted as if every symbol comes with an inherited meaning – They don’t, until we give them meanings.
Remark 7.6. (Dichotomy of semantic truthness and the liar’s paradox) It should be made clear from the definition that given a structure and an assignment, either \(\models_\mathfrak{A} \varphi[s]\) (exclusive) or \(\not\models_\mathfrak{A} \varphi[s]\), but not both! It follows from our intuition that a statement is either semantically true or false; and there is no third possibility.
A problem arises with selfreferential terms, woefully: Assume that we have a firstorder language \(\mathcal{L}\) with a 1place predicate symbol \(P\), and the structure \(\mathfrak{A}\) assigns it the domain \(\mathfrak{A} = \text{Formula}(\mathcal{L})\), \(P\) is interpreted as \(P^\mathfrak{A} = \{ \langle \sigma \rangle \,:\, \models \sigma \}\), that is, \(\sigma \in P^\mathfrak{A}\) iff \(\models \sigma\). Let the sentence \(\tau\) be \((\lnot P x)\) and the assignment \(s : V \to \mathfrak{A}\) maps the variable \(x\) to the sentence \(\tau\), then is \(\tau[s]\) true or false in \(\mathfrak{A}\)? If we take \(\tau[s]\) as true, that is, \((\lnot P \tau)\) is true, then \(P \tau\) must be false, so \(\tau \not\in P^\mathfrak{A}\) thus \(\not\models \tau\). If we take \(\tau[s]\) as false, that is, \((\lnot P \tau)\) is false, then \(P \tau\) must be true, so \(\tau \in P^\mathfrak{A}\) thus \(\models \tau\). This is known as the classical liar’s paradox. One possible way to resolve this (given by Alfred Tarski) is by disabling impredicativity in our structures; more precisely, one can define a semantic hierarchy of structures that allows us to predicate truth only of a formula at a lower level, but never at the same or a higher level. This matter is far beyond the scope of this summary, but the important lesson to learn here is that it is generally a bad idea to allow something both true and false in our semantics; it would put our enduring effort to cumulate all “mathematical truths” into void.
Remark 7.7. (Decidability of truth/validity) In propositional logic, it is easy to see that given a truth assignment of sentential symbols, every formula can be decided for its truth or falsehood. Moreover, even without any truth assignment, one can enumerate a truth table to find out whether a given formula is a tautology. Truth and validity are decidable in propositional logic. However, this is often not the case in firstorder logic: In order to decide whether a sentence is true, one needs to find the truth values of all prime formulas (i.e., formulas like \(P t_1 \cdots t_n\) and \(\forall v_i \psi\)) first, but the domain \(\mathfrak{A}\) may be an (uncountably) infinite set, thus makes it impossible to mechanically check the universal quantification for all members; moreover, the functions used in building terms may not be Turingcomputable at all. To decide the validity of a sentence, we have to check its truth in all structures of the language (whose set may also be uncountably large), and that is an even more impossible task.^{2}
If semantic truth/validity is generally undecidable, how do we say that some formula is true in a predefined structure? Well, we can’t, in most general cases, since an infinite argument of truth is a useless argument (you can’t present it to someone / some Turing machine, as no physical device is capable of handling such an infinite object). Fortunately, there is a feasible way to say something is true, without appealing to any specific structures (that may give rise to unwanted undecidability), and that is called a formal deduction (also called a proof, expectedly).
Remark 7.8. (Validity of logical axioms) It should be intuitively clear that all logical axioms are convincing, and that their validity can be argued without appealing to any specific model. In particular, for an axiom \(\theta \in \Lambda\), there is \(\vdash \theta\); we must be able to argue (in our metalanguage) that \(\models \theta\), so that we can be convinced that our deductive system is a sound one. Remember that for any formula \(\varphi\), either \(\models \varphi\) or \(\not\models \varphi\) (which is just \(\models (\lnot \varphi)\)). If a proof of \(\theta\) (not as a formal deduction, but as an argument in our metalanguage) does not even imply \(\models \theta\), that would be very frustrating.
Remark 7.9. (Tautological implication, logical implication and deduction) If \(\Gamma \models_t \varphi\) (i.e., \(\varphi\) is tautologically implied by \(\Gamma\) in propositional logic), we can argue that \(\Gamma \models \varphi\) when replacing sentential symbols by prime formulas in firstorder logic. In the special case that \(\Gamma = \emptyset\), we are proving the validity of Axiom Group 1: \(\models_t \alpha \implies \models \alpha\) (every tautology is valid). The converse does not hold though, since we have \(\models (\alpha \to \forall x \alpha)\) (by Axiom Group 4), but \(\not\models_t (\alpha \to \forall x \alpha)\) as \(\alpha\) and \(\forall x \alpha\) are two different sentential symbols (surely \((A_1 \to A_2)\) is not a tautology in propositional logic!).
It is worth noticing that even though \(\Gamma \models \varphi \not\implies \Gamma \models_t \varphi\), we do have \(\Gamma \models \varphi \iff \Gamma \cup \Lambda \models_t \varphi\). Intuitively, the set \(\Lambda\) of logical axioms gives us a chance to establish truths about quantifiers and equalities (other than treating these prime formulas as sentential symbols that are too unrefined for our firstorder logic). I haven’t done a proof of this, but I believe it should be nontrivial on both directions. Combining with Theorem 24B in Enderton p. 115, we get the nice result concluding that \[\Gamma \vdash \varphi \iff \Gamma \cup \Lambda \models_t \varphi \iff \Gamma \models \varphi\] which entails both the soundness and the completeness theorems. It is basically saying that these three things are equivalent:
Remark 7.10. (Soundness and completeness) The soundness and the completeness theorems together evidence the equivalence of consistency and satisfiability of a set of formulas, or that of validity and provability of a formula. The completeness theorem is by no means an obvious result; the first proof was given by Kurt Gödel in 1930^{3}, but the proof that we use today is given by Leon Henkin in 1949 [1], which easily generalizes to uncountable languages.
Remark 7.11. (Completeness and incompleteness) Note that the completeness theorem should not be confused with Gödel’s incompleteness theorems. Specifically, the completeness theorem claims that (unconditionally) every formula that is logically implied by \(\Gamma\) is also deducible from \(\Gamma\) (i.e., \(\Gamma \models \varphi \implies \Gamma \vdash \varphi\)), while the first incompleteness theorem claims that (under some conditions) some consistent deductive systems are incomplete (i.e., there is some formula \(\varphi\) such that neither \(\Gamma \vdash \varphi\) nor \(\Gamma \vdash (\lnot\varphi)\)). As is clearly seen, the incompleteness theorem is purely syntactical and matters for provability (or decidability, one might say). The aforementioned liar’s paradox, where our semantics raises a contradiction that neither \(\Gamma \models \varphi\) nor \(\Gamma \models (\lnot\varphi)\) reasonably holds, may be seen as a semantical analogue of the first incompleteness theorem.
The equality symbol \(=\) is a logical symbol, in the sense that the equivalence relation it represents is not open to interpretation and always means what it’s intended to mean (i.e., “the LHS is equal to the RHS”). But if so, how do we say \[1+1=2\] is a true sentence then? Can’t we just interpret the equality symbol as something else in a structure \(\mathfrak{A}\) such that \(\models_\mathfrak{A} (\lnot 1+1=2)\)?
One reason is that in many firstorder systems, functions (operations) are defined as axioms using equalities; we need a general way to say “something is defined as…” or just “something is…” There would be no better way of saying this rather than representing it as an equality, so we won’t have the hassle of interpreting a madeup relation in every model. Consider the language of elementary number theory, in the intended model \(\mathfrak{N} = (\mathbb{N}; \mathbf{0}, \mathbf{S}, +, \cdot)\) of Peano arithmetic, the addition function is defined as a set of domainspecific axioms: \[\begin{align*} a + \mathbf{0} &= a &\qquad(1) \\ a + \mathbf{S} b &= \mathbf{S} (a + b) &\qquad(2) \end{align*}\] By Axiom Group 6 we have \(\mathbf{S0} + \mathbf{0} = \mathbf{S0} \to (\mathbf{S0} + \mathbf{S0} = \mathbf{S}(\mathbf{S0} + \mathbf{0}) \to \mathbf{S0} + \mathbf{S0} = \mathbf{S}\mathbf{S0})\). By (1) \(\mathbf{S0} + \mathbf{0} = \mathbf{S0}\). By (2) \(\mathbf{S0} + \mathbf{S0} = \mathbf{S}(\mathbf{S0} + \mathbf{0})\). Applying modus ponens twice, we get \(\mathbf{S0} + \mathbf{S0} = \mathbf{S}\mathbf{S0}\), which is the result we want (sloppily written as \(1+1=2\)).
The equality is a little special, since it is the most common relation with reflexivity, as justified by Axiom Group 5, i.e., \(x = x\) for any variable \(x\). We could exclude these from our logical axioms, but in many cases we would still need a reflexive relation symbol to denote equivalence (justified by some domainspecific axioms) otherwise. Technically, it would be convenient to just treat it as a logical symbol (together with the useful Axiom Groups 5 and 6). Note that though our logical axioms did not say anything about other properties like symmetry, antisymmetry and transitivity, they follow easily from Axiom Groups 5 and 6, in our deductive system:
Lemma 7.12. (Symmetry) If \(x = y\), then \(y = x\).
Proof. Given \(x = y\). By Axiom Group 5 we have \(x = x\). By Axiom Group 6 we have \(x = y \to (x = x \to y = x)\). Applying modus ponens twice, we get \(y = x\).
■
Lemma 7.13. (Transitivity) If \(x = y\) and \(y = z\), then \(x = z\).
Proof. Given \(x = y\), by symmetry it holds \(y = x\). Also \(y = z\). By Axiom Group 6 we have \(y = x \to (y = z \to x = z)\). Applying modus ponens twice, we get \(x = z\).
■
Lemma 7.14. (Antisymmetry) If \(x = y\) and \(y = x\), then \(x = y\).
Proof. Trivial.
Note that if any partial order is definable on a structure, the equality symbol may not be indispensable in our language, e.g., consider the language of set theory, where the equivalence of two sets \(x = y\) may be defined as \[(\forall v (v \in x \to v \in y) \land \forall v (v \in y \to v \in x))\]
Consider again the language of elementary number theory, in the intended model \(\mathfrak{N} = (\mathbb{N}; \mathbf{0}, \mathbf{S}, +, \cdot)\) of Peano arithmetic, we have the set of all true sentences \(\text{Th}\mathfrak{N}\).^{4} Now we add a new constant symbol \(c'\) to our language, and extend our theory with a countable set of sentences \(\text{Th}\mathfrak{N} \cup \{ \underbrace{\mathbf{S} \cdots \mathbf{S}}_{n\ \text{times}} \mathbf{0} < c' \,:\, n \in \mathbb{N} \}\) (Here we define \(x < y\) as \(\lnot\forall z ((\lnot z = \mathbf{0}) \to (\lnot x + z = y))\)). Is there still a model for this extended theory?
For any given \(n \in \mathbb{N}\), the sentence \(\underbrace{\mathbf{S} \cdots \mathbf{S}}_{n\ \text{times}} \mathbf{0} < c'\) is clearly satisfiable (by simply taking \(c' = \mathbf{S} n\)). Then it is easily shown that every finite subset \(\Gamma_0 \subseteq \text{Th}\mathfrak{N} \cup \{ \underbrace{\mathbf{S} \cdots \mathbf{S}}_{n\ \text{times}} \mathbf{0} < c' \,:\, n \in \mathbb{N} \}\) is satisfiable. By the compactness theorem (b.), \(\text{Th}\mathfrak{N} \cup \{ \underbrace{\mathbf{S} \cdots \mathbf{S}}_{n\ \text{times}} \mathbf{0} < c' \,:\, n \in \mathbb{N} \}\) is also satisfiable. This means that we can construct a nonstandard model of arithmetic \(\mathfrak{N}'\) with an additional bizarre element (specifically \(c'_0\)) that turns out to be greater than any other number (thus the model of this theory is not isomorphic to our standard model \(\mathfrak{N}\)).
Recall that in a Peano arithmetic modeled by \(\mathfrak{N}'\), numbers are closed under the successor function \(\mathbf{S}\). More precisely, if \(k \in \mathfrak{N}'\) , then \(\mathbf{S}k \in \mathfrak{N}'\) and \(\mathbf{S}k \neq \mathbf{0}\). This implies that not only \(c'_0 \in \mathfrak{N}'\), but also \(\mathbf{S} c'_0, \mathbf{S}\mathbf{S} c'_0, \mathbf{S}\mathbf{S}\mathbf{S} c'_0, \dots\) are all nonstandard numbers in \(\mathfrak{N}'\). As none of these numbers is equal to \(\mathbf{0}\) (by one of Peano axioms), they form an infinite “chain” separately besides our familiar standard ones. One can write down all the (standard and nonstandard) numbers sloppily as the following sequence: \[\langle 0, 1, 2, \dots, \quad c'_0, c'_1, c'_2, \dots \rangle\] where \(0\) is just \(\mathbf{0}\), \(1\) is \(\mathbf{S0}\), \(2\) is \(\mathbf{SS0}\), \(c'_1\) is \(\mathbf{S} c'_0\), \(c'_2\) is \(\mathbf{SS} c'_0\), etc.
Clearly, every number but \(0\) and \(c'_0\) in the sequence has a unique predecessor. There is certainly no such a predecessor \(j\) of \(0\), because otherwise we would have \(\mathbf{S}j = \mathbf{0}\), contradicting our axioms. But can we have a predecessor of \(c'_0\)? There is no axiom preventing us from constructing that thing! So here we go, enlarge our collection of numbers to: \[\langle 0, 1, 2, \dots, \quad \dots, c'_{2}, c'_{1}, c'_0, c'_1, c'_2, \dots \rangle\] where for each \(c'_{i}\), \(c'_{i+1} = \mathbf{S} c'_{i}\). Again, we know that every such nonstandard numbers \(c'_i\) is greater than any standard number \(n\) (otherwise we can find a standard number \(ni\) such that \((\lnot ni < c'_0)\), contradicting our initial construction of \(c'_0\) by compactness. So the nonstandard part is still a separate chain, thus as written above.
We can go even further. Let \(\mathfrak{N}'\) be this set of standard and nonstandard numbers, and \(\mathfrak{N}' = (\mathfrak{N}'; \mathbf{0}, \mathbf{S}, +, \cdot)\) is still the intended model of Peano arithmetic on \(\mathfrak{N}'\). Consider adding yet another constant symbol \(c''\). Is \(\text{Th}\mathfrak{N}' \cup \{ \underbrace{\mathbf{S} \cdots \mathbf{S}}_{n'\ \text{times}} \mathbf{0} < c'' \,:\, n' \in \mathfrak{N}' \}\) satisfiable? By the same reasoning as before, we get a model \(\mathfrak{N}''\), with its domain of numbers \[\langle 0, 1, 2, \dots, \quad \dots, c'_{2}, c'_{1}, c'_0, c'_1, c'_2, \dots, \quad \dots, c''_{2}, c''_{1}, c''_0, c''_1, c''_2, \dots \rangle\]
Counterintuitively, this is not the kind of “arithmetic” we are used to. What we are trying to do is to formulate the arithmetic for standard natural numbers that we use everyday (i.e., \(0, 1, 2, \dots\)) in firstorder logic, but these nonstandard numbers come out of nowhere and there is an infinite “stage” of them, such that each number in a latter stage is greater than every number in a former stage (How is that even possible?). And what is worse, in a subset of the nonstandard model \(\mathfrak{N}'\): \[\{ \dots, c'_{2}, c'_{1}, c'_0, c'_1, c'_2, \dots \}\] There is no minimal element with respect to the intended ordering relation \(<\), thus it is not wellordered by \(<\), so our good old mathematical induction will no longer work with nonstandard numbers.
Well, the lucky part is that we can at least have the induction axiom as a firstorder sentence: \[\varphi(\mathbf{0}, y_1, \dots, y_k) \land \forall x (\varphi (x, y_1, \dots, y_k) \to \varphi (\mathbf{S}(x), y_1, \dots, y_k)) \to \forall x \varphi(x, y_1, \dots, y_k))\] Since the standard model \(\mathfrak{N}\) and the nonstandard model \(\mathfrak{N}'\) are elementarily equivalent (i.e., they satisfy the same sentences in a language excluding nonstandard numbers), we still enjoy the nice induction principle for all of standard natural numbers. But for the nonstandard part, we’re out of luck.
Ideally, we would like to have a bunch of axioms that perfectly defines the model of arithmetic, where no nonstandard part is allowed to exist, i.e., the set of numbers is wellordered by a definable ordering relation \(<\) so that we can apply the induction principle on all of them.^{5} Unfortunately, this is infeasible in firstorder logic (without formulating the notion of sets). We will demonstrate two potential resolutions.
The intuition here is that in order to rule out any nonstandard chains of numbers, we must find a 1place predicate \(P \subseteq \mathfrak{N}\) such that for every standard number \(n\) we have \(P n\), distinguishing it from \((\lnot P n')\) where \(n'\) is nonstandard. Certainly \(\mathbf{0}\) is a standard number; consequently every standard number \(x\) is followed by \(\mathbf{S}x\), which is also a standard one. \[P \mathbf{0} \land \forall x (P x \to P \mathbf{S} x)\] Once we have this \(P\) in mind, we say that every number \(n \in P\), so it is also standard. There would be no other numbers in our model, thus define our set of all numbers: \[\forall n P n\] Notice that \(P\) is not in our language; it is something we have yet to construct for our standard model. How to deal with this issue?
Books:
Herbert B. Enderton, A Mathematical Introduction to Logic, 2nd ed.
Kenneth Kunen, The Foundations of Mathematics.
Articles:
Terence Tao, “The completeness and compactness theorems of firstorder logic,” https://terrytao.wordpress.com/2009/04/10/thecompletenessandcompactnesstheoremsoffirstorderlogic/.
Asger Törnquist, “The completeness theorem: a guided tour,” http://www.math.ku.dk/~asgert/teachingnotes/imlcompletenessguide.pdf.
Eliezer Yudkowsky, “Godel’s completeness and incompleteness theorems,” http://lesswrong.com/lw/g1y/godels_completeness_and_incompleteness_theorems/.
Eliezer Yudkowsky, “Standard and nonstandard numbers,” http://lesswrong.com/lw/g0i/standard_and_nonstandard_numbers/.
David A. Ross, “Fun with nonstandard models,” http://www.math.hawaii.edu/~ross/fun_with_nsmodels.pdf.
Papers:
[1] L. Henkin, “The completeness of the firstorder functional calculus,” The journal of symbolic logic, vol. 14, no. 3, pp. 159–166, 1949.
Is there a constructive approach as a replacement of Henkin’s construction? https://math.stackexchange.com/questions/1462408/isthereaconstructiveapproachasareplacementofhenkinsconstruction↩
We are using the notion of decidability/undecidability here even before we get to Gödel’s incompleteness theorem, but hopefully it’s no stranger to us as computability theory has all the modelspecific issues (though nonlogical) covered.↩
Gödel’s original proof of the completeness theorem: https://en.wikipedia.org/wiki/Original_proof_of_G%C3%B6del%27s_completeness_theorem↩
It might be interesting to know that \(\text{Th}(\mathbb{N}; \mathbf{0}, \mathbf{S}, +, \cdot)\) is an undecidable theory, as shown by the aforementioned incompleteness theorem.↩
If we accept the Axiom of Choice, then every set can have a wellordering; so this is actually a reasonable request.↩
Many logicians (Kurt Gödel, Thoralf Skolem, Willard Van Quine) seem to adhere to firstorder logic. And that’s why.↩
libgjs.so.0.0.0
, leaving me an unresponsive desktop:
kernel: gnomeshell[963]: segfault at 7f06645fffe8 ip 00007f06d94898dd
sp 00007fff562ee7e0 error 4 in libgjs.so.0.0.0[7f06d9
On crashing GNOME Shell does not restart itself correctly, nor gnomeshell
processes are actually terminated, for some unknown reason. When this happens Mutter is completely frozen, Alt+F2 is unavailable; what’s even worse, Ctrl+Alt+F3 can’t get you a lifesaving TTY.
Clearly, the only way one can access the box then is via SSH. But what to do next? While it is possible to simply restart gdm
like: (for systemd users)
$ systemctl restart gdm
This would, however, destroy the whole X session and all your open GUI processes will be lost! A desirable, more elegant approach is to restart gnomeshell
alone, but this requires a few tricks in case you’re working through ssh
and killall 9 gnomeshell
doesn’t restart GNOME Shell properly.
/etc/ssh/sshd_config
, if you’ve never done it before:sshd
on the remote machine after then.) X11Forwarding yes
$ ssh yourUser@yourHost X
gnomeshell
processes, if any: $ pkill gnomeshell
DISPLAY
environment otherwise you won’t be able to run a WM from a remote shell: $ export DISPLAY=:0
$ setsid gnomeshell
Related Gjs bug(s). The bug was there since gjs 1.48 and has been reported multiple times: #781799, #783935, #785657. Still, a recent gjs 1.49 build crashes my desktop: ( It seems rather hard to find out what would trigger the issue, but no more worries when it bumps in the night – Just get back the WM and no running window is lost.
]]>\[\text{FL} \subsetneq \text{RL} \subsetneq \text{DCFL} \subsetneq \text{CFL} \subsetneq \text{CSL} \subsetneq \text{R} \subsetneq \text{U} \subsetneq \mathcal{P}(\mathcal{P}(\Sigma^*)) \]
Grammars provide a mathematical, recursive perspective for describing languages, while automata provide a physically implementable, iterative approach of recognizing languages (via finite states and possibly infinite stacks). A language is Turingrecognizable if and only if it has a welldefined grammar (so that it’s possible to construct some Turing machine that recognizes it), although it can sometimes be hard to formulate. Quite naturally, if we cannot specify a grammar for a language \(L \subset \mathcal{P}(\Sigma^*)\) theoretically, it would be impossible to construct a Turing machine that recognizes it.
We consider mainly three types of decision problems concerning different computational models: (generative grammars may be viewed as a form of machines like automata, in the following setting)
Model  Language  Acceptance problem  Emptiness problem  Equality problem 

DFA (Deterministic finite automaton) 
Regular language: decidable  \(A_\textsf{DFA}\): decidable  \(E_\textsf{DFA}\): decidable  \(EQ_\textsf{DFA}\): decidable 
NFA (Nondeterministic finite automaton) 
Regular language: decidable  \(A_\textsf{NFA}\): decidable  \(E_\textsf{NFA}\): decidable  \(EQ_\textsf{NFA}\): decidable 
Regular expression  Regular language: decidable  \(A_\textsf{REX}\): decidable  \(E_\textsf{REX}\): decidable  \(EQ_\textsf{REX}\): decidable 
CFG (Contextfree grammar) 
Contextfree language: decidable  \(A_\textsf{CFG}\): decidable  \(E_\textsf{CFG}\): decidable  \(EQ_\textsf{CFG}\): undecidable 
PDA (Pushdown automaton) 
Contextfree language: decidable  \(A_\textsf{PDA}\): decidable  \(E_\textsf{PDA}\): decidable  \(EQ_\textsf{PDA}\): undecidable 
CSG (Contextsensitive grammar) 
Contextsensitive language: decidable  \(A_\textsf{CSG}\): decidable  \(E_\textsf{CSG}\): undecidable  \(EQ_\textsf{CSG}\): undecidable 
LBA (Linear bounded automaton) 
Contextsensitive language: decidable  \(A_\textsf{LBA}\): decidable  \(E_\textsf{LBA}\): undecidable  \(EQ_\textsf{LBA}\): undecidable 
Turing machine and equivalent Turingcomplete models  Turingrecognizable language: may be decidable or undecidable  \(A_\textsf{TM}\): undecidable  \(E_\textsf{TM}\): undecidable  \(EQ_\textsf{TM}\): undecidable (not Turingrecognizable) 
The fact that \(A_\mathsf{TM}\) is undecidable implies that a Turing machine may not halt on some input \(w\). Consider the following problem: (the halting problem [1]) \[\textit{HALT}_\textsf{TM} = \{ \langle M, w \rangle\  \ M \text{ is a TM, and } M \text{ halts on input } w \}\]
Corollary 6.1. \(\textit{HALT}_\textsf{TM}\) is undecidable.
Proof. (By contraposition) Assume that there exists a Turing machine \(R\) that decides \(\textit{HALT}_\textsf{TM}\). We construct a Turing machine \(S\) that decides \(A_\mathsf{TM}\), using \(R\):
\(S =\) “On input \(\langle M, w \rangle\):
 Run \(R\) on input \(\langle M, w \rangle\).
 If \(R\) rejects, reject.
 If \(R\) accepts (so that we know \(M\) always halt on \(w\)), simulate \(M\) on \(w\) until it halts.
 If \(M\) accepts, accept; otherwise, reject.’’
If \(R\) decides \(\textit{HALT}_\textsf{TM}\), then \(S\) decides \(A_\mathsf{TM}\) by construction. Since \(A_\mathsf{TM}\) is undecidable, such an \(R\) that decides \(\textit{HALT}_\textsf{TM}\) does not exist.
■
Apart from Turing’s original halting problem, other undecidable problems of historical importance include:
Some problems, such as \(EQ_\mathsf{TM}\), are not even Turingrecognizable. Another example is \(MIN_\mathsf{TM}\): \[MIN_\mathsf{TM} = \{ \langle M \rangle\  \ M \text{ is a minimal Turing machine} \}\]
Theorem 6.2. \(MIN_\mathsf{TM}\) is not Turingrecognizable.
Proof. Assume that there exists an enumerator \(E\) that enumerates \(MIN_\mathsf{TM}\) (note that an enumerator is equivalent to a Turing machine). We construct a Turing machine \(C\) using \(E\):
\(C =\) “On input \(w\):
 Obtain the own description \(\langle C \rangle\). (by the recursion theorem)
 Run \(E\) until a machine \(D\) appears with a longer description than that of \(C\).
 Simulate \(D\) on \(w\).’’
Since \(MIN_\textsf{TM}\) is infinitely large but the description of \(C\) is of finite length, the enumerator \(E\) must eventually terminate with some \(D\) that has a longer description than that of \(C\). As \(C\) simulates \(D\) in the last step, \(C\) is equivalent to \(D\). But then the description of \(C\) is shorter than that of \(D\), thus \(D\) could not be an output of \(E\) (which enumerates only \(MIN_\mathsf{TM}\)). That is a contradiction. Therefore, such an enumerator \(E\) for \(MIN_\mathsf{TM}\) does not exist, that is, \(MIN_\mathsf{TM}\) is not Turingrecognizable.
■
The Turingunrecognizability of \(MIN_\textsf{TM}\) implies that the Kolmogorov complexity (descriptive complexity) \(K(x)\) is not a computable function.
Per the ChurchTuring thesis, a Turing machine (one finite automaton + one infinite linear table) defines the “limitation of computation”. Several hypothetical models (sometimes referred to as hypercomputation [5] as they are assumed to have the abilities to solve nonTuringcomputable problems) have been proposed for the sake of theoretical interest:
Note that standard (qubitbased) quantum computers are PSPACEreducible, thus they are still a Turingcomplete model of computation (i.e., quantum computers cannot be real computers or Zeno machines, and they do not break the ChurchTuring thesis). Moreover, it has been proposed that the simulation of every (quantum) physical process, where only computable reals present, is actually a Turingcomputable problem (ChurchTuringDeutsch principle).
Finite languages. A finite language consists of a finite set of strings, thus it may be written as a regular expression of a finite union of those strings. It may be recognized by a timeindependent combinational circuit, which can be viewed as a special form of acyclic finite automaton. (See also the blog post: What I Wish I Knew When Learning Boolean Circuits) There are several unsolved problems concerning the circuit complexity.
Generalized regular expression. A generalized regular expression is defined as \[R ::= a\ \ \varepsilon\ \ \emptyset\ \ (R_1 \cup R_2)\ \ (R_1 \cap R_2)\ \ (R_1 \circ R_2)\ \ (\lnot R_1)\ \ (R_1^*)\] where \(a \in \Sigma\), \(R_1\) and \(R_2\) are generalized regular expressions. Although it comes with two extra operators (intersection \(\cap\) and complementation \(\lnot\)), it actually has the same representational power as regular expressions, due to the fact that the class of regular languages is closed under intersection and complementation.
Starfree languages. A starfree language is a regular language that can be described by a generalized regular expression but without the Kleene star operation. Clearly, this class includes all finite languages. One example of an infinite starfree language is \(a^*\), because \(a^* = \lnot((\lnot\emptyset) \circ (\lnot{a}) \circ (\lnot\emptyset))\). On the other hand, \((a \circ a)^*\) is not starfree.
(Unsolved) Generalized star height problem. Can all regular languages be expressed using generalized regular expressions with a limited nesting depth of Kleene stars (star height)? Moreover, is the minimum required star height a computable function? For example, \(((((a \circ a)^*)^*)\cdots)^*\) can be expressed as \((a \circ a)^*\), which has a minimum required star height of 1 (but it is not starfree). The general result and whether the minimum star height is decidable are not yet known.
Linear bounded automata. LBAs provide an accurate model for realworld computers, which have only bounded memories. Given sufficient memory on an LBA, we can simulate another (smaller) LBA; a practical example would be a virtual machine running on VirtualBox or QEMU.
(Unsolved) LBA problem. Is the class of languages accepted by LBA equal to the class of languages accepted by deterministic LBA? Or, in terms of the complexity theory, is \(\text{SPACE}(n) = \text{NSPACE}(n)\)? We know that DFA and NFA accept the same class of languages, so do TM and NTM, while PDA and DPDA are not completely equivalent. However, it is still an open question whether LBA and DLBA are equivalent models. Note that it is known that \(\text{PSPACE} = \bigcup_k \text{SPACE}(n^k)\) is equivalent to \(\text{NPSPACE} = \bigcup_k \text{NSPACE}(n^k)\), that is, deterministic polynomially bounded automata and nondeterministic polynomially bounded automata are equivalent, by Savitch’s theorem.
Books:
M. Sipser, Introduction to the Theory of Computation, 3rd ed.
J. E. Hopcroft, R. Motwani, J. D. Ullman, Introduction to Automata Theory, Languages, and Computation, 3rd ed.
Articles:
M. Davis, “What is a computation?” Mathematics Today: Twelve Informal Essays, pp. 241–267, 1978.
D. Zeilberger, “A 2minute proof of the 2ndmost important theorem of the 2nd millennium,” http://www.math.rutgers.edu/~zeilberg/mamarim/mamarimTeX/halt.
Papers:
[1] A. M. Turing, “On computable numbers, with an application to the Entscheidungsproblem,” Proceedings of the London Mathematical Society, vol. 2, no. 1, pp. 230–265, 1937.
[2] E. L. Post, “A variant of a recursively unsolvable problem,” Bulletin of the American Mathematical Society, vol. 52, no. 4, pp. 264–268, 1946.
[3] E. L. Post, “Recursive unsolvability of a problem of Thue,” The Journal of Symbolic Logic, vol. 12, no. 01, pp. 1–11, 1947.
[4] T. Rado, “On noncomputable functions,” Bell System Technical Journal, vol. 41, no. 3, pp. 877–884, 1962.
[5] M. Davis, “The myth of hypercomputation,” in Alan Turing: Life and legacy of a great thinker, Springer, 2004, pp. 195–211.
pacman Syu
(so well as to dare any potential issues due to Arch’s infamous instability) earlier today, weird things did happen: Something other than pacman
was eating up my CPU. I assumed it was Chromium in the first place, unless it wasn’t this time. It turned out to be gnomesettingsdaemon
together with some other GUI processes running on GNOME, that were consistently hogging my CPU in turn. And that made a percent of 100% (!). As I killed the seemingly problematic process, something else would insist to emerge on the top of top
with similarly excessive CPU usage. I decided that this was an unusual situation and without much further thought, forced a reboot
while pacman
was still running. Then I ended up with a corrupted system: the kernel didn’t load during the early boot process, complaining about missing modules.devname
and that /dev/sda11
(which is the root partition) could not be found:
Warning: /lib/modules/4.10.81ARCH/modules.devname not found  ignoring
starting version 232
Waiting 10 seconds for device /dev/sda11 ...
ERROR: device '/dev/sda11' not found. Skipping fsck.
mount: you must specify the filesystem type
You are now being dropped into an emergency shell.
Without proper modules loaded even the keyboard was not working, so I couldn’t do anything under the emergency shell. But I wasn’t completely out of luck – I have a dualbooting of Arch and Ubuntu, which is prepared just for recovery purposes like this. (otherwise I had to make room for another Live USB, which was a little bit inconvenient; I don’t use those rescueware for years)
Recover an (unbootable) Arch system from another distro: This is mostly relevant if you can’t boot into the system (due to a corrupted kernel upgrade), otherwise it’s possible to just login from TTY (locally or via SSH) and perform the fix (e.g., in the case that it’s just X, display manager or WM that fails to load but the kernel boots well).
/dev/sda11
) # TARGETDEV="/dev/sda11"
# TARGETDIR="/mnt/arch"
# mount $TARGETDEV $TARGETDIR
# mount t proc /proc $TARGETDIR/proc
# mount rbind /sys $TARGETDIR/sys
# mount rbind /dev $TARGETDIR/dev
# cp /etc/hosts $TARGETDIR/etc
# cp /etc/resolv.conf $TARGETDIR/etc
# chroot $TARGETDIR rm /etc/mtab 2> /dev/null
# chroot $TARGETDIR ln s /proc/mounts /etc/mtab
# chroot $TARGETDIR
Alternatively, use the archchroot
script from an Arch bootstrap image:
# archchroot $TARGETDIR
pacman
upgrades and clean up any lock file if needed: [/mnt/arch]# pacman Syu
Or, if there’s any unwanted package upgrade that causes the system to fail, downgrade it.
The important thing to remember is that, while continuing, a previously failed transaction will not run its hooks anymore. Therefore it might be wiser to find out which exact packages pacman
was trying to upgrade in the last transaction (from /var/log/pacman.log
) and reinstall all of them, rather than a complementing pacman Syu
. But if it’s known which package is causing the problem (and in this very common case, linux
), simply run the following to regenerate initramfs for the new kernel:
[/mnt/arch]# mkinitcpio p linux
And we are done. A few commands and our Arch system is now back and booting.
Why the mess? Initramfs images need to be regenerated (from a corresponding mkinitcpio preset file) after each kernel update. Starting from Arch’s package linux 4.8.82
, the calling of mkinitcpio
was specified in a PostTransaction alpm hook, in place of an explicit post_install()
command to be invoked from the .install
script as before: (The move towards alpm hooks is said to be a fix for #51818, as two or more packages in one transaction may need the regeneration of initramfs images)
[Trigger]
Type = File
Operation = Install
Operation = Upgrade
Target = boot/vmlinuz%PKGBASE%
Target = usr/lib/initcpio/*
[Action]
Description = Updating %PKGBASE% initcpios
When = PostTransaction
Exec = /usr/bin/mkinitcpio p %PKGBASE%
The faulty part about PostTransaction alpm hooks is that they run in a posttransaction manner; in the context of Arch Linux, a “transaction” means a complete run of pacman
, regardless of how many packages it installs or upgrades. Such PostTransaction hooks will not run until all preceding operations succeed. Quoted from the CAVEATS section in the alpmhooks(5) Manual Page:
“PostTransaction hooks will not run if the transaction fails to complete for any reason.”
This (kind of) misbehavior could lead to unintended aftermath in a failed system upgrade, as it does in the case of the linux
package. Confusingly, a pacman
“transaction” can be an arbitrary set of unrelated package updates; it could possibly fail to complete at any point (e.g., .install
script errors, process accidentally killed, or even power lost), and ideally, it should remain safe while encountering such failures before commit. However, since the generation of initramfs
in the PostTransaction hook is actually an indispensable step for the newly upgraded linux
package here (whether other packages complete their upgrades or not), it is not safe if a pacman
“transaction” fails halfway; PostTransaction hook won’t run and all you get is a halfway system that doesn’t boot (so you’ll need to fix that through a fallback or alternative kernel). To see this problem more clearly, think of the following two scenarios:
A successful system upgrade:
pacman Syu
sanity check (for packages to upgrade)

start upgrading package L (requires hook M to run)

finish upgrading package L
start upgrading package A

finish upgrading package A
start upgrading package B

finish upgrading package B
complete transaction
run posttransaction hook M

done.
[packages L, A, B are upgraded; hook M is run; working system]
An (ungracefully) failing system upgrade:
pacman Syu
sanity check (for packages to upgrade)

start upgrading package L (requires hook M to run)

finish upgrading package L
start upgrading package A

finish upgrading package A
start upgrading package B

crash!
[packages L, A are upgraded; hook M is not run; broken system]
How to perform a system upgrade safely? Although it ultimately depends on how we define a “safe system upgrade” and how much we want it, some robustness may be achieved at least. It’s clear to see that the upgrade of package L
and the run of hook M
should be dealt as one transaction, or treating M
in a posttransaction way will put us on the risk that if some other package fails to upgrade (or maybe it’s just computer crashes), we end up with having only L
upgraded, but not M
run (while it is desirable to have both or none).
A (gracefully) failing system upgrade:
sanity check (for packages to upgrade)

start upgrading package L (requires hook M to run)

finish upgrading package L
run hook M

complete transaction
start upgrading package A

finish upgrading package A
start upgrading package B

crash!
[packages L, A are upgraded; hook M is run; working system]
This is exactly the way things worked before (using the oldfashioned post_install()
): It’s less fragile under possible failures, although in order to fix #51818 (another package L2
also requires hook M
to run), we’ll need to manage the ordering of package upgrades properly:
start upgrading package L (requires hook M to run)

finish upgrading package L
start upgrading package L2 (requires hook M to run)

finish upgrading package L2
run hook M

complete transaction
start upgrading package A
...
That is, L
, L2
and M
are one transaction. It is not allowed to have only L
and L2
, or only L
, without a run of hook M
. Other unrelated packages (A
, B
, …) are prevented from causing such issues in the above scheme. There is still a possibility that the process crashes during the phase of L
, L2
or M
, but the chance would be much smaller, because we are working with a transaction consisting of the strongly coupling upgrades of L
, L2
and M
, not a whole pacman
“transaction” consisting of easily hundreds of arbitrary system upgrades and posttransaction hooks. Even better, in case of such failures, we can ask to redo uncommitted transactions anyway so we don’t rerun hooks manually:
redo uncommitted transactions (upgrades & hooks)

start upgrading package B
...
I don’t know if there’s any software management tool implementing this fully recovering approach. Transactions need to be more refined (pacman
’s “all is a transaction” is really a naïve semantics to have) and the ordering of upgrades is essential, in order to give best robustness in case of possible crashes during a system upgrade, so it might not be as implementable as most package managers are (although still far less sophisticated than a realworld database). Or – with reasonable efforts – if we cannot guarantee anything for a transaction – maybe a Windowsinspired “userfriendly” warning message like “Don’t turn off your computer” is good enough. (At least you’ll know when you’re breaking shit: a transaction is not completed yet, some necessary hooks are not run, and you’re on your own then.)
[1] ArchWiki, “Install from existing Linux”. https://wiki.archlinux.org/index.php/Install_from_existing_Linux
[2] ArchWiki, “Change root”. https://wiki.archlinux.org/index.php/change_root
[3] ArchWiki, “mkinitcpio”. https://wiki.archlinux.org/index.php/mkinitcpio
[4] alpmhooks(5) Manual Page. https://www.archlinux.org/pacman/alpmhooks.5.html
\frac{
\stackrel{\mathcal{E}_0}
{\langle b, \sigma \rangle \downarrow \textbf{true}} \qquad
\stackrel{\mathcal{E}_1}
{\langle c_0, \sigma \rangle \downarrow \sigma''} \qquad
\stackrel{\mathcal{E}_2}
{\langle \textbf{while } b \textbf{ do } c_0, \sigma'' \rangle \downarrow \sigma'}
}{
\langle \textbf{while } b \textbf{ do } c_0, \sigma \rangle \downarrow \sigma'}
I could just write it more neatly, intuitively, without all those hexes in magic backslashes and braces:
<b, sig> ! true  E_0
<c_0, sig> ! sig''  E_1
<while b do c_0, sig''> ! sig'  E_2

<while b do c_0, sig> ! sig'
But, of course, preserving the nicelooking outcome (from either TeX or MathJax).
Besides natural derivations, we also got a lot of evidently agreeable conventions for typesetting general mathematics: parentheses (brackets, braces) are almost always paired; “\left
” and “\right
” are very often desired since \(\left[\frac{1}{2}\right]\) is a bit less awkward than \([\frac{1}{2}]\); when referring to a function name, “\operatorname
” looks aesthetically better; etc. Furthermore, if we write down some math notations in plain text, clearly, “=>
” has to be a “\(\Rightarrow\)” and “>
” has to be a “\(\mapsto\)”; “
” means “\(\vdash\)” and “=
” means “\(\vDash\)”; a standalone letter “N
” is just \(\mathbb{N}\); etc. Taking such matters into consideration, there could be some handy alternative syntax (I call it “lazybones’ syntax”) for typesetting formulas, at least for a specific field someone specializes in, where these conventions are consistent:
Typesetting outcome  Syntax  

Provability  \(\Gamma \vdash \varphi\)  TeX:\Gamma \vdash \varphi Lazybones’: Gam  phi 
Validity  \(\Gamma \vDash \varphi\)  TeX:\Gamma \vDash \varphi Lazybones’: Gam = phi 
Bigstep semantics  \(\langle b_0 \land b_1, \sigma \rangle \downarrow t\)  TeX:\langle b_0 \land b_1, \sigma \rangle \downarrow t Lazybones’: <b_0 && b_1, sig> ! t 
Smallstep semantics  \(\sigma \vdash b_0 \land b_1 \to^* t\)  TeX:\sigma \vdash b_0 \land b_1 \to^* t Lazybones’: sig  b_0 && b_1 >* t 
Hoare logic  \(\vdash \{A\} \textbf{if } b \textbf{ then } c_0 \textbf{ else } c_1 \{B\}\)  TeX:\vdash \{A\} \textbf{if } b \textbf{ then } c_0 \textbf{ else } c_1 \{B\} Lazybones’:  <[A]> if b then c_0 else c_1 <[B]> 
Denotational semantics  \(\mathcal{C}[\![X:=a]\!] = \lambda \sigma . \eta (\sigma[X \mapsto \mathcal{A}[\![a]\!] \sigma])\)  TeX:\mathcal{C}[\![X:=a]\!] = \lambda \sigma . \eta (\sigma[X \mapsto \mathcal{A}[\![a]\!] \sigma]) Lazybones’: C[[X:=a]] = lam sig . eta (sig[X > A[[a]] sig]) 
For simplicity, we require that all such transformations are regular, i.e., “lazybones’ syntax” may be translated into plain TeX code using merely substitutions of regular expressions.
As a basic example, let’s consider angle brackets, for which we desire to type in simply “<
” and “>
” instead of “\langle
” and “\rangle
”. To avoid any ambiguity (with normal lessthan or greaterthan sign), it is required that such brackets have no internal spacing, that is, we write “<x, y>
” rather than “< x, y >
”, but “1 < 2
” instead of “1<2
”. Once we fix the transformation rules, implementation would be quite straightforward in Haskell with Text.Regex
, if we want a pandoc filter to handle these substitutions for everything embedded in TeX math mode:
subLAngle s =
subRegex (mkRegex "<([^[:blank:]])") s $ "\\langle \\1"
subRAngle s =
subRegex (mkRegex "([^[:blank:]])>") s $ "\\1\\rangle "
semType m@(Math mathType s) = do
let s' = subLAngle $ subRAngle s
return $ Math mathType s'
semType x = return x
So, why bother implementing a pandoc filter rather than just defining some TeX macros? Two main reasons:
Surely you can write pandoc filters in any language as you like (officially Haskell and Python). Nothing like plain TeX (despite the fact that it’s Turingcomplete): filters are easier to implement, unrestricted in the way that they interact with structured documents, with a great deal of cool supporting libraries even to blow up the earth. Although here in this case, we are only using some nobrainer pattern matching to release our fingers from heavy math typing.
A proofofconcept of this is in semtype.hs. While undocumented, all tricks it does should be clear from the comments and regular expressions themselves.
]]>There seem to be some common misconceptions about “proof by contradiction”. Recently I came across Robert’s blog post on this issue, and couldn’t help reviewing it in my own way. I prefer a more formal treatment, nevertheless, without sacrificing its intuitive comprehensibility.
What is a “proof by contradiction”, exactly? When a classical mathematician (probably you) talks about proving by contradiction, they can mean either one of these two (syntactically) different things:
Both syntactic forms have some kind of reductio ad absurdum (reduction to / derivation of contradiction) argument. However, the first form provides an indirect proof for \(P\); this is what we call a genuine “proof by contradiction”. The second form provides a direct proof for \(\lnot P\) which is just the negation of \(P\); this is preferably called a “proof of negation”, as it’s not a proof of \(P\) itself, but rather a proof of the negation of \(P\).
But how are they any different? You may ask. In a classical setting, there is no semantic difference. Notice that in the first form (“proof by contradiction”), we could rewrite \(P\) as \(\lnot Q\), then we get
which is just the second form, i.e., a “proof of negation”. Likewise, if we rewrite \(P\) as \(\lnot Q\) in the second form, we get
which is just the first form, i.e., a “proof by contradiction”. That’s the very reason people often misconceive them – classically, a proof by contradiction and a proof of negation can be simply converted to the form of one another, without losing their semantic validity.
From a constructivist perspective, things are quite different. In the above rewritten forms, we introduced a new term \(\lnot Q := P\). For the rewrite in the 3rd form to be valid, the new assumption \(Q\) must be as strong as the original assumption \(\lnot P\), which is just \(\lnot \lnot Q\). Formally, there must be \(\lnot \lnot Q \implies Q\). For the rewrite in the 4th form to be valid, the new statement \(Q\) must be not any stronger than the original statement \(\lnot P\), so formally there must also be \(Q \implies \lnot \lnot Q\). In intuitionistic logic, although we can derive a “double negation introduction” (thus complete the rewrite in the 4th form), there is no way to derive a “double negation elimination” as required to get the 3rd form. So technically, while we can soundly rewrite from a “proof of negation” to a “proof by contradiction”, the other direction is impossible. Indeed, we must make a clear distinction between a “proof by contradiction” and a “proof of negation” here: Semantically, they are not even dual and should not be fused with each other.
Why is this distinction important? Because in intuitionistic logic, the second form (proof of negation) is a valid proof; the first form (proof by contradiction) is not. Take a look at the negation introduction rule: \[\lnot_\mathsf{I} : \frac{\Gamma, P \vdash \bot}{\Gamma \vdash \lnot P}\] which justifies the validity of “proof of negation”. However, there is no such rule saying that \[\mathsf{PBC} : \frac{\Gamma, \lnot P \vdash \bot}{\Gamma \vdash P}\] In classical logic, where a rule like \(\mathsf{PBC}\) is allowed, one can easily derive the double negation elimination which we begged for before: Given \(\Gamma \vdash \lnot \lnot P\), the only rule that ever introduces a negation is \(\lnot_\mathsf{I}\), so we must also have \(\Gamma, \lnot P \vdash \bot\). Then by \(\mathsf{PBC}\), we get a derivation of \(\Gamma \vdash P\), as desired. \[\mathsf{DNE} : \frac{\Gamma \vdash \lnot \lnot P}{\Gamma \vdash P}\] If we adopt \(\mathsf{PBC}\), then we will also have adopted \(\mathsf{DNE}\); if we have \(\mathsf{DNE}\), then it would be perfectly valid to rewrite a “proof by contradiction” into the form of a “proof of negation”, or the other way around, as is already shown before. Since constructivists do not want to accept rules like \(\mathsf{PBC}\) or \(\mathsf{DNE}\) at all, they claim that a “proof by contradiction” and a “proof of negation” are essentially different, in that the latter is a valid proof but the former is doubtful, while their distinction is purely syntactical for classical mathematicians as the semantic equivalence would be trivial with \(\mathsf{PBC}\) or \(\mathsf{DNE}\).
The rationale behind constructivists’ choice of ruling out indirect proofs by rejecting derivation rules like \(\mathsf{PBC}\) and \(\mathsf{DNE}\) comes into view when talking about firstorder theories, where existential quantifiers are used. Say, if we wish to prove that there exists some \(x\) with a property \(P\), we must specify an example \(t\) which makes this property holds: \[\exists_{\mathsf{I}_t} : \frac{\Gamma \vdash P[t/x]}{\Gamma \vdash \exists x : S.P}\]
(\(t\) is a term of sort \(S\))
This is something called a constructive proof, in the sense that in order to derive an existentialization, one must construct such a term explicitly, directly.
What if we allow \(\mathsf{PBC}\) in our proofs then? We will be given enough power to utilize an alternate approach: Assume to the contrary that for all \(x\) in \(S\), \(\lnot P\) holds. Then we derive a contradiction. Thus, by \(\mathsf{PBC}\), there must exist some \(x\) such that \(P\) holds (since \(\lnot (\exists x : S.P) \equiv \forall x : S.\lnot P\)). Formally, \[\frac{\Gamma, \forall x : S.\lnot P \vdash \bot}{\Gamma \vdash \exists x : S.P}\] Note that a term \(t\) is nowhere to be evident in this form of proof. The downside of this classical approach of existence proof is that it is nonconstructive, so even if you can derive a proof that some mathematical object exists, you can’t claim that you necessarily know what it is, since you have not concretely constructed such an object yet. Its existence is just “principally provable”, but not necessarily constructible or witnessable.
I would say it’s too much of a philosophical choice between classical logic and intuitionistic logic – at least for old school mathematicians who don’t practically mechanize their proofs at all. But one with some logic maturity should be able to draw a semantic distinction between a “proof by contradiction” that \(\vdash P\) and a “proof of negation” that \(\vdash \lnot P\), bewaring of how their treatments can diverge in nonclassical logic settings. It is still questionable to me whether every theorem provable in classical logic can be proved constructively, whatsoever, a constructive proof almost always makes more sense: If you claim that you have an apple, just show me the apple, instead of arguing to me sophistically that it can’t be the case that you do not have an apple.
[1] Andrzej Filinski, “Course Notes for Semantics and Types, Lecture 1: Logic”.
[2] Robert Harper, “A ‘proof by contradiction’ is not a proof that ends with a contradiction”. https://existentialtype.wordpress.com/2017/03/04/aproofbycontradictionisnotaproofthatderivesacontradiction/
[3] Andrej Bauer, “Proof of negation and proof by contradiction”. http://math.andrej.com/2010/03/29/proofofnegationandproofbycontradiction/
[4] Timothy Gowers, “When is proof by contradiction necessary?”. https://gowers.wordpress.com/2010/03/28/whenisproofbycontradictionnecessary/
[5] Terence Tao, “The ‘no selfdefeating object’ argument”. https://terrytao.wordpress.com/2009/11/05/thenoselfdefeatingobjectargument/
Still, I have thought of many fun things I could do with statistical learning: image classification and face recognition (I always want a photo management application to be like that! I take and collect so many photos each day), a CBIR tool (like “Search by Image” in Google Images, but works locally on my computer), a GIMP ScriptFu that does tricks like image synthesis via ConvNets, etc. The good part about applying learning methods in image processing is that, once you can extract feature descriptors, everything becomes statistically learnable (and you as a programmer don’t really need to have prior knowledge about what an object visually is: an apple, or a pen?).
Cryptography, from what I learned, is a differently fun story. For a perfectly secure encryption scheme: \[\Pr[M=m\,\,C=c] = \Pr[M=m]\] that is, knowing the ciphertext \(c\) will not update any attacker’s belief about whatever the underlying message \(m\) is. Even if you have a statistical training model, it cannot learn anything from purely observations of ciphertexts. But this unbreakable level of security comes with a price: The key space must expose substantial entropy that is as high as the message space, thus the key length can be no shorter than the message length (given by Shannon’s theorem). In practice, the messages we sent do not usually have the highest entropy possible, and we can safely assume that the attackers’ computation ability is bounded by polynomialtime algorithms, thus, we as the cryptosystem designers need only to make up schemes that are assumed to be unbreakable (i.e., breakable with only a negligible probability) for any polynomialtime attackers. As we don’t know yet if there actually are any polynomial unsolvable cases (e.g., is P ≠ NP?), the proof of security would eventually rely on some unproven computational hardness assumptions: oneway functions exist, integer factorization is hard, discrete logarithm is hard, etc. If one can construct a provably secure scheme, it is guaranteed that statistical cryptanalysis would be theoretically impossible within polynomial time (except for sidechannel attacks); of course, if the hardness assumption we made is proved invalid, then nothing but the onetime pad can be secure.
I might be writing one or two blog posts about cryptographic security from an informationtheoretic perspective and some basic cryptanalysis on insecure schemes, but now is the time to move on with my unfinished courses about logic and programming. Before that, I feel that I should add a little bit philosophy to my wiki so as to refresh my viewpoint and methodology. And here it is.
(Philosophy is a sophisticated, highly arguable subject, so pardon me if there’s any inconsistency with your textbook.)
We can talk about the philosophy of science (particularly, philosophy of mathematics and statistics) with the understanding of epistemology and logic: Are you a logician or a statistician? If a logician, does set theory or type theory suit you the best? If a statistician, are you a Bayesian or a frequentist?
(As a personally opinionated note, I often find myself subscribe to the skepticism the most. But that doesn’t mean that logical reasoning and statistical inference aren’t useful to me; they are. Extremely. So as not to go too far with this subjective topic, I’ll be focusing more on classical / modal logic in the next few weeks.)
Papers:
[1] E. L. Gettier, “Is justified true belief knowledge?” analysis, vol. 23, no. 6, pp. 121–123, 1963.
For random variables \(X\) and \(Y\), we define sets \(\tilde X\) and \(\tilde Y\). Then the information entropy \(\operatorname{H}\) may be viewed as a signed measure \(\mu\) over sets: \[\begin{align*} \operatorname{H}(X) &= \mu(\tilde X) \\ \operatorname{H}(Y) &= \mu(\tilde Y) \\ \operatorname{H}(X,Y) &= \mu(\tilde X \cup \tilde Y) \qquad \text{(Joint entropy is the measure of a set union)} \\ \operatorname{H}(X\,\,Y) &= \mu(\tilde X \setminus \tilde Y) \qquad \text{(Conditional entropy is the measure of a set difference)} \\ \operatorname{I}(X;Y) &= \mu(\tilde X \cap \tilde Y) \qquad \text{(Mutual information is the measure of a set intersection)} \end{align*}\] The inclusion–exclusion principle: \[\begin{align*} \operatorname{H}(X,Y) &= \operatorname{H}(X) + \operatorname{H}(Y)  \operatorname{I}(X;Y) \\ \mu(\tilde X \cup \tilde Y) &= \mu(\tilde X) + \mu(\tilde Y)  \mu(\tilde X \cap \tilde Y) \end{align*}\] Bayes’ theorem: \[\begin{align*} \operatorname{H}(X\,\,Y) &= \operatorname{H}(Y\,\,X) + \operatorname{H}(X)  \operatorname{H}(Y) \\ \mu(\tilde X \setminus \tilde Y) &= \mu(\tilde Y \setminus \tilde X) + \mu(\tilde X)  \mu(\tilde Y) \end{align*}\]
Absolute entropy (Shannon entropy) quantifies how much information is contained in some data. For data compression, the entropy gives the minimum size that is needed to reconstruct original data (losslessly). Assume that we want to store a random binary string of length \(\ell\) (by “random”, we do not have yet any prior knowledge on what data to be stored). Under the principle of maximum entropy, the entropy of its distribution \(p(x)\) should be maximized: \[\max \operatorname{H}(X) = \max \left\{ \sum_{x\in\mathcal{X}} p(x) \log p(x) \right\}\] given the only constraint \[\sum_{x\in\mathcal{X}} p(x) = 1\] Let \(\lambda\) be the Lagrange multiplier, set \[\mathcal{L} =  \sum_{x\in\mathcal{X}} p(x) \log p(x)  \lambda\left( \sum_{x\in\mathcal{X}} p(x)  1 \right)\] We get \[\begin{align*} \frac{\partial\mathcal{L}}{\partial x} = p(x)(\log p(x) + 1 + \lambda) &= 0 \\ \log p(x) &=  \lambda  1 \\ p(x) &= c \qquad \text{(constant)} \end{align*}\] That is, the discrete uniform distribution maximizes the entropy for a random string. Since \(\mathcal{X} = 2^\ell\), we have \(p(x) = 2^{\ell}\) and \(\operatorname{H}(X) = \sum_{x\in\mathcal{X}} 2^{\ell} \log_2 2^{\ell} = \ell\) (bits). We conclude that the information that can be represented in a \(\ell\)bit string is at most \(\ell\) bits. Some practical results include
Relative entropy (KL divergence) quantifies how much information diverges between two sets of data. For data differencing, the KL divergence gives the minimum patch size that is needed to reconstruct target data (with distribution \(p(x)\)) given source data (with distribution \(q(x)\)).
In particular, if \(p(x) = q(x)\), which means that the two distributions are identical, we have \(\operatorname{D}_\mathrm{KL}(p\q) = 0\). This follows our intuition that no information is gained or lost during data encoding/decoding. If \(p(x_0) = 0\) at \(x=x_0\), we take \(p(x) \log \frac{p(x)}{q(x)} = 0\), to justify the fact that the target data is trivial to reconstruct at this point, no matter how much information \(q(x)\) contains. However, if \(q(x_0) = 0\) at \(x=x_0\), we should take \(p(x) \log \frac{p(x)}{q(x)} = \infty\), so that the target data is impossible to reconstruct if we have only trivial \(q(x)\) at some point (unless \(p(x_0) = 0\)).
Lemma 4.1. (Gibbs’ inequality)^{2} The KL divergence is always nonnegative: \(\operatorname{D}_\mathrm{KL}(p\q) \geq 0\).
Informally, Lemma 4.1 simply states that in order to reconstruct target data from source data, either more information (\(\operatorname{D}_\mathrm{KL}(p\q) > 0\)) or no further information (\(\operatorname{D}_\mathrm{KL}(p\q) = 0\)) is needed.
Theorem 4.2. Normal distribution \(\mathcal{N}(\mu,\sigma^2)\) maximizes the differential entropy for given mean \(\mu\) and variance \(\sigma^2\).
Proof.^{3} Let \(g(x)\) be a pdf of the normal distribution with mean \(\mu\) and variance \(\sigma^2\). Let \(f(x)\) be an arbitrary pdf with the same mean and variance.
Consider the KL divergence between \(f(x)\) and \(g(x)\). By Lemma 4.1 (Gibbs’ inequality): \[\operatorname{D}_\mathrm{KL}(f\g) = \int_{\infty}^\infty f(x) \log \frac{f(x)}{g(x)} dx = \operatorname{H}(f,g)  \operatorname{H}(f) \geq 0\]
Notice that \[\begin{align*} \operatorname{H}(f,g) &=  \int_{\infty}^\infty f(x) \log g(x) dx \\ &=  \int_{\infty}^\infty f(x) \log \left( \frac{1}{\sqrt{2\pi\sigma^2}} e^{\frac{(x\mu)^2}{2\sigma^2}} \right) dx \\ &= \frac{1}{2} \left( \log(2\pi\sigma^2) + 1 \right) \\ &= \operatorname{H}(g) \end{align*}\] Therefore, \[\operatorname{H}(g) \geq \operatorname{H}(f)\] That is, the distribution of \(g(x)\) (Gaussian) always has the maximum entropy.
■
It is also possible to derive the normal distribution directly from the principle of maximum entropy, under the constraint such that \(\int_{\infty}^\infty (x\mu)^2f(x)dx = \sigma^2\).
The wellknown central limit theorem (CLT) which states that the sum of independent random variables \(\{X_1,\dots,X_n\}\) tends toward a normal distribution may be alternatively expressed as the monotonicity of the entropy of the normalized sum: \[\operatorname{H}\left( \frac{\sum_{i=1}^n X_i}{\sqrt{n}} \right)\] which is an increasing function of \(n\). [1]
Books:
T. M. Cover and J. A. Thomas. Elements of Information Theory, 2nd ed.
Papers:
[1] S. Artstein, K. Ball, F. Barthe, and A. Naor, “Solution of shannon’s problem on the monotonicity of entropy,” Journal of the American Mathematical Society, vol. 17, no. 4, pp. 975–982, 2004.
Generally, pseudorandom generators used in probabilistic algorithms yield random bits according to the uniform distribution, so it is worth mentioning:
Cryptographic schemes are defined as tuples of deterministic or probabilistic algorithms:
A brief, formalized overview of some classical ciphers, and their security:
Lessons learned from these classical ciphers: While perfect secrecy is easy to achieve (onetime pads), designing practical cryptographic schemes (with shorter keys, and computationally hard to break) can be difficult.
The construction of privatekey encryption schemes involves probabilistic algorithms. We simply assume that an unlimited supply of independent, unbiased random bits is available for these cryptographic algorithms. But in practice, this is a nontrivial issue, as the source of randomness must provide highentropy data so as to accommodate cryptographically secure random bits.
In the perfectly secret scheme of onetime pads, the key generation algorithm \(\mathsf{Gen}\) requires the access to a source of randomness in order to choose the uniformly random key \(k \in \mathcal{K}\). Practically, highentropy data may be collected via physical input or even fully written by hand with human labor.
Theoretically, without external intervention, we have:
Conjecture 3.1. Pseudorandom generators exist.
Theorem 3.2. (Pseudorandom generator theorem) Pseudorandom generators exist if and only if oneway functions exist.
Pseudorandomness is also a basic construction in CPAsecure encryption algorithms (\(\mathsf{Enc}\)), e.g., in stream ciphers and block ciphers.
So what is an acceptable level of pseudorandomness, if we are not sure whether such generators theoretically exist? Intuitively, if one cannot distinguish between a “pseudorandom” string (generated by a PRG) and a truly random string (chosen according to the uniform distribution), we have confidence that the PRG is a good one. Various statistical tests have been designed for testing the randomness of PRGs.
It holds true that:
Corollary 3.3. By redefining the key space, we can assume that any encryption scheme \(\Pi=(\mathsf{Gen},\mathsf{Enc},\mathsf{Dec})\) satisfies
If so, why do we still need probabilistic \(\mathsf{Enc}\) in CPAsecure encryptions? Can’t we just make \(\mathsf{Enc}\) deterministic while still being CPAsecure?
The first thing to realize is that chosenplaintext attacks are geared towards multiple encryptions (with the same secret key \(k\)), so when the adversary obtains a pair \((m_0, c_0)\) such that \(\Pr[C=c_0\,\,M=m_0] = 1\), the key is already leaked. (Recall that the adversary knows the deterministic algorithm \(\mathsf{Enc}_k\), thus reversing \(k\) from known \(m_0\) and \(c_0\) can be quite feasible; e.g., in a onetime pad, \(k = m_0 \oplus c_0\).) The only way to get around this is make \(\mathsf{Enc}_k\) probabilistic (constructed from a pseudorandom function), such that an adversary cannot reverse the key efficiently within polynomial time.
Note that perfect secrecy is not possible under CPA, since there is a small possibility that the adversary will reverse the key (by, for example, traversing an exponentially large lookup table of all random bits) and succeed in the further indistinguishability experiment with a slightly higher (but negligible) probability.
Onetime pad is one of the most (provably) secure encryption schemes, and its secrecy does not rely on any computational hardness assumptions. However, it requires that \(\mathcal{K} \geq \mathcal{M}\) (which in fact is a necessary condition for any perfectly secret scheme), thus its realworld use is limited.
The onetime key \(k\) (uniformly chosen from the key space \(\mathcal{K}\)) may not be simply reused in multiple encryptions. Assume that \(\mathcal{K} = \mathcal{M}\), for encryptions of \(n\) messages, the message space is expanded to size \(\mathcal{M}^n\), while the key space remains \(\mathcal{K}\), thus we have \(\mathcal{K} < \mathcal{M}^n\). Such a degraded scheme (manytime pad) is theoretically insecure and vulnerable to several practical cryptanalyses.
A historical exploit of the vulnerability of manytime pad occurred in the VENONA project, where the U.S. Army’s Signal Intelligence Service (later the NSA) aimed at decrypting messages sent by the USSR intelligence agencies (KGB) over a span of 4 decades. As the KGB mistakenly reused some portions of their onetime key codebook, the SIS was able to break a good amount of the messages.^{1}
Books:
J. Katz and Y. Lindell, Introduction to Modern Cryptography, 2nd ed.
Papers:
[1] C. E. Shannon, “Communication theory of secrecy systems,” Bell system technical journal, vol. 28, no. 4, pp. 656–715, 1949.
R. L. Benson, “The Venona story.” https://www.nsa.gov/about/cryptologicheritage/historicalfigurespublications/publications/coldwar/assets/files/venona_story.pdf↩
Intuitively, probability is a measure of uncertainty. Mathematically, probability is a realvalued function defined on a set of events in a probability space that satisfies measure properties such as countable additivity (or simply, measure on probability space).
Typically, a probability density function (pdf) or a probability mass function (pmf) determines a distribution in the probability space.
Example 2.1. Consider the wave function of a particle: \[\Psi(x,t)\] where \(x\) is position and \(t\) is time.
If the particle’s position is measured, its location cannot be determined but is described by a probability distribution: The probability that the particle is found in \([x, x+\Delta x]\) is \[\Delta\Pr = \Psi(x,t)^2 \Delta x\]
The square modulus of the wave function (which is realvalued, nonnegative) \[\left\Psi(x, t)\right^2 = {\Psi(x, t)}^{*}\Psi(x, t) = \rho(x, t)\] is interpreted as the pdf.
Since the particle must be found somewhere, we have the normalization condition: (by the assumption of unit measure) \[\int\limits_{\infty}^\infty \Psi(x,t)^2 dx = 1\]
Distributions are also called generalized functions in analysis. It expands the notion of functions to functions whose derivatives may not exist in the classical sense. Thus, it is not uncommon that many probability distributions cannot be described using classical (differentiable) functions. The Dirac delta function \(\delta\) (which is a generalized function) is often used to represent a discrete distribution, or a partially discrete, partially continuous distribution, using a pdf.
Bayes’ theorem forms the basis for Bayesian inference, which is an important method of statistical inference that updates the probability for a hypothesis as more evidence or information becomes available.
Hypotheses can also be fallacies. In Bayesian inference, if one can make the assumption that every event occurs independently and the probability is identically distributed throughout lasting trials, it is clear to see that some common beliefs are mistaken.
Gambler’s fallacy (Monte Carlo fallacy). If an outcome occurs more frequently than normal during some period, it will happen less frequently in the future; contrariwise, if an outcome happens less frequently than normal during some period, it will happen more frequently in the future. This is presumed to be a means of balancing nature.
Gambler’s fallacy is considered a fallacy if the probability of outcomes is known to be independently, identically distributed. Assume that the future (the probability of event \(A_2\)) has no effect on the past (the probability of event \(A_1\)), we have \(\Pr[A_1A_2] = \Pr[A_1]\). From Bayes’ theorem, it holds true that \[\Pr[A_2A_1] = \Pr[A_2]\] That is, past events should not increase or decrease our confidence in a future event.
Hothand fallacy. A person who has experienced success with a seemingly random event has a greater chance of further success in additional attempts. That is, if an outcome occurs more frequently than normal during some period, it will also happen frequently in the future.
If psychological factors can be excluded, then hothand fallacy is a fallacy caused by people’s confirmation bias. Like the gambler’s fallacy, if we can’t assume that the probability of outcomes is independently, identically distributed, we can’t simply conclude that this belief is mistaken.
Inverse gambler’s fallacy. If an unlikely outcome occurs, then the trials must have been repeated many times before.
Assume that the past (the probability of event \(A_1\)) has no effect on the future (the probability of event \(A_2\)), we have \(\Pr[A_2A_1] = \Pr[A_2]\). From Bayes’ theorem, it holds true that \[\Pr[A_1A_2] = \Pr[A_1]\] That is, our confidence in \(A_1\) should remain unchanged after we observe \(A_2\).
Fallacies of hasty generalization and slothful induction (law of small numbers). Informal fallacies reaching an inductive generalization based on insufficient evidence, or denying a reasonable conclusion of an inductive argument.
Statistically saying, sampling from a small group can lead to misbeliefs that fail to hold for the entire population, if hypothesis testing is not carefully conducted.
Theorem 2.2. (Law of large numbers) Let \(X_1, \dots, X_n\) be an infinite sequence of i.i.d. Lebesgue integrable random variables with fixed expectation \(\operatorname{E}[X_1] = \cdots = \operatorname{E}[X_n] = \mu\). Define the sample average \[\overline{X}_n = \frac{1}{n}(X_1 + \dots + X_n)\]
Chebyshev’s inequality provides an upper bound on the probability that a random variable deviates from its expected value. Thus, it may be used as a proof for the weak law of large numbers.
The expected value of a random variable \(X\): \[\operatorname{E}[X] = \sum_{x \in \mathcal{X}} x \Pr[X=x]\] While it seemingly gives an estimate on how people would “expect” a random variable to take its value, it can sometimes lead to counterintuitive results, as shown by the following paradox.
St. Petersburg Paradox.^{1} A casino offers a game of chance for a gambler to flip a fair coin until it comes up tails. The initial stake starts at \(2\) dollars and is doubled every time heads appears. The first time tails appears, the game ends and the gambler wins whatever is in the pot. Thus if the coin comes up tails the first time, the gambler wins \(2^1=2\) dollars, and the game ends. If the coin comes up heads, the coin is flipped again. If the coin comes up tails the second time, the gambler wins \(2^2=4\) dollars, and the game ends. If the coin comes up heads again, the coin is flipped again. If the coin comes up tails the third time, the gambler wins \(2^3=8\) dollars, and the game ends. So on and so like. Eventually the gambler wins \(2^k\) dollars, where \(k\) is the number of coin flips until tails appears. (It is easy to see that \(k\) satisfies the geometric distribution.) What would be a fair price to pay the casino for entering such a game? (Assume that there is no house edge)
\(k\)th coin flip  \(\Pr[\text{Tails}]\)  Stake  Expected payoff 

1  \(\frac{1}{2}\)  2  1 
2  \(\frac{1}{4}\)  4  1 
3  \(\frac{1}{8}\)  8  1 
4  \(\frac{1}{16}\)  16  1 
5  \(\frac{1}{32}\)  32  1 
…  …  …  … 
k  \((1/2)^k\)  \(2^k\)  1 
The price should be made equal to the expected value that a gambler wins the stake, which is \[\operatorname{E}[\text{Payoff}] = \sum_{k=1}^{+\infty} \left(\frac{1}{2}\right)^k \cdot 2^k = \sum_{k=1}^{+\infty} 1 = +\infty\]
If a rational gambler pays for entering a game if and only if its average payoff is larger than its price, then he would pay any price to enter this game (since the expected payoff of this game is infinitely large). But in reality, few of us are willing to pay even tens of dollars to enter such a game. What went wrong? Furthermore, if mathematical expectation does not reflect correctly what people expect from a game, how to quantify the “true” expectation?
The St. Petersburg paradox was initially stated by Nicolas Bernoulli in 1713. There are several proposed approaches for solving the paradox, including the expected utility theory with the hypothesis of diminishing marginal utility [1], and the cumulative prospect theory. However, none of them is purely probability theoretical, as they require the use of hypothesized economic/behavioral models.
In probability theory, the variance of a random variable \(X\) is defined as \[\operatorname{Var}(X) = \operatorname{E}[(X\mu)^2] = \frac{1}{N} \sum_{i=1}^N (X_i\bar{X})^2\]
In statistics, when calculating the sample variance in order to give an estimation of the population variance, and the population mean is unknown, Bessel’s correction^{2} (use of \(N1\) instead of \(N\)) is often preferred: \[s^2 = \frac{1}{N1} \sum_{i=1}^N (X_i\bar{X})^2\]
A few remarks and caveats:
Books:
M. Mitzenmacher and E. Upfal, Probability and Computing: Randomized Algorithms and Probabilistic Analysis.
M. Baron, Probability and Statistics for Computer Scientists, 2nd ed.
Articles:
[1] R. Martin, “The st. Petersburg paradox,” in The stanford encyclopedia of philosophy, Summer 2014., E. N. Zalta, Ed. https://plato.stanford.edu/archives/sum2014/entries/paradoxstpetersburg/; Metaphysics Research Lab, Stanford University, 2014.
Yes, it bugs me when I can’t restore my last open tabs and I want my old session so badly. Remembering last tabs, if I get the history right, was a feature first introduced by Google Chrome, and soon it started to play an indispensable part in my daily workflow. I’m a multitasker, but the computing resource of my laptop is very limited – Say, if I have a session in which I am working on a homework report, having loads of folders, web pages and editor buffers open and those can fill up gigabytes of RAM easily, then I realize that I will need to compile something really hardcore, or maybe just take a short rest and do some random surfing on the web, certainly I would rather close all those engrossing processes for the time being, hoping that they could continue with all the open tabs I left off.
It’s mainly four types of applications that account for socalled “work sessions” for me:
Terminals don’t take up a lot of memory, so I wouldn’t mind leaving them open. Typical browsers, including Chromium and Firefox, do support session resuming (and there are even extensions which allow you to save current tabs and recover them at any later time). Any decent text editor (or IDE) may also be configured to remember open tabs / sessions. After all, average file managers fail to meet my basic needs of productivity.
I’m on GNOME 3, but currently using the Caja file manager – ever since Nautilus 3.6 decided to remove two or three features I found important to me (compact mode, backspace navigation) and introduced an awkward, smug “searchwhatevershitasyoutype” feature.
File managers I’ve tried so far:
Among all above, I settled down with Caja, simply because there was no reasonably good alternative. Still, I’m wishing for something that can save session states for me. After doing a little research, I realized that:
Let’s take the issue into more technical details. On Caja (or other similarly GTK+/GLibbased file managers), one need to implement:
destroy
callback of the main GtkObject
, all last remaining session data (e.g., internal states about open tabs, windows) must be saved to disk. (after the main loop ends there’s no more chance to get this information back)With caja_application_get_session_data()
, making a quick workaround that enables Caja to save and restore a session is somewhat trivial labor; however, it seems Caja doesn’t record the correct (spatial) ordering of tabs in its session data – so I wouldn’t consider this as a satisfying solution to the issue, and I have no intent to send such an incomplete patch to Caja. Nevertheless, it’s better than nothing, and, if ordering of tabs really matters, it would be feasible to write a wrapper script that manipulates the XML file in $HOME/.config/caja/lastsession
.
And here goes the patch: (Applied to Caja 1.16.1; definitely UNWARRANTED)
title
, author
and date
) into a properly typeset PDF document:
$ pandoc src.md o out.pdf
However, Markdown is not TeX. Not even close. Once you need to have some bleeding edge control over the typesetting outcome, or perhaps just a little refinement on its LaTeX templating, you’ll soon notice that Pandoc has its quirks and gotchas. I’ve been utilizing Pandoc in all my serious academic writing (incl. homework reports) for years, ever since I gave up on learning more about the overwhelmingly sophisticated TeX ecosystem and turned to something that “just works”. Pandoc fits my needs well. And when it doesn’t, there’s almost always a workaround that achieves the same thing neatly. And this is what this writeup is mostly about.
default.latex
? Bad idea.You could, of course, modify the default template (default.latex
) provided by Pandoc, as long as you’re no stranger to LaTeX. In this way, you can achieve anything you want – in pure LaTeX.
$ pandoc template mydefault.latex src.md o out.pdf
There are, however, a few problems with this naïve approach:
I’m conservative about changing the templates. If it’s a general issue that needs to be fixed in the default template, sending a pull request to pandoctemplates might be a better idea. Of course, if there’s a certain submission format you have to stick with (given LaTeX templates for conference papers), then you will fall back on your own.
I wouldn’t claim that I know the best practice of using Pandoc, but there’s such a common idiom that cannot be overstressed: Separate presentation and content!
In the YAML front matter of src.md
(the main Markdown file you’re writing), put only things that matter to your potential readers:

title: Boilerplating Pandoc for Academic Writing
subtitle: or How I Learned to Stop Typesetting and Concentrate on the Math
author: Mort Yao
date: 17 November 2016
abstract: 
Lorem ipsum dolor sit amet, consectetur adipiscing elit,
sed do eiusmod tempor incididunt ut labore et dolore magna
aliqua. Ut enim ad minim veniam, quis nostrud exercitation
ullamco laboris nisi ut aliquip ex ea commodo consequat.

And in a separate YAML file (let’s call it default.yaml
), here goes the formatting stuff:

geometry: margin=1.5in
indent: true
headerincludes: 
\usepackage{tcolorbox}
\newcommand\qed{\hfill\rule{1em}{1em}}

Above is my personal default, and it’s worth a few words to explain:
geometry
is where you control the geometric settings of your document. For example, you may narrow down the page margin to margin=1.5in
, and this is equivalent to raw LaTeX:\usepackage[margin=1.5in]{geometry}
indent
to any value other than false
if paragraph indentation is desired. (And it is often desired in formal publications.)headerincludes
is where you define your own macros, configure existing ones, or claim \usepackage
in case you want to use a package not enabled by Pandoc (e.g., tcolorbox
). Although you might as well define those in other places (e.g., in the content of a Markdown file), don’t do that.
\newcommand\qed{\hfill\rule{1em}{1em}}
is my favorite of all time. It doesn’t require the amsthm
package.With a separate default.yaml
, now here we are:
$ pandoc default.yaml src.md o out.pdf
headerincludes
You might have already noticed that the subtitle
field won’t display in the produced PDF file. As far as I’m concerned (in Pandoc 1.18), this is the expected behavior. See here in README:
To make
subtitle
work with other LaTeX document classes, you can add the following toheaderincludes
:
\providecommand{\subtitle}[1]{%\usepackage{titling}\posttitle{%\par\large#1\end{center}}}
Unfortunately, this won’t work (until Issue #2139 is resolved) since Pandoc parses the headerincludes
metadata field as Markdown, and the bracketed [1]
is misinterpreted as literals rather than a part of LaTeX control sequence. So the workaround is: Instead of embedding headerincludes
as a metadata field in YAML, we should separate it into another file for this dedicated purpose (it’s simply raw LaTeX anyway), and include it using includeinheader/H
:
$ pandoc H header.tex default.yaml src.md o out.pdf
Note that you can’t have two headerincludes
for one document. So the headerincludes
field specified in YAML metadata will be overridden by the content of header.tex
.
While the Markdown syntax for citing is rather easy ([@id]
), it takes effort to make things right, especially if you have a certain preferred citation format (APA, MLA, Chicago, IEEE, etc.).
The suggestion is: Use pandocciteproc. Once you have a list of references you’re interested in, you need two things to typeset those nicely in your document:
.csl
), to specify the citation format you want to use.
.bib
), which is a list of all entries you might cite.
As part of the YAML metadata: (Assume you have ieee.csl
and references.bib
)
csl: ieee.csl
bibliography: references.bib
Using pandocciteproc
as a filter, generate the document with citations:
$ pandoc filter pandocciteproc H header.tex default.yaml src.md o out.pdf
The list of references is appended to the end of the document. It is often desirable to give the references an obvious title (“References”), start from a new page and avoid any further indentation, so the following comes in the end of the Markdown source:
\newpage
\setlength\parindent{0pt}
# References
Basically, we need 5 files in total:
src.md
(Markdown + possibly LaTeX mixed format): Main text.references.bib
(BibTeX/BibLaTeX format): List of references.default.yaml
(YAML format): Formatrelated metadata.header.tex
(LaTeX format): Content of headerincludes
; package imports and macro definitions.ieee.csl
(CSL XML format): Citation style.And one command:
$ pandoc filter pandocciteproc H header.tex default.yaml src.md o out.pdf
amsthm
?Pandoc doesn’t provide native support for amsthm
(and I wonder if there will ever be). You can still have the same thing in Pandoc Markdown:
\newtheorem{definition}{Definition}
\begin{definition}
Man is a rational animal.
\end{definition}
However, everything in between \begin
and \end
will be treated as raw LaTeX, and the expressiveness of Markdown is lost there. More importantly, this is purely a LaTeXspecific thing, so there’s no way for Pandoc to convert this to HTML or any other format (unless you have a filter that does the trick). Consequently, I tend to write all definitions / theorems (lemmas, claims, corollaries, propositions…) in simple Markdown:
**Definition 1.** *Man is a rational animal.*
It does have some advantages over amsthm
:
amsthm
, you cannot see the numbering of each theorem (definition, etc.) in the text editor (well, you can’t without a dedicated plugin at least). This is inconvenient when you need to refer to a prior one later. By numbering them explicitly, you can clearly see these ordinals in the Markdown source.This also has some drawbacks compared to using amsthm
, though:
(Probably) The best solution is to write a filter that (conventionally) converts any plain text like Definition 1
(and Lemma 2
, Theorem 3
, etc.) in the beginning of a paragraph to proper Markdown (for HTML target) or corresponding amsthm
block (for LaTeX target). Even better, it should be able to do crossreferences accordingly (Remember Lemma 10.42
? Let’s put an anchored link on that!). This is yet to be done, but would be very helpful to someone who does a lot of theorems and proofs thus wants to avoid the kludge of mixing raw LaTeX with semantically evident Markdown.
This is an expository reading summary of a selected CRYPTO 2015 paper I did as an assignment in KU’s Introduction to Modern Cryptography course. Adversarialresilient Bloom filters are the counterparts of cryptographically secure hash functions in an adversarial setting, where adaptive adversaries that have access to a deterministic or nondeterministic query oracle may challenge the data structure in a way that intentionally increases the false positive rate of querying. As a preliminary result, this paper shows that the resistance of Bloom filters against computationally bounded adversaries requires that oneway functions exist; furthermore, such constructions are possible using pseudorandom permutations. I do find the proof a bit involved, but the notions of security introduced for Bloom filters are new and appealing (which I haven’t read previously anywhere else).
Original paper:
Abstract
Bloom filter is a hashbased probabilistic data structure which is spaceefficient for set membership querying, with a small probability of false positives. Naor and Yogev’s 2015 paper introduces the adversarial model and formally proposes a strong notion of security for Bloom filters, i.e., adversarial resilience, based on an adversarial game under a cryptographic setting. This paper also discusses the correspondence between adversarialresilient Bloom filters and the open assumption that oneway functions exist, thus enables theoretical constructions using pseudorandom permutations. We believe that such an understanding will help design practical Bloom filters that are safe from known attacks in software systems.
Probabilistic data structures are data structures that employ randomness in their designs to enable more efficient approaches of storing and querying data, compared to deterministic ones. Traditionally, the algorithmic probabilistic analysis of such data structures assumes the model where all inputs and queries are independent of the internal randomness of data structures. In this work, we consider an adversarial environment, where a computationally bounded adversary^{1} may adaptively chooses inputs and queries with the intention of degrading the efficiency of the underlying data structure of some computer system. By introducing the adversarial model, we analyze the behavior of such data representations under the cryptographic notion of computational security against adversaries; furthermore, it enables us to construct more efficient, provably secure schemes of probabilistic data structures.
As a concrete example, a Bloom filter is a probabilistic data structure that holds a set \(S\) of elements approximately, using significantly fewer bits of storage and allowing for faster access than a complete representation. As a tradeoff between efficiency and preciseness, for any query of \(x \in S\), a Bloom filter always outputs a yesanswer, and for any query of \(x \not\in S\), it should output a yesanswer only with small probability. In other words, a noanswer given by a Bloom filter indicates unambiguously that \(x \not\in S\), while a yesanswer indicates that \(x \in S\) probably holds^{2}, that is, it allows false positives. Ideally, the error probability that a Bloom filter returns a false positive should be as small as possible.
Approaching the commonlyseen set membership problem, Bloom filters have been implemented widely in realworld applications, specifically as internal data representations for optimizing largescale software systems. For example, Akamai’s CDN^{3} servers maintain Bloom filters in their memories to decide whether to lookup the disk cache for a requested resource, and a false positive of the Bloom filter causes a cache miss, which means that the server has to make an unnecessary disk lookup at an expense of time and system workload; if an attacker exploits the behaviors of the Bloom filter, it is possible for them to cast queries that degrade the disk cache hit rate of the CDN servers, and consequently, perform a DenialofService (DoS) attack.[1] On another scenario, where Bitcoin clients apply Bloom filters in the Simplified Payment Verification (SPV) mode to increase the overall performance of wallet synchronization, an adversary may perform a DoS attack on an SPV node by learning from the responses of Bloom filters they have access to.[2]
As discussed above, the adversarial model addresses some security issues, thus the necessity of defining security in adversarial environments and constructing provably secure Bloom filters arises. Essentially, it is desirable for a wellconstructed Bloom filter to maintain its small error probability in an adversarial environment; we say that such a Bloom filter is adversarial resilient (or just resilient). In an adversarial game, where an adversary has oracle access to the Bloom filter and is allowed to make a number of \(t\) queries before it outputs a certain \(x^*\) (that has not been queried before) which is believed to be a false positive, and if it is, the adversary wins the game. We say that a Bloom filter is \((n,t,\varepsilon)\)adversarial resilient if when initialized over sets of size \(n\) then after \(t\) queries the probability of \(x^*\) being a false positive is at most \(\varepsilon\). A Bloom filter that is resilient for any polynomially many queries is said to be strongly resilient.
Clearly, a trivial construction of a strongly resilient Bloom filter would be a deterministic lookup table that stores \(S\) precisely, so that there is no false positive which an adversary can find. However, such a construction does not take advantage of the space and time efficiency as a normal Bloom filter would do, since it stores every element in the memory. In the following, we consider only nontrivial Bloom filters, and we show that for a nontrivial Bloom filter to be adversarialresilient, oneway functions must exist; that is, if oneway functions do not exist, then any nontrivial Bloom filter can be attacked with a nonnegligible probability by an efficient adversary. Furthermore, under the assumption that oneway functions exist, a pseudorandom permutation (PRP) can be used to construct a strongly resilient Bloom filter which has a reasonable memory consumption.
The construction of a Bloom filter consists of two algorithms: an initialization algorithm that gets a set \(S\) and outputs a memoryefficient representation of \(S\); a query algorithm that gets a representation of \(S\) and an \(x\) to be checked and outputs \(1\) if \(x \in S\), otherwise \(0\). Typically, the initialization algorithm is randomized but the query algorithm is deterministic, that is, a query operation does not amend the existing representation. We say that such a Bloom filter has a steady representation.^{4}
In the following model, we consider a universal set \(U\) and a subset \(S \subset U\) to be stored in a Bloom filter. We denote that \(u=U\) and \(n=S\).
Definition 1. (Steadyrepresentation Bloom filter) Let \(\mathbf{B}=(\mathbf{B}_1,\mathbf{B}_2)\) be a pair of polynomialtime algorithms, where \(\mathbf{B}_1\) is a randomized algorithm that gets as input a set \(S\) and outputs a representation \(M\), and \(\mathbf{B}_2\) is a deterministic algorithm that gets as input a representation \(M\) and a query element \(x \in U\). We say that \(\mathbf{B}\) is an \((n,\varepsilon)\)Bloom filter (with a steady representation) if for any set \(S \subset U\) of size \(n\) it holds that:
\(\forall x \in S, \Pr[\mathbf{B}_2(\mathbf{B}_1(S), x) = 1] = 1\) (Completeness)
\(\forall x \not\in S, \Pr[\mathbf{B}_2(\mathbf{B}_1(S), x) = 1] \leq \varepsilon\) (Soundness)
where the probability is taken over the randomness used by the algorithm \(\mathbf{B}_1\).
Intuitively, the first property (completeness) says that for all elements in the set \(S\), the Bloom filter is guaranteed to output a yesanswer correctly; the second property (soundness) gives the upper bound that the Bloom filter outputs a false positive, that is, the query algorithm returns \(1\) when an element does not actually belong to the set \(S\). Formally,
False positive and error rate. Given a representation \(M\) of \(S\), if \(x \not\in S\) and \(\mathbf{B}_2(M,x)=1\), we say that \(x\) is a false positive. And we say that the probability bound \(\varepsilon\) of outputting false positives is the error rate of \(\mathbf{B}\).
In an adversarial environment, consider the following experiment for any Bloom filter \(\mathbf{B}=(\mathbf{B}_1,\mathbf{B}_2)\), adversary \(\mathcal{A}\), value \(t\) as the bound of the amount of queries which \(\mathcal{A}\) can make, and value \(\lambda\) as the security parameter.
The Bloom filter resilience challenge experiment \(\mathsf{Challenge}_{\mathcal{A},\mathbf{B},t}(\lambda)\):
Definition 2. (Adversarialresilient Bloom filter) Let \(\mathbf{B}=(\mathbf{B}_1,\mathbf{B}_2)\) be an \((n,\varepsilon)\)Bloom filter (with a steady representation). We say that \(\mathbf{B}\) is an \((n,t,\varepsilon)\)adversarial resilient Bloom filter if for any set \(S\) of size \(n\), for all sufficiently large \(\lambda \in \mathbb{N}\) and for all probabilistic polynomialtime adversaries \(\mathcal{A}\), it holds that \[\Pr[\mathsf{Challenge}_{\mathcal{A},\mathbf{B},t}(\lambda) = 1] \leq \varepsilon\]
where the probability is taken over the randomness used by the algorithm \(\mathbf{B}_1\) and \(\mathcal{A}\).
To define the nontriviality of a Bloom filter formally, notice that it is always desirable to minimize the memory use of the Bloom filter. Let \(\mathbf{B}\) be an \((n,\varepsilon)\)Bloom filter that uses \(m\) bits of memory. It is shown[3] that the lower bound of \(m\) is \(m \geq n \log \frac{1}{\varepsilon}\). Thus, we define
Definition 3. (Minimal error) Let \(\mathbf{B}\) be an \((n,\varepsilon)\)Bloom filter. We say that \(\varepsilon_0 = 2^{\frac{m}{n}}\) is the minimal error of \(\mathbf{B}\).
As mentioned previously, a trivial construction of Bloom filters is a lookup table that stores \(S\) precisely, in which case, the memory use \(m=\log \binom{u}{n} \approx n \log(\frac{u}{n})\), thus by using the bound \(m \geq n \log \frac{1}{\varepsilon}\), a construction is trivial if \(\varepsilon > \frac{n}{u}\). On the other hand, if \(u\) is superpolynomial in \(n\), then \(\varepsilon\) is negligible in \(n\) and every polynomialtime adversary has only negligible probability to find any false positive, therefore for such \(S \subset U\), the Bloom filter must be trivial. Notice that \(\varepsilon_o \leq \varepsilon\), we then define
Definition 4. (Nontrivial Bloom filter) Let \(\mathbf{B}\) be an \((n,\varepsilon)\)Bloom filter and let \(\varepsilon_0\) be the minimal error of \(\mathbf{B}\). We say that \(\mathbf{B}\) is nontrivial if there exists a constant \(c \geq 1\) such that \(\varepsilon_0 > \max\{\frac{n}{u},\frac{1}{n^c}\}\).
We now show that the existence of adversarial resilient Bloom filters depends on the existence of oneway functions, that is, if any nontrivial, strongly resilient Bloom filter exists, then oneway functions also exist.
Theorem 5. Let \(\mathbf{B}=(\mathbf{B}_1,\mathbf{B}_2)\) be any nontrivial Bloom filter (with a steady representation) of \(n\) elements that uses \(m\) bits of memory, and let \(\varepsilon_0\) be the minimal error of \(\mathbf{B}\). If \(\mathbf{B}\) is \((n,t,\varepsilon)\)adversarial resilient for \(t=\mathcal{O}(\frac{m}{\varepsilon_0^2})\), then oneway functions exist.
Proof. First we assume that oneway functions do not exist, then we show that we can construct a polynomialtime adversary \(\mathcal{A}\) such that \[\Pr[\mathsf{Challenge}_{\mathcal{A},\mathbf{B},t}(\lambda) = 1] > \varepsilon\] given a fixed value \(\varepsilon\). That is, \(\mathbf{B}\) cannot be \((n,t,\varepsilon)\)adversarial resilient.
Define the following function \(f\): \[f(S,r,x_1,\dots,x_t) = (x_1,\dots,x_t,\mathbf{B}_2(M,x_1),\dots,\mathbf{B}_2(M,x_t))\] where \(S\) is a set of size \(n\), \(r\) is the number of bits used by the randomness of \(\mathbf{B}_1\), \(M\) is a representation of \(S\) generated by \(\mathbf{B}_1\), and \(t=\frac{200m}{\varepsilon_0}\). Clearly, \(f\) is polynomialtime computable.
Since \(f\) is not a oneway function (under the assumption that oneway functions do not exist), there is also an algorithm that can invert \(f\) efficiently. Thus we have,
Claim 6. Assume that oneway functions do not exist, there exists a polynomialtime algorithm \(\mathcal{A}\) that inverts \(f\) with a failure probability of at most \(\frac{1}{100}\): \[\Pr[f(\mathcal{A}(f(S,r,x_1,\dots,x_t))) \neq f(S,r,x_1,\dots,x_t)] < \frac{1}{100}\]
Proof. Because \(f\) is not a oneway function, there exists[4] an algorithm \(\mathcal{A}'\) such that \(\Pr[\mathsf{Invert}_{\mathcal{A}',f}(n) = 1] \geq \frac{1}{p(n)}\), where \(p(n)\) is polynomial in \(n\). Construct an algorithm \(\mathcal{A}\) that runs \(\mathcal{A}'\) individually for \(\lceil\frac{\log 100}{\log(p(n))  \log(p(n)1)}\rceil\) times, so we have the total failure probability \(\Pr[f(\mathcal{A}(f(S,r,x_1,\dots,x_t))) \neq f(S,r,x_1,\dots,x_t)] < \left(1\frac{1}{p(n)}\right)^{\lceil\frac{\log 100}{\log(p(n))  \log(p(n)1)}\rceil} \leq \frac{1}{100}\)
■
Using \(\mathcal{A}\), construct the following probabilistic polynomialtime algorithm \(\mathsf{Attack}\):
The Algorithm \(\mathsf{Attack}\)
\(\mathsf{Attack}\) is given oracle access to the query algorithm \(\mathbf{B}_2(M,\cdot)\), and gets \(1^\lambda\) as input.
 For \(i \in \{1,\dots,t\}\), sample \(x_i \in U\) uniformly, and query \(y_i = \mathbf{B}_2(M,x_i)\).
 Run \(\mathcal{A}\) (the inverter of \(f\)) and get \((S',r',x_1,\dots,x_t) \leftarrow \mathcal{A}(x_1,\dots,x_t,y_1,\dots,y_t)\).
 Compute \(M' \overset{r'}{\leftarrow} \mathbf{B}_1(1^\lambda,S')\), using \(r'\) as the randomness bits in the initialization.
 For \(k=1,\dots,\frac{100}{\varepsilon_0}\), do:
 Sample \(x^* \in U\) uniformly.
 If \(\mathbf{B}_2(M',x^*)=1\) and \(x^* \not\in \{x_1,\dots,x_t\}\), output \(x^*\) and HALT.
 Sample \(x^* \in U\) uniformly, and output \(x^*\).
Claim 7. Assume that \(\mathcal{A}\) inverts \(f\) successfully. For any representation \(M\), the probability such that there exists a representation \(M'\) that for \(i \in \{1,\dots,t\}\), \(\mathbf{B}_2(M,x_i)=\mathbf{B}_2(M',x_i)\), and that the error rate \(\Pr[\mathbf{B}_2(M,x) \neq \mathbf{B}_2(M',x)] > \frac{\varepsilon_0}{100}\) is at most \(\frac{1}{100}\).
Proof. From the error rate of any \(x\) and the independence of the choice of \(x_i\), we get \[\Pr[\forall i \in \{1,\dots,t\} : \mathbf{B}_2(M,x_i) = \mathbf{B}_2(M',x_i)] \leq \left(1  \frac{\varepsilon_0}{100}\right)^t\]
Since the Bloom filter uses \(m\) bits of memory, there are \(2^m\) possible representations as candidates for \(M'\). Thus, by union bound, \[\Pr[\exists M' \ \forall i \in \{1,\dots,t\} : \mathbf{B}_2(M,x_i) = \mathbf{B}_2(M',x_i)] \leq 2^m \left(1  \frac{\varepsilon_0}{100}\right)^t \leq \frac{1}{100}\]
Since \(\mathcal{A}\) is assumed to invert \(f\) successfully, it must output a representation \(M'\) such that for \(i \in \{1,\dots,t\}\), \(\mathbf{B}_2(M,x_i)=\mathbf{B}_2(M',x_i)\). Therefore, the above bound holds.
■
Define \(\mu(M)=\Pr_{x \in U}[\mathbf{B}_2(M,x)=1]\) as the positive rate over \(U\), we now show that for almost all possible representations \(M\) generated from set \(S\) and randomness \(r\), it holds true that \(\mu(M) > \frac{\varepsilon_0}{8}\), with only a negligible probability of error:
Claim 8. \(\Pr_S[\exists r : \mu(M_r^S) \leq \frac{\varepsilon}{8}] \leq 2^{n}\).
Proof. Let \(\mathsf{BAD}\) be the set of all sets \(S\) such that there exists an \(r\) such that \(\mu(M_r^S) \leq \frac{\varepsilon_0}{8}\). Given \(S \in \mathsf{BAD}\), let \(\hat{S}\) be the set of all elements \(x\) such that \(\mathbf{B}_2(M_r^S,x)=1\), then \(\hat{S} \leq \frac{\varepsilon_0}{8} \cdot u\). Notice that we can encode the set \(S\) using the representation \(M_r^S\) while specifying \(S\) from all subsets of \(\hat{S}\) of size \(n\), and the encoding bits must be no less than \(\log\mathsf{BAD}\) (which is the number of bits required to encode \(\mathsf{BAD}\)): \[\log\mathsf{BAD} \leq m + \log\binom{\frac{\varepsilon_0 u}{8}}{n} \leq m + n \log\left(\frac{\varepsilon_0 u}{8}\right)  n \log n + 2n \leq n + \log\binom{u}{n} \] thus \(\mathsf{BAD} \leq 2^{n}\binom{u}{n}\). Since the number of sets \(S\) is \(\binom{u}{n}\), \(\Pr_S[\exists r : \mu(M_r^S) \leq \frac{\varepsilon}{8}] \leq 2^{n}\).
■
Claim 9. Assume that \(\Pr[\mathbf{B}_2(M,x) \neq \mathbf{B}_2(M',x)] \leq \frac{\varepsilon_0}{100}\) and that \(\mu(M) > \frac{\varepsilon_0}{8}\). The probability that \(\mathsf{Attack}\) does not halt on Step 4 is at most \(\frac{1}{100}\).
Proof. It follows directly from the assumptions that \(\mu(M') > \frac{\varepsilon_0}{8}  \frac{\varepsilon_0}{100} > \frac{\varepsilon_0}{10}\). For convenience, let \(\mathcal{X} = \{x_1,\dots,x_t\}\) and \(\hat{S'} = \{x : \mathbf{B}_2(M',x)=1\}\). We have that
\[E[\hat{S'} \cap \mathcal{X}] = t \cdot \mu(M') > \frac{200m}{\varepsilon_0} \cdot \frac{\varepsilon_0}{10} = 20m\]
By Chernoff bound with a probability of at least \((1e^{\Omega(m)})\) we have that \(\hat{S'} \cap \mathcal{X} < 40m\), \[\hat{S'} \backslash \mathcal{X}=\hat{S'}\hat{S'} \cap \mathcal{X} > \hat{S'}  40m \geq \frac{\varepsilon_0 u}{10}  40m\]
Case 1. \(u=n^d\) (\(d\) is a constant). We show that \(\hat{S'} \backslash \mathcal{X} \geq 1\), so that an exhaustive search over the universal set \(U\) is efficient and guaranteed to find an element \(x^*\) in \(\hat{S'} \backslash \mathcal{X}\). Let \(c\) be a constant such that \(\varepsilon_0 > \frac{1}{n^c}\), \(\varepsilon_0 < \frac{1}{n^{c1}}\). We have \(\frac{u}{n} = n^{d1} \geq \frac{1}{\varepsilon_0} > n^{c1}\), then \(dc>1\). Moreover, \(m \leq n \log\frac{u}{n} \leq nd \log n\), thus, \[\hat{S'} \backslash \mathcal{X} \geq \frac{\varepsilon_0 u}{10}  40m \geq \frac{n^{dc}}{10}  40nd \log n > 1\]
Case 2. \(u\) is superpolynomial in \(n\). We show that the fraction of \(\hat{S'} \backslash \mathcal{X}\) is large enough so that sampling can find an \(x^*\), with only a small failure probability. Since \(\frac{\varepsilon_0}{20}\) is polynomial in \(\frac{1}{n}\) but \(\frac{40m}{u} \leq \frac{40n \log u}{u}\) is negligible, we get that \(\frac{\varepsilon_0}{20} > \frac{40m}{u}\). It follow from \(\frac{\hat{S'} \backslash \mathcal{X}}{u} > \frac{\varepsilon_0}{10}  \frac{40m}{u}\) that \(\frac{\hat{S'} \backslash \mathcal{X}}{u} > \frac{\varepsilon_0}{20}\). Thus, the probability of sampling \(x \not\in \hat{S'} \backslash \mathcal{X}\) in all \(k\) attempts is bounded by \[\left(1\frac{\varepsilon_0}{20}\right)^k = \left(1\frac{\varepsilon_0}{20}\right)^\frac{100}{\varepsilon_0} < \frac{1}{100} \]
In both cases, the probability that \(\mathsf{Attack}\) fails to find \(x^*\) and halt on Step 4 is less than \(\frac{1}{100}\).
■
Claim 10. \(\Pr[\mathbf{B}(M',x^*)=1; \mathbf{B}_2(M,x^*)=0] \leq \frac{1}{100}\)
Proof. This follows from the assumption that \(\Pr[\mathbf{B}_2(M,x) \neq \mathbf{B}_2(M',x)] \leq \frac{\varepsilon}{100}\).
■
Consider Claim 6, 7, 8 & 10 which cover all cases that \(\mathsf{Attack}\) fails, each happens only if the respective assumptions hold, so they provide an upper bound of failure probability. Taking a union bound, we have the total failure probability is at most \(\frac{4}{100}\). Thus, we have constructed a polynomialtime adversary \(\mathsf{Attack}\) such that \[\Pr[\mathsf{Challenge}_{\mathsf{Attack},\mathbf{B},t}(\lambda) = 1] > \varepsilon \geq 1\frac{4}{100}\] therefore \(\mathbf{B}\) cannot be \((n,t,\varepsilon)\)adversarial resilient, which is a contradiction implying that such adversarial resilient Bloom filters do not exist, under the assumption that oneway functions do not exist. By modus tollens, we have that if nontrivial \((n,t,\varepsilon)\)adversarial resilient Bloom filters exist, then oneway functions exist.
■
In Theorem 5 we showed that the existence of adversarial resilient Bloom filters implies that oneway functions exist. Furthermore, assume that adversarial resilient Bloom filters exist (thus oneway functions exist), it can be shown that pseudorandom permutations may be used to construct nontrivial, strongly adversarial resilient Bloom filters. We have
Proposition 11. (Construction using pseudorandom permutations)^{7} Let \(\mathbf{B}\) be an \((n,\varepsilon)\)Bloom filter using \(m\) bits of memory. If pseudorandom permutations exist, then for any security parameter \(\lambda\), there exists an \((n,\varepsilon+\mathsf{negl}(\lambda))\)strongly resilient Bloom filter that uses \(m'=m+\lambda\) bits of memory.
Unsteady representation and computationally unbounded adversary. The above discussion about Bloom filters assumes that steady representation is used, that is, the query algorithm \(\mathbf{B}_2\) is deterministic. Some implementations allow \(\mathbf{B}_2\) to change the internal representation thus querying can also be probabilistic. Further results regarding unsteady representations may be found in [5], with the ACD^{8} framework proposed in [6]. Moreover, some results are shown to hold even for computationally unbounded adversaries.
Bloom filters and secrecy. Bloom filters, like hash functions, are not designed as encryption schemes. Thus, even adversarial resilient Bloom filters may leak considerable information in an unintended way. As a concrete example, in Bitcoin lightweight SPV clients which rely on Bloom filters to store users’ Bitcoin addresses, an adversary can efficiently distinguish information about these addresses. See [7] for a discussion on this interesting scenario.
[1] B. M. Maggs and R. K. Sitaraman, “Algorithmic nuggets in content delivery,” ACM SIGCOMM Computer Communication Review, vol. 45, no. 3, pp. 52–66, 2015.
[2] R. Benjamin and E. K. Yasmine, “Attacks on bitcoin,” 2015.
[3] L. Carter, R. Floyd, J. Gill, G. Markowsky, and M. Wegman, “Exact and approximate membership testers,” in Proceedings of the tenth annual acm symposium on theory of computing, 1978, pp. 59–65.
[4] J. Katz and Y. Lindell, “Introduction to modern cryptography,” 2014.
[5] M. Naor and E. Yogev, “Bloom filters in adversarial environments,” in Annual cryptology conference, 2015, pp. 565–584.
[6] M. Naor and G. N. Rothblum, “Learning to impersonate,” in Proceedings of the 23rd international conference on machine learning, 2006, pp. 649–656.
[7] A. Gervais, S. Capkun, G. O. Karame, and D. Gruber, “On the privacy provisions of bloom filters in lightweight bitcoin clients,” in Proceedings of the 30th annual computer security applications conference, 2014, pp. 326–335.
We will only consider polynomialtime adversaries here.↩
For this reason, a yesanswer is also referred to as a maybeanswer in some texts.↩
A CDN (Content Delivery Network) is a globally distributed network of proxy servers deployed in multiple data centers, in order to serve cached content to endusers with high availability and high performance.↩
In this work, we will consider only steady representations.↩
The security parameter \(\lambda\) is implied in the initialization algorithm \(\mathbf{B}_1\), thus we denote it as \(\mathbf{B}_1(1^\lambda,S)\).↩
To strengthen our definition, we assume that the adversary also gets the set \(S\), and it is important that the adversary can hardly find any false positive even if given \(S\).↩
A proof of this can be found in [5].↩
Learning model of adaptively changing distributions (ACD).↩
(20 Nov 2016) Correction: P ≠ NP is not sufficient to imply that oneway functions exist. See P versus NP problem and oneway functions.
Intro. Starting from November, I’ll summarize my study notes on wiki into weekly blog posts. I always wanted to keep my study progress on track; I feel that it’s hard even to convince myself without visible footprints.
So here we have the first episode. (Hopefully it won’t be the last one)
Asymptotic notation is an important tool in analyzing the time/space efficiency of algorithms.
For decision problems, we now formally define the time complexity classes P and NP, and propose the hardness of NPcomplete problems, which plays an indispensable role in the study of algorithm design and modern cryptography.
Computational hardness assumption P ≠ NP. Although it is still an open proposition, many believe that P ≠ NP. Notably, if P ≠ NP holds true,
(I stand corrected: There is no such a known proof showing that P ≠ NP implies the existence of oneway functions. However, reversely, the existence of oneway functions implies that P ≠ NP. There is an informal argument given by Peter Shor on StackExchange^{1}, rephrased in the section P versus NP problem and oneway functions.)
Later we will cover the notion of security in cryptography, so there is a refresher of basic probability: (Probability is also considerably used in analyzing the behaviors of nondeterministic algorithms, hash functions, etc.)
Plan for next week:
Consider the following map: \[f : (x, r) \to s\] where \(x\) is an arbitrary bit string, \(r\) is a string of random bits, and \(s\) is an instance of a \(k\)SAT problem having \(x\) as a planted solution, while the randomness of \(r\) determines uniquely which \(k\)SAT problem to choose.
If we can invert the above function \(f\) (in polynomial time), we must already have solved the corresponding \(k\)SAT problem \(s\) with a planted solution \(x\). \(k\)SAT problems are known to be NPcomplete, and inverting such a function would be as hard as solving a \(k\)SAT problem with a planted solution, that is, inverting \(f\) at one point can be hard. Clearly, should we have a oneway function, then inverting it is guaranteed to be no easier than inverting \(f\).
So what does it mean if P ≠ NP? We know that \(k\)SAT problem is hard to solve in its worst case, so function \(f\) can be made as hard to invert as solving a \(k\)SAT problem in its worst case. However, we don’t know whether it’s possible to have a class \(\mathcal{S}\) of \(k\)SAT problems with planted solutions that are as hard as generalcase \(k\)SAT problems. If such a class \(\mathcal{S}\) exists, then given any \(s \in \mathcal{S}\), no probabilistic polynomialtime algorithm is able to get \(x\) with a nonnegligible probability, so we can conclude that \(f\) is indeed a oneway function. [1]
Problem 1.1. Does there exist a class \(\mathcal{S}\) of \(k\)SAT problems with planted solutions, such that every \(L \in \mathcal{S}\) is NPhard?
Conjecture 1.2. If \(\mathrm{P} \neq \mathrm{NP}\), then oneway functions exist.
On the other hand, assume that \(f\) is a oneway function, so that oneway functions do exist, then this implies that \(k\)SAT problem is hard to solve (in its worse case) by a polynomialtime algorithm, thus we have P ≠ NP. By modus tollens, if P = NP, then no oneway function exists. [2]
Theorem 1.3. If oneway functions exist, then \(\mathrm{P} \neq \mathrm{NP}\).
Proof. (Sketch) Let \(f : \{0,1\}^*\to\{0,1\}^*\) be a oneway function. There is a polynomialtime algorithm \(M_f\) that computes \(y=f(x)\) for all \(x\), thus, there exists a polynomialtime computable circuit that outputs \(y=f(x)\) for all \(x\).
Since \(f\) is a oneway function, that is, for every probabilistic polynomialtime algorithm \(\mathcal{A}\), there is a negligible function \(\mathsf{negl}\) such that \(\Pr[\mathsf{Invert}_{\mathcal{A},f}(n) = 1] \leq \mathsf{negl}(n)\), so we know that no \(\mathcal{A}\) can fully compute \(f^{1}(x)\) for any given \(x\). \(\mathcal{A}\) fully computes \(f^{1}\) if and only if it solves the corresponding CIRCUITSAT
problems of the circuit in all cases. Thus, there must exist some CIRCUITSAT
problems that cannot be decided by a polynomialtime algorithm, therefore, \(\mathrm{P} \neq \mathrm{NP}\).
■
Remark 1.4. If one can come up with a construction of the oneway function or a proof that such functions exist, then it holds true that \(\mathrm{P} \neq \mathrm{NP}\).
Books:
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms, 3rd ed.
M. Sipser, Introduction to the Theory of Computation, 3rd ed.
J. Katz and Y. Lindell, Introduction to Modern Cryptography, 2nd ed.
Papers:
[1] A. L. Selman, “A survey of oneway functions in complexity theory,” Mathematical Systems Theory, vol. 25, no. 3, pp. 203–221, 1992.
[2] M. Abadi, E. Allender, A. Broder, J. Feigenbaum, and L. A. Hemachandra, “On generating solved instances of computational problems,” in Proceedings on advances in cryptology, 1990, pp. 297–310.