Through the Looking-Glass Mort’s random homepage. tag:www.soimort.org,2017:/ 2017-11-09T00:00:00+01:00 Mort Yao soi@mort.ninja Underrated Gems (1) tag:www.soimort.org,2017:/notes/171109 2017-11-09T00:00:00+01:00 2017-11-11T00:00:00+01:00 Mort Yao

In a listless afternoon I questioned myself: What are some of the most neglected theorems that deserve more attention? Many famous old theorems in mathematics as I recalled, are about some kinds of impossibilities: $$\sqrt{2}$$ is irrational, $$\pi$$ is transcendental, polynomial equations of degree 5 or higher are generally insolvable, squaring the circle is impossible, Fermat’s last theorem, etc; by arguing about impossibility they’ve landscaped lots of math subfields in a revolutionary way. Compared to mathematics, formal logic is a relatively young field, not really fruitful before Frege, and the study on the theory of computation is even less cultivated, with quite a bunch of landmarking results yet underrated or poorly informed. Among those neglected theorems in logic and computation, I decided to do a write-up on each one I ever heard of, to help myself understand them better. Undoubtedly, they ought to be brought to the limelight and made better understood by working mathematicians and programmers nowadays.

It should be put in the first paragraph of the preface to The Book on logic, if there could be one, that truth in any language cannot be defined in the same language. We have the classical liar’s sentence which says that “This sentence is false”. If this kind of inconsistency does not seem too disastrous to bug you yet, as you may guarantee not to involve yourself into any circular reasoning, then think about the following truth-teller’s sentence:

If this sentence is true, then God exists.

Symbolically,

$\sigma = (\text{True}(\sigma) \to \exists x\,\text{Godlike}(x))$

If you take this sentence to be false, then since it is a material conditional the only way it can ever be false is that the antecedent ($$\text{True}(\sigma)$$) is true but the consequent ($$\exists x\,\text{Godlike}(x)$$) is false. However, from that $$\text{True}(\sigma)$$ is true we would immediately get that $$\sigma$$ is true, which contradicts our assumption that $$\sigma$$ is false. On the other hand, if you take this sentence to be true, then everything should work perfectly: $$\text{True}(\sigma)$$ is true, $$\exists x\,\text{Godlike}(x)$$ is also true, therefore the whole sentence $$\sigma$$ is true, which just verifies the fact that $$\text{True}(\sigma)$$ is true. Therefore, to maintain logical consistency in our reasoning, the only choice for a rational person is to accept the truth of $$\exists x\,\text{Godlike}(x)$$, i.e., “God exists.”

This is not really about the ontological argument of God’s existence; it’s about how the inherent inconsistency of an unrestricted language can lead to any random truth trivially (ex contradictione quodlibet). If you believe that the above argument is all reasonable and God indeed exists, try yourself on the following sentence:

If this sentence is true, then Satan rules the world.

Let’s say, get it over since natural language is so tricky and facile but no one is really hurt by its expressive power.1 The sure thing is, no serious mathematician wants to derive ridiculous random facts (e.g., if some self-referential proposition is true, then “$$2 + 2 = 3$$”), so in a formal language we are obliged to have a metatheorem like:

Undefinability theorem (Tarski 1936). Let $$\mathcal{L}$$ be a formal language, and let $$\mathfrak{A}$$ be an $$\mathcal{L}$$-structure, with its theory $$\operatorname{Th}\mathfrak{A}$$ capable of representing all computable functions. A Gödel numbering $$\mathfrak{g}(\phi)$$ is defined for every $$\mathcal{L}$$-formula $$\phi$$. There is no formula $$\text{True}(n)$$ such that for every $$\mathcal{L}$$-formula $$\varphi$$, $$\text{True}(\mathfrak{g}(\varphi)) \leftrightarrow \varphi$$.

Before proceeding with the proof of Tarski’s undefinability theorem, we get our terminology straight first and prove a very important lemma, namely the diagonal lemma. The definitions of language, structure and theory are already made in Mst. #7. The capability of representing all computable functions can be informally understood as, for every Turing-computable function $$f : x \mapsto y$$, there is an $$\mathcal{L}$$-formula $$\gamma_f$$ such that $$\gamma_f(x, y) \in \operatorname{Th}\mathfrak{A} \leftrightarrow y = f(x)$$.

A Gödel numbering in a formal language $$\mathcal{L}$$ is an injective function $$\mathfrak{g} : \Phi \to \mathbb{N}$$ that maps every $$\mathcal{L}$$-formula to a unique natural number $$n \in \mathbb{N}$$. It may be defined in any way that satisfies the desired injective property. For example, in a first-order language $$\mathcal{L}_\text{Set}$$ consisting of only variables $$x_i$$ ($$i = 0, 1, 2, \dots$$), two connective symbols $$\{ \to, \neg \}$$, the equality symbol $$=$$, the quantifier symbol $$\forall$$ and a predicate symbol $$\in$$, one possible Gödel numbering function may be defined inductively as

\begin{align*} \mathfrak{g}(x_i) &= 2^0 \cdot 3^i \\ \mathfrak{g}(= t_1 t_2) &= 2^1 \cdot 3^{\mathfrak{g}(t_1)} \cdot 5^{\mathfrak{g}(t_2)} \\ \mathfrak{g}(\in t_1 t_2) &= 2^2 \cdot 3^{\mathfrak{g}(t_1)} \cdot 5^{\mathfrak{g}(t_2)} \\ \mathfrak{g}((\neg \psi)) &= 2^3 \cdot 3^{\mathfrak{g}(\psi)} \\ \mathfrak{g}((\psi \to \theta)) &= 2^4 \cdot 3^{\mathfrak{g}(\psi)} \cdot 5^{\mathfrak{g}(\theta)} \\ \mathfrak{g}(\forall x_i \psi) &= 2^5 \cdot 3^i \cdot 5^{\mathfrak{g}(\psi)} \\ \end{align*}

(*)

Claim 1. (*) defines a Gödel numbering $$\mathfrak{g}$$ on the language $$\mathcal{L}_\text{Set}$$. That is, for any $$\mathcal{L}_\text{Set}$$-formulas $$\varphi_1$$ and $$\varphi_2$$, if $$\mathfrak{g}(\varphi_1) = \mathfrak{g}(\varphi_2)$$, then $$\varphi_1 \cong \varphi_2$$.

Proof idea. By structural induction on the formula and the fundamental theorem of arithmetic (every integer greater than 1 has a unique prime-factorization).

Example 2. The Gödel number of formula $$\forall x_0 (\neg \in x_0 x_0)$$ using (*) is

\begin{align*} \mathfrak{g}(\forall x_0 (\neg \in x_0 x_0)) &= 2^5 \cdot 3^0 \cdot 5^{\mathfrak{g}(\neg \in x_0 x_0)} = 2^5 \cdot 3^0 \cdot 5^{2^3 \cdot 3^{\mathfrak{g}(\in x_0 x_0)}} = 2^5 \cdot 3^0 \cdot 5^{2^3 \cdot 3^{2^2 \cdot 3^{2^0 \cdot 3^0} \cdot 5^{2^0 \cdot 3^0}}} \\ &= 2^5 \cdot 3^0 \cdot 5^{2^3 \cdot 3^{60}} \\ &= 32 \cdot 5^{339129266201729628114355465608} \end{align*}

Note that the choice of our numbering function is purely technical; it doesn’t make any difference if we choose another set of prime numbers $$\{5, 7, 11\}$$ instead of $$\{2, 3, 5\}$$ as basis, as long as the decomposition of a Gödel number is unique. (Of course, such numbers would be even bigger.)

So, what is the whole point of this numbering thing, you might ask? As is shown in Example 2, a simple formula can yield a Gödel number which is insanely large (no kidding, even much greater than the total number of atoms in the universe!). We don’t actually use these numbers; we construct them just to convince ourselves that we can encode virtually any theory into the theory of arithmetic, or literally, strings into natural numbers. The overwhelming cost (complexity) is not of our concern for now. Hence, from now on, we can virtually reduce any problem in a formal language (set theory, type theory, analysis, geometry, …) to an arithmetical problem, i.e., a problem simply about manipulating and deciding on natural numbers.

Diagonal lemma (Carnap 1934). Let $$\mathcal{L}_\text{A}$$ be a first-order language of arithmetic, and let $$\mathfrak{A}$$ be an $$\mathcal{L}_\text{A}$$-structure, with its theory $$\operatorname{Th}\mathfrak{A}$$ capable of representing all computable functions. Let $$\phi$$ be an $$\mathcal{L}_\text{A}$$-formula with one free variable. There is a sentence $$\psi$$ such that $$\psi \leftrightarrow \phi(\mathfrak{g}(\psi))$$ in $$\operatorname{Th}\mathfrak{A}$$.

Proof. Let $$f$$ be a total function defined on $$|\mathfrak{A}| = \mathbb{N}$$: $f(\mathfrak{g}(\theta)) = \mathfrak{g}(\theta(\mathfrak{g}(\theta)))$ for each $$\mathcal{L}_\text{A}$$-formula $$\theta$$ with one free variable; otherwise, define $$f(n) = 0$$.

Since $$f$$ is a computable function, there is a formula $$\gamma_f$$ representing $$f$$ in $$\operatorname{Th}\mathfrak{A}$$. That is, for each formula $$\theta$$, we have in $$\operatorname{Th}\mathfrak{A}$$ $\forall y\,(\gamma_f(\mathfrak{g}(\theta), y) \leftrightarrow y = f(\mathfrak{g}(\theta)))$ which is just $\forall y\,(\gamma_f(\mathfrak{g}(\theta), y) \leftrightarrow y = \mathfrak{g}(\theta(\mathfrak{g}(\theta))))$ Define the formula $$\beta$$ (with one free variable) as $\beta(z) = \forall y\,(\gamma_f(z,y) \to \phi(y))$ Then we have in $$\operatorname{Th}\mathfrak{A}$$ $\beta(\mathfrak{g}(\theta)) \leftrightarrow \forall y\,(y = \mathfrak{g}(\theta(\mathfrak{g}(\theta))) \to \phi(y))$ which is just $\beta(\mathfrak{g}(\theta)) \leftrightarrow \phi(\mathfrak{g}(\theta(\mathfrak{g}(\theta))))$ Since $$\theta$$ can be any $$\mathcal{L}_\text{A}$$-formula, let $$\theta = \beta$$. Then we have in $$\operatorname{Th}\mathfrak{A}$$ $\beta(\mathfrak{g}(\beta)) \leftrightarrow \phi(\mathfrak{g}(\beta(\mathfrak{g}(\beta))))$ Let $$\psi = \beta(\mathfrak{g}(\beta))$$. Thus we have in $$\operatorname{Th}\mathfrak{A}$$ $\psi \leftrightarrow \phi(\mathfrak{g}(\psi))$ Therefore we obtain $$\psi$$ which is the required sentence (“fixed point”).

Now we are all set to prove the undefinability theorem:

Proof. By reductio ad absurdum: Suppose that there is an $$\mathcal{L}$$-formula $$\text{True}(n)$$ such that for every $$\mathcal{L}$$-formula $$\varphi$$, $$\text{True}(\mathfrak{g}(\varphi)) \leftrightarrow \varphi$$. In particular, if $$\sigma$$ is a sentence, then $$\text{True}(\mathfrak{g}(\sigma))$$ holds in $$\mathfrak{A}$$ iff $$\sigma$$ is true in $$\mathfrak{A}$$. Thus for all $$\sigma$$, $$\text{True}(\mathfrak{g}(\sigma)) \leftrightarrow \sigma$$ holds in $$\operatorname{Th}\mathfrak{A}$$. However, by the diagonal lemma, there is also a sentence $$\psi$$ such that $$\psi \leftrightarrow \neg \text{True}(\mathfrak{g}(\mathfrak{\psi}))$$ in $$\operatorname{Th}\mathfrak{A}$$. That is a contradiction. Therefore, such a formula $$\text{True}(n)$$ does not exist.

Tarski’s undefinability theorem, along with the auxiliary diagonal lemma, showed some really fundamental limitative results in formal logic, and even more. The diagonal lemma obtained its name from the ingenious Cantor’s diagonal argument, which reveals the well-known fact that the set $$\mathbb{R}$$ of all real numbers is uncountable. The similar trick of diagonalization has been exploited in multiple contextually different but essentially interwoven arguments, e.g., Gödel’s incompleteness theorems, and that the halting problem is undecidable. The category-theoretic generalization of this lemma is known as Lawvere’s fixed-point theorem, applicable in any cartesian closed category.

Let’s consider an informal but intuitive connection between the undefinability theorem and the (renowned) incompleteness theorem. Given the fact that the universal truth-telling formula $$\text{True}(n)$$ is undefinable in a formal language, one may wonder, “If some statement is true, then by the completeness theorem of first-order logic it can always be proven true. But if truth is not even definable at all, how do I know if any statement could as well be proven false?”

Here’s the thing. We don’t. If we do, we would be able to tell truth from falsehood completely and provably. That is to say, for every formula $$\varphi$$ in our language, there would be either a proof ($$\vdash \varphi$$) or a disproof ($$\vdash \varphi \to \bot$$ or $$\vdash \neg \varphi$$) of it; the class of all such proofs would just make a perfect $$\text{True}(n)$$, while $$\text{True}(\mathfrak{g}(\varphi))$$ holds if and only if $$\varphi$$ has a proof! That would contradict the undefinability theorem. So necessarily, we must take for granted that there could be some falsehood in a logical system (which is powerful enough to represent all computable functions via Gödel numbering, e.g., a number theory of elementary arithmetic) of which existence we can never disprove (because otherwise we could have defined that $$\text{True}(n)$$). Plug this metastatement into Löb’s theorem: (Let the modal operator $$\square$$ denote the modality of provability, and let the constant $$\bot$$ denote the falsehood.) \begin{align*} & \square (\square \bot \to \bot) \to \square \bot \\ \iff & \square (\neg \square \bot) \to \square \bot &\qquad\text{(Definition of negation)} \\ \iff & \square (\neg \square \bot) \to \bot &\qquad\text{(Axiom T and transitivity)}\\ \iff & \neg \square (\neg \square \bot) &\qquad\text{(Definition of negation)}\\ \end{align*} which informally says “It is not provable that falsehood is not provable.” which is just another way of claiming that $$\mathfrak{N} \not\vdash \operatorname{Cons}\mathfrak{N}$$, the consistency of the logical system cannot be proved from within the system itself, a.k.a. Gödel’s second incompleteness theorem.

Digressing a little from our main topic, formal logic, let’s consider the application of the undefinability theorem to programming language theory as a dessert. It is well understood that, by the Curry-Howard correspondence, a logical formula corresponds to a type, and a type checker of a programming language guarantees that every well-typed program consists of only types that correspond to valid formulas. If you write some sophisticated type and its type checking succeeds, the underlying formula must be valid as you have just proved that (so-called “programs are proofs”). Now, I challenge you, given a hypothetical, reasonably expressive programming language with full dependent types, can you write a type-checked self-interpreter in it? Once you succeed, you would have proved that you can actually tell valid formulas (types) from false ones in all cases (by writing a self-interpreter that accepts only well-typed programs), thus effectively define the truth in the language itself. By the undefinability theorem, that would be impossible (otherwise your underlying logical system can’t be consistent!). A slightly different but related result is that a total programming language can’t have a self-interpreter; since a total language is not necessarily equipped with a type system or alike proof-theoretic constructions, the undefinability theorem may not directly apply, but a computability-theoretic argument of this is analogously diagonal (see ).2

1. One may find it intuitive to notice that, whether $$\text{True}(\sigma)$$ is true or not, it has nothing to do with the existence of God or whatever other claims of real interest. Indeed, instead of restricting our language as we do here, we could as well devise a restrictive set of inference rules for an alternative logic (e.g., relevance logic) so that explosions can be expelled.

2. I am aware of the “self-interpreter” for System $$\text{F}_\omega$$ proposed in a POPL’16 paper by Brown & Palsberg . According to a footnote in  and also from my unrefined understanding, the interpreter does not come with a separate syntax for types thus fails to capture the underlying logical consistency; therefore neither Löb’s theorem nor the undefinability theorem may apply in that case.

]]>
Go Away, tracker-store tag:www.soimort.org,2017:/notes/171103 2017-11-03T00:00:00+01:00 2017-11-04T00:00:00+01:00 Mort Yao Long story short, after an Arch update the annoying process called tracker-store started to hog my CPU and disk space (again). For some really peculiar reason, GNOME developers decided that everyone should want to use their awesome Tracker smarty-ware to index everything in pants, and you should not need to turn it off so easily…

Well, I had had enough of this due to concerns of performance and privacy, and surely uninstalled the tracker package a long time ago, but now it seems GNOME’s canonical file manager nautilus starts to depend on tracker (one of its old dependencies, libtracker-sparql, has now been fused with the tracker package since version 2.0 in Arch’s downstream repository). That explains why it’s back, uninvitedly.

A few quirks noted (in Tracker 2.0.x):

1. The GUI configuration program tracker-preferences has been summarily removed (d4a8d6e), while tracker provides no command-line alternative for intuitive gsettings configuration yet. (Then why would they remove this very helpful GUI?)
2. The old trick that appends Hidden=true to an overriding autostart file such as ~/.config/autostart/tracker-store.desktop (in  ) no longer works in 2.0, because….
3. tracker ships with a systemd user service now (/usr/lib/systemd/user/tracker-store.service), since they obviously think a desktop-level autostart is not enough. While Arch Linux ships wisely with a /usr/lib/systemd/system-preset/99-default.preset containing “disable *” which disables all new units by default , there is no equivalent user-preset file doing the same, which means that tracker-store.service, as a user service, is enabled by default and still gets to run on every boot.

Now that uninstalling tracker is not an option, one has to take both measures to block tracker-store from running:

1. Copy the autostart file and override it with a user-specific one: $cp /etc/xdg/autostart/tracker-store.desktop ~/.config/autostart/tracker-store.desktop and append to it: Hidden=true (The same step also applies for tracker’s friends, such as tracker-miner-fs, etc.) 2. Mask (which is the strongest yet nondestructive way to “disable” a static systemd service completely ) all tracker-related services: $ systemctl --user mask tracker-store

Look ma, no more hogs. Back to work!

## References

 Ask Ubuntu, “tracker-store and tracker-miner-fs eating up my CPU on every startup”. https://askubuntu.com/questions/346211/tracker-store-and-tracker-miner-fs-eating-up-my-cpu-on-every-startup

 “Disabling GNOME Tracker and Other Info”. https://gist.github.com/vancluever/d34b41eb77e6d077887c

 ArchWiki, “systemd”. https://wiki.archlinux.org/index.php/Systemd

 FreeDesktop.org, “systemctl(1)”. https://www.freedesktop.org/software/systemd/man/systemctl.html

]]>
The Semantic Truth tag:www.soimort.org,2017:/mst/9 2017-10-07T00:00:00+02:00 2017-10-17T00:00:00+02:00 Mort Yao

“This is clear, in the first place, if we define what the true and the false are. To say of what is that it is not, or of what is not that it is, is false, while to say of what is that it is, and of what is not that it is not, is true.”

– Aristotle, Metaphysics

On truth-bearers. A truth-bearer is something of what we can talk about the truth-value: a statement, a sentence, a proposition, etc. We will use the three aforementioned notions indistinguishably, but avoid the use of subjective notions such like assertion, belief, intuition and judgment as truth-bearers. (These words however, can occur with their originally intended meanings.)

In Mst. #7 we defined that (in a formal language) a sentence is just a well-formed formula with no free variable. Furthermore, in Mst. #8 we defined a statement, naïvely, as a sentence that says something is true. But what makes a sentence true, or oppositely, not true?

Aristotle’s definition of truth. Aristotle’s original truth definition in Metaphysics (360s B.C.) can be denoted using the modern device of Quine’s quasi-quotation, as follows:

$$\psi$$⌝ is true if and only if $$\psi$$.
$$\psi$$⌝ is false if and only if it is not the case that $$\psi$$.

Note that " ⌜$$\psi$$⌝" is a mention of some statement while “$$\psi$$” is the use of that statement in the metalanguage, so the above definition does make sense (instead of some “be whatever you be” nonsense as it might seem to an untrained eye).

An implication of this definition, is that “false” is the opposite of “true”. The bivalence of truth and falsity is an essential property of classical logic. Thus, we have the following tautology: $\psi \lor \lnot\psi \qquad\text{(Law of Excluded Middle)}$ that is, given any interpretation, whether $$\psi$$ is true or false, the above proposition is always true (or said to be valid). This fact may be justified via a truth table.

Theories of truth. From a metaphysical perspective, several theories of truth have been proposed and well argued:

• Correspondence theory: The truth is determined solely by how it relates to the world and whether it describes a fact about that world.
• For a statement to be true, there must be an isomorphism to it from the state of the real world that makes it true. Hence, the correspondence view is mainly accepted by realists/Platonists but rejected by most idealists.
• Coherence theory: The truth is determined by how consistent it is with our existing, coherent set of statements or beliefs.
• Pragmatic theory: The truth is determined by how well it enables us to achieve our goals.
• Deflationary theory (alternatively under the labels of redundancy theory, disquotational theory, prosentential theory or minimalist theory): The assertion of truth of a statement is nothing but an assertion of the statement itself; it does not attribute a property called “truth” to such a statement.
• The most widely accepted definitions of truth in modern treatment of formal logic (by Frege, F. P. Ramsey, Quine, etc.), are deflationary in essence.

On the notion of truth and truth-bearers:

Correspondence theory:

Coherence theory:

Pragmatic theory:

Deflationary theory:

Ramsey’s redundancy theory of truth (a deflationary theory):

• Frank P. Ramsey, “Facts and Propositions.” [PDF]

Now we consider a (semi-)formal language where we are able to state truth self-referentially.

Antinomies.

This sentence is false.

“Yields falsehood when preceded by its quotation”
yields falsehood when preceded by its quotation.

If we take the truth (or falsity) of a statement to be bivalent, then the truth-values of the above two sentences could not be consistently determined. Such antinomies motivated Alfred Tarski’s proposal of the semantic theory of truth, which eliminates such potentially paradoxical use in a formal language.

Semantic theory of truth. (Tarski, 1933) Given an object language $$\mathcal{L}_0$$ and a metalanguage $$\mathcal{L}_1$$,

$$\psi_0$$⌝ is true if and only if $$\psi_1$$. (Convention T)

where “$$\psi_0$$” is a sentence in $$\mathcal{L}_0$$, “$$\psi_1$$” is a sentence in $$\mathcal{L}_1$$. It is demanded that $$\mathcal{L}_0$$ must be contained in $$\mathcal{L}_1$$. Moreover, the word “true” (or “false”) does not occur in $$\psi_0$$. Thus, no sentence in the object language $$\mathcal{L}_0$$ can assert the truth/falsity of itself; rather, its truth-value must be asserted by a higher-level metalanguage $$\mathcal{L}_1$$. Virtually, it would be intuitive to imagine a semantic hierarchy of formal languages that rules out the use of self-reference: $\mathcal{L}_0 \sqsubset \mathcal{L}_1 \sqsubset \mathcal{L}_2 \sqsubset \cdots$ that is, the truth of a sentence in an object language $$\mathcal{L}_i$$ can only be asserted in $$\mathcal{L}_{i+1}$$ as the metalanguage. Since there is no final stage in this hierarchy, the “most-meta” truths may never be asserted.

T-schema. For a formal language containing a given set of logical connective symbols, we give an inductive definition of truth in the following form:

1. $$\psi$$⌝ is true if and only if $$\psi$$.
2. (Negation) ⌜$$\lnot\psi$$⌝ is true if and only if $$\psi$$ is not true.
3. (Conditional) ⌜$$\psi \to \theta$$⌝ is true if and only if $$\psi$$ is not true or $$\theta$$, or both.
4. (Conjunction) ⌜$$\psi \land \theta$$⌝ is true if and only if $$\psi$$ and $$\theta$$.
5. (Disjunction) ⌜$$\psi \lor \theta$$⌝ is true if and only if $$\psi$$ or $$\theta$$, or both.
6. (Universality) ⌜$$\forall x \psi(x)$$⌝ is true if and only if every object $$x$$ satisfies $$\psi(x)$$.
7. (Existence) ⌜$$\exists x \psi(x)$$⌝ is true if and only if there is an object $$x$$ that satisfies $$\psi(x)$$.

(Note that the subscripts distinguishing the object language / metalanguage are implicit.)

Tarski’s semantic theory of truth is not only limited to the application in philosophical logic though: A similar definition is also used in model theory, where we define the satisfaction of a formula $$\varphi$$ with respect to a structure $$\mathfrak{A}$$ and an assignment $$s$$ inductively.

Remark 9.1. (Semantic theory of truth and mathematical logic) The mathematical counterpart of Tarski’s semantic theory of truth yields the undefinability theorem (Tarski, 1936), which briefly states that arithmetical truth cannot be defined in arithmetic itself (it is worth noting that the proof is a non-trivial one which requires Gödel numbering on formulas in a formal language). The semantic undefinability also has a strong correspondence with the incompleteness theorems (Gödel, 1931), which will be covered in future notes hopefully.

On Tarski’s semantic theory of truth:

Truth and objectivity. Is truth objective? Yes. Given the object language $$\mathcal{L}_0$$, consider a sentence $$\varphi[s]$$ which is true under the model $$\mathfrak{A}$$: $\models_\mathfrak{A} \varphi[s]$ It is true because we can argue in our metalanguage $$\mathcal{L}_1$$, that it is satisfied decidedly; thus it must be a truth in the object language. For example, we can argue extensionally that $\models_\mathfrak{A} \text{Morning Star} = \text{Evening Star}$ Opinionatedly, there could be no such thing as a “subjective truth”, since the truth of a sentence must be justified in a higher-level metalanguage (due to the semantic theory of truth). Per the object-metalanguage distinction, truth is only objective by nature. Once we claim that “something holds” without appealing to a metalanguage for justification, then it is merely a belief rather than a truth. Beliefs can be, of course, illogical and ungrounded.

Logical validity of arguments. A true sentence may not always be true in every possible structure. Consider: $\models_\mathfrak{B} \text{Morning Star} = \text{Evening Star}$ In an alternative universe $$\mathfrak{B}$$ where Venus fails to be both the morning star and the evening star, such a statement would be trivially false. However, a logical argument must not depend on a specific structure or interpretation of its parameters; that is, when we argue logically that something is true, the argument itself must hold valid. Validity is a purely logical notion here. For example, consider adding the following axiom to our deductive system: $t = t' \to t \cong t'$

One can easily verify that this axiom is valid (i.e., always true): If $$t = t'$$ is false, then $$t = t' \to t \cong t'$$ is vacuously true; if $$t = t'$$ is true, then we have $$t \cong t'$$ so $$t = t' \to t \cong t'$$ is also true.

Specifically, we have

$$\text{Morning Star} = \text{Evening Star} \to \text{Morning Star} \cong \text{Evening Star}$$
(If the morning star is the evening star, then it is isomorphic to the evening star.)

By modus ponens, the following deduction applies:

$$\Gamma; \text{Morning Star} = \text{Evening Star} \quad\vdash\quad \text{Morning Star} \cong \text{Evening Star}$$
(Assume that the morning star is the evening star, then it is isomorphic to the evening star.)

This is a valid argument, since it follows solely from our set of axioms and rules of inference, but relies on no particular structures or interpretations. Such a valid argument justifies the logical fact that the truth of its premises guarantees the truth of its conclusion; this, however, does not imply anything about the truth of premises (that would depend on the actual interpretation).

Soundness of arguments. To put it shortly, a sound argument is just a valid argument with true premises.

$$\frac{\text{Morning Star} = \text{Evening Star} \qquad \text{Morning Star} = \text{Evening Star} \to \text{Morning Star} \cong \text{Evening Star}}{\text{Morning Star} \cong \text{Evening Star}}$$
(Given that the morning star is the evening star, it is isomorphic to the evening star.)

Given true premises, the valid argument proves its conclusion: $$\text{Morning Star} \cong \text{Evening Star}$$, and we know that it is guaranteed to be true. Such a truth is also called a logical truth, since it follows from a logical consequence of a sound argument in our deductive system. Notice that the above is just an instance of the soundness theorem, loosely stated as: Every provable sentence is true.

Two approaches to logical consequence.

1. Model-theoretic (or “realist”) approach. The validity of an argument is justified by the absence of counterexample. That is, if we can ever find a structure $$\frak{A}$$ and an assignment function $$s : V \to |\mathfrak{A}|$$ such that $$\frak{A}$$ satisfies every formula in hypotheses $$\Gamma$$ with $$s$$, but $$\frak{A}$$ fails to satisfy $$\varphi$$ with $$s$$, we will conclude that $$\Gamma \not\models \varphi$$. Otherwise, we will have $$\Gamma \models \varphi$$, where $$\varphi$$ is a logical truth.
2. Proof-theoretic (or “formalist”) approach. The validity of an argument is justified by a deduction. That is, from our set of hypotheses and axioms $$\Gamma \cup \Lambda$$ we obtain $$\varphi$$ by applying various rules of inference as a finite sequence, then $$\Gamma \vdash \varphi$$, where $$\varphi$$ is a logical truth.

By the soundness and completeness theorems $$\Gamma \models \varphi \Leftrightarrow \Gamma \vdash \varphi$$, the two approaches agree with each other extensionally.

Philosophical logic-related issues:

Motivations of alternative logics. In classical first-order logic, a deductive system preserves the semantic property of truth, as justified by the dual theorems of soundness/completeness. We will see that it does not always sufficiently capture the need for logical reasoning, thus numerous alternative (non-classical) logics has emerged for this reason.

• Modality. Instead of saying that something is the case, a modal statement says about what could be or must be the case, e.g., “Venus must be the morning star”, “Venus could be both the morning star and the evening star”. Modal logic supplements classical logic with two modal operators representing the notions of necessity (□) and possibility (◇).
• Temporality and tense. Instead of saying that something is the case, a temporal statement says something to be the case as qualified in terms of time, e.g., “Venus is always the morning star”, “The evening star will eventually be visible”. Temporal logic (or tense logic) supplements classical logic with a set of temporal modal operators.
• Vagueness. While we could simply say that something is true or false, sometimes questions like “how true is that?” arise, e.g., “Venus is a distant planet”. The vagueness of predicates in natural languages leads to the approach of many-valued logic, where we are able to speak of truth-values like “fully true”, “fully false” or “neutral” (three-valued logic), or even any real truth-value in between the interval $$[0,1]$$ (infinite valued fuzzy logic).
• Heterogeneity. In the semantics of classical logic, the universe (domain) is assumed to be a homogeneous collection of objects. Many-sorted logic allows us to divide the universe into disjoint subsets (sorts) so as to treat objects of different sorts differently.
• Quantification into predicates. In first-order logic, one can only quantify over a variable in the universe. Using the notation of set theory: $v \in |\mathfrak{A}|$ where $$v$$ is an urelement (i.e., not a set itself).
In second-order logic, any predicate or function may be quantified over as a “second-order variable” (which essentially represents a set), in the power set of the universe: $P \subseteq |\mathfrak{A}|^n \in \mathcal{P}(|\mathfrak{A}|)$ $f \subseteq |\mathfrak{A}|^{n+1} \in \mathcal{P}(|\mathfrak{A}|)$ In an $$n$$-th-order logic, one can quantify over an $$n$$-th-order variable $$\chi$$ in the $$(n-1)$$-th-order power set of the universe: (When $$n=1$$, the domain is just the original universe.) $\chi \in \mathcal{P}^{n-1}(|\mathfrak{A}|)$ Therefore, we can say that higher-order logics are just “set theory in disguise” (Quine), since the set theory in a first-order language is capable of representing notions like “set”, “set of sets”, etc.
It is worth mentioning here that since the power set operation is definable in second-order logic, second-order logic is expressive enough to simulate any finite-order logic. 
• Empty terms or domains. In classical logic, a structure $$\mathfrak{A}$$ (with a non-empty domain $$|\mathfrak{A}|$$) assigns each term an interpretation. Free logic allows for uninterpreted terms (that do not denote any object); furthermore, inclusive logic allows structures to have an empty domain $$|\mathfrak{A}| = \emptyset$$.

It is not uncommon to revise one or more axioms in classical first-order logic. The resulting non-classical logics can be useful in a constructive setting. In the following text, we will use syntactical forms like $$\alpha \to \beta$$ to denote both a tautological conditional sentence and a valid deduction rule ($$\{\alpha\} \vdash \beta$$), as justified by the dual theorems of deduction/resolution.

• Justification. In classical logic we accept the principle of bivalence (i.e., anything is either true or false), thus we have the following tautologies: $\psi \lor \lnot\psi \qquad\text{(Law of Excluded Middle)}$ $\lnot\lnot\psi \to \psi \qquad\text{(Double Negation Elimination)}$ $(\lnot \psi \to \bot) \to \psi \qquad\text{(Proof by Contradiction)}$ which are denied by intuitionistic logic. From a model-theoretic view, there could be logical truths that are not flawed, but not necessarily well justified. In a word, intuitionistic logic attempts to capture the semantic property of justification instead of truth. Particularly, in classical logic, assume that we can prove $$\forall x (\lnot\varphi) \to \bot$$, then $$\lnot\lnot \forall x (\lnot\varphi) \to \bot$$, by PbC we get $$\lnot \forall x (\lnot\varphi)$$, which is just $$\exists x \varphi$$ – but we have no evidence that such a “witness” $$t$$ (which truthifies $$\varphi^x_t$$) really exists yet. This intuitionistic thinking is extremely helpful in the so-called constructive mathematics, where the proof of the existence of a mathematical object requires a justification by providing an example.
• Inconsistency. In classical and intuitionistic logics we accept the principle of explosion (ex falso quodlibet), that is, a contradiction $$\psi \land \lnot\psi$$ (often abbreviated as $$\bot$$) would make a theory inconsistent, thus any sentence $$\theta$$ is trivially true: $(\psi \land \lnot\psi) \to \theta \qquad\text{(Ex Falso Quodlibet)}$ Sometimes, however, we are interested in formalizing inconsistent theories in a non-trivial way, in which case trivial truths by explosion must be excluded from our system. Paraconsistent logic is any such logic that rejects EFQ (or its equivalents). Specifically, minimal logic is the paraconsistent revision of intuitionistic logic.

In proof theory, substructural logics are a family of non-classical logics where one (or more) structural rule (e.g., weakening, contraction, commutativity, associativity) is absent.

• Relevance of implication. In classical logic we have the following tautologies (involving material/strict implication): $\psi \to (\theta \to \psi) \qquad\text{(1)}$ $\lnot\psi \to (\psi \to \theta) \qquad\text{(2)}$ $(\psi \to \theta) \lor (\theta \to \phi) \qquad\text{(3)}$ $\psi \to (\theta \to \theta) \qquad\text{(4)}$ $\psi \to (\theta \lor \lnot\theta) \qquad\text{(5)}$ It is easy to see that they all turn out to be counterintuitive in some way. For example, by (1) we are allowed to argue things like “If Mars is a planet, then Venus is a star implies that Mars is a planet”, but this is clearly absurd not only because Venus isn’t astronomically a star, but also because the fact whether Venus is a star or not, has no direct relevance with Mars being a planet. But the whole sentence is valid in classical logic! Relevance logic denies these tautologies, since its implication $$\to$$ requires that the antecedent must be somehow relevant to the consequent. Relevance logic is both a substructural logic (as it lacks of the structural rule of weakening) and a paraconsistent logic (as it clearly denies EFQ from which any irrelevant truth is trivially deducible).
• Resource consciousness. With an emphasis on resource boundedness, linear logic leaves out both structural rules of weakening and contraction, as a reflection of the limitation that resources cannot always be duplicated or thrown away arbitrarily.
• Other substructural logics include affine logic, which allows the structural rule of weakening but disallows contraction; ordered logic (non-commutative logic), which disallows the rule of exchange (commutativity) in addition to rejecting weakening and contraction.

On classical logic:

On modality and modal logic:

On temporality, tense and temporal logic:

On vagueness, many-valued logic and fuzzy logic:

On quantification, second-order and higher-order logic, and their correspondence with set theory:

On free logic:

On intuitionistic logic and constructive mathematics:

On paraconsistent logic:

On substructural logics, relevance logic and linear logic:

## Towards higher-order logic (HOL) and type theory

As discussed by the subsection of “The limitation of first-order logic” in Mst. #7, a practical issue about first-order theory is that it provides no way to express the so-called “well-ordering property” (or its equivalents like the second-order induction axiom) of sets: Every non-empty set of numbers has a least element. $\forall P (\exists x Px \to \exists x (Px \land \forall y (Py \to (x = y \lor x < y))))$

Recall that in first-order logic, we have notably the following model-theoretic results:

1. (Completeness) $$\Gamma \models \varphi \implies \Gamma \vdash \varphi$$.
2. (Compactness) If every finite subset $$\Gamma_0$$ of $$\Gamma$$ is satisfiable, then $$\Gamma$$ is satisfiable.
3. (Enumerability) For a countable language, the set of valid formulas is computably enumerable.
4. (Löwenheim-Skolem theorem) For a countable language, if a set of sentences has any model, then it has a countable model.

however, they do not generally hold in the standard semantics of second-order (or any higher-order) logic. As a naïve counterexample  disproving the compactness, consider a second-order sentence that defines a strict ordering $$R$$: $\exists R (\forall x \forall y \forall z (Rxy \land Ryz \to Rxz) \land \forall x (\lnot Rxx \land \exists y Rxy))$ let $$\lambda_\infty$$ be the above sentence, which is true if and only if the universe is an infinite set. Let $$\lambda_i$$ be the first-order sentence saying that “there are at least $$i$$ urelements in the universe”. Then the infinite set $\Gamma = \{ \lnot \lambda_\infty, \lambda_1, \lambda_2, \lambda_3, \dots \}$ has no model, since $$\lnot \lambda_\infty \land \lambda_\omega$$ would be a contradiction. However, every finite subset of $$\Gamma$$ is clearly satisfiable.

Another intuitive way of thinking this is that since second-order logic is capable of categorically axiomatizing Peano arithmetic (i.e., there is at most one model of arithmetic up to isomorphism), but we would be able to build a non-standard model of arithmetic via ultraproducts given that the compactness theorem holds (as shown in Mst. #7), therefore the compactness theorem must be false then.

In second-order logic, we have two separate “stages” of ranges that one may quantify over: variables and predicates/functions, denoted as set membership: $v_0 \in v_1 \in \mathcal{P}(|\mathfrak{A}|)$

Naturally, in an $$n$$-th-order logic we have $v_0 \in v_1 \in \cdots \in v_{n-1} \in \mathcal{P}^{n-1}(|\mathfrak{A}|)$

Once the order of logic is raised to the limit ordinal $$\omega$$, we reach the level of type theory.  Sloppily, in type theory, a type $$\text{T}_i$$ is just a set $$V_i$$, a function $$\text{f} : \text{T}_x \to \text{T}_y$$ is just a relation set $$f$$ such that $$\forall x \forall y_1 \forall y_2 ((\langle x, y_1 \rangle \in f \land \langle x, y_2 \rangle \in f) \to y_1 = y_2)$$, where $$x \in V_x$$ and $$y_1, y_2 \in V_y$$. Hence to accommodate the function type $$\text{f}$$, one needs a power set operation correspondingly in set theory. The $$\omega$$-infinite hierarchy of sets of sets: $V_0 \in V_1 \in \cdots \in V_n \in \cdots$ where $$V_0$$ is the empty set $$\varnothing$$ (a set that has no members), corresponds to the $$\omega$$-infinite hierarchy of types of types: $\text{Type}_0 : \text{Type}_1 : \cdots : \text{Type}_n : \cdots$ where $$\text{Type}_0$$ is the bottom type (a type that has no values).

Both set theory and type theory can serve as logical foundations of mathematics , owing to their equivalent expressive power . Some differences are worth noting:

1. Set theory is built on top of classical first-order logic (although there are recent proposals of intuitionistic set theories), with its semantics denoting a concrete model (e.g., ZFC); type theory is essentially an extension of higher-order logic, which is syntactically strong enough on its own.
2. Set theory tends to use a Hilbert-like deductive system, with its specific non-logical axioms (e.g., ZF axioms, Axiom of Choice); type theory tends to use a natural or Gentzen-like deductive system, with its specific logical rules of inference.
3. Set theory needs the Axiom of Regularity to establish its well-foundedness, and numbers must be encoded as well-founded sets; in type theory one may define inductive types directly, this makes handling models like Peano arithmetic more easily and naturally.
4. So as for a set theory to be constructive, one must reject both LEM (or its equivalents) and AC; a type theory is constructive as long as its underlying logic is constructive (i.e., rejecting LEM).
5. Type theory allows many sorts and has a straightforward correspondence with statically typed functional programming, i.e., “propositions as types” and “proofs as programs”. In set theory, functions must be uniquely encoded as power sets in a quite verbose way, which makes it unsuitable for computer-based proof mechanization.
6. Both naïve set theory and type theory were under the paradoxical crisis of inconsistency, as revealed by Russell’s paradox and Girard’s paradox respectively. The resolution of these paradoxes has led to multiple axiomatic set theories (e.g., ZF, NBG, NFU) and modern type theories (e.g., Martin-Löf Type Theory, Calculus of Inductive Constructions).
7. As an offshoot of set theory we can talk about large cardinals, whose existence is independent of (unprovable in) ZFC; there is no analog to these in proof-centric, constructive type theory yet.
8. Set theory was not well tailored for modeling today’s category theory, since not all categories are truly sets; in type theory, it is natural to think of types as objects and function types as morphisms, and a mutual interpretation between the semantics of type theory and category theory would be intuitively possible. 

Books:

Aristotle, Metaphysics. (English translation by W. D. Ross, 1925)

W. V. O. Quine, Philosophy of Logic, 2nd ed.

Michael Dummett, Truth and Other Enigmas.

Stewart Shapiro, Foundations Without Foundationalism: A Case for Second-Order Logic.

Articles:

 J. Hintikka, “Two papers on symbolic logic form and content in quantification theory and reductions in the theory of types,” 1955.

 H. B. Enderton, “Second-order and higher-order logic,” in The stanford encyclopedia of philosophy, Fall 2015., E. N. Zalta, Ed. https://plato.stanford.edu/archives/fall2015/entries/logic-higher-order/; Metaphysics Research Lab, Stanford University, 2015.

 T. Coquand, “Type theory,” in The stanford encyclopedia of philosophy, Summer 2015., E. N. Zalta, Ed. https://plato.stanford.edu/archives/sum2015/entries/type-theory/; Metaphysics Research Lab, Stanford University, 2015.

 B. Werner, “Sets in types, types in sets,” in Proceedings of tacs’97, 1997, pp. 530–546.

]]>
The Referential Use tag:www.soimort.org,2017:/mst/8 2017-09-26T00:00:00+02:00 2017-10-01T00:00:00+02:00 Mort Yao So I was trying to make some sense of the relevance between language and logic these days, and here comes my rendering of some fundamental issues in the philosophy of language. This should not be taken as a comprehensive summary on the topic, but hopefully it makes clearer how the translation from a natural language to a formal (e.g., first-order) logic can go unambiguously (or ambiguously). Moreover, each subtopic of notes is followed by a reading list introducing main concepts and debates, which are the “real fruits” here.

Inspired by Wittgenstein’s theory of “Meaning as Use”, it is fair to say that when we use a word, we just mean it. The meaningful use of a word and a mention of it must be distinguished. Quotation, as the canonical device of making the distinction, is central to debates on this topic. Davidson argued that even quoted words can be both a use and a mention. Quine’s invention of quasi-quotations (as an alternative quotational device) is of technical importance, and this notion has found its use in inductive definitions of formal logic, denotational semantics, metaprogramming in Lisp and string interpolations in many practical programming languages.

As firstly noticed by Frege, the semantics of words embraces two differently supporting aspects: sense and reference. Referring expressions such as descriptions and proper names, may both be denoted using a descriptivist theory (proposed by Russell); the Russellian view of names, however, was criticized by Kripke in favor of a many-worlds causal view.

On quotation. In the following text, we will utilize English quotation marks intensively. Quoted text is used as a literal mention of things whatever they enclose. By contrast, unquoted text is used as whatever its meaning is. For example,

I am who I am.

but

“I” is a word. “I” is not who I am.

(This issue will be discussed further in “Use-mention distinction”.)

Statement. A statement is defined as a sentence that says something is true. The notions of English words “sentence” and “true” are clear in this informal context, and they all have their naïve meanings. “Something” is just an object (or objects) that the statement talks about.

Metalanguage and object language. To make a statement about something, we may use either a formal or a natural language. When that “something” is some language (or languages), we refer to it as the object language(s) that we are talking about and the language we are using to make the very statement as the metalanguage. For example,

The German phrase “ein weißes Pferd ist weiß” means “a white horse is white” in English.

Here we are talking about the isomorphic meanings of two sentences in German and in English (as our object languages), and English is the metalanguage that we use to make the statement. It doesn’t matter which metalanguage we use and whether the metalanguage is identical to one of the object languages, as long as it is capable of representing the semantics we want; The following statement (using Esperanto as the metalanguage) has the same meaning as above:

La germana frazo “ein weißes Pferd ist weiß” signifas “a white horse is white” en la angla.

On the application of metalanguages and object languages in mathematical logic, see:

Ambiguity in interpretations. It might be tempting to just use a natural language as our metalanguage, but ambiguity could arise due to the sloppy nature of any human languages. Consider the following statement in Chinese:

(A white horse is not a horse.)

The argument is that the intension of “white horse” consists of the properties of being white and being horse-like, while the intension of “horse” consists merely of the property of being horse-like, hence conceptually “white horse” and “horse” are not identical to each other. However, if we interpret “是 (be)” as “be a kind of” thus “非 (be not)” as “be not a kind of”, as in the usual context, we could immediately tell that it is absurd to say:

(A white horse is not a kind of horses.)

Natural languages can be vague, however we are almost always in need of a definite interpretation when studying the semantics (of whatever language we like). It is most advisable to employ a formal language so that we don’t fall into dialectic traps like this. Consider the formal language of set theory, where we define a relation $$P_H$$ (“being horse-like”) such that $$P_H x$$ if and only if $$x \in P_H$$, and a relation $$P_W$$ (“being white”) such that $$P_W x$$ if and only if $$x \in P_W$$. Then we can write the first meaning of the informal statement sloppily, but unambiguously as: $\{ x \in P_H : x \in P_W \} \neq P_H$ which is true if and only if there is at least one $$x'$$ such that $$x' \in P_H \land x' \not\in P_W$$, so that $$x' \not\in \{ x \in P_H : x \in P_W \}$$ but $$x' \in P_H$$. There is a plausible argument by nature, since we are easily convinced that “There is a horse that is not white”.

Exercise 8.1. Rewrite $$\{ x \in P_H : x \in P_W \} \neq P_H$$ formally (i.e., without using set comprehension), in the first-order language of set theory containing only $$\in$$ as a predicate symbol.

The second meaning of the statement can be unambiguously written as: $\forall x ((x \in P_H \land x \in P_W) \to x \not\in P_H)$ which is true if and only if $$\forall x (\lnot (x \in P_H \land x \in P_W))$$, i.e., “There is no such thing as a white horse”; or even $$\forall x (x \not\in P_H)$$, i.e., “Nothing is a horse”. Both are ungrounded beliefs contradicting our empirical evidence.

Ambiguity in referring expressions. Difficulties in translating from a natural language to a formal language often arise in unintended ways. Consider again, the statement:

A white horse is not a kind of horses.

Doe the mentioned “white horse” refer to any arbitrary object that simply holds the properties of being a white horse, or a very specific white horse that one happens to find to be not a kind of horses? This distinction is significant because in the latter case, we have to formulate the statement using an existential quantification instead of the universal quantification: $\exists x ((x \in P_H \land x \in P_W) \land x \not\in P_H)$ that gives us a third meaning of the statement “白馬非馬”.

Exercise 8.2. (1) Can it be the case that the sentence $$\exists x ((x \in P_H \land x \in P_W) \land x \not\in P_H)$$ is true? (2) Why don’t we formulate its meaning as $$\exists x ((x \in P_H \land x \in P_W) \to x \not\in P_H)$$?

Descriptions and names. To eliminate any potential ambiguity when referring to something in a natural language, we firstly review three kinds of referring expressions:

1. Demonstratives. Examples: “this”; “this horse”; “that white horse”.
2. Descriptions.
• Indefinite descriptions. Examples: “a white horse”.
• Definite descriptions. Examples: “the white horse”.
3. Proper names (names that uniquely identify their referents). Examples: “Princess Celestia”; “Rainbow Dash”.

A referring expression (singular term) picks out a particular object (or some particular objects) that the sentence talks about. The demonstratives are clearly context sensitive, so for now we will consider only descriptions and proper names for the sake of unambiguity.

For indefinite descriptions, consider the following sentence:

A white horse is white.

Based on our preceding discussion, there are two different ways of formulating it:

(If there is a white horse, then it is white.) $\forall x ((x \in P_H \land x \in P_W) \to x \in P_W)$

and

(There is a white horse which is white.) $\exists x ((x \in P_H \land x \in P_W) \land x \in P_W)$

N.B. The latter (and weaker) formulation is proposed by Bertrand Russell. 

For definite descriptions,

The white horse is white.

How do we understand its semantics? Russell proposed the following formulation:

(There is exactly one white horse which is white.) $\exists! x ((x \in P_H \land x \in P_W) \land x \in P_W)$

Exercise 8.3. Rewrite the above sentence using the existential quantifier.

Russell’s theory of descriptions can lead to justified denials of some obviously ridiculous statements. For example, the sentence

The invisible pink unicorn is white.

is puzzling, since we have no idea what an “invisible pink unicorn” would be like. Most of us would agree that there is no such thing as an invisible pink unicorn. Can the above statement be true then? By Russell’s formulation of definite descriptions, we say that this is equivalent to

There is exactly one invisible pink unicorn which is white.

Then we can immediately assert that it is not true, given our belief that no invisible pink unicorn really exists.

As an alternative view of the theory of descriptions, Gottlob Frege suggested that a sentence like above makes use of a definite description that fails to refer (as “invisible pink unicorn” does not refer to anything), thus it does not express a proposition and has no truth value. 

For proper names, we could say things like

Princess Celestia is white.

It has been disputed, however, whether a descriptivist theory can be applied on proper names. Russell argued that most names have some descriptive meanings, e.g., “Princess Celestia” can be descriptively understood as “The benevolent ruler of Equestria”. Modern causalists such as Saul Kripke, criticized the descriptivist theory , for that a name need not be a uniquely identifying description so as to refer to an object; rather, there must be a causal chain that makes the name a reference. For example, we know that the name “Princess Celestia” and the description “The benevolent ruler of Equestria” refer to the same referent because she defeated King Sombra and took over Equestria, or more realistically, the story-writer let her be so. However, in an alternate world where King Sombra still rules Equestria, the name “Princess Celestia” would not have the same referent as “The benevolent ruler of Equestria” (in fact the latter term would actually refer to nothing), but “Princess Celestia” will always be Princess Celestia; that is, a proper name is a rigid designator of objects, while a definite description may refer to very different things due to varying causality in different worlds. Kripke’s causal theory of reference suggested that, a name refers uniquely to the same object in every possible world, regardless of any particular facts about it. By contrast, the descriptivist view of a name may fail to capture its referent consistently in all worlds.

Sense and reference. As per discussion on descriptions and names, it is essential to make a distinction between a sense and a reference (this distinction is attributed to Frege’s 1892 work “Über Sinn und Bedeutung” (On Sense and Reference)). A reference denotes a particular object in the world that the term applies, while a sense denotes the way in which the term presents, regardless of what the actual referent is (or whether there is a referent). For example, the description “The invisible pink unicorn” clearly makes sense, but it has no reference in any possible physical world. As a second example, the proper name “Princess Celestia” and the description “The benevolent ruler of Equestria” have different senses (i.e., “Princess Celestia” means Princess Celestia, “The benevolent ruler of Equestria” means the benevolent ruler of Equestria, and these two meanings are never the same), however, they do refer to the same thing in this world.

It is worth noting that senses may not be injectively mapped to words or phrases, as a word can present different senses in different contexts. For example, the word “structure” has a meaning in mathematical logic, and it also has a meaning in abstract algebra. Despite that their senses are similar in essence, they apply to different domains of mathematics and should not be taken as identical. They can, nonetheless, have one common reference when we talk about a formal language $$\mathcal{A}$$ of algebra, studying the theory of a particular algebraic structure (e.g., abelian group $$(A, \cdot)$$) thus it is also a structure $$\mathfrak{A}$$ that we assign to our language $$\mathcal{A}$$.

On a first introduction to descriptions, proper names, Fregean-Russellian descriptivist theory and Kripke’s causal theory, see Wikipedia:

Further reading on descriptions, names and the issue of reference:

On Frege’s innovative distinction between sense and reference:

• Gottlob Frege, “On Sense and Reference.” [PDF] (English translation of “Über Sinn und Bedeutung”)
• Michael Dummett, “Frege’s Distinction Between Sense and Reference.”

Russell’s original essay on his theory of descriptions:

Kripke’s objections to descriptivist theories of proper names:

• Saul Kripke, “Naming and Necessity.” [PDF]

Use-mention distinction. When trying to understand the semantics of a natural language, it is also critical to make the distinction between using a word (or phrase) and mentioning it. Consider again, the following sentence:

The white horse is white.

This sentence is true because a white horse is indeed, white (in the conventional interpretation). Furthermore,

“The white horse” is a definite description.

This sentence is also true because the phrase “the white horse” is well defined with a meaning that refers to the collection of horses that are white. However, it does not make much sense to say:

The white horse is a definite description.

Since a white horse is just a white horse, not a word or phrase with meanings. And we would also agree that

“The white horse” is white.

is confusing in some way, because the quotation marks suggest that “the white horse” may not have the intended meaning, i.e., it may appear as a literal mention of the English phrase “white horse” instead.

It is worth noting that though quotation marks are often an indication of mention while unquoted text is simply used, as in the above example, it is not always the case. Donald Davidson argued that “mixed quotation” exists so that quoted words can be simultaneously used and mentioned. 

Quasi-quotation. While quotation marks can be used for denoting a mention, a practical issue is that when mentioning something (e.g., a definition), we sometimes need to use the reference of a part of the mention inside it. Recall that when defining a first-order language in Mst. #7, we made claims like

If $$\psi$$ is a well-formed formula, $$(\lnot\psi)$$ is a well-formed formula.

Note that the mention is implicit here, without any extra meta-symbols. Can we just make the definition using quotation marks?

If $$\psi$$ is a well-formed formula, “$$(\lnot\psi)$$” is a well-formed formula.

It is immediately clear that the above definition does not fit our intended purpose: Let the open term (in our metalanguage) $$\psi$$ be the formula (in the object language) “$$(\alpha \to \beta)$$”, then what we are saying basically is

If “$$(\alpha \to \beta)$$” is a well-formed formula, “$$(\lnot\psi)$$” is a well-formed formula.

This sentence does not really make an inductive definition as intended. The problem is that the symbol “$$\psi$$” in “‘$$(\lnot\psi)$$’” is not interpolated as what its use is (i.e., “$$(\alpha \to \beta)$$”), but only appears as a literal Greek letter in the mention. To resolve this issue, W. V. O. Quine introduced the meta-symbols called quasi-quotation , so that it is technically feasible to make inductive definitions like this:

If $$\psi$$ is a well-formed formula, ⌜$$(\lnot\psi)$$⌝ is a well-formed formula.

Then let $$\psi$$ be “$$(\alpha \to \beta)$$”, ⌜$$(\lnot\psi)$$⌝ becomes “$$(\lnot (\alpha \to \beta))$$”. We have that

If “$$(\alpha \to \beta)$$” is a well-formed formula, “$$(\lnot (\alpha \to \beta))$$” is a well-formed formula.

just as intended.

On the importance of use-mention distinction:

On quotation:

• W. V. O. Quine, Mathematical Logic, chap. 1.5. (Statements about Statements)
• Herman Cappelen and Ernest Lepore, “Quotation,” Stanford Encyclopedia of Philosophy, https://plato.stanford.edu/entries/quotation/.
• Donald Davidson, “Quotation.”

On the practical device of quasi-quotation:

## What’s coming next

The use-mention distinction and the auxiliary device of quasi-quotation are philosophically essential not only for the definition of formulas/sentences in formal languages, but also for Alfred Tarski’s semantic theory of truth and models. Deeper debates on the essence of truth often involve the correspondence theory, the deflationary theory and the coherence theory. It is also informing to know how the liar’s paradox and Quine’s paradox can arise from direct or indirect self-referential statements, and what Tarski’s undefinability theorem implies. Michael Dummett’s characterization of the principle of bivalence (in classical logic) as realism, together with the defence of intuitionism (an anti-realism), are also highly relevant for understanding the philosophical basis of modern logic.

## Solutions to exercises

Exercise 8.1. $$\{ x \in P_H : x \in P_W \} \neq P_H$$ is just $\lnot ((x \in P_H \land x \in P_W) \leftrightarrow x \in P_H)$ which is just $((\lnot (x \in P_H \to (\lnot x \in P_W))) \to x \in P_H) \to (\lnot (x \in P_H \to (\lnot (x \in P_H \to (\lnot x \in P_W)))))$

Exercise 8.2. (1) Notice that $$\lnot ((x \in P_H \land x \in P_W) \land x \not\in P_H)$$ is a tautology, by generalization we get that $$\forall x (\lnot ((x \in P_H \land x \in P_W) \land x \not\in P_H))$$. By double negation $\lnot (\lnot \forall x (\lnot ((x \in P_H \land x \in P_W) \land x \not\in P_H)))$ which is just $\lnot \exists x ((x \in P_H \land x \in P_W) \land x \not\in P_H)$ so $$\exists x ((x \in P_H \land x \in P_W) \land x \not\in P_H)$$ cannot be satisfied by any interpretation. (or: it is never true.)

(2) $$\exists x ((x \in P_H \land x \in P_W) \to x \not\in P_H)$$ is equivalent to $\lnot \forall x (\lnot ((x \in P_H \land x \in P_W) \to x \not\in P_H))$ Assume that there is an $$x_c$$ such that $$x_c \not\in P_H$$, then $$(x_c \in P_H \land x_c \in P_W) \to x_c \not\in P_H$$ would be true.

Notice that given $$\Gamma; \forall x (\lnot ((x \in P_H \land x \in P_W) \to x \not\in P_H))$$, by substituting $$x$$ with $$x_c$$ we get $$\lnot ((x_c \in P_H \land x_c \in P_W) \to x_c \not\in P_H)$$, but that would be inconsistent with our previous assumption. Thus by Reductio ad Absurdum, $$\Gamma \vdash \lnot \forall x (\lnot ((x \in P_H \land x \in P_W) \to x \not\in P_H))$$.

Now that we can be easily convinced that our assumption holds, that is, there exists something that is not a horse. So this formulation would be trivially true!

Exercise 8.3. $\exists x (((x \in P_H \land x \in P_W) \land \forall y ((y \in P_H \land y \in P_W) \to y = x)) \land x \in P_W)$

## References

 P. Ludlow, “Descriptions,” in The stanford encyclopedia of philosophy, Fall 2013., E. N. Zalta, Ed. https://plato.stanford.edu/archives/fall2013/entries/descriptions/; Metaphysics Research Lab, Stanford University, 2013.

 P. F. Strawson, “On referring,” Mind, vol. 59, no. 235, pp. 320–344, 1950.

 S. A. Kripke, “Naming and necessity,” in Semantics of natural language, Springer, 1972, pp. 253–355.

 D. Davidson, “Quotation,” Theory and Decision, vol. 11, no. 1, pp. 27–40, Mar. 1979 [Online]. Available: https://doi.org/10.1007/BF00126690

 W. V. Quine, Mathematical logic. Cambridge: Harvard University Press, 1940.

]]>
The Logical Implication tag:www.soimort.org,2017:/mst/7 2017-09-06T00:00:00+02:00 2017-09-09T00:00:00+02:00 Mort Yao Prologue. This is the summary of a set of notes I took in KU’s Introduction to Mathematical Logic course, that I should have finished months ago but somehow procrastinated until recently. I basically followed Enderton’s A Mathematical Introduction to Logic book (with all its conventions of mathematical notations). A few issues to address in advance:

• These notes are mainly about first-order logic. Propositional logic is assumed to be a prerequisite. The important thing to know is that its semantics (as a Boolean algebra) can be fully described by a truth table, which is obviously finite and mechanizable.
• Naïve set theory is also a prerequisite, that is, one must accept the notion of sets unconditionally: A set is just a collection of objects, and it must exist as we describe it. We are convinced that no paradoxical use of sets will ever emerge, and that no “naïve set” could be a proper class. The formal notion of sets is built upon first-order logic, but without some informal reference to sets we can’t even say what a first-order logic is.
• A set $$S$$ is said to be “countable” iff there exists a bijection $$f : \mathbb{N} \to S$$; otherwise it is “uncountable”.
• One must be willing to accept the validity of mathematical induction on natural numbers (or the well-ordering principle for natural numbers). Again, to formalize the induction principle we need the set theory, but we don’t have it in the first place (until we introduce a first-order logic).
• On the model-theoretic part: Notes on definability and homomorphisms are omitted from this summary, not to mention ultraproducts and the Löwenheim-Skolem theorem. Model theory is a big topic in its own right and deserves an individual treatment, better with some algebraic and topological contexts. (Note that the homomorphism theorem is used in proving the completeness theorem; also the notion of definability is mentioned in the supplementary sections.)
• On the proof-theoretic part: Notes on some metatheorems are also omitted, as they are a purely technical aspect of a Hilbert-style deductive system. (One may find it convenient to prove an actual theorem with more metatheorems, but they are really not adding any extra power to our system.)
• The relation between logic and computability (i.e., Gödel’s incompleteness theorems) is not discussed.
• But the meanings of “decidable” and “undecidable” are clear from the previous notes Mst. #6 (from a computer scientist’s perspective).
• Axiomatic set theory, which is another big part of the course, is not included in these notes. (Maybe I’m still too unintelligent to grasp the topic.) But it is good to know:
• First-order logic has its limitation in definability (i.e., it’s not capable of ruling out non-standard models of arithmetic), until we assign to it a set-theoretic context. So set theory is often considered a foundation of all mathematics (for its expressive power).
• Axiom of Choice (AC) causes some counter-intuitive consequences, but it was shown to be consistent with ZF axioms (Gödel 1938). And there are models of ZF$$\cup\lnot$$AC so well as ZF$$\cup$$AC.
• Constructivists tend to avoid AC in mathematics. However, Henkin’s proof of the completeness theorem in first-order logic assumes AC (Step II in finding a maximal consistent set). (thus it is a non-constructive proof!1)
• Intuitionistic logic is not in the scope of these course notes. (And most logic books, including the Enderton one, are not written by constructive mathematicians.) Basically, in a Hilbert-style system, a classical logic would admit all tautologies in propositional logic as Axiom Group 1. Intuitionistic logic, in contrast, rejects those tautologies that are non-constructive in a first-logic setting.

• First-order language: A formal language consisting of the following symbols:
1. Logical symbols
• Parentheses: $$($$, $$)$$.
• Connective symbols: $$\to$$, $$\lnot$$.
• Variables: $$v_1$$, $$v_2$$, …
• Equality symbol (optional 2-place predicate symbols): $$=$$.
2. Parameters (non-logical symbols; open to interpretation)
• Universal quantifier symbol: $$\forall$$.
• Predicate symbols (relation symbols): $$P_1$$, $$P_2$$, …
• Constant symbols (0-place function symbols): $$c_1$$, $$c_2$$, …
• Function symbols: $$f_1$$, $$f_2$$, …
• When specifying a concrete first-order language $$\mathcal{L}$$, we must say (i) whether the quality symbol is present; (ii) what the parameters are.

Remark 7.1. (Language of propositional logic) The language of propositional logic may be seen as a stripped form of first-order languages, in which parentheses, connective symbols and sentential symbols (the only parameters; may be treated as 0-place predicate symbols) are present. Intuitively, that language might seem too weak to encode our formal reasoning in all kinds of mathematics and many practical areas, so to speak.

• Terms and formulas
• An expression is a finite sequence of symbols (i.e., a finite string). Among all expressions, we are interested in two kinds of them which we refer to as terms and formulas.
• A term is either:
• a single variable or constant symbol; or
• $$f t_1 \cdots t_n$$, where $$f$$ is a $$n$$-place function symbol, and every $$t_i$$ $$(1 \leq i \leq n)$$ is also a term.
• A formula (or wff, well-formed formula) is either:
• $$P t_1 \cdots t_n$$, where $$P$$ is a $$n$$-place predicate symbol (or the equality symbol $$=$$), and every $$t_i$$ $$(1 \leq i \leq n)$$ is a term; or
• one of the following forms:
• $$(\lnot \psi)$$, where $$\psi$$ is also a formula;
• $$(\psi \to \theta)$$, where $$\psi$$ and $$\theta$$ are also formulas;
• $$\forall v_i \psi$$, where $$v_i$$ is a variable and $$\psi$$ is also a formula.
• A variable may occur free in a formula. A formula without any free variable is called a sentence.

Remark 7.2. (Metatheory and philosophical concerns) A first-order expression, as a finite sequence (also called a tuple), may be defined in terms of ordered pairs in axiomatic set theory. But we will not appeal to set theory in our first definitions of expressions in logic. (So far we have no notion about what a “set” formally is!)

A further concern is whether our definitions of terms and formulas are well established, that is, since we are defining the notions of terms and formulas inductively, would it be possible that there is a certain term or formula that is covered by our recursive definition, but can never be actually built using these operations? To show that first-order terms/formulas are well-defined, a beginner might try to prove these induction principles by mathematical induction on the complexities of terms/formulas, but that would rely on the fact that the set of natural numbers $$\omega$$ is well-ordered so that we can apply induction on numbers; to justify things like this, it is essential to use set theory or second-order logic, which we don’t even have until we define a first-order logic. Thus, unavoidable circularity emerges if we try to look really fundamentally.

For now, we must appeal to a metatheory that we can easily convince ourselves by intuition, so that we will accept these induction principles and the notion of “naïve sets” (or collections, if we don’t want to abuse the formal term of sets too much). Notwithstanding, I believe that a prudent person can bootstrap theories like this without drawing in any inconsistency.

Remark 7.3. (Context freedom, unique readability and parentheses) Since the formations of first-order terms and formulas make use of context-free rules, one familiar with formal languages and automata theory might ask, “Are the set of terms/formulas context-free languages?” Generally they are not, since our set $$V$$ of variables (samely for predicate and function symbols) could be infinitely (or even uncountably) large, but a context-free grammar requires that every set must be finite. However, in our first-order language $$\mathcal{L}$$, if these symbols can be effectively decidable, then there is an algorithm that accepts terms or formulas (or parses them). Furthermore, such parses are guaranteed to be unique, as shown by the Unique Readability Theorems in Enderton p. 105ff. Indeed, the inclusion of parentheses in our first-order language enables us to write any formula unambiguously. If we leave out all the parentheses, does a formula like $$\forall x P x \to \lnot Q x$$ mean $$(\forall x P x \to (\lnot Q x))$$ or $$\forall x (P x \to (\lnot Q x))$$? An alternative syntax would be to use logical connectives in a prefix manner, e.g., $$\to \forall x P x \lnot Q x$$ and $$\forall x \to P x \lnot Q x$$, but that is hardly as comprehensible as our chosen syntax.

Remark 7.4. (Abbreviations on notation) Why don’t we have the existential quantifier $$\exists$$ and some other connectives such like $$\land$$, $$\lor$$ and $$\leftrightarrow$$, in our language? Because any first-order formula that makes use of these symbols can be seen as syntactical abbreviations and should be rewritten using $$\forall$$, $$\to$$ and $$\lnot$$, as will be shown. A deeper reason is that $$\{ \to, \lnot \}$$ is a functionally complete set of Boolean algebraic operators that is sufficient to express all possible truth tables in propositional logic. On the other hand, a formula like $$\exists x \varphi$$ is just $$(\lnot \forall x (\lnot \varphi))$$, following from our understanding of what an existential quantification is.

Remark 7.5. (Sentences and truth values) In propositional logic, we don’t generally know whether a formula evaluates to true until every sentential symbol is assigned a truth value. (Sometimes we can tell the truth value with a little less information than what is required, if we apply a so-called short-circuit evaluation strategy, e.g., if $$A_1$$ is false then we immediately know $$(A_1 \to A_2)$$ is true, or if $$A_2$$ is true then $$(A_1 \to A_2)$$ is also true. But it is not the general case, and one should expect to evaluate both $$A_1$$ and $$A_2$$ before getting the answer.) Similarly, in a first-order logic, every free variable needs to have a definite assignment so as to give rise to the truth value of a formula. This is done by specifying a function $$s$$ (where $$\operatorname{dom} s = V$$ is the set of all variables) as the assignment of variables, and when applying $$s$$ to a formula $$\varphi$$ we get $$\varphi[s]$$, which is a sentence that has a definite meaning (i.e., no variable occurs free). Note that the assignment of variables alone is not sufficient to determine the truth of a sentence – For example, $$(P x y \to P x f y)\ [s(x \,|\, 0)(y \,|\, 0)]$$ is a sentence since no variable occurs free in it, but we can’t decide whether it is true because we don’t know what the predicate $$P$$ and the function $$f$$ are. If we say, $$P$$ is the arithmetic “less-than” relation and $$f$$ is the successor function $$f : x \mapsto x + 1$$, then we can tell that this is a true sentence (in fact $$P x y$$ is false as $$0 < 0$$ is false, but $$P x f y$$ is true as $$0 < 1$$ is true, so the overall sentence as a conditional is true). We could write $$P$$ and $$f$$ as $$<$$ and $$S$$, but the conventional interpretation of these symbols should not be taken for granted as if every symbol comes with an inherited meaning – They don’t, until we give them meanings.

• Structures: A structure (or an interpretation) $$\mathfrak{A}$$ assigns a domain $$|\mathfrak{A}|$$ to the language $$\mathcal{L}$$, and:
• Every predicate symbol is assigned a relation $$P^\mathfrak{A} \subseteq |\mathfrak{A}|^n$$.
• Every function symbol is assigned a function $$f^\mathfrak{A} : |\mathfrak{A}|^n \to |\mathfrak{A}|$$.
• Every constant symbol is assigned a member $$c^\mathfrak{A}$$ of the domain $$\mathfrak{A}$$.
• The universal quantifier symbol $$\forall$$ is assigned the domain $$|\mathfrak{A}|$$. (So it makes sense to say: “for all $$x$$ in $$|\mathfrak{A}|$$…”)
• Satisfaction and truth
• Given a structure $$\mathfrak{A}$$ and an assignment of variables $$s : V \to |\mathfrak{A}|$$, we define an extension function $$\bar{s} : T \to |\mathfrak{A}|$$ (where $$T$$ is the set of all terms) that maps any term into the domain $$|\mathfrak{A}|$$.
• With the term valuation $$\bar{s}$$, we define recursively that a structure $$\mathfrak{A}$$ satisfies a formula $$\varphi$$ with an assignment $$s$$ of variables, written as $\models_\mathfrak{A} \varphi[s]$ If this is not the case, then $$\not\models_\mathfrak{A} \varphi[s]$$ and we say that $$\mathfrak{A}$$ does not satisfy $$\varphi$$ with $$s$$.
• For a sentence $$\sigma$$ (which is just a formula with no free variables), the assignment of variables $$s : V \to |\mathfrak{A}|$$ does not make a difference whether $$\varphi$$ is satisfied by $$\mathfrak{A}$$. So if $$\models_\mathfrak{A} \sigma$$, we say that $$\sigma$$ is true in $$\mathfrak{A}$$ or that $$\mathfrak{A}$$ is a model of $$\sigma$$.
• Satisfiability of formulas: A set $$\Gamma$$ of formulas is said to be satisfiable iff there is a structure $$\mathfrak{A}$$ and an assignment $$s$$ of variables such that $$\models_\mathfrak{A} \Gamma[s]$$.
• Logical implication and validity
• In a language $$\mathcal{L}$$, we say that a set $$\Gamma$$ of formulas logically implies a formula $$\varphi$$, iff for every structure $$\mathfrak{A}$$ of $$\mathcal{L}$$ and every assignment $$s : V \to |\mathfrak{A}|$$ such that $$\models_\mathfrak{A} \gamma [s]$$ (for all $$\gamma \in \Gamma$$), it also holds that $$\models_\mathfrak{A} \varphi [s]$$ (note that $$\varphi$$ is not required to be a sentence): $\Gamma \models \varphi$ This is the analogue of tautological implication in propositional logic: $$A \Rightarrow B$$, iff every truth assignment that satisfies $$A$$ also satisfies $$B$$.
• If the empty set logically implies a formula, i.e., $$\emptyset \models \varphi$$, we write this fact simply as $$\models \varphi$$ and say that $$\varphi$$ is valid. A formula is valid iff given any assignment of variables, it is true in every structure; this is the analogue of tautologies in propositional logic: something that is considered “always true”.

Remark 7.6. (Dichotomy of semantic truthness and the liar’s paradox) It should be made clear from the definition that given a structure and an assignment, either $$\models_\mathfrak{A} \varphi[s]$$ (exclusive) or $$\not\models_\mathfrak{A} \varphi[s]$$, but not both! It follows from our intuition that a statement is either semantically true or false; and there is no third possibility.

A problem arises with self-referential terms, woefully: Assume that we have a first-order language $$\mathcal{L}$$ with a 1-place predicate symbol $$P$$, and the structure $$\mathfrak{A}$$ assigns it the domain $$|\mathfrak{A}| = \text{Formula}(\mathcal{L})$$, $$P$$ is interpreted as $$P^\mathfrak{A} = \{ \langle \sigma \rangle \,:\, \models \sigma \}$$, that is, $$\sigma \in P^\mathfrak{A}$$ iff $$\models \sigma$$. Let the sentence $$\tau$$ be $$(\lnot P x)$$ and the assignment $$s : V \to |\mathfrak{A}|$$ maps the variable $$x$$ to the sentence $$\tau$$, then is $$\tau[s]$$ true or false in $$\mathfrak{A}$$? If we take $$\tau[s]$$ as true, that is, $$(\lnot P \tau)$$ is true, then $$P \tau$$ must be false, so $$\tau \not\in P^\mathfrak{A}$$ thus $$\not\models \tau$$. If we take $$\tau[s]$$ as false, that is, $$(\lnot P \tau)$$ is false, then $$P \tau$$ must be true, so $$\tau \in P^\mathfrak{A}$$ thus $$\models \tau$$. This is known as the classical liar’s paradox. One possible way to resolve this (given by Alfred Tarski) is by disabling impredicativity in our structures; more precisely, one can define a semantic hierarchy of structures that allows us to predicate truth only of a formula at a lower level, but never at the same or a higher level. This matter is far beyond the scope of this summary, but the important lesson to learn here is that it is generally a bad idea to allow something both true and false in our semantics; it would put our enduring effort to cumulate all “mathematical truths” into void.

Remark 7.7. (Decidability of truth/validity) In propositional logic, it is easy to see that given a truth assignment of sentential symbols, every formula can be decided for its truth or falsehood. Moreover, even without any truth assignment, one can enumerate a truth table to find out whether a given formula is a tautology. Truth and validity are decidable in propositional logic. However, this is often not the case in first-order logic: In order to decide whether a sentence is true, one needs to find the truth values of all prime formulas (i.e., formulas like $$P t_1 \cdots t_n$$ and $$\forall v_i \psi$$) first, but the domain $$|\mathfrak{A}|$$ may be an (uncountably) infinite set, thus makes it impossible to mechanically check the universal quantification for all members; moreover, the functions used in building terms may not be Turing-computable at all. To decide the validity of a sentence, we have to check its truth in all structures of the language (whose set may also be uncountably large), and that is an even more impossible task.2

If semantic truth/validity is generally undecidable, how do we say that some formula is true in a predefined structure? Well, we can’t, in most general cases, since an infinite argument of truth is a useless argument (you can’t present it to someone / some Turing machine, as no physical device is capable of handling such an infinite object). Fortunately, there is a feasible way to say something is true, without appealing to any specific structures (that may give rise to unwanted undecidability), and that is called a formal deduction (also called a proof, expectedly).

• Formal deduction: Given a set $$\Lambda$$ of formulas (axioms), a set of rules of inference, we say that a set $$\Gamma$$ of formulas (hypotheses) proves another formula $$\varphi$$, or $$\varphi$$ is a theorem of $$\Gamma$$, iff there is a finite sequence (called a deduction of $$\varphi$$ from $$\Gamma$$) $$\langle \alpha_0, \dots, \alpha_n \rangle$$ such that
1. $$\alpha_n$$ is just $$\varphi$$.
2. For each $$0 \leq k \leq n$$, either
• $$\alpha_k \in \Gamma \cup \Lambda$$; or
• $$\alpha_k$$ is obtained by a rule of inference from a subset of previous formulas $$A \subseteq \bigcup_{0 \leq i < k} \alpha_i$$. $\Gamma \vdash \varphi$
• Formal systems and proof calculi: Different deductive systems made different choices on the set of axioms and rules of inferences. A natural deduction system may consist of no axiom but many rules of inference; on the contrary, a Hilbert-style system (named obviously after David Hilbert) uses many axioms but only two rules of inference. A proof calculus is the approach to formal deduction in a specified system, and as it is called a “calculus”, any derivation in it must contain only a finite number of steps so as to be calculable (by a person or by a machine).
• We will use a Hilbert-style deductive system here:
• Rules of inference
1. Modus ponens $\frac{\Gamma \vdash \psi \quad \Gamma \vdash (\psi \to \varphi)}{\Gamma \vdash \varphi}$
2. Generalization $\frac{\vdash \theta}{\vdash \forall x_1 \cdots \forall x_n \theta}$ (where $$\theta \in \Lambda$$.)
• Logical axioms: In a deductive system, axioms are better called logical axioms, to stress the fact that they are logically valid formulas in every structure, i.e., that their validity is not open to interpretation.
1. (Tautology) $$\alpha$$, where $$\models_t \alpha$$. (take sentential symbols to be prime formulas in first-order logic)
2. (Substitution) $$\forall x \alpha \to \alpha^x_t$$, where $$t$$ is substitutable for $$x$$ in $$\alpha$$.
3. $$\forall x (\alpha \to \beta) \to (\forall x \alpha \to \forall x \beta)$$.
4. $$\alpha \to \forall x \alpha$$, where $$x$$ does not occur free in $$\alpha$$.
5. $$x = x$$.
6. $$x = y \to (\alpha \to \alpha')$$, where $$\alpha$$ is atomic and $$\alpha'$$ is obtained from $$\alpha$$ by replacing $$x$$ in zero or more places by $$y$$.

Remark 7.8. (Validity of logical axioms) It should be intuitively clear that all logical axioms are convincing, and that their validity can be argued without appealing to any specific model. In particular, for an axiom $$\theta \in \Lambda$$, there is $$\vdash \theta$$; we must be able to argue (in our meta-language) that $$\models \theta$$, so that we can be convinced that our deductive system is a sound one. Remember that for any formula $$\varphi$$, either $$\models \varphi$$ or $$\not\models \varphi$$ (which is just $$\models (\lnot \varphi)$$). If a proof of $$\theta$$ (not as a formal deduction, but as an argument in our meta-language) does not even imply $$\models \theta$$, that would be very frustrating.

Remark 7.9. (Tautological implication, logical implication and deduction) If $$\Gamma \models_t \varphi$$ (i.e., $$\varphi$$ is tautologically implied by $$\Gamma$$ in propositional logic), we can argue that $$\Gamma \models \varphi$$ when replacing sentential symbols by prime formulas in first-order logic. In the special case that $$\Gamma = \emptyset$$, we are proving the validity of Axiom Group 1: $$\models_t \alpha \implies \models \alpha$$ (every tautology is valid). The converse does not hold though, since we have $$\models (\alpha \to \forall x \alpha)$$ (by Axiom Group 4), but $$\not\models_t (\alpha \to \forall x \alpha)$$ as $$\alpha$$ and $$\forall x \alpha$$ are two different sentential symbols (surely $$(A_1 \to A_2)$$ is not a tautology in propositional logic!).

It is worth noticing that even though $$\Gamma \models \varphi \not\implies \Gamma \models_t \varphi$$, we do have $$\Gamma \models \varphi \iff \Gamma \cup \Lambda \models_t \varphi$$. Intuitively, the set $$\Lambda$$ of logical axioms gives us a chance to establish truths about quantifiers and equalities (other than treating these prime formulas as sentential symbols that are too unrefined for our first-order logic). I haven’t done a proof of this, but I believe it should be non-trivial on both directions. Combining with Theorem 24B in Enderton p. 115, we get the nice result concluding that $\Gamma \vdash \varphi \iff \Gamma \cup \Lambda \models_t \varphi \iff \Gamma \models \varphi$ which entails both the soundness and the completeness theorems. It is basically saying that these three things are equivalent:

1. $$\Gamma$$ proves $$\varphi$$. (There is a formal deduction that derives $$\varphi$$ from $$\Gamma$$ and axioms $$\Lambda$$ in our deductive system; it is finite and purely syntactical, in the sense that it does not depend on any structure or assignment of variables.)
2. $$\Gamma$$ logically implies $$\varphi$$. (For every structure and every assignment of variables, $$\varphi$$ holds true given $$\Gamma$$. Of course, this is a semantical notion in the sense that it does involve structures and assignments of variables, which are infinite in numbers so it would be impossible for one to check this mechanically.)
3. $$\Gamma \cup \Lambda$$ tautologically implies $$\varphi$$. (We can reduce a first-order logic to propositional logic by adding logical axioms to the set of hypotheses, preserving all truths. For each prime formula, this is still a semantical notion for its truth value depends on structures / assignments of variables.)
• Soundness
1. $$\Gamma \vdash \varphi \implies \Gamma \models \varphi$$.
• Proof idea:
1. For $$\varphi \in \Lambda$$, show that every logical axiom is valid (Lemma 25A in Enderton p. 132ff.), that is, $$\models \varphi$$. Then trivially $$\Gamma \models \varphi$$;
2. For $$\varphi \in \Gamma$$, we have trivially $$\Gamma \models \varphi$$;
3. $$\varphi$$ is obtained by generalization on variable $$x$$ from a valid formula $$\theta$$. Since $$\models \theta$$ (if $$\theta$$ is an axiom, then this is already shown in Step 1; if $$\theta$$ is another generalization, then this can be shown by IH), for every structure $$\mathfrak{A}$$ and $$a \in |\mathfrak{A}|$$, $$\models_\mathfrak{A} \theta[s(x|a)]$$, then by definition we have $$\models_\mathfrak{A} \forall x \theta[s]$$. Therefore $$\models \forall x \theta$$;
4. $$\varphi$$ is obtained by modus ponens from $$\psi$$ and $$(\psi \to \varphi)$$. By IH we have $$\Gamma \models \psi$$ and $$\Gamma \models (\psi \to \varphi)$$. Show that $$\Gamma \models \varphi$$ using Exercise 1 in Enderton p. 99. (NB. the wording in the last line of Enderton p. 131, i.e., “follows at once”, seems too sloppy to me: we have not proved modus ponens semantically yet.)
• Consistency of formulas: A set $$\Gamma$$ of formulas is said to be consistent iff for no formula $$\varphi$$ it is the case that both $$\Gamma \vdash \varphi$$ and $$\Gamma \vdash (\lnot\varphi)$$.
• By the soundness theorem, an inconsistent set $$\Gamma$$ of formulas gives rise to both $$\Gamma \models \varphi$$ and $$\Gamma \models (\lnot\varphi)$$. As discussed before, it would throw our trust on mathematical truths into fire – we will have proved that some statement is both true and false!
1. If $$\Gamma$$ is satisfiable, then $$\Gamma$$ is consistent.
(a. and b. are equivalent representations of the soundness theorem.)
• Completeness
1. $$\Gamma \models \varphi \implies \Gamma \vdash \varphi$$.
2. If $$\Gamma$$ is consistent, then $$\Gamma$$ is satisfiable.
(a. and b. are equivalent representations of the completeness theorem.)
• Proof idea: We will prove first a weaker form of b., i.e., the completeness for a countable language $$\mathcal{L}$$. Let $$\Gamma$$ be a consistent set of formulas. We show that it is satisfiable.
1. Extend the language $$\mathcal{L}$$ with a countable set $$\bar{C}$$ of new constant symbols and get a new language $$\mathcal{L}_\bar{C}$$;
2. Given the set $$\Gamma$$ of $$\mathcal{L}$$-formulas, show that there is a set $$\bar\Gamma$$ of $$\mathcal{L}_\bar{C}$$-formulas that is consistent, complete, deductively closed and Henkinized, i.e., for every formula $$\exists x \varphi \in \Gamma$$ there is a “witness” constant $$\bar{c} \in \bar{C}$$ such that $$\varphi^x_\bar{c} \in \bar{\Gamma}$$;
3. Build a structure $$\mathfrak{A}_0$$ from $$\bar\Gamma$$ where $$|\mathfrak{A}_0|$$ is the set of terms of $$\mathcal{L}_\bar{C}$$. The assignment $$s$$ maps every variable to itself;
4. Define an equivalence relation $$E$$ on $$|\mathfrak{A}_0|$$: $$t E t' \iff t = t' \in \bar\Gamma$$. Show by induction that for any $$\mathcal{L}_\bar{C}$$-formula $$\varphi$$, $$\models_{\mathfrak{A}_0} \varphi^* [s] \iff \varphi \in \bar\Gamma$$ (where $$\varphi^*$$ is $$\varphi$$ with $$=$$ replaced by $$E$$ everywhere);
5. Show by the homomorphism theorem that $$\models_\mathfrak{A} \varphi[s] \iff \varphi \in \bar\Gamma$$ (where $$\mathfrak{A} = \mathfrak{A}_0 / E$$);
6. Restrict the structure $$\mathfrak{A}$$ (a model of $$\mathcal{L}_\bar{C}$$) to $$\mathcal{L}$$ by dropping all new constants $$\bar{C}$$. Then $$\Gamma$$ is satisfiable with $$\mathfrak{A}$$ and $$s$$ in $$\mathcal{L}$$.
• Compactness
1. $$\Gamma \models \varphi \implies$$ There is a finite $$\Gamma_0 \subseteq \Gamma$$ such that $$\Gamma_0 \models \varphi$$.
2. If every finite subset $$\Gamma_0$$ of $$\Gamma$$ is satisfiable, then $$\Gamma$$ is satisfiable.
(a. and b. are equivalent representations of the compactness theorem.)
• Proof idea: A simple corollary of soundness and completeness theorems.

Remark 7.10. (Soundness and completeness) The soundness and the completeness theorems together evidence the equivalence of consistency and satisfiability of a set of formulas, or that of validity and provability of a formula. The completeness theorem is by no means an obvious result; the first proof was given by Kurt Gödel in 19303, but the proof that we use today is given by Leon Henkin in 1949 , which easily generalizes to uncountable languages.

Remark 7.11. (Completeness and incompleteness) Note that the completeness theorem should not be confused with Gödel’s incompleteness theorems. Specifically, the completeness theorem claims that (unconditionally) every formula that is logically implied by $$\Gamma$$ is also deducible from $$\Gamma$$ (i.e., $$\Gamma \models \varphi \implies \Gamma \vdash \varphi$$), while the first incompleteness theorem claims that (under some conditions) some consistent deductive systems are incomplete (i.e., there is some formula $$\varphi$$ such that neither $$\Gamma \vdash \varphi$$ nor $$\Gamma \vdash (\lnot\varphi)$$). As is clearly seen, the incompleteness theorem is purely syntactical and matters for provability (or decidability, one might say). The aforementioned liar’s paradox, where our semantics raises a contradiction that neither $$\Gamma \models \varphi$$ nor $$\Gamma \models (\lnot\varphi)$$ reasonably holds, may be seen as a semantical analogue of the first incompleteness theorem.

## Equality is logical

The equality symbol $$=$$ is a logical symbol, in the sense that the equivalence relation it represents is not open to interpretation and always means what it’s intended to mean (i.e., “the LHS is equal to the RHS”). But if so, how do we say $1+1=2$ is a true sentence then? Can’t we just interpret the equality symbol as something else in a structure $$\mathfrak{A}$$ such that $$\models_\mathfrak{A} (\lnot 1+1=2)$$?

One reason is that in many first-order systems, functions (operations) are defined as axioms using equalities; we need a general way to say “something is defined as…” or just “something is…” There would be no better way of saying this rather than representing it as an equality, so we won’t have the hassle of interpreting a made-up relation in every model. Consider the language of elementary number theory, in the intended model $$\mathfrak{N} = (\mathbb{N}; \mathbf{0}, \mathbf{S}, +, \cdot)$$ of Peano arithmetic, the addition function is defined as a set of domain-specific axioms: \begin{align*} a + \mathbf{0} &= a &\qquad(1) \\ a + \mathbf{S} b &= \mathbf{S} (a + b) &\qquad(2) \end{align*} By Axiom Group 6 we have $$\mathbf{S0} + \mathbf{0} = \mathbf{S0} \to (\mathbf{S0} + \mathbf{S0} = \mathbf{S}(\mathbf{S0} + \mathbf{0}) \to \mathbf{S0} + \mathbf{S0} = \mathbf{S}\mathbf{S0})$$. By (1) $$\mathbf{S0} + \mathbf{0} = \mathbf{S0}$$. By (2) $$\mathbf{S0} + \mathbf{S0} = \mathbf{S}(\mathbf{S0} + \mathbf{0})$$. Applying modus ponens twice, we get $$\mathbf{S0} + \mathbf{S0} = \mathbf{S}\mathbf{S0}$$, which is the result we want (sloppily written as $$1+1=2$$).

The equality is a little special, since it is the most common relation with reflexivity, as justified by Axiom Group 5, i.e., $$x = x$$ for any variable $$x$$. We could exclude these from our logical axioms, but in many cases we would still need a reflexive relation symbol to denote equivalence (justified by some domain-specific axioms) otherwise. Technically, it would be convenient to just treat it as a logical symbol (together with the useful Axiom Groups 5 and 6). Note that though our logical axioms did not say anything about other properties like symmetry, antisymmetry and transitivity, they follow easily from Axiom Groups 5 and 6, in our deductive system:

Lemma 7.12. (Symmetry) If $$x = y$$, then $$y = x$$.

Proof. Given $$x = y$$. By Axiom Group 5 we have $$x = x$$. By Axiom Group 6 we have $$x = y \to (x = x \to y = x)$$. Applying modus ponens twice, we get $$y = x$$.

Lemma 7.13. (Transitivity) If $$x = y$$ and $$y = z$$, then $$x = z$$.

Proof. Given $$x = y$$, by symmetry it holds $$y = x$$. Also $$y = z$$. By Axiom Group 6 we have $$y = x \to (y = z \to x = z)$$. Applying modus ponens twice, we get $$x = z$$.

Lemma 7.14. (Antisymmetry) If $$x = y$$ and $$y = x$$, then $$x = y$$.

Proof. Trivial.

Note that if any partial order is definable on a structure, the equality symbol may not be indispensable in our language, e.g., consider the language of set theory, where the equivalence of two sets $$x = y$$ may be defined as $(\forall v (v \in x \to v \in y) \land \forall v (v \in y \to v \in x))$

## The limitation of first-order logic

Consider again the language of elementary number theory, in the intended model $$\mathfrak{N} = (\mathbb{N}; \mathbf{0}, \mathbf{S}, +, \cdot)$$ of Peano arithmetic, we have the set of all true sentences $$\text{Th}\mathfrak{N}$$.4 Now we add a new constant symbol $$c'$$ to our language, and extend our theory with a countable set of sentences $$\text{Th}\mathfrak{N} \cup \{ \underbrace{\mathbf{S} \cdots \mathbf{S}}_{n\ \text{times}} \mathbf{0} < c' \,:\, n \in \mathbb{N} \}$$ (Here we define $$x < y$$ as $$\lnot\forall z ((\lnot z = \mathbf{0}) \to (\lnot x + z = y))$$). Is there still a model for this extended theory?

For any given $$n \in \mathbb{N}$$, the sentence $$\underbrace{\mathbf{S} \cdots \mathbf{S}}_{n\ \text{times}} \mathbf{0} < c'$$ is clearly satisfiable (by simply taking $$c' = \mathbf{S} n$$). Then it is easily shown that every finite subset $$\Gamma_0 \subseteq \text{Th}\mathfrak{N} \cup \{ \underbrace{\mathbf{S} \cdots \mathbf{S}}_{n\ \text{times}} \mathbf{0} < c' \,:\, n \in \mathbb{N} \}$$ is satisfiable. By the compactness theorem (b.), $$\text{Th}\mathfrak{N} \cup \{ \underbrace{\mathbf{S} \cdots \mathbf{S}}_{n\ \text{times}} \mathbf{0} < c' \,:\, n \in \mathbb{N} \}$$ is also satisfiable. This means that we can construct a non-standard model of arithmetic $$\mathfrak{N}'$$ with an additional bizarre element (specifically $$c'_0$$) that turns out to be greater than any other number (thus the model of this theory is not isomorphic to our standard model $$\mathfrak{N}$$).

Recall that in a Peano arithmetic modeled by $$\mathfrak{N}'$$, numbers are closed under the successor function $$\mathbf{S}$$. More precisely, if $$k \in |\mathfrak{N}'|$$ , then $$\mathbf{S}k \in |\mathfrak{N}'|$$ and $$\mathbf{S}k \neq \mathbf{0}$$. This implies that not only $$c'_0 \in |\mathfrak{N}'|$$, but also $$\mathbf{S} c'_0, \mathbf{S}\mathbf{S} c'_0, \mathbf{S}\mathbf{S}\mathbf{S} c'_0, \dots$$ are all non-standard numbers in $$|\mathfrak{N}'|$$. As none of these numbers is equal to $$\mathbf{0}$$ (by one of Peano axioms), they form an infinite “chain” separately besides our familiar standard ones. One can write down all the (standard and non-standard) numbers sloppily as the following sequence: $\langle 0, 1, 2, \dots, \quad c'_0, c'_1, c'_2, \dots \rangle$ where $$0$$ is just $$\mathbf{0}$$, $$1$$ is $$\mathbf{S0}$$, $$2$$ is $$\mathbf{SS0}$$, $$c'_1$$ is $$\mathbf{S} c'_0$$, $$c'_2$$ is $$\mathbf{SS} c'_0$$, etc.

Clearly, every number but $$0$$ and $$c'_0$$ in the sequence has a unique predecessor. There is certainly no such a predecessor $$j$$ of $$0$$, because otherwise we would have $$\mathbf{S}j = \mathbf{0}$$, contradicting our axioms. But can we have a predecessor of $$c'_0$$? There is no axiom preventing us from constructing that thing! So here we go, enlarge our collection of numbers to: $\langle 0, 1, 2, \dots, \quad \dots, c'_{-2}, c'_{-1}, c'_0, c'_1, c'_2, \dots \rangle$ where for each $$c'_{i}$$, $$c'_{i+1} = \mathbf{S} c'_{i}$$. Again, we know that every such non-standard numbers $$c'_i$$ is greater than any standard number $$n$$ (otherwise we can find a standard number $$n-i$$ such that $$(\lnot n-i < c'_0)$$, contradicting our initial construction of $$c'_0$$ by compactness. So the non-standard part is still a separate chain, thus as written above.

We can go even further. Let $$|\mathfrak{N}'|$$ be this set of standard and non-standard numbers, and $$\mathfrak{N}' = (|\mathfrak{N}'|; \mathbf{0}, \mathbf{S}, +, \cdot)$$ is still the intended model of Peano arithmetic on $$|\mathfrak{N}'|$$. Consider adding yet another constant symbol $$c''$$. Is $$\text{Th}\mathfrak{N}' \cup \{ \underbrace{\mathbf{S} \cdots \mathbf{S}}_{n'\ \text{times}} \mathbf{0} < c'' \,:\, n' \in |\mathfrak{N}'| \}$$ satisfiable? By the same reasoning as before, we get a model $$\mathfrak{N}''$$, with its domain of numbers $\langle 0, 1, 2, \dots, \quad \dots, c'_{-2}, c'_{-1}, c'_0, c'_1, c'_2, \dots, \quad \dots, c''_{-2}, c''_{-1}, c''_0, c''_1, c''_2, \dots \rangle$

Counter-intuitively, this is not the kind of “arithmetic” we are used to. What we are trying to do is to formulate the arithmetic for standard natural numbers that we use everyday (i.e., $$0, 1, 2, \dots$$) in first-order logic, but these non-standard numbers come out of nowhere and there is an infinite “stage” of them, such that each number in a latter stage is greater than every number in a former stage (How is that even possible?). And what is worse, in a subset of the non-standard model $$\mathfrak{N}'$$: $\{ \dots, c'_{-2}, c'_{-1}, c'_0, c'_1, c'_2, \dots \}$ There is no minimal element with respect to the intended ordering relation $$<$$, thus it is not well-ordered by $$<$$, so our good old mathematical induction will no longer work with non-standard numbers.

Well, the lucky part is that we can at least have the induction axiom as a first-order sentence: $\varphi(\mathbf{0}, y_1, \dots, y_k) \land \forall x (\varphi (x, y_1, \dots, y_k) \to \varphi (\mathbf{S}(x), y_1, \dots, y_k)) \to \forall x \varphi(x, y_1, \dots, y_k))$ Since the standard model $$\mathfrak{N}$$ and the non-standard model $$\mathfrak{N}'$$ are elementarily equivalent (i.e., they satisfy the same sentences in a language excluding non-standard numbers), we still enjoy the nice induction principle for all of standard natural numbers. But for the non-standard part, we’re out of luck.

Ideally, we would like to have a bunch of axioms that perfectly defines the model of arithmetic, where no non-standard part is allowed to exist, i.e., the set of numbers is well-ordered by a definable ordering relation $$<$$ so that we can apply the induction principle on all of them.5 Unfortunately, this is infeasible in first-order logic (without formulating the notion of sets). We will demonstrate two potential resolutions.

The intuition here is that in order to rule out any non-standard chains of numbers, we must find a 1-place predicate $$P \subseteq |\mathfrak{N}|$$ such that for every standard number $$n$$ we have $$P n$$, distinguishing it from $$(\lnot P n')$$ where $$n'$$ is non-standard. Certainly $$\mathbf{0}$$ is a standard number; consequently every standard number $$x$$ is followed by $$\mathbf{S}x$$, which is also a standard one. $P \mathbf{0} \land \forall x (P x \to P \mathbf{S} x)$ Once we have this $$P$$ in mind, we say that every number $$n \in P$$, so it is also standard. There would be no other numbers in our model, thus define our set of all numbers: $\forall n P n$ Notice that $$P$$ is not in our language; it is something we have yet to construct for our standard model. How to deal with this issue?

1. The first approach is by allowing quantification over relations. So we can say “for all such relations $$P$$, it holds that $$\forall n P n$$”. Formally: $\forall P (P \mathbf{0} \land \forall x (P x \to P \mathbf{S} x)) \to \forall n P n$ Of course, in our previous definition of first-order languages, for $$\forall v_i \psi$$ to be a well-formed formula, $$v_i$$ is a variable such that $$v_i \in |\mathfrak{N}|$$ given a model $$\mathfrak{N}$$; here we have $$P \subseteq |\mathfrak{N}|$$ hence $$P \in \mathcal{P}(|\mathfrak{N}|)$$. So in a first-order logic we would not be able to do this (we can only quantify a variable in the domain $$|\mathfrak{N}|$$). This approach leads to a second-order logic (where we can not only quantify over a plain variable in $$|\mathfrak{N}|$$, but also quantify over a relation variable in the power set of the domain, i.e., $$\mathcal{P}(|\mathfrak{N}|)$$; that gives our logic more expressive power!).
2. As we see, a relation is essentially a subset of $$|\mathfrak{N}|$$ (thus its range is also a set); it is tempting to formulate Peano arithmetic using the notion of sets. Indeed, we can rewrite the formula in Approach 1 into the language of set theory as: $\forall y (\emptyset \in y \land \forall x (x \in y \to S(x) \in y)) \to \forall n\ n \in y$ where we encode the standard number $$\mathbf{0}$$ as $$\emptyset$$, $$\mathbf{S}x$$ as $$S(x) = x \cup \{x\}$$. Clearly there is no non-standard number in this set-theoretic model. This is exactly how we define natural numbers $$\mathbb{N}$$ (as a minimal inductive set $$\omega$$) in set theory, and its existence is justified by the so-called axiom of infinity. Note that once we introduce the set theory (using a first-order language), we have the equivalently expressive (sometimes paradoxical) power of any arbitrary higher-order logic. Fundamentally.6

Books:

Herbert B. Enderton, A Mathematical Introduction to Logic, 2nd ed.

Kenneth Kunen, The Foundations of Mathematics.

Articles:

Terence Tao, “The completeness and compactness theorems of first-order logic,” https://terrytao.wordpress.com/2009/04/10/the-completeness-and-compactness-theorems-of-first-order-logic/.

Asger Törnquist, “The completeness theorem: a guided tour,” http://www.math.ku.dk/~asgert/teachingnotes/iml-completenessguide.pdf.

Eliezer Yudkowsky, “Godel’s completeness and incompleteness theorems,” http://lesswrong.com/lw/g1y/godels_completeness_and_incompleteness_theorems/.

Eliezer Yudkowsky, “Standard and nonstandard numbers,” http://lesswrong.com/lw/g0i/standard_and_nonstandard_numbers/.

David A. Ross, “Fun with nonstandard models,” http://www.math.hawaii.edu/~ross/fun_with_nsmodels.pdf.

Papers:

 L. Henkin, “The completeness of the first-order functional calculus,” The journal of symbolic logic, vol. 14, no. 3, pp. 159–166, 1949.

1. Is there a constructive approach as a replacement of Henkin’s construction? https://math.stackexchange.com/questions/1462408/is-there-a-constructive-approach-as-a-replacement-of-henkins-construction

2. We are using the notion of decidability/undecidability here even before we get to Gödel’s incompleteness theorem, but hopefully it’s no stranger to us as computability theory has all the model-specific issues (though non-logical) covered.

3. Gödel’s original proof of the completeness theorem: https://en.wikipedia.org/wiki/Original_proof_of_G%C3%B6del%27s_completeness_theorem

4. It might be interesting to know that $$\text{Th}(\mathbb{N}; \mathbf{0}, \mathbf{S}, +, \cdot)$$ is an undecidable theory, as shown by the aforementioned incompleteness theorem.

5. If we accept the Axiom of Choice, then every set can have a well-ordering; so this is actually a reasonable request.

6. Many logicians (Kurt Gödel, Thoralf Skolem, Willard Van Quine) seem to adhere to first-order logic. And that’s why.

]]>
When GNOME Shell Freezes (But You Have Unsaved Stuff There) tag:www.soimort.org,2017:/notes/170902 2017-09-02T00:00:00+02:00 2017-09-02T00:00:00+02:00 Mort Yao In the past few months, I was running into the mysterious bug that GNOME Shell crashes with a segfault in libgjs.so.0.0.0, leaving me an unresponsive desktop:

  kernel: gnome-shell: segfault at 7f06645fffe8 ip 00007f06d94898dd
sp 00007fff562ee7e0 error 4 in libgjs.so.0.0.0[7f06d9

On crashing GNOME Shell does not restart itself correctly, nor gnome-shell processes are actually terminated, for some unknown reason. When this happens Mutter is completely frozen, Alt+F2 is unavailable; what’s even worse, Ctrl+Alt+F3 can’t get you a lifesaving TTY.

Clearly, the only way one can access the box then is via SSH. But what to do next? While it is possible to simply restart gdm like: (for systemd users)

$systemctl restart gdm This would, however, destroy the whole X session and all your open GUI processes will be lost! A desirable, more elegant approach is to restart gnome-shell alone, but this requires a few tricks in case you’re working through ssh and killall -9 gnome-shell doesn’t restart GNOME Shell properly. 1. Login to the (borked) remote machine, and enable X11 Forwarding in its SSH daemon by modifying /etc/ssh/sshd_config, if you’ve never done it before: (Note that this option is not the default on most distros; you’ll have to restart sshd on the remote machine after then.)  X11Forwarding yes 1. Login to the remote machine again, with X11 forwarding enabled this time: $ ssh yourUser@yourHost -X
1. On the borked remote machine, kill all the FUBARed gnome-shell processes, if any:
        $pkill gnome-shell 1. Set the correct DISPLAY environment otherwise you won’t be able to run a WM from a remote shell: $ export DISPLAY=:0
1. Bring back the GNOME Shell WM, detached (so it can keep running after closing the remote shell):
        $setsid gnome-shell Related Gjs bug(s). The bug was there since gjs 1.48 and has been reported multiple times: #781799, #783935, #785657. Still, a recent gjs 1.49 build crashes my desktop: ( It seems rather hard to find out what would trigger the issue, but no more worries when it bumps in the night – Just get back the WM and no running window is lost. ]]> Understanding Colors – A Walkthrough of Modern Color Theory and Practice tag:www.soimort.org,2017:/posts/170803 2017-08-03T00:00:00+02:00 2017-08-03T00:00:00+02:00 Mort Yao The Stateful Automata tag:www.soimort.org,2017:/mst/6 2017-04-12T00:00:00+02:00 2017-04-18T00:00:00+02:00 Mort Yao • What is a formal language? (review of Mst. #1) • An alphabet $$\Sigma$$ is a well-defined set of symbols. A string is a sequence of symbols. • A language $$L$$ over alphabet $$\Sigma$$ is a set of strings, where each string $$w \in \Sigma^*$$. • We can define a class of languages (which is a set of languages with some common properties). • Operations on languages (as sets): union, intersection, complement, concatenation, Kleene star. • A class of languages is said to be closed under some operation, if the operation yields a language that is in the same class. (closure properties) • Finite language: Any finite set of strings. • Closure properties: FL is closed under union, intersection and concatenation. (but not under complementation or Kleene star) • Regular language: Any language that is recognized by a DFA. • Closure properties: RL is closed under union, concatenation and Kleene star. (regular operations) • Equivalent models and conversions: • DFA (Deterministic finite automaton) • NFA (Nondeterministic finite automaton) • DFA to NFA: trivial (every DFA is an NFA). • NFA to DFA: powerset (Rabin-Scott construction). • Regular expression • RegExp to NFA: structural induction (Thompson’s construction). • RegExp to DFA: via NFA (structural induction and powerset). • NFA to RegExp: via GNFA (node removal). • DFA to RegExp: Kleene’s algorithm. • Show that a language is regular: 1. By construction: find a DFA/NFA that recognizes it or a RegExp that describes it. 2. By the Myhill-Nerode theorem. 3. By hierarchy: every finite language is regular. • Show that a language is not regular: 1. By the pumping lemma. 2. By the Myhill-Nerode theorem. • Conversion among different models of RL • (Nondeterministic) Context-free language: Any language that is generated by a context-free grammar. • Closure properties: CFL is closed under union, concatenation and Kleene star. (but not under complementation or intersection) • Equivalent models: • Context-free grammar • Every CFG can be converted into Chomsky normal form. • PDA (Nondeterministic pushdown automaton) • Finite automaton + 1 infinite stack • Show that a language is context-free: 1. By construction: find a CFG that generates it or a PDA that recognizes it. 2. By hierarchy: every regular language is context-free. • Show that a language is not context-free: 1. By the pumping lemma. • Church-Turing thesis • It is (informally) a “thesis” because it cannot be formalized as a provable proposition (since we use it to give a notion for what a computation is). • Computation may be performed on unrestricted languages: • Recognizing a language $$L$$: • For every string $$w \in L$$, the machine $$M$$ must accept it. • We call such a machine $$M$$ an acceptor or a recognizer of the language $$L$$. • Deciding a language $$L$$: • For every string $$w \in L$$, the machine $$M$$ must accept it; moreover, for every $$w \notin L$$, the machine $$M$$ must reject it. That is, the machine $$M$$ must halt on every input $$w$$. • We call such a machine $$M$$ a decider of the language $$L$$. • Equivalent Turing-complete models of computation: • (Single-tape, deterministic, read-write) Turing machine • Finite automaton + 1 infinite linear table (or 2 infinite stacks) • Multi-tape Turing machine • Nondeterministic Turing machine • Enumerator • Abstract rewriting system • Lambda calculus (combinatory logic) • The Church-Turing thesis claims that all Turing-complete models are equivalent in their abilities to compute (recognize / decide a language). • There are some languages that cannot be recognized by any Turing machine, that is, unrecognizable languages exist. (shown by the diagonal method) • Turing-recognizable language: Any language that is recognized by a Turing machine. • Closure properties: closed under union, intersection, concatenation and Kleene star. (but not under complementation) • Co-Turing-recognizable language: Any language whose complement is recognized by a Turing machine. • Turing-decidable language (or simply decidable language): Any language that is decided by a Turing machine. • A language is decidable if and only if it is both Turing-recognizable and co-Turing-recognizable. • Closure properties: closed under union, intersection, complementation, concatenation and Kleene star. • Show that a language is Turing-recognizable: 1. By construction: find a Turing machine that recognizes it. 2. By mapping reducibility: $$L \leq_\text{m} L'$$, where $$L'$$ is Turing-recognizable. 3. By hierarchy: any context-sensitive language (thus also context-free language or regular language) is Turing-recognizable. • Show that a language is not Turing-recognizable: 1. By contraposition: we already know that $$\overline{L}$$ is Turing-recognizable; if $$L$$ is Turing-recognizable, then $$L$$ would be decidable. But we know that $$L$$ is undecidable, thus $$L$$ cannot be Turing-recognizable. 2. By mapping reducibility: $$L' \leq_\text{m} L$$, where $$L'$$ is not Turing-recognizable. • Show that a language is decidable: 1. By construction: find a Turing machine that decides it. 2. Show that it is both Turing-recognizable and co-Turing-recognizable. 3. By mapping reducibility: $$L \leq_\text{m} L'$$, where $$L'$$ is decidable. 4. By Turing reducibility: $$L \leq_\text{T} L'$$, where $$L'$$ is decidable. 5. By hierarchy: any context-sensitive language (thus also context-free language or regular language) is decidable. • Show that a language is undecidable: 1. By mapping reducibility: $$L' \leq_\text{m} L$$, where $$L'$$ is undecidable. 2. By Rice’s theorem. ## Chomsky hierarchy of languages, grammars and automata $\text{FL} \subsetneq \text{RL} \subsetneq \text{DCFL} \subsetneq \text{CFL} \subsetneq \text{CSL} \subsetneq \text{R} \subsetneq \text{U} \subsetneq \mathcal{P}(\mathcal{P}(\Sigma^*))$ • FL (Finite languages), described by finite enumeration, recognized by finite Boolean circuits. • RL (Regular languages, Type-3 grammars), described by regular expressions, recognized by DFAs/NFAs. • DCFL (Deterministic context-free languages), described by deterministic context-free grammars, recognized by deterministic pushdown automata. • CFL (Context-free languages, Type-2 grammars), described by nondeterministic context–free grammars, recognized by nondeterministic pushdown automata. • CSL (Context-sensitive languages, Type-1 grammars), described by context-sensitive grammars, recognized by linear bounded automata (restricted form of Turing machines). • R (Turing-decidable languages or recursive languages), described by recursive grammars, recognized by Turing machines that always halt (i.e., deciders). • U (Turing-recognizable languages or recursively enumerable languages, Type-0 grammars), described by unrestricted grammars, recognized by Turing machines (but not guaranteed to halt thus not necessarily decidable). • $$\mathcal{P}(\mathcal{P}(\Sigma^*))$$, may have unrestricted grammars or no grammar at all, not necessarily Turing-recognizable. Grammars provide a mathematical, recursive perspective for describing languages, while automata provide a physically implementable, iterative approach of recognizing languages (via finite states and possibly infinite stacks). A language is Turing-recognizable if and only if it has a well-defined grammar (so that it’s possible to construct some Turing machine that recognizes it), although it can sometimes be hard to formulate. Quite naturally, if we cannot specify a grammar for a language $$L \subset \mathcal{P}(\Sigma^*)$$ theoretically, it would be impossible to construct a Turing machine that recognizes it. ## Decidability and undecidability We consider mainly three types of decision problems concerning different computational models: (generative grammars may be viewed as a form of machines like automata, in the following setting) 1. Acceptance problem: $$A_\mathsf{X} = \{ \langle M, w\rangle\ |\ M \text{ is a machine of type } X \text{, and } M \text{ accepts } w \}$$. 2. Emptiness problem: $$E_\mathsf{X} = \{ \langle M \rangle\ |\ M \text{ is a machine of type } X \text{, and } \mathcal{L}(M) = \emptyset \}$$. 3. Equality problem: $$EQ_\mathsf{X} = \{ \langle M_1, M_2 \rangle\ |\ M_1 \text{ and } M_2 \text{ are machines of type } X \text{, and } \mathcal{L}(M_1) = \mathcal{L}(M_2) \}$$. Model Language Acceptance problem Emptiness problem Equality problem DFA (Deterministic finite automaton) Regular language: decidable $$A_\textsf{DFA}$$: decidable $$E_\textsf{DFA}$$: decidable $$EQ_\textsf{DFA}$$: decidable NFA (Nondeterministic finite automaton) Regular language: decidable $$A_\textsf{NFA}$$: decidable $$E_\textsf{NFA}$$: decidable $$EQ_\textsf{NFA}$$: decidable Regular expression Regular language: decidable $$A_\textsf{REX}$$: decidable $$E_\textsf{REX}$$: decidable $$EQ_\textsf{REX}$$: decidable CFG (Context-free grammar) Context-free language: decidable $$A_\textsf{CFG}$$: decidable $$E_\textsf{CFG}$$: decidable $$EQ_\textsf{CFG}$$: undecidable PDA (Pushdown automaton) Context-free language: decidable $$A_\textsf{PDA}$$: decidable $$E_\textsf{PDA}$$: decidable $$EQ_\textsf{PDA}$$: undecidable CSG (Context-sensitive grammar) Context-sensitive language: decidable $$A_\textsf{CSG}$$: decidable $$E_\textsf{CSG}$$: undecidable $$EQ_\textsf{CSG}$$: undecidable LBA (Linear bounded automaton) Context-sensitive language: decidable $$A_\textsf{LBA}$$: decidable $$E_\textsf{LBA}$$: undecidable $$EQ_\textsf{LBA}$$: undecidable Turing machine and equivalent Turing-complete models Turing-recognizable language: may be decidable or undecidable $$A_\textsf{TM}$$: undecidable $$E_\textsf{TM}$$: undecidable $$EQ_\textsf{TM}$$: undecidable (not Turing-recognizable) The fact that $$A_\mathsf{TM}$$ is undecidable implies that a Turing machine may not halt on some input $$w$$. Consider the following problem: (the halting problem ) $\textit{HALT}_\textsf{TM} = \{ \langle M, w \rangle\ | \ M \text{ is a TM, and } M \text{ halts on input } w \}$ Corollary 6.1. $$\textit{HALT}_\textsf{TM}$$ is undecidable. Proof. (By contraposition) Assume that there exists a Turing machine $$R$$ that decides $$\textit{HALT}_\textsf{TM}$$. We construct a Turing machine $$S$$ that decides $$A_\mathsf{TM}$$, using $$R$$: $$S =$$ “On input $$\langle M, w \rangle$$: 1. Run $$R$$ on input $$\langle M, w \rangle$$. 2. If $$R$$ rejects, reject. 3. If $$R$$ accepts (so that we know $$M$$ always halt on $$w$$), simulate $$M$$ on $$w$$ until it halts. 4. If $$M$$ accepts, accept; otherwise, reject.’’ If $$R$$ decides $$\textit{HALT}_\textsf{TM}$$, then $$S$$ decides $$A_\mathsf{TM}$$ by construction. Since $$A_\mathsf{TM}$$ is undecidable, such an $$R$$ that decides $$\textit{HALT}_\textsf{TM}$$ does not exist. Apart from Turing’s original halting problem, other undecidable problems of historical importance include: • Post correspondence problem (Post 1946 ) • Word problem for semigroups (proposed by Axel Thue in 1914; its undecidability was shown by Emil Post and Andrey Markov Jr. independently in 1947 ) • The busy beaver game (Rado 1962 ) • Hilbert’s tenth problem (proposed by D. Hilbert in 1900; its undecidability was shown by Matiyasevich’s theorem in 1970, and is in no way an obvious result) Some problems, such as $$EQ_\mathsf{TM}$$, are not even Turing-recognizable. Another example is $$MIN_\mathsf{TM}$$: $MIN_\mathsf{TM} = \{ \langle M \rangle\ | \ M \text{ is a minimal Turing machine} \}$ Theorem 6.2. $$MIN_\mathsf{TM}$$ is not Turing-recognizable. Proof. Assume that there exists an enumerator $$E$$ that enumerates $$MIN_\mathsf{TM}$$ (note that an enumerator is equivalent to a Turing machine). We construct a Turing machine $$C$$ using $$E$$: $$C =$$ “On input $$w$$: 1. Obtain the own description $$\langle C \rangle$$. (by the recursion theorem) 2. Run $$E$$ until a machine $$D$$ appears with a longer description than that of $$C$$. 3. Simulate $$D$$ on $$w$$.’’ Since $$MIN_\textsf{TM}$$ is infinitely large but the description of $$C$$ is of finite length, the enumerator $$E$$ must eventually terminate with some $$D$$ that has a longer description than that of $$C$$. As $$C$$ simulates $$D$$ in the last step, $$C$$ is equivalent to $$D$$. But then the description of $$C$$ is shorter than that of $$D$$, thus $$D$$ could not be an output of $$E$$ (which enumerates only $$MIN_\mathsf{TM}$$). That is a contradiction. Therefore, such an enumerator $$E$$ for $$MIN_\mathsf{TM}$$ does not exist, that is, $$MIN_\mathsf{TM}$$ is not Turing-recognizable. The Turing-unrecognizability of $$MIN_\textsf{TM}$$ implies that the Kolmogorov complexity (descriptive complexity) $$K(x)$$ is not a computable function. ## Beyond Turing machines Per the Church-Turing thesis, a Turing machine (one finite automaton + one infinite linear table) defines the “limitation of computation”. Several hypothetical models (sometimes referred to as hypercomputation  as they are assumed to have the abilities to solve non-Turing-computable problems) have been proposed for the sake of theoretical interest: • Oracle machine: a know-all machine that can solve a certain non-Turing-computable problem such as $$A_\textsf{TM}$$ or $$\textit{HALT}_\textsf{TM}$$ (in a black-boxed way). • Real computer (e.g., Blum-Shub-Smale machine): an idealized analog computer that can compute infinite-precision real numbers. (Real numbers are uncountable as shown by Cantor’s diagonal argument, so Turing machines are incapable of handling arbitrary reals) • Zeno machine (i.e., supertasking): a Turing machine that can complete infinitely many steps in finite time. • Infinite-time Turing machine: a Turing machine that is simply allowed to halt in an infinite amount of time. Note that standard (qubit-based) quantum computers are PSPACE-reducible, thus they are still a Turing-complete model of computation (i.e., quantum computers cannot be real computers or Zeno machines, and they do not break the Church-Turing thesis). Moreover, it has been proposed that the simulation of every (quantum) physical process, where only computable reals present, is actually a Turing-computable problem (Church-Turing-Deutsch principle). ## Notes on interesting problems in computability theory Finite languages. A finite language consists of a finite set of strings, thus it may be written as a regular expression of a finite union of those strings. It may be recognized by a time-independent combinational circuit, which can be viewed as a special form of acyclic finite automaton. (See also the blog post: What I Wish I Knew When Learning Boolean Circuits) There are several unsolved problems concerning the circuit complexity. Generalized regular expression. A generalized regular expression is defined as $R ::= a\ |\ \varepsilon\ |\ \emptyset\ |\ (R_1 \cup R_2)\ |\ (R_1 \cap R_2)\ |\ (R_1 \circ R_2)\ |\ (\lnot R_1)\ |\ (R_1^*)$ where $$a \in \Sigma$$, $$R_1$$ and $$R_2$$ are generalized regular expressions. Although it comes with two extra operators (intersection $$\cap$$ and complementation $$\lnot$$), it actually has the same representational power as regular expressions, due to the fact that the class of regular languages is closed under intersection and complementation. Star-free languages. A star-free language is a regular language that can be described by a generalized regular expression but without the Kleene star operation. Clearly, this class includes all finite languages. One example of an infinite star-free language is $$a^*$$, because $$a^* = \lnot((\lnot\emptyset) \circ (\lnot{a}) \circ (\lnot\emptyset))$$. On the other hand, $$(a \circ a)^*$$ is not star-free. (Unsolved) Generalized star height problem. Can all regular languages be expressed using generalized regular expressions with a limited nesting depth of Kleene stars (star height)? Moreover, is the minimum required star height a computable function? For example, $$((((a \circ a)^*)^*)\cdots)^*$$ can be expressed as $$(a \circ a)^*$$, which has a minimum required star height of 1 (but it is not star-free). The general result and whether the minimum star height is decidable are not yet known. Linear bounded automata. LBAs provide an accurate model for real-world computers, which have only bounded memories. Given sufficient memory on an LBA, we can simulate another (smaller) LBA; a practical example would be a virtual machine running on VirtualBox or QEMU. (Unsolved) LBA problem. Is the class of languages accepted by LBA equal to the class of languages accepted by deterministic LBA? Or, in terms of the complexity theory, is $$\text{SPACE}(n) = \text{NSPACE}(n)$$? We know that DFA and NFA accept the same class of languages, so do TM and NTM, while PDA and DPDA are not completely equivalent. However, it is still an open question whether LBA and DLBA are equivalent models. Note that it is known that $$\text{PSPACE} = \bigcup_k \text{SPACE}(n^k)$$ is equivalent to $$\text{NPSPACE} = \bigcup_k \text{NSPACE}(n^k)$$, that is, deterministic polynomially bounded automata and nondeterministic polynomially bounded automata are equivalent, by Savitch’s theorem. ## References and further reading Books: M. Sipser, Introduction to the Theory of Computation, 3rd ed. J. E. Hopcroft, R. Motwani, J. D. Ullman, Introduction to Automata Theory, Languages, and Computation, 3rd ed. Articles: M. Davis, “What is a computation?” Mathematics Today: Twelve Informal Essays, pp. 241–267, 1978. D. Zeilberger, “A 2-minute proof of the 2nd-most important theorem of the 2nd millennium,” http://www.math.rutgers.edu/~zeilberg/mamarim/mamarimTeX/halt. Papers:  A. M. Turing, “On computable numbers, with an application to the Entscheidungsproblem,” Proceedings of the London Mathematical Society, vol. 2, no. 1, pp. 230–265, 1937.  E. L. Post, “A variant of a recursively unsolvable problem,” Bulletin of the American Mathematical Society, vol. 52, no. 4, pp. 264–268, 1946.  E. L. Post, “Recursive unsolvability of a problem of Thue,” The Journal of Symbolic Logic, vol. 12, no. 01, pp. 1–11, 1947.  T. Rado, “On non-computable functions,” Bell System Technical Journal, vol. 41, no. 3, pp. 877–884, 1962.  M. Davis, “The myth of hypercomputation,” in Alan Turing: Life and legacy of a great thinker, Springer, 2004, pp. 195–211. ]]> Recovering from a Corrupted Arch Linux Upgrade tag:www.soimort.org,2017:/notes/170407 2017-04-07T00:00:00+02:00 2017-04-08T00:00:00+02:00 Mort Yao (Probably unimportant) Backstory: It’s been a while since the last full-system upgrade of my Arch Linux laptop – It sat well with me for a span of months’ uptime. When I finally got the time to run a pacman -Syu (so well as to dare any potential issues due to Arch’s infamous instability) earlier today, weird things did happen: Something other than pacman was eating up my CPU. I assumed it was Chromium in the first place, unless it wasn’t this time. It turned out to be gnome-settings-daemon together with some other GUI processes running on GNOME, that were consistently hogging my CPU in turn. And that made a percent of 100% (!). As I killed the seemingly problematic process, something else would insist to emerge on the top of top with similarly excessive CPU usage. I decided that this was an unusual situation and without much further thought, forced a reboot while pacman was still running. Then I ended up with a corrupted system: the kernel didn’t load during the early boot process, complaining about missing modules.devname and that /dev/sda11 (which is the root partition) could not be found:  Warning: /lib/modules/4.10.8-1-ARCH/modules.devname not found - ignoring starting version 232 Waiting 10 seconds for device /dev/sda11 ... ERROR: device '/dev/sda11' not found. Skipping fsck. mount: you must specify the filesystem type You are now being dropped into an emergency shell. Without proper modules loaded even the keyboard was not working, so I couldn’t do anything under the emergency shell. But I wasn’t completely out of luck – I have a dual-booting of Arch and Ubuntu, which is prepared just for recovery purposes like this. (otherwise I had to make room for another Live USB, which was a little bit inconvenient; I don’t use those rescue-ware for years) Recover an (unbootable) Arch system from another distro: This is mostly relevant if you can’t boot into the system (due to a corrupted kernel upgrade), otherwise it’s possible to just login from TTY (locally or via SSH) and perform the fix (e.g., in the case that it’s just X, display manager or WM that fails to load but the kernel boots well). 1. On the host system (e.g., Ubuntu), create a working chroot environment: (Assume that the root for Arch is mounted to /dev/sda11)  # TARGETDEV="/dev/sda11" # TARGETDIR="/mnt/arch" # mount$TARGETDEV $TARGETDIR # mount -t proc /proc$TARGETDIR/proc
# mount --rbind /sys $TARGETDIR/sys # mount --rbind /dev$TARGETDIR/dev

# cp /etc/hosts $TARGETDIR/etc # cp /etc/resolv.conf$TARGETDIR/etc
# chroot $TARGETDIR rm /etc/mtab 2> /dev/null # chroot$TARGETDIR ln -s /proc/mounts /etc/mtab

# chroot $TARGETDIR Alternatively, use the arch-chroot script from an Arch bootstrap image:  # arch-chroot$TARGETDIR
1. Now that we have a chroot environment with access to the (currently broken) Arch system, continue with unfinished pacman upgrades and clean up any lock file if needed:
    [/mnt/arch]# pacman -Syu

Or, if there’s any unwanted package upgrade that causes the system to fail, downgrade it.
The important thing to remember is that, while continuing, a previously failed transaction will not run its hooks anymore. Therefore it might be wiser to find out which exact packages pacman was trying to upgrade in the last transaction (from /var/log/pacman.log) and reinstall all of them, rather than a complementing pacman -Syu. But if it’s known which package is causing the problem (and in this very common case, linux), simply run the following to regenerate initramfs for the new kernel:

    [/mnt/arch]# mkinitcpio -p linux

And we are done. A few commands and our Arch system is now back and booting.

Why the mess? Initramfs images need to be regenerated (from a corresponding mkinitcpio preset file) after each kernel update. Starting from Arch’s package linux 4.8.8-2, the calling of mkinitcpio was specified in a PostTransaction alpm hook, in place of an explicit post_install() command to be invoked from the .install script as before: (The move towards alpm hooks is said to be a fix for #51818, as two or more packages in one transaction may need the regeneration of initramfs images)

[Trigger]
Type = File
Operation = Install
Target = boot/vmlinuz-%PKGBASE%
Target = usr/lib/initcpio/*

[Action]
Description = Updating %PKGBASE% initcpios
When = PostTransaction
Exec = /usr/bin/mkinitcpio -p %PKGBASE%

The faulty part about PostTransaction alpm hooks is that they run in a post-transaction manner; in the context of Arch Linux, a “transaction” means a complete run of pacman, regardless of how many packages it installs or upgrades. Such PostTransaction hooks will not run until all preceding operations succeed. Quoted from the CAVEATS section in the alpm-hooks(5) Manual Page:

“PostTransaction hooks will not run if the transaction fails to complete for any reason.”

This (kind of) misbehavior could lead to unintended aftermath in a failed system upgrade, as it does in the case of the linux package. Confusingly, a pacman “transaction” can be an arbitrary set of unrelated package updates; it could possibly fail to complete at any point (e.g., .install script errors, process accidentally killed, or even power lost), and ideally, it should remain safe while encountering such failures before commit. However, since the generation of initramfs in the PostTransaction hook is actually an indispensable step for the newly upgraded linux package here (whether other packages complete their upgrades or not), it is not safe if a pacman “transaction” fails halfway; PostTransaction hook won’t run and all you get is a halfway system that doesn’t boot (so you’ll need to fix that through a fallback or alternative kernel). To see this problem more clearly, think of the following two scenarios:

    pacman -Syu
sanity check (for packages to upgrade)
|
start upgrading package L (requires hook M to run)
|
|
|
complete transaction
run post-transaction hook M
|
done.
[packages L, A, B are upgraded; hook M is run; working system]

    pacman -Syu
sanity check (for packages to upgrade)
|
start upgrading package L (requires hook M to run)
|
|
|
crash!
[packages L, A are upgraded; hook M is not run; broken system]

How to perform a system upgrade safely? Although it ultimately depends on how we define a “safe system upgrade” and how much we want it, some robustness may be achieved at least. It’s clear to see that the upgrade of package L and the run of hook M should be dealt as one transaction, or treating M in a post-transaction way will put us on the risk that if some other package fails to upgrade (or maybe it’s just computer crashes), we end up with having only L upgraded, but not M run (while it is desirable to have both or none).

        sanity check (for packages to upgrade)
|
start upgrading package L (requires hook M to run)
|
run hook M
|
complete transaction
|
|
crash!
[packages L, A are upgraded; hook M is run; working system]

This is exactly the way things worked before (using the old-fashioned post_install()): It’s less fragile under possible failures, although in order to fix #51818 (another package L2 also requires hook M to run), we’ll need to manage the ordering of package upgrades properly:

        start upgrading package L (requires hook M to run)
|
start upgrading package L2 (requires hook M to run)
|
run hook M
|
complete transaction
...

That is, L, L2 and M are one transaction. It is not allowed to have only L and L2, or only L, without a run of hook M. Other unrelated packages (A, B, …) are prevented from causing such issues in the above scheme. There is still a possibility that the process crashes during the phase of L, L2 or M, but the chance would be much smaller, because we are working with a transaction consisting of the strongly coupling upgrades of L, L2 and M, not a whole pacman “transaction” consisting of easily hundreds of arbitrary system upgrades and post-transaction hooks. Even better, in case of such failures, we can ask to redo uncommitted transactions anyway so we don’t rerun hooks manually:

        redo uncommitted transactions (upgrades & hooks)
|
...

I don’t know if there’s any software management tool implementing this fully recovering approach. Transactions need to be more refined (pacman’s “all is a transaction” is really a naïve semantics to have) and the ordering of upgrades is essential, in order to give best robustness in case of possible crashes during a system upgrade, so it might not be as implementable as most package managers are (although still far less sophisticated than a real-world database). Or – with reasonable efforts – if we cannot guarantee anything for a transaction – maybe a Windows-inspired “user-friendly” warning message like “Don’t turn off your computer” is good enough. (At least you’ll know when you’re breaking shit: a transaction is not completed yet, some necessary hooks are not run, and you’re on your own then.)

## References

 ArchWiki, “Install from existing Linux”. https://wiki.archlinux.org/index.php/Install_from_existing_Linux

 ArchWiki, “Change root”. https://wiki.archlinux.org/index.php/change_root

 ArchWiki, “mkinitcpio”. https://wiki.archlinux.org/index.php/mkinitcpio

 alpm-hooks(5) Manual Page. https://www.archlinux.org/pacman/alpm-hooks.5.html

]]>
A Pandoc Filter for Typesetting Operational Semantics tag:www.soimort.org,2017:/notes/170323 2017-03-23T00:00:00+01:00 2017-03-24T00:00:00+01:00 Mort Yao Recently I decided to make my life a little easier by simplifying the laborious, tedious math typing that I had to do from time to time. For me this is most relevant for typesetting the formal semantics of programming languages, where proof trees are interwoven with a variety of programming constructs. Here’s my initial intention: To typeset something like (in big-step semantics) $\frac{ \stackrel{\mathcal{E}_0} {\langle b, \sigma \rangle \downarrow \textbf{true}} \qquad \stackrel{\mathcal{E}_1} {\langle c_0, \sigma \rangle \downarrow \sigma''} \qquad \stackrel{\mathcal{E}_2} {\langle \textbf{while } b \textbf{ do } c_0, \sigma'' \rangle \downarrow \sigma'} }{ \langle \textbf{while } b \textbf{ do } c_0, \sigma \rangle \downarrow \sigma'}$ Instead of the bloated TeX syntax:

  \frac{
\stackrel{\mathcal{E}_0}
{\langle b, \sigma \rangle \downarrow \textbf{true}} \qquad
\stackrel{\mathcal{E}_1}
{\langle c_0, \sigma \rangle \downarrow \sigma''} \qquad
\stackrel{\mathcal{E}_2}
{\langle \textbf{while } b \textbf{ do } c_0, \sigma'' \rangle \downarrow \sigma'}
}{
\langle \textbf{while } b \textbf{ do } c_0, \sigma \rangle \downarrow \sigma'}

I could just write it more neatly, intuitively, without all those hexes in magic backslashes and braces:

  <b, sig> ! true                 -- E_0
<c_0, sig> ! sig''              -- E_1
<while b do c_0, sig''> ! sig'  -- E_2
----
<while b do c_0, sig> ! sig'

But, of course, preserving the nice-looking outcome (from either TeX or MathJax).

Besides natural derivations, we also got a lot of evidently agreeable conventions for typesetting general mathematics: parentheses (brackets, braces) are almost always paired; “\left” and “\right” are very often desired since $$\left[\frac{1}{2}\right]$$ is a bit less awkward than $$[\frac{1}{2}]$$; when referring to a function name, “\operatorname” looks aesthetically better; etc. Furthermore, if we write down some math notations in plain text, clearly, “=>” has to be a “$$\Rightarrow$$” and “|->” has to be a “$$\mapsto$$”; “|-” means “$$\vdash$$” and “|=” means “$$\vDash$$”; a standalone letter “N” is just $$\mathbb{N}$$; etc. Taking such matters into consideration, there could be some handy alternative syntax (I call it “lazybones’ syntax”) for typesetting formulas, at least for a specific field someone specializes in, where these conventions are consistent:

Typesetting outcome Syntax
Provability $$\Gamma \vdash \varphi$$ TeX:
\Gamma \vdash \varphi
Lazybones’:
Gam |- phi
Validity $$\Gamma \vDash \varphi$$ TeX:
\Gamma \vDash \varphi
Lazybones’:
Gam |= phi
Big-step semantics $$\langle b_0 \land b_1, \sigma \rangle \downarrow t$$ TeX:
\langle b_0 \land b_1, \sigma
\rangle \downarrow t
Lazybones’:
<b_0 && b_1, sig> ! t
Small-step semantics $$\sigma \vdash b_0 \land b_1 \to^* t$$ TeX:
\sigma \vdash b_0 \land b_1
\to^* t
Lazybones’:
sig |- b_0 && b_1 ->* t
Hoare logic $$\vdash \{A\} \textbf{if } b \textbf{ then } c_0 \textbf{ else } c_1 \{B\}$$ TeX:
\vdash \{A\} \textbf{if } b
\textbf{ then } c_0
\textbf{ else } c_1 \{B\}
Lazybones’:
|- <[A]> if b then c_0
else c_1 <[B]>
Denotational semantics $$\mathcal{C}[\![X:=a]\!] = \lambda \sigma . \eta (\sigma[X \mapsto \mathcal{A}[\![a]\!] \sigma])$$ TeX:
\mathcal{C}[\![X:=a]\!] =
\lambda \sigma . \eta
(\sigma[X \mapsto
\mathcal{A}[\![a]\!] \sigma])
Lazybones’:
C[[X:=a]] = lam sig . eta
(sig[X |-> A[[a]] sig])

For simplicity, we require that all such transformations are regular, i.e., “lazybones’ syntax” may be translated into plain TeX code using merely substitutions of regular expressions.

As a basic example, let’s consider angle brackets, for which we desire to type in simply “<” and “>” instead of “\langle” and “\rangle”. To avoid any ambiguity (with normal less-than or greater-than sign), it is required that such brackets have no internal spacing, that is, we write “<x, y>” rather than “< x, y >”, but “1 < 2” instead of “1<2”. Once we fix the transformation rules, implementation would be quite straightforward in Haskell with Text.Regex, if we want a pandoc filter to handle these substitutions for everything embedded in TeX math mode:

subLAngle s =
subRegex (mkRegex "<([^[:blank:]])") s $"\\langle \\1" subRAngle s = subRegex (mkRegex "([^[:blank:]])>") s$ "\\1\\rangle "

semType m@(Math mathType s) = do
let s' = subLAngle $subRAngle s return$ Math mathType s'
semType x = return x

So, why bother implementing a pandoc filter rather than just defining some TeX macros? Two main reasons:

1. TeX code is nontrivial to write. And you can’t get rid of the overwhelming backslashes quite easily. It would be very hard, if not impossible, to achieve the same thing you’d do with a pandoc filter. (I would say that TeX is rather fine for typesetting, but only for typesetting; actual programming/manipulating macros in TeX seems very awkward for me.)
2. TeX macros are just… code in TeX, which means that they’re not designated to work for non-DVI originated publishing formats, such as HTML with MathJax. So if you do a lot of heavy lifting with TeX macros, they won’t play well for display on the Web (unless you’re willing to convert math formulas into pictures).

Surely you can write pandoc filters in any language as you like (officially Haskell and Python). Nothing like plain TeX (despite the fact that it’s Turing-complete): filters are easier to implement, unrestricted in the way that they interact with structured documents, with a great deal of cool supporting libraries even to blow up the earth. Although here in this case, we are only using some no-brainer pattern matching to release our fingers from heavy math typing.

A proof-of-concept of this is in semtype.hs. While undocumented, all tricks it does should be clear from the comments and regular expressions themselves.

]]>
An Intuitive Exposition of<br /> “Proof by Contradiction vs. Proof of Negation” tag:www.soimort.org,2017:/notes/170306 2017-03-06T00:00:00+01:00 2017-03-06T00:00:00+01:00 Mort Yao (Inspired by A “proof by contradiction” is not a proof that ends with a contradiction by Robert Harper.)

There seem to be some common misconceptions about “proof by contradiction”. Recently I came across Robert’s blog post on this issue, and couldn’t help reviewing it in my own way. I prefer a more formal treatment, nevertheless, without sacrificing its intuitive comprehensibility.

What is a “proof by contradiction”, exactly? When a classical mathematician (probably you) talks about proving by contradiction, they can mean either one of these two (syntactically) different things:

1. Prove $$P$$: Assume $$\lnot P$$, we derive a contradiction ($$\bot$$). Therefore, we have $$P$$.
2. Prove $$\lnot P$$: Assume $$P$$, we derive a contradiction ($$\bot$$). Therefore, we have $$\lnot P$$.

Both syntactic forms have some kind of reductio ad absurdum (reduction to / derivation of contradiction) argument. However, the first form provides an indirect proof for $$P$$; this is what we call a genuine “proof by contradiction”. The second form provides a direct proof for $$\lnot P$$ which is just the negation of $$P$$; this is preferably called a “proof of negation”, as it’s not a proof of $$P$$ itself, but rather a proof of the negation of $$P$$.

But how are they any different? You may ask. In a classical setting, there is no semantic difference. Notice that in the first form (“proof by contradiction”), we could rewrite $$P$$ as $$\lnot Q$$, then we get

1. Prove $$\lnot Q$$: Assume $$Q$$, we derive a contradiction ($$\bot$$). Therefore, we have $$\lnot Q$$.

which is just the second form, i.e., a “proof of negation”. Likewise, if we rewrite $$P$$ as $$\lnot Q$$ in the second form, we get

1. Prove $$Q$$: Assume $$\lnot Q$$, we derive a contradiction ($$\bot$$). Therefore, we have $$Q$$.

which is just the first form, i.e., a “proof by contradiction”. That’s the very reason people often misconceive them – classically, a proof by contradiction and a proof of negation can be simply converted to the form of one another, without losing their semantic validity.

From a constructivist perspective, things are quite different. In the above rewritten forms, we introduced a new term $$\lnot Q := P$$. For the rewrite in the 3rd form to be valid, the new assumption $$Q$$ must be as strong as the original assumption $$\lnot P$$, which is just $$\lnot \lnot Q$$. Formally, there must be $$\lnot \lnot Q \implies Q$$. For the rewrite in the 4th form to be valid, the new statement $$Q$$ must be not any stronger than the original statement $$\lnot P$$, so formally there must also be $$Q \implies \lnot \lnot Q$$. In intuitionistic logic, although we can derive a “double negation introduction” (thus complete the rewrite in the 4th form), there is no way to derive a “double negation elimination” as required to get the 3rd form. So technically, while we can soundly rewrite from a “proof of negation” to a “proof by contradiction”, the other direction is impossible. Indeed, we must make a clear distinction between a “proof by contradiction” and a “proof of negation” here: Semantically, they are not even dual and should not be fused with each other.

Why is this distinction important? Because in intuitionistic logic, the second form (proof of negation) is a valid proof; the first form (proof by contradiction) is not. Take a look at the negation introduction rule: $\lnot_\mathsf{I} : \frac{\Gamma, P \vdash \bot}{\Gamma \vdash \lnot P}$ which justifies the validity of “proof of negation”. However, there is no such rule saying that $\mathsf{PBC} : \frac{\Gamma, \lnot P \vdash \bot}{\Gamma \vdash P}$ In classical logic, where a rule like $$\mathsf{PBC}$$ is allowed, one can easily derive the double negation elimination which we begged for before: Given $$\Gamma \vdash \lnot \lnot P$$, the only rule that ever introduces a negation is $$\lnot_\mathsf{I}$$, so we must also have $$\Gamma, \lnot P \vdash \bot$$. Then by $$\mathsf{PBC}$$, we get a derivation of $$\Gamma \vdash P$$, as desired. $\mathsf{DNE} : \frac{\Gamma \vdash \lnot \lnot P}{\Gamma \vdash P}$ If we adopt $$\mathsf{PBC}$$, then we will also have adopted $$\mathsf{DNE}$$; if we have $$\mathsf{DNE}$$, then it would be perfectly valid to rewrite a “proof by contradiction” into the form of a “proof of negation”, or the other way around, as is already shown before. Since constructivists do not want to accept rules like $$\mathsf{PBC}$$ or $$\mathsf{DNE}$$ at all, they claim that a “proof by contradiction” and a “proof of negation” are essentially different, in that the latter is a valid proof but the former is doubtful, while their distinction is purely syntactical for classical mathematicians as the semantic equivalence would be trivial with $$\mathsf{PBC}$$ or $$\mathsf{DNE}$$.

The rationale behind constructivists’ choice of ruling out indirect proofs by rejecting derivation rules like $$\mathsf{PBC}$$ and $$\mathsf{DNE}$$ comes into view when talking about first-order theories, where existential quantifiers are used. Say, if we wish to prove that there exists some $$x$$ with a property $$P$$, we must specify an example $$t$$ which makes this property holds: $\exists_{\mathsf{I}_t} : \frac{\Gamma \vdash P[t/x]}{\Gamma \vdash \exists x : S.P}$

($$t$$ is a term of sort $$S$$)

This is something called a constructive proof, in the sense that in order to derive an existentialization, one must construct such a term explicitly, directly.

What if we allow $$\mathsf{PBC}$$ in our proofs then? We will be given enough power to utilize an alternate approach: Assume to the contrary that for all $$x$$ in $$S$$, $$\lnot P$$ holds. Then we derive a contradiction. Thus, by $$\mathsf{PBC}$$, there must exist some $$x$$ such that $$P$$ holds (since $$\lnot (\exists x : S.P) \equiv \forall x : S.\lnot P$$). Formally, $\frac{\Gamma, \forall x : S.\lnot P \vdash \bot}{\Gamma \vdash \exists x : S.P}$ Note that a term $$t$$ is nowhere to be evident in this form of proof. The downside of this classical approach of existence proof is that it is non-constructive, so even if you can derive a proof that some mathematical object exists, you can’t claim that you necessarily know what it is, since you have not concretely constructed such an object yet. Its existence is just “principally provable”, but not necessarily constructible or witnessable.

I would say it’s too much of a philosophical choice between classical logic and intuitionistic logic – at least for old school mathematicians who don’t practically mechanize their proofs at all. But one with some logic maturity should be able to draw a semantic distinction between a “proof by contradiction” that $$\vdash P$$ and a “proof of negation” that $$\vdash \lnot P$$, bewaring of how their treatments can diverge in non-classical logic settings. It is still questionable to me whether every theorem provable in classical logic can be proved constructively, whatsoever, a constructive proof almost always makes more sense: If you claim that you have an apple, just show me the apple, instead of arguing to me sophistically that it can’t be the case that you do not have an apple.

## References

 Andrzej Filinski, “Course Notes for Semantics and Types, Lecture 1: Logic”.

 Terence Tao, “The ‘no self-defeating object’ argument”. https://terrytao.wordpress.com/2009/11/05/the-no-self-defeating-object-argument/

]]>
What I Wish I Knew When Learning Boolean Circuits tag:www.soimort.org,2017:/posts/170306 2017-03-06T00:00:00+01:00 2017-03-06T00:00:00+01:00 Mort Yao The Fundamental Justification tag:www.soimort.org,2017:/mst/5 2017-02-06T00:00:00+01:00 2017-02-06T00:00:00+01:00 Mort Yao I don’t have much to write about this week. My plan was to finish the sections on statistics and machine learning in my wiki before I can move on to more rigorous mathematics and logic, but that turned out to be an impossible task (shortly after the exam week a new, incredibly tenser semester is right under the nose). I wanted to write more about cryptography and information theory as a brush-up account of previous courses I’ve taken, and that’s infeasible too (although the very introductory parts are done in #3 and #4).

Still, I have thought of many fun things I could do with statistical learning: image classification and face recognition (I always want a photo management application to be like that! I take and collect so many photos each day), a CBIR tool (like “Search by Image” in Google Images, but works locally on my computer), a GIMP Script-Fu that does tricks like image synthesis via ConvNets, etc. The good part about applying learning methods in image processing is that, once you can extract feature descriptors, everything becomes statistically learnable (and you as a programmer don’t really need to have prior knowledge about what an object visually is: an apple, or a pen?).

Cryptography, from what I learned, is a differently fun story. For a perfectly secure encryption scheme: $\Pr[M=m\,|\,C=c] = \Pr[M=m]$ that is, knowing the ciphertext $$c$$ will not update any attacker’s belief about whatever the underlying message $$m$$ is. Even if you have a statistical training model, it cannot learn anything from purely observations of ciphertexts. But this unbreakable level of security comes with a price: The key space must expose substantial entropy that is as high as the message space, thus the key length can be no shorter than the message length (given by Shannon’s theorem). In practice, the messages we sent do not usually have the highest entropy possible, and we can safely assume that the attackers’ computation ability is bounded by polynomial-time algorithms, thus, we as the cryptosystem designers need only to make up schemes that are assumed to be unbreakable (i.e., breakable with only a negligible probability) for any polynomial-time attackers. As we don’t know yet if there actually are any polynomial unsolvable cases (e.g., is P ≠ NP?), the proof of security would eventually rely on some unproven computational hardness assumptions: one-way functions exist, integer factorization is hard, discrete logarithm is hard, etc. If one can construct a provably secure scheme, it is guaranteed that statistical cryptanalysis would be theoretically impossible within polynomial time (except for side-channel attacks); of course, if the hardness assumption we made is proved invalid, then nothing but the one-time pad can be secure.

I might be writing one or two blog posts about cryptographic security from an information-theoretic perspective and some basic cryptanalysis on insecure schemes, but now is the time to move on with my unfinished courses about logic and programming. Before that, I feel that I should add a little bit philosophy to my wiki so as to refresh my viewpoint and methodology. And here it is.

(Philosophy is a sophisticated, highly arguable subject, so pardon me if there’s any inconsistency with your textbook.)

• Metaphysics is about the fundamental nature of being and the world. Ontology, as a branch of metaphysics, studies about the basic categories of being of entities and their relations.
• Epistemology is the study of knowledge.
• Where does knowledge come form?
• Empiricism claims that knowledge comes from empirical evidence.
• Rationalism claims that knowledge requires justification by reasoning.
• Skepticism rejects the certainty in knowledge and claims that it is impossible to have an adequate justification of knowledge.
• How to resolve the regress problem that the justification of knowledge is questioned ad infinitum?
• In foundationalism, a statement is inferred from a basis of unprovably sound premises.
• This approach leads to the (axiomatic) foundations of mathematics and the constructivism.
• The fundamentals of modern mathematics are the axiomatic set theory (which has several variations with different sets of axioms).
• The fundamentals of modern probability theory are the Kolmogorov axioms.
• The computational hardness assumptions in cryptography may also be seen as a practice of foundationalism.
• In coherentism, a statement holds true as long as it coheres with all other justified beliefs, including itself.
• In infinitism, a justification chain is allowed to be infinite, thus there could never be adequate justification for any statement in the chain (which is the point that it is often criticized for).
• So, what is knowledge, actually?
• Knowledge is justified true belief. (though questioned by the Gettier problem )
• How do we categorize our knowledge?
• A priori knowledge is independent of empirical evidence. Examples: knowledge deduced by logical deductions or mathematical proofs.
• A posteriori knowledge is dependent of empirical evidence. Examples: knowledge induced by statistical inference or learning (either human or machine learning).
• Science is the approach to filter out unreliable knowledge and gather together reliable knowledge as a theory.
• What qualifies as a science?
• See the demarcation problem.
• Scientific discovery is made of hypotheses. A hypothesis is a proposed explanation or prediction method for a phenomenon.
• Only formally proved or statistically tested hypotheses qualify as reliable knowledge.
• Epicurus’ principle of multiple explanations: If multiple hypotheses are consistent with the observations, one should retain them all.
• Occam’s razor: Among all hypotheses consistent with the observations, choose the simplest.
• Epicurus’ principle of multiple explanations and Occam’s razor find their uses in the learning theory, e.g., Occam’s razor bound.
• Logic is the study of reasoning and argument, which plays an essential part in gaining knowledge.
• How to reason / argue by inference?
• Deduction is to infer a conclusion by deriving a logical consequence (“a priori”) from some premises using rules of inference in a formal system. Examples: proofs, a priori probabilities.
• Induction is to infer a probable conclusion (“a posteriori”) from the generalization of some observations. Examples: statistical inference and learning, inductive programming.
• Abduction is to infer a probable explanation from some observations. Examples: deep learning.

We can talk about the philosophy of science (particularly, philosophy of mathematics and statistics) with the understanding of epistemology and logic: Are you a logician or a statistician? If a logician, does set theory or type theory suit you the best? If a statistician, are you a Bayesian or a frequentist?

(As a personally opinionated note, I often find myself subscribe to the skepticism the most. But that doesn’t mean that logical reasoning and statistical inference aren’t useful to me; they are. Extremely. So as not to go too far with this subjective topic, I’ll be focusing more on classical / modal logic in the next few weeks.)

Papers:

 E. L. Gettier, “Is justified true belief knowledge?” analysis, vol. 23, no. 6, pp. 121–123, 1963.

]]>
The Measurable Entropy tag:www.soimort.org,2017:/mst/4 2017-01-11T00:00:00+01:00 2017-01-14T00:00:00+01:00 Mort Yao A brief introduction to basic information theory (entropy/information as a measure for theoretical unpredictability of data) and descriptive statistics (quantitative properties about real-world data including central tendency, dispersion and shape). The maximization of entropy under different constraints yields some common probability distributions: uniform distribution (given no prior knowledge); normal distribution (given that mean and variance are known).

• What is the measure for information?
• Intuitively, if a sample appears to be more naturally “random”, then it may contain more “information” of interest since it takes a greater size of data (more bits) to describe. But how to measure this quantitatively?
• Probability-theoretic view: Shannon entropy.
• Algorithmic view: Kolmogorov complexity. (TBD in February or March 2017)
• Basic information theory
• Shannon entropy
• For discrete random variable $$X$$ with pmf $$p(x)$$: $\operatorname{H}(X) = -\sum_{x\in\mathcal{X}} p(x) \log p(x).$
• For continuous random variable $$X$$ with pdf $$f(x)$$: $\operatorname{H}(X) = -\int_\mathcal{X} f(x) \log f(x) dx.$ (also referred to as differential entropy)
• The notion of entropy is an extension of the one in statistical thermodynamics (Gibbs entropy) and the measure-theoretic entropy of dynamical systems.
• Obviously, the entropy is determined by the pmf/pdf, which depends on the parameters of the specific probability distribution.
• In the context of Computer Science, the logarithm in the formula is often taken to the base $$2$$. Assume that we take a uniform binary string of length $$\ell$$, then $p(x) = 2^{-\ell}$ Thus, the entropy of the distribution is $\operatorname{H}(X) = -\sum_{x\in\mathcal{X}} p(x) \log p(x) = - (2^\ell \cdot 2^{-\ell} \log_2 2^{-\ell}) = \ell$ which is just the length of this ($$\ell$$-bit) binary string. Therefore, the unit of information (when applying binary logarithm) is often called a bit (also shannon).
• For the joint entropy $$\operatorname{H}(X,Y)$$ and the conditional entropy $$\operatorname{H}(X\,|\,Y)$$, the following equation holds: $\operatorname{H}(X,Y) = \operatorname{H}(X\,|\,Y) + \operatorname{H}(Y) = \operatorname{H}(Y\,|\,X) + \operatorname{H}(X)$ Notice that if $$\operatorname{H}(X\,|\,Y) = \operatorname{H}(X)$$, then $$\operatorname{H}(X,Y) = \operatorname{H}(X) + \operatorname{H}(Y)$$ and $$\operatorname{H}(Y\,|\,X) = \operatorname{H}(Y)$$. $$X$$ and $$Y$$ are said to be independent of each other in this case.
• Mutual information $$\operatorname{I}(X;Y) = \operatorname{H}(X) + \operatorname{H}(Y) - \operatorname{H}(X,Y) \geq 0$$ (equality holds iff $$X$$ and $$Y$$ are independent). Unlike the joint entropy or the conditional entropy, this notion does not reflect an actual probabilistic event thus it is referred to as information (sometimes correlation) rather than entropy.
• Kullback-Leibler divergence (relative entropy) $\operatorname{D}_\mathrm{KL}(p\|q) = \sum_{x\in\mathcal{X}} p(x) \log \frac{p(x)}{q(x)}$ KL divergence is a measurement of the distance of two probability distributions $$p(x)$$ and $$q(x)$$.
• If $$p = q$$, $$\operatorname{D}_\mathrm{KL}(p\|q) = 0$$. (Any distribution has a KL divergence of 0 with itself.)
• $$\operatorname{I}(X;Y) = \operatorname{D}_\mathrm{KL}(p(x,y)\|p(x)p(y))$$.
• Cross entropy $\operatorname{H}(p,q) = \operatorname{H}(p) + \operatorname{D}_\mathrm{KL}(p\|q)$ Notice that cross entropy is defined on two distributions $$p$$ and $$q$$ rather than two random variables taking one distribution $$p$$ (given by joint entropy $$\operatorname{H}(X,Y)$$).
• Basic probability theory
• Normal (Gaussian) distribution.
• (Univariate) $$X \sim \mathcal{N}(\mu,\sigma^2)$$ $f_X(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}$ where $$\mu$$ is the mean, $$\sigma^2$$ is the variance of the distribution.
(Multivariate) $$\boldsymbol x \sim \mathcal{N}_k(\boldsymbol\mu,\mathbf\Sigma)$$ $f(\boldsymbol x) = (2\pi)^{-k/2} |\mathbf\Sigma|^{-1/2} e^{-\frac{1}{2} (\boldsymbol x - \boldsymbol\mu)^\mathrm{T} \Sigma^{-1}(\boldsymbol x - \boldsymbol\mu)}$ where $$\boldsymbol\mu$$ is the mean vector, $$\mathbf\Sigma$$ is the covariance matrix of the distribution.
• Maximum entropy: normal distribution is the probability distribution that maximizes the entropy when the mean $$\mu$$ and the variance $$\sigma^2$$ are fixed.
• The normal distribution does not have any shape parameter. Moreover, its skewness and excess kurtosis are always 0.
• Basic descriptive statistics
• Descriptive statistics describe the properties of data sets quantitatively, without making any further inference.
• Population vs. sample.
• Three major descriptive statistics:
1. Central tendency: sample means (arithmetic mean $$\mu$$, geometric, harmonic), median, mode, mid-range.
• Arithmetic mean is an unbiased estimator of the population mean (expectation).
• Median and mode are most robust in the presence of outliers.
2. Dispersion (or variability): minimum, maximum, range, IQR (interquartile range), maximum absolute deviation, MAD (mean absolute deviation), sample variance $$s^2$$ with Bessel’s correction, CV (coefficient of variance), VMR (index of dispersion).
• IQR and MAD are robust in the presence of outliers.
• Sample variance (with Bessel’s correction) is an unbiased estimator of the population variance.
• CV and VMR are sample standard deviation and sample variance normalized by the mean respectively, thus they are sometimes called relative standard deviation and relative variance; they are not unbiased though.
3. Shape: sample skewness, sample excess kurtosis.
• These statistics show how a sample deviates from normality, since the skewness and the excess kurtosis of a normal distribution are 0. The estimators could vary under different circumstances.

## Entropy as a measure1

For random variables $$X$$ and $$Y$$, we define sets $$\tilde X$$ and $$\tilde Y$$. Then the information entropy $$\operatorname{H}$$ may be viewed as a signed measure $$\mu$$ over sets: \begin{align*} \operatorname{H}(X) &= \mu(\tilde X) \\ \operatorname{H}(Y) &= \mu(\tilde Y) \\ \operatorname{H}(X,Y) &= \mu(\tilde X \cup \tilde Y) \qquad \text{(Joint entropy is the measure of a set union)} \\ \operatorname{H}(X\,|\,Y) &= \mu(\tilde X \setminus \tilde Y) \qquad \text{(Conditional entropy is the measure of a set difference)} \\ \operatorname{I}(X;Y) &= \mu(\tilde X \cap \tilde Y) \qquad \text{(Mutual information is the measure of a set intersection)} \end{align*} The inclusion–exclusion principle: \begin{align*} \operatorname{H}(X,Y) &= \operatorname{H}(X) + \operatorname{H}(Y) - \operatorname{I}(X;Y) \\ \mu(\tilde X \cup \tilde Y) &= \mu(\tilde X) + \mu(\tilde Y) - \mu(\tilde X \cap \tilde Y) \end{align*} Bayes’ theorem: \begin{align*} \operatorname{H}(X\,|\,Y) &= \operatorname{H}(Y\,|\,X) + \operatorname{H}(X) - \operatorname{H}(Y) \\ \mu(\tilde X \setminus \tilde Y) &= \mu(\tilde Y \setminus \tilde X) + \mu(\tilde X) - \mu(\tilde Y) \end{align*}

## Entropy and data coding

Absolute entropy (Shannon entropy) quantifies how much information is contained in some data. For data compression, the entropy gives the minimum size that is needed to reconstruct original data (losslessly). Assume that we want to store a random binary string of length $$\ell$$ (by “random”, we do not have yet any prior knowledge on what data to be stored). Under the principle of maximum entropy, the entropy of its distribution $$p(x)$$ should be maximized: $\max \operatorname{H}(X) = \max \left\{ -\sum_{x\in\mathcal{X}} p(x) \log p(x) \right\}$ given the only constraint $\sum_{x\in\mathcal{X}} p(x) = 1$ Let $$\lambda$$ be the Lagrange multiplier, set $\mathcal{L} = - \sum_{x\in\mathcal{X}} p(x) \log p(x) - \lambda\left( \sum_{x\in\mathcal{X}} p(x) - 1 \right)$ We get \begin{align*} \frac{\partial\mathcal{L}}{\partial x} = -p(x)(\log p(x) + 1 + \lambda) &= 0 \\ \log p(x) &= - \lambda - 1 \\ p(x) &= c \qquad \text{(constant)} \end{align*} That is, the discrete uniform distribution maximizes the entropy for a random string. Since $$|\mathcal{X}| = 2^\ell$$, we have $$p(x) = 2^{-\ell}$$ and $$\operatorname{H}(X) = -\sum_{x\in\mathcal{X}} 2^{-\ell} \log_2 2^{-\ell} = \ell$$ (bits). We conclude that the information that can be represented in a $$\ell$$-bit string is at most $$\ell$$ bits. Some practical results include

• In general, pseudorandom data (assume no prior knowledge) cannot be losslessly compressed, e.g., the uniform key used in one-time pad must have $$\log_2 |\mathcal{M}|$$ bits (lower bound) so as not to compromise the perfect secrecy. (Further topic: Shannon’s source coding theorem)
• Fully correct encoding/decoding of data, e.g., $$\mathsf{Enc}(m)$$ and $$\mathsf{Dec}(c)$$ algorithms in a private-key encryption scheme, must ensure that the probability distributions of $$m \in \mathcal{M}$$ and $$c \in \mathcal{C}$$ have the same entropy.
• An algorithm with finite input cannot generate randomness infinitely. Consider a circuit that takes the encoded algorithm with some input ($$\ell$$ bits in total) and outputs some randomness, the entropy of the output data is at most $$\ell$$ bits. (Further topic: Kolmogorov complexity)

Relative entropy (KL divergence) quantifies how much information diverges between two sets of data. For data differencing, the KL divergence gives the minimum patch size that is needed to reconstruct target data (with distribution $$p(x)$$) given source data (with distribution $$q(x)$$).

In particular, if $$p(x) = q(x)$$, which means that the two distributions are identical, we have $$\operatorname{D}_\mathrm{KL}(p\|q) = 0$$. This follows our intuition that no information is gained or lost during data encoding/decoding. If $$p(x_0) = 0$$ at $$x=x_0$$, we take $$p(x) \log \frac{p(x)}{q(x)} = 0$$, to justify the fact that the target data is trivial to reconstruct at this point, no matter how much information $$q(x)$$ contains. However, if $$q(x_0) = 0$$ at $$x=x_0$$, we should take $$p(x) \log \frac{p(x)}{q(x)} = \infty$$, so that the target data is impossible to reconstruct if we have only trivial $$q(x)$$ at some point (unless $$p(x_0) = 0$$).

Lemma 4.1. (Gibbs’ inequality)2 The KL divergence is always non-negative: $$\operatorname{D}_\mathrm{KL}(p\|q) \geq 0$$.

Informally, Lemma 4.1 simply states that in order to reconstruct target data from source data, either more information ($$\operatorname{D}_\mathrm{KL}(p\|q) > 0$$) or no further information ($$\operatorname{D}_\mathrm{KL}(p\|q) = 0$$) is needed.

## Maximum entropy and normality

Theorem 4.2. Normal distribution $$\mathcal{N}(\mu,\sigma^2)$$ maximizes the differential entropy for given mean $$\mu$$ and variance $$\sigma^2$$.

Proof.3 Let $$g(x)$$ be a pdf of the normal distribution with mean $$\mu$$ and variance $$\sigma^2$$. Let $$f(x)$$ be an arbitrary pdf with the same mean and variance.

Consider the KL divergence between $$f(x)$$ and $$g(x)$$. By Lemma 4.1 (Gibbs’ inequality): $\operatorname{D}_\mathrm{KL}(f\|g) = \int_{-\infty}^\infty f(x) \log \frac{f(x)}{g(x)} dx = \operatorname{H}(f,g) - \operatorname{H}(f) \geq 0$

Notice that \begin{align*} \operatorname{H}(f,g) &= - \int_{-\infty}^\infty f(x) \log g(x) dx \\ &= - \int_{-\infty}^\infty f(x) \log \left( \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x-\mu)^2}{2\sigma^2}} \right) dx \\ &= \frac{1}{2} \left( \log(2\pi\sigma^2) + 1 \right) \\ &= \operatorname{H}(g) \end{align*} Therefore, $\operatorname{H}(g) \geq \operatorname{H}(f)$ That is, the distribution of $$g(x)$$ (Gaussian) always has the maximum entropy.

It is also possible to derive the normal distribution directly from the principle of maximum entropy, under the constraint such that $$\int_{-\infty}^\infty (x-\mu)^2f(x)dx = \sigma^2$$.

The well-known central limit theorem (CLT) which states that the sum of independent random variables $$\{X_1,\dots,X_n\}$$ tends toward a normal distribution may be alternatively expressed as the monotonicity of the entropy of the normalized sum: $\operatorname{H}\left( \frac{\sum_{i=1}^n X_i}{\sqrt{n}} \right)$ which is an increasing function of $$n$$. 

Books:

T. M. Cover and J. A. Thomas. Elements of Information Theory, 2nd ed.

Papers:

 S. Artstein, K. Ball, F. Barthe, and A. Naor, “Solution of shannon’s problem on the monotonicity of entropy,” Journal of the American Mathematical Society, vol. 17, no. 4, pp. 975–982, 2004.

]]>
The Adversarial Computation tag:www.soimort.org,2017:/mst/3 2017-01-01T00:00:00+01:00 2017-01-01T00:00:00+01:00 Mort Yao Theory of computation (computability and complexity) forms the basis for modern cryptography:

• What is an algorithm?
• An algorithm is a computational method for solving an abstract problem.
• An algorithm takes as input a set $$I$$ of problem instances, and outputs a solution from the set $$S$$ of problem solutions.
• An algorithm is represented in a well-defined formal language.
• An algorithm must be able to be represented within a finite amount of time and space (otherwise it cannot be actually used for solving any problem).
• An algorithm can be simulated by any model of computation:
• Turing machine is a model implemented through internal states.
• λ-calculus is a model based on pure functions.
• All Turing-complete models are equivalent in their computational abilities.
• Computability: Not every abstract problem is solvable. Notably, there exists a decision problem for which some instances can neither be accepted nor rejected by any algorithm. (Undecidable problem)
• Complexity:
• The complexity class P is closed under polynomial-time reductions. Hence, proof by reduction can be a useful technique in provable security of cryptosystems.
• If one can prove that P = NP, then one-way functions do not exist. This would invalidate the construction of cryptographically secure pseudorandom generators (PRG). (Pseudorandom generator theorem)
• In many scenarios, we assume that an algorithm acts as a stateless computation and takes independent and identically distributed inputs. It differs from a computer program conceptually.
• An algorithm can be either deterministic or probabilistic.
• For probabilistic algorithms, the source of randomness may be from:
• External (physical) input of high entropy.
• Pseudorandomness: Since everything computational is deterministic, the existence of pseudorandomness relies on the (assumed) existence of one-way functions and PRGs.

Generally, pseudorandom generators used in probabilistic algorithms yield random bits according to the uniform distribution, so it is worth mentioning:

• Basic probability theory
• Discrete uniform distribution. $$\mathcal{U}\{a,b\}$$. The probability distribution where a finite number of values are equally likely to be observed (with probability $$\frac{1}{b-a+1}$$).

Cryptographic schemes are defined as tuples of deterministic or probabilistic algorithms:

• Principles of modern cryptography
• Formal description of a private-key encryption scheme $$\Pi=(\mathsf{Gen},\mathsf{Enc},\mathsf{Dec})$$ with message space $$\mathcal{M}$$.
• $$\mathsf{Gen}$$, $$\mathsf{Enc}$$, $$\mathsf{Dec}$$ are three algorithms.
• Correctness: $$\mathsf{Dec}_k(\mathsf{Enc}_k(m)) = m$$.
• For the correctness equality to hold, $$\mathsf{Dec}$$ should be deterministic.
• Assume that we have access to a source of randomness, $$\mathsf{Gen}$$ should choose a key at random thus is probabilistic. If $$\mathsf{Gen}$$ is deterministic and always generate the same key, such an encryption scheme is of no practical use and easy to break.
• $$\mathsf{Enc}$$ can be either deterministic (e.g., as in one-time pads) or probabilistic. Later we will see that for an encryption scheme to be CPA-secure, $$\mathsf{Enc}$$ should be probabilistic.
• Kerchhoffs’ principle (Shannon’s maxim) claims that a cryptosystem should be secure even if the scheme $$(\mathsf{Gen},\mathsf{Enc},\mathsf{Dec})$$ is known to the adversary. That is, security should rely solely on the secrecy of the private key.
• Provable security of cryptosystems requires:
1. Formal definition of security;
2. Minimal assumptions;
3. Rigorous proofs of security.
• Common attacks and notions of security:
• Ciphertext-only attack.
• A cryptosystem is said to be perfectly secret if it is theoretically unbreakable under ciphertext-only attack.
• A cryptosystem is said to be computationally secure if it is resistant to ciphertext-only attack (by any polynomial-time adversary).
• Known-plaintext attack (KPA). A cryptosystem is KPA-secure if it is resistant to KPA.
• KPA-security implies ciphertext-only security.
• Chosen-plaintext attack (CPA). A cryptosystem is CPA-secure (or IND-CPA) if it is resistant to CPA.
• IND-CPA implies KPA-security.
• Chosen-ciphertext attack (CCA). A cryptosystem is CCA-secure (or IND-CCA1) if it is resistant to CCA; furthermore, a cryptosystem is IND-CCA2 if it is resistant to adaptive CCA (where the adversary may make further calls to the oracle, but may not submit the challenge ciphertext).
• IND-CCA1 implies IND-CPA.
• IND-CCA2 implies IND-CCA1. Thus, IND-CCA2 is the strongest of above mentioned definitions of security.
• Perfect secrecy
• Two equivalent definitions: (proof of equivalence uses Bayes’ theorem)
• $$\Pr[M=m\,|\,C=c] = \Pr[M=m]$$. (Observing a ciphertext $$c$$ does not leak any information about the underlying message $$m$$)
• $$\Pr[\mathsf{Enc}_K(m)=c] = \Pr[\mathsf{Enc}_K(m')=c]$$. (The adversary has no bias when distinguishing two messages if given only the ciphertext $$c$$)
• Perfect indistinguishability defined on adversarial indistinguishability experiment $$\mathsf{PrivK}_{\mathcal{A},\Pi}^\mathsf{eav}$$:
• $$\Pr[\mathsf{PrivK}_{\mathcal{A},\Pi}^\mathsf{eav} = 1] = \frac{1}{2}$$. (No adversary can win the indistinguishability game with a probability better than random guessing)
• Perfect indistinguishability is equivalent to the definition of perfect secrecy.
• The adversarial indistinguishability experiment is a very useful setting in defining provable security, e.g., the definition of computational indistinguishability: (for arbitrary input size $$n$$)
• $$\Pr[\mathsf{PrivK}_{\mathcal{A},\Pi}^\mathsf{eav}(n) = 1] \leq \frac{1}{2} + \mathsf{negl}(n)$$ where $$\mathsf{negl}(n)$$ is a negligible function.
• Perfect secrecy implies that $$|\mathcal{K}| \geq |\mathcal{M}|$$, i.e., the key space must be larger than the message space. If $$|\mathcal{K}| < |\mathcal{M}|$$, then the scheme cannot be perfectly secure.
• Shannon’s theorem: If $$|\mathcal{K}| = |\mathcal{M}| = |\mathcal{C}|$$, an encryption scheme is perfectly secret iff:
• $$k \in \mathcal{K}$$ is chosen uniformly.
• For every $$m \in \mathcal{M}$$ and $$c \in \mathcal{C}$$, there exists a unique $$k \in \mathcal{K}$$ such that $$\mathsf{Enc}_k(m) = c$$.

A brief, formalized overview of some classical ciphers, and their security:

• One-time pad (Vernam cipher): XOR cipher when $$|\mathcal{K}| = |\mathcal{M}|$$.
• One-time pad is perfectly secret. The proof simply follows from Bayes’ theorem. (Also verified by Shannon’s theorem. While one-time pad was initially introduced in the 19th century and patented by G. Vernam in 1919, it was not until many years later Claude Shannon gave a formal definition of information-theoretical security and proved that one-time pad is a perfectly secret scheme in his groundbreaking paper. )
• One-time pad is deterministic. Moreover, it is a reciprocal cipher ($$\mathsf{Enc} = \mathsf{Dec}$$).
• One-time pad is not secure when the same key is applied in multiple encryptions, and it is not CPA-secure. In fact, an adversary can succeed in such indistinguishability experiments with probability 1.
• Insecure historical ciphers:
• Shift cipher: Defined with key space $$\mathcal{K}=\{0,\dots,n-1\}$$. ($$n=|\Sigma|$$)
• $$|\mathcal{K}|=n$$, $$|\mathcal{M}|=n^\ell$$.
• Cryptanalysis using frequency analysis.
• Substitution cipher: Defined with key space $$\mathcal{K} = \mathfrak{S}_\Sigma$$ (symmetric group on $$\Sigma$$).
• $$|\mathcal{K}|=n!$$, $$|\mathcal{M}|=n^\ell$$.
• Cryptanalysis using frequency analysis.
• Vigenère cipher (poly-alphabetic shift cipher): Like (mono-alphabetic) shift cipher, but the key length is an (unknown) integer $$t$$.
• $$|\mathcal{K}|=n^t$$, $$|\mathcal{M}|=n^\ell$$. (Typically $$t \ll \ell$$)
• Cryptanalysis using Kasiski’s method, index of coincidence method and frequency analysis.

Lessons learned from these classical ciphers: While perfect secrecy is easy to achieve (one-time pads), designing practical cryptographic schemes (with shorter keys, and computationally hard to break) can be difficult.

## Where do random bits come from?

The construction of private-key encryption schemes involves probabilistic algorithms. We simply assume that an unlimited supply of independent, unbiased random bits is available for these cryptographic algorithms. But in practice, this is a non-trivial issue, as the source of randomness must provide high-entropy data so as to accommodate cryptographically secure random bits.

In the perfectly secret scheme of one-time pads, the key generation algorithm $$\mathsf{Gen}$$ requires the access to a source of randomness in order to choose the uniformly random key $$k \in \mathcal{K}$$. Practically, high-entropy data may be collected via physical input or even fully written by hand with human labor.

Theoretically, without external intervention, we have:

Conjecture 3.1. Pseudorandom generators exist.

Theorem 3.2. (Pseudorandom generator theorem) Pseudorandom generators exist if and only if one-way functions exist.

Pseudorandomness is also a basic construction in CPA-secure encryption algorithms ($$\mathsf{Enc}$$), e.g., in stream ciphers and block ciphers.

So what is an acceptable level of pseudorandomness, if we are not sure whether such generators theoretically exist? Intuitively, if one cannot distinguish between a “pseudorandom” string (generated by a PRG) and a truly random string (chosen according to the uniform distribution), we have confidence that the PRG is a good one. Various statistical tests have been designed for testing the randomness of PRGs.

## Pseudorandomness and IND-CPA

It holds true that:

Corollary 3.3. By redefining the key space, we can assume that any encryption scheme $$\Pi=(\mathsf{Gen},\mathsf{Enc},\mathsf{Dec})$$ satisfies

1. $$\mathsf{Gen}$$ chooses a uniform key.
2. $$\mathsf{Enc}$$ is deterministic.

If so, why do we still need probabilistic $$\mathsf{Enc}$$ in CPA-secure encryptions? Can’t we just make $$\mathsf{Enc}$$ deterministic while still being CPA-secure?

The first thing to realize is that chosen-plaintext attacks are geared towards multiple encryptions (with the same secret key $$k$$), so when the adversary obtains a pair $$(m_0, c_0)$$ such that $$\Pr[C=c_0\,|\,M=m_0] = 1$$, the key is already leaked. (Recall that the adversary knows the deterministic algorithm $$\mathsf{Enc}_k$$, thus reversing $$k$$ from known $$m_0$$ and $$c_0$$ can be quite feasible; e.g., in a one-time pad, $$k = m_0 \oplus c_0$$.) The only way to get around this is make $$\mathsf{Enc}_k$$ probabilistic (constructed from a pseudorandom function), such that an adversary cannot reverse the key efficiently within polynomial time.

Note that perfect secrecy is not possible under CPA, since there is a small possibility that the adversary will reverse the key (by, for example, traversing an exponentially large lookup table of all random bits) and succeed in the further indistinguishability experiment with a slightly higher (but negligible) probability.

## Historical exploits of many-time pad

One-time pad is one of the most (provably) secure encryption schemes, and its secrecy does not rely on any computational hardness assumptions. However, it requires that $$|\mathcal{K}| \geq |\mathcal{M}|$$ (which in fact is a necessary condition for any perfectly secret scheme), thus its real-world use is limited.

The one-time key $$k$$ (uniformly chosen from the key space $$\mathcal{K}$$) may not be simply reused in multiple encryptions. Assume that $$|\mathcal{K}| = |\mathcal{M}|$$, for encryptions of $$n$$ messages, the message space is expanded to size $$|\mathcal{M}|^n$$, while the key space remains $$\mathcal{K}$$, thus we have $$|\mathcal{K}| < |\mathcal{M}|^n$$. Such a degraded scheme (many-time pad) is theoretically insecure and vulnerable to several practical cryptanalyses.

A historical exploit of the vulnerability of many-time pad occurred in the VENONA project, where the U.S. Army’s Signal Intelligence Service (later the NSA) aimed at decrypting messages sent by the USSR intelligence agencies (KGB) over a span of 4 decades. As the KGB mistakenly reused some portions of their one-time key codebook, the SIS was able to break a good amount of the messages.1

Books:

J. Katz and Y. Lindell, Introduction to Modern Cryptography, 2nd ed.

Papers:

 C. E. Shannon, “Communication theory of secrecy systems,” Bell system technical journal, vol. 28, no. 4, pp. 656–715, 1949.

1. R. L. Benson, “The Venona story.” https://www.nsa.gov/about/cryptologic-heritage/historical-figures-publications/publications/coldwar/assets/files/venona_story.pdf

]]>
The Probable Outcome tag:www.soimort.org,2017:/mst/2 2016-12-23T00:00:00+01:00 2016-12-23T00:00:00+01:00 Mort Yao A refresher of basic probability theory, which is just common knowledge but plays a supporting role in information theory, statistical methods, and consequently, computer science.

• Basic probability theory
• An experiment has various outcomes. The set of all probable outcomes constitute the sample space of that experiment.
• Any measurable subset of the sample space $$\Omega$$ is known as an event.
• A probability measure is a real-valued function defined on a set of events $$\mathcal{F}$$ in a probability space $$(\Omega,\mathcal{F},\Pr)$$ that satisfies measure properties such as countable additivity. (See Kolmogorov’s axioms.)
• The union bound (Boole’s inequality) follows from the fact that a probability measure is σ-sub-additive.
• Events can be independent. The following conditions hold equivalently for any independent events:
• $$\Pr[A_1 \cap A_2] = \Pr[A_1] \cdot \Pr[A_2]$$
• $$\Pr[A_1|A_2] = \Pr[A_1]$$
• Bayes’ theorem and the law of total probability describe the basic properties of conditional probability.
• A random variable is a mapping that maps a value to an event. Hence, we have probability measure defined on random variables, such as $$\Pr[X=x]$$.
• For discrete random variables, a probability mass function (pmf) determines a discrete probability distribution.
• For continuous random variables, a probability density function (pdf) determines a continuous probability distribution.
• Random variables can be uncorrelated. ($$\operatorname{Cov}(X,Y)=0 \iff \operatorname{E}[XY] = \operatorname{E}[X] \cdot \operatorname{E}[Y]$$.)
• Independent random variables are uncorrelated.
• However, uncorrelated random variables are not necessarily independent.
• A distribution can be presented using moments:
• Expectation (mean) $$\operatorname{E}[X]$$: first raw moment.
• Variance $$\operatorname{Var}(X)$$: second central moment.
• Skewness $$\operatorname{Skew}(X)$$: third standardized moment.
• Kurtosis $$\operatorname{Kurt}(X)$$: fourth standardized moment.
• For a bounded distribution of probability, the collection of all the moments (of all orders) uniquely determines the distribution.
• Some distributions, notably Cauchy distributions, do not have their moments defined.
• Concentration inequalities provide bounds on how a random variable deviates from some value (usually one of its moments).
• Markov’s inequality is the simplest and weakest probability bound.
• Chebyshev’s inequality provides an upper bound on the probability that a random variable deviates from its expectation.
• Chernoff bound is stronger than Markov’s inequality.
• Hoeffding’s inequality provides an upper bound on the probability that the sum of random variables deviates from its expectation. It’s also useful for analyzing the number of required samples needed to obtain a confidence interval.
• Some common discrete probability distributions:
• Bernoulli distribution. Special case of Binomial distribution: $$\text{B}(1,p)$$.
• Binomial distribution $$\text{B}(n,p)$$. Given number of draws $$n$$, the distribution of the number of successes.
• Geometric distribution $$\text{Geom}(p)$$. Special case of negative binomial distribution: $$\text{NB}(1,1-p)$$.
• Negative binomial distribution $$\text{NB}(r,p)$$. Given number of failures $$r$$, the distribution of the number of successes.

## Probability measure, distribution and generalized function

Intuitively, probability is a measure of uncertainty. Mathematically, probability is a real-valued function defined on a set of events in a probability space that satisfies measure properties such as countable additivity (or simply, measure on probability space).

Typically, a probability density function (pdf) or a probability mass function (pmf) determines a distribution in the probability space.

Example 2.1. Consider the wave function of a particle: $\Psi(x,t)$ where $$x$$ is position and $$t$$ is time.

If the particle’s position is measured, its location cannot be determined but is described by a probability distribution: The probability that the particle is found in $$[x, x+\Delta x]$$ is $\Delta\Pr = |\Psi(x,t)|^2 \Delta x$

The square modulus of the wave function (which is real-valued, non-negative) $\left|\Psi(x, t)\right|^2 = {\Psi(x, t)}^{*}\Psi(x, t) = \rho(x, t)$ is interpreted as the pdf.

Since the particle must be found somewhere, we have the normalization condition: (by the assumption of unit measure) $\int\limits_{-\infty}^\infty |\Psi(x,t)|^2 dx = 1$

Distributions are also called generalized functions in analysis. It expands the notion of functions to functions whose derivatives may not exist in the classical sense. Thus, it is not uncommon that many probability distributions cannot be described using classical (differentiable) functions. The Dirac delta function $$\delta$$ (which is a generalized function) is often used to represent a discrete distribution, or a partially discrete, partially continuous distribution, using a pdf.

## Bayes’ theorem and common fallacies

Bayes’ theorem forms the basis for Bayesian inference, which is an important method of statistical inference that updates the probability for a hypothesis as more evidence or information becomes available.

Hypotheses can also be fallacies. In Bayesian inference, if one can make the assumption that every event occurs independently and the probability is identically distributed throughout lasting trials, it is clear to see that some common beliefs are mistaken.

Gambler’s fallacy (Monte Carlo fallacy). If an outcome occurs more frequently than normal during some period, it will happen less frequently in the future; contrariwise, if an outcome happens less frequently than normal during some period, it will happen more frequently in the future. This is presumed to be a means of balancing nature.

Gambler’s fallacy is considered a fallacy if the probability of outcomes is known to be independently, identically distributed. Assume that the future (the probability of event $$A_2$$) has no effect on the past (the probability of event $$A_1$$), we have $$\Pr[A_1|A_2] = \Pr[A_1]$$. From Bayes’ theorem, it holds true that $\Pr[A_2|A_1] = \Pr[A_2]$ That is, past events should not increase or decrease our confidence in a future event.

Hot-hand fallacy. A person who has experienced success with a seemingly random event has a greater chance of further success in additional attempts. That is, if an outcome occurs more frequently than normal during some period, it will also happen frequently in the future.

If psychological factors can be excluded, then hot-hand fallacy is a fallacy caused by people’s confirmation bias. Like the gambler’s fallacy, if we can’t assume that the probability of outcomes is independently, identically distributed, we can’t simply conclude that this belief is mistaken.

Inverse gambler’s fallacy. If an unlikely outcome occurs, then the trials must have been repeated many times before.

Assume that the past (the probability of event $$A_1$$) has no effect on the future (the probability of event $$A_2$$), we have $$\Pr[A_2|A_1] = \Pr[A_2]$$. From Bayes’ theorem, it holds true that $\Pr[A_1|A_2] = \Pr[A_1]$ That is, our confidence in $$A_1$$ should remain unchanged after we observe $$A_2$$.

## LLN and Chebyshev’s inequality

Fallacies of hasty generalization and slothful induction (law of small numbers). Informal fallacies reaching an inductive generalization based on insufficient evidence, or denying a reasonable conclusion of an inductive argument.

Statistically saying, sampling from a small group can lead to misbeliefs that fail to hold for the entire population, if hypothesis testing is not carefully conducted.

Theorem 2.2. (Law of large numbers) Let $$X_1, \dots, X_n$$ be an infinite sequence of i.i.d. Lebesgue integrable random variables with fixed expectation $$\operatorname{E}[X_1] = \cdots = \operatorname{E}[X_n] = \mu$$. Define the sample average $\overline{X}_n = \frac{1}{n}(X_1 + \dots + X_n)$

1. (Weak law of large numbers; Khintchine’s law) The sample average converges in probability towards the expectation: $\lim_{n\to\infty} \Pr[|\overline{X}_n - \mu| > \varepsilon] = 0$
2. (Strong law of large numbers) The sample average converges almost surely to the expectation: $\Pr[\lim_{n\to\infty} \overline{X}_n = \mu] = 1$

Chebyshev’s inequality provides an upper bound on the probability that a random variable deviates from its expected value. Thus, it may be used as a proof for the weak law of large numbers.

## How is mathematical expectation only “mathematical”?

The expected value of a random variable $$X$$: $\operatorname{E}[X] = \sum_{x \in \mathcal{X}} x \Pr[X=x]$ While it seemingly gives an estimate on how people would “expect” a random variable to take its value, it can sometimes lead to counterintuitive results, as shown by the following paradox.

St. Petersburg Paradox.1 A casino offers a game of chance for a gambler to flip a fair coin until it comes up tails. The initial stake starts at $$2$$ dollars and is doubled every time heads appears. The first time tails appears, the game ends and the gambler wins whatever is in the pot. Thus if the coin comes up tails the first time, the gambler wins $$2^1=2$$ dollars, and the game ends. If the coin comes up heads, the coin is flipped again. If the coin comes up tails the second time, the gambler wins $$2^2=4$$ dollars, and the game ends. If the coin comes up heads again, the coin is flipped again. If the coin comes up tails the third time, the gambler wins $$2^3=8$$ dollars, and the game ends. So on and so like. Eventually the gambler wins $$2^k$$ dollars, where $$k$$ is the number of coin flips until tails appears. (It is easy to see that $$k$$ satisfies the geometric distribution.) What would be a fair price to pay the casino for entering such a game? (Assume that there is no house edge)

$$k$$th coin flip $$\Pr[\text{Tails}]$$ Stake Expected payoff
1 $$\frac{1}{2}$$ 2 1
2 $$\frac{1}{4}$$ 4 1
3 $$\frac{1}{8}$$ 8 1
4 $$\frac{1}{16}$$ 16 1
5 $$\frac{1}{32}$$ 32 1
k $$(1/2)^k$$ $$2^k$$ 1

The price should be made equal to the expected value that a gambler wins the stake, which is $\operatorname{E}[\text{Payoff}] = \sum_{k=1}^{+\infty} \left(\frac{1}{2}\right)^k \cdot 2^k = \sum_{k=1}^{+\infty} 1 = +\infty$

If a rational gambler pays for entering a game if and only if its average payoff is larger than its price, then he would pay any price to enter this game (since the expected payoff of this game is infinitely large). But in reality, few of us are willing to pay even tens of dollars to enter such a game. What went wrong? Furthermore, if mathematical expectation does not reflect correctly what people expect from a game, how to quantify the “true” expectation?

The St. Petersburg paradox was initially stated by Nicolas Bernoulli in 1713. There are several proposed approaches for solving the paradox, including the expected utility theory with the hypothesis of diminishing marginal utility , and the cumulative prospect theory. However, none of them is purely probability theoretical, as they require the use of hypothesized economic/behavioral models.

## Bias of sample variance and Bessel’s correction

In probability theory, the variance of a random variable $$X$$ is defined as $\operatorname{Var}(X) = \operatorname{E}[(X-\mu)^2] = \frac{1}{N} \sum_{i=1}^N (X_i-\bar{X})^2$

In statistics, when calculating the sample variance in order to give an estimation of the population variance, and the population mean is unknown, Bessel’s correction2 (use of $$N-1$$ instead of $$N$$) is often preferred: $s^2 = \frac{1}{N-1} \sum_{i=1}^N (X_i-\bar{X})^2$

A few remarks and caveats:

1. Bessel’s correction is only necessary when the population mean is unknown and estimated as the sample mean.
2. Without Bessel’s correction, the estimated variance would be biased; the biased sample variance $$s_n^2$$ tends to be much smaller than the population variance $$\sigma^2$$, whether the sample mean is smaller or larger than the population mean.
3. Bessel’s correction does not yield an unbiased estimator of standard deviation, only variance and covariance.
4. The corrected estimator often has a larger mean squared error (MSE).

Books:

M. Mitzenmacher and E. Upfal, Probability and Computing: Randomized Algorithms and Probabilistic Analysis.

M. Baron, Probability and Statistics for Computer Scientists, 2nd ed.

Articles:

 R. Martin, “The st. Petersburg paradox,” in The stanford encyclopedia of philosophy, Summer 2014., E. N. Zalta, Ed. https://plato.stanford.edu/archives/sum2014/entries/paradox-stpetersburg/; Metaphysics Research Lab, Stanford University, 2014.

]]>
Remember My Last Tabs, File Manager tag:www.soimort.org,2017:/notes/161208 2016-12-08T00:00:00+01:00 2016-12-13T00:00:00+01:00 Mort Yao It’s 2016, and I can’t believe that there is still no “Continue where you left off” option in most dominant GUI file managers (as far as I know)!

Yes, it bugs me when I can’t restore my last open tabs and I want my old session so badly. Remembering last tabs, if I get the history right, was a feature first introduced by Google Chrome, and soon it started to play an indispensable part in my daily workflow. I’m a multitasker, but the computing resource of my laptop is very limited – Say, if I have a session in which I am working on a homework report, having loads of folders, web pages and editor buffers open and those can fill up gigabytes of RAM easily, then I realize that I will need to compile something really hard-core, or maybe just take a short rest and do some random surfing on the web, certainly I would rather close all those engrossing processes for the time being, hoping that they could continue with all the open tabs I left off.

It’s mainly four types of applications that account for so-called “work sessions” for me:

• Terminal emulator
• File manager
• Web browser
• Text editor

Terminals don’t take up a lot of memory, so I wouldn’t mind leaving them open. Typical browsers, including Chromium and Firefox, do support session resuming (and there are even extensions which allow you to save current tabs and recover them at any later time). Any decent text editor (or IDE) may also be configured to remember open tabs / sessions. After all, average file managers fail to meet my basic needs of productivity.

## File managers: The unremembered UX

I’m on GNOME 3, but currently using the Caja file manager – ever since Nautilus 3.6 decided to remove two or three features I found important to me (compact mode, backspace navigation) and introduced an awkward, smug “search-whatever-shit-as-you-type” feature.

File managers I’ve tried so far:

• Nautilus (GNOME). As said, already rendered unusable for me.
• Pantheon. Like Nautilus, it doesn’t feature a compact mode either.
• Nemo (Cinnamon). Nope, segfaults too often.
• Caja (MATE). It’s OK, just what I’m using right now.
• Dolphin (KDE). OK, unless it’s from the foreign land of KDE.
• Konqueror (KDE). It’s both a web browser and a file manager, and it’s the only one I know that can save / restore open tabs. Unfortunately it has only limited file management functionality. (sufficient as a file viewer, though?)

Among all above, I settled down with Caja, simply because there was no reasonably good alternative. Still, I’m wishing for something that can save session states for me. After doing a little research, I realized that:

1. There is no commonly adopted protocol addressing this issue. Not even on GNOME.
2. There is EggSMClient, but state saving is implemented on the X (desktop) session level thus only available on the XSMP backend. It works when you logout your desktop session and login, but not when you close the window and restart the application again.
3. It is ultimately the application itself which must maintain its session states and restore them when required.

## A quick working patch for Caja

Let’s take the issue into more technical details. On Caja (or other similarly GTK+/GLib-based file managers), one need to implement:

• On the destroy callback of the main GtkObject, all last remaining session data (e.g., internal states about open tabs, windows) must be saved to disk. (after the main loop ends there’s no more chance to get this information back)
• On GUI initialization, read last session data (if exist) from disk, and reopen saved tabs as well as windows.
• On the event of changing state (e.g., creating or closing tab/window, repositioning tabs), session data are updated respectively and, optionally, saved to disk.

With caja_application_get_session_data(), making a quick workaround that enables Caja to save and restore a session is somewhat trivial labor; however, it seems Caja doesn’t record the correct (spatial) ordering of tabs in its session data – so I wouldn’t consider this as a satisfying solution to the issue, and I have no intent to send such an incomplete patch to Caja. Nevertheless, it’s better than nothing, and, if ordering of tabs really matters, it would be feasible to write a wrapper script that manipulates the XML file in $HOME/.config/caja/last-session. And here goes the patch: (Applied to Caja 1.16.1; definitely UNWARRANTED) ]]> Boilerplating Pandoc for Academic Writing tag:www.soimort.org,2017:/notes/161117 2016-11-17T00:00:00+01:00 2016-11-17T00:00:00+01:00 Mort Yao For starters, this is how you might want to turn your well-written Markdown file (with common metadata fields like title, author and date) into a properly typeset PDF document: $ pandoc src.md -o out.pdf

However, Markdown is not TeX. Not even close. Once you need to have some bleeding edge control over the typesetting outcome, or perhaps just a little refinement on its LaTeX templating, you’ll soon notice that Pandoc has its quirks and gotchas. I’ve been utilizing Pandoc in all my serious academic writing (incl. homework reports) for years, ever since I gave up on learning more about the overwhelmingly sophisticated TeX ecosystem and turned to something that “just works”. Pandoc fits my needs well. And when it doesn’t, there’s almost always a workaround that achieves the same thing neatly. And this is what this write-up is mostly about.

## Tweaking default.latex? Bad idea.

You could, of course, modify the default template (default.latex) provided by Pandoc, as long as you’re no stranger to LaTeX. In this way, you can achieve anything you want – in pure LaTeX.

## Separating header-includes

You might have already noticed that the subtitle field won’t display in the produced PDF file. As far as I’m concerned (in Pandoc 1.18), this is the expected behavior. See here in README:

To make subtitle work with other LaTeX document classes, you can add the following to header-includes:

\providecommand{\subtitle}{%
\usepackage{titling}
\posttitle{%
\par\large#1\end{center}}
}

Unfortunately, this won’t work (until Issue #2139 is resolved) since Pandoc parses the header-includes metadata field as Markdown, and the bracketed  is misinterpreted as literals rather than a part of LaTeX control sequence. So the workaround is: Instead of embedding header-includes as a metadata field in YAML, we should separate it into another file for this dedicated purpose (it’s simply raw LaTeX anyway), and include it using --include-in-header/-H:

$pandoc -H header.tex default.yaml src.md -o out.pdf Note that you can’t have two header-includes for one document. So the header-includes field specified in YAML metadata will be overridden by the content of header.tex. ## Citing sources While the Markdown syntax for citing is rather easy ([@id]), it takes effort to make things right, especially if you have a certain preferred citation format (APA, MLA, Chicago, IEEE, etc.). The suggestion is: Use pandoc-citeproc. Once you have a list of references you’re interested in, you need two things to typeset those nicely in your document: • A CSL (Citation Style Language) file (.csl), to specify the citation format you want to use. • A BibTeX file (.bib), which is a list of all entries you might cite. • Citation entries in BibTeX format may be found easily on the Internet, through academic search engines and databases. Concatenate them one by one. As part of the YAML metadata: (Assume you have ieee.csl and references.bib) csl: ieee.csl bibliography: references.bib Using pandoc-citeproc as a filter, generate the document with citations: $ pandoc --filter pandoc-citeproc -H header.tex default.yaml src.md -o out.pdf

The list of references is appended to the end of the document. It is often desirable to give the references an obvious title (“References”), start from a new page and avoid any further indentation, so the following comes in the end of the Markdown source:

\newpage
\setlength\parindent{0pt}

# References

## Putting it all together!

Basically, we need 5 files in total:

• For content:
• src.md (Markdown + possibly LaTeX mixed format): Main text.
• references.bib (BibTeX/BibLaTeX format): List of references.
• For presentation:
• default.yaml (YAML format): Format-related metadata.
• header.tex (LaTeX format): Content of header-includes; package imports and macro definitions.
• ieee.csl (CSL XML format): Citation style.

And one command:

\$ pandoc --filter pandoc-citeproc -H header.tex default.yaml src.md -o out.pdf

## Open question: Lightweight replacement for amsthm?

Pandoc doesn’t provide native support for amsthm (and I wonder if there will ever be). You can still have the same thing in Pandoc Markdown:

\newtheorem{definition}{Definition}

\begin{definition}
Man is a rational animal.
\end{definition}

However, everything in between \begin and \end will be treated as raw LaTeX, and the expressiveness of Markdown is lost there. More importantly, this is purely a LaTeX-specific thing, so there’s no way for Pandoc to convert this to HTML or any other format (unless you have a filter that does the trick). Consequently, I tend to write all definitions / theorems (lemmas, claims, corollaries, propositions…) in simple Markdown:

**Definition 1.** *Man is a rational animal.*

It does have some advantages over amsthm:

• Using amsthm, you cannot see the numbering of each theorem (definition, etc.) in the text editor (well, you can’t without a dedicated plugin at least). This is inconvenient when you need to refer to a prior one later. By numbering them explicitly, you can clearly see these ordinals in the Markdown source.
• It is perfectly valid Markdown, so it converts to any format as you wish (HTML, for example).

This also has some drawbacks compared to using amsthm, though:

• It doesn’t have theorem counters. You need to number things explicitly, manually. (Clearly you can’t have implicit numbering and explicit numbering at the same time, so here’s the trade-off.)
• It doesn’t have automatic formatting. That is, you could possibly get the style for a certain entry (plain, definition, remark) wrong.
• Semantically, they are not recognized as theorems, just normal text paragraphs. This is problematic if you want to prevent definitions and theorems from being indented, since there’s no way for LaTeX to tell them from a normal text.

(Probably) The best solution is to write a filter that (conventionally) converts any plain text like Definition 1 (and Lemma 2, Theorem 3, etc.) in the beginning of a paragraph to proper Markdown (for HTML target) or corresponding amsthm block (for LaTeX target). Even better, it should be able to do cross-references accordingly (Remember Lemma 10.42? Let’s put an anchored link on that!). This is yet to be done, but would be very helpful to someone who does a lot of theorems and proofs thus wants to avoid the kludge of mixing raw LaTeX with semantically evident Markdown.

]]>

This is an expository reading summary of a selected CRYPTO 2015 paper I did as an assignment in KU’s Introduction to Modern Cryptography course. Adversarial-resilient Bloom filters are the counterparts of cryptographically secure hash functions in an adversarial setting, where adaptive adversaries that have access to a deterministic or non-deterministic query oracle may challenge the data structure in a way that intentionally increases the false positive rate of querying. As a preliminary result, this paper shows that the resistance of Bloom filters against computationally bounded adversaries requires that one-way functions exist; furthermore, such constructions are possible using pseudorandom permutations. I do find the proof a bit involved, but the notions of security introduced for Bloom filters are new and appealing (which I haven’t read previously anywhere else).

Original paper:

• M. Naor and E. Yogev, “Bloom filters in adversarial environments,” in Annual Cryptology Conference, 2015. [arXiv:1412.8356]

Abstract

Bloom filter is a hash-based probabilistic data structure which is space-efficient for set membership querying, with a small probability of false positives. Naor and Yogev’s 2015 paper introduces the adversarial model and formally proposes a strong notion of security for Bloom filters, i.e., adversarial resilience, based on an adversarial game under a cryptographic setting. This paper also discusses the correspondence between adversarial-resilient Bloom filters and the open assumption that one-way functions exist, thus enables theoretical constructions using pseudorandom permutations. We believe that such an understanding will help design practical Bloom filters that are safe from known attacks in software systems.

# 1. Introduction

Probabilistic data structures are data structures that employ randomness in their designs to enable more efficient approaches of storing and querying data, compared to deterministic ones. Traditionally, the algorithmic probabilistic analysis of such data structures assumes the model where all inputs and queries are independent of the internal randomness of data structures. In this work, we consider an adversarial environment, where a computationally bounded adversary1 may adaptively chooses inputs and queries with the intention of degrading the efficiency of the underlying data structure of some computer system. By introducing the adversarial model, we analyze the behavior of such data representations under the cryptographic notion of computational security against adversaries; furthermore, it enables us to construct more efficient, provably secure schemes of probabilistic data structures.

As a concrete example, a Bloom filter is a probabilistic data structure that holds a set $$S$$ of elements approximately, using significantly fewer bits of storage and allowing for faster access than a complete representation. As a trade-off between efficiency and preciseness, for any query of $$x \in S$$, a Bloom filter always outputs a yes-answer, and for any query of $$x \not\in S$$, it should output a yes-answer only with small probability. In other words, a no-answer given by a Bloom filter indicates unambiguously that $$x \not\in S$$, while a yes-answer indicates that $$x \in S$$ probably holds2, that is, it allows false positives. Ideally, the error probability that a Bloom filter returns a false positive should be as small as possible.

Approaching the commonly-seen set membership problem, Bloom filters have been implemented widely in real-world applications, specifically as internal data representations for optimizing large-scale software systems. For example, Akamai’s CDN3 servers maintain Bloom filters in their memories to decide whether to lookup the disk cache for a requested resource, and a false positive of the Bloom filter causes a cache miss, which means that the server has to make an unnecessary disk lookup at an expense of time and system workload; if an attacker exploits the behaviors of the Bloom filter, it is possible for them to cast queries that degrade the disk cache hit rate of the CDN servers, and consequently, perform a Denial-of-Service (DoS) attack. On another scenario, where Bitcoin clients apply Bloom filters in the Simplified Payment Verification (SPV) mode to increase the overall performance of wallet synchronization, an adversary may perform a DoS attack on an SPV node by learning from the responses of Bloom filters they have access to.

As discussed above, the adversarial model addresses some security issues, thus the necessity of defining security in adversarial environments and constructing provably secure Bloom filters arises. Essentially, it is desirable for a well-constructed Bloom filter to maintain its small error probability in an adversarial environment; we say that such a Bloom filter is adversarial resilient (or just resilient). In an adversarial game, where an adversary has oracle access to the Bloom filter and is allowed to make a number of $$t$$ queries before it outputs a certain $$x^*$$ (that has not been queried before) which is believed to be a false positive, and if it is, the adversary wins the game. We say that a Bloom filter is $$(n,t,\varepsilon)$$-adversarial resilient if when initialized over sets of size $$n$$ then after $$t$$ queries the probability of $$x^*$$ being a false positive is at most $$\varepsilon$$. A Bloom filter that is resilient for any polynomially many queries is said to be strongly resilient.

Clearly, a trivial construction of a strongly resilient Bloom filter would be a deterministic lookup table that stores $$S$$ precisely, so that there is no false positive which an adversary can find. However, such a construction does not take advantage of the space and time efficiency as a normal Bloom filter would do, since it stores every element in the memory. In the following, we consider only non-trivial Bloom filters, and we show that for a non-trivial Bloom filter to be adversarial-resilient, one-way functions must exist; that is, if one-way functions do not exist, then any non-trivial Bloom filter can be attacked with a non-negligible probability by an efficient adversary. Furthermore, under the assumption that one-way functions exist, a pseudorandom permutation (PRP) can be used to construct a strongly resilient Bloom filter which has a reasonable memory consumption.

The construction of a Bloom filter consists of two algorithms: an initialization algorithm that gets a set $$S$$ and outputs a memory-efficient representation of $$S$$; a query algorithm that gets a representation of $$S$$ and an $$x$$ to be checked and outputs $$1$$ if $$x \in S$$, otherwise $$0$$. Typically, the initialization algorithm is randomized but the query algorithm is deterministic, that is, a query operation does not amend the existing representation. We say that such a Bloom filter has a steady representation.4

# 2. Definitions

In the following model, we consider a universal set $$U$$ and a subset $$S \subset U$$ to be stored in a Bloom filter. We denote that $$u=|U|$$ and $$n=|S|$$.

Definition 1. (Steady-representation Bloom filter) Let $$\mathbf{B}=(\mathbf{B}_1,\mathbf{B}_2)$$ be a pair of polynomial-time algorithms, where $$\mathbf{B}_1$$ is a randomized algorithm that gets as input a set $$S$$ and outputs a representation $$M$$, and $$\mathbf{B}_2$$ is a deterministic algorithm that gets as input a representation $$M$$ and a query element $$x \in U$$. We say that $$\mathbf{B}$$ is an $$(n,\varepsilon)$$-Bloom filter (with a steady representation) if for any set $$S \subset U$$ of size $$n$$ it holds that:

1. $$\forall x \in S, \Pr[\mathbf{B}_2(\mathbf{B}_1(S), x) = 1] = 1$$ (Completeness)

2. $$\forall x \not\in S, \Pr[\mathbf{B}_2(\mathbf{B}_1(S), x) = 1] \leq \varepsilon$$ (Soundness)

where the probability is taken over the randomness used by the algorithm $$\mathbf{B}_1$$.

Intuitively, the first property (completeness) says that for all elements in the set $$S$$, the Bloom filter is guaranteed to output a yes-answer correctly; the second property (soundness) gives the upper bound that the Bloom filter outputs a false positive, that is, the query algorithm returns $$1$$ when an element does not actually belong to the set $$S$$. Formally,

False positive and error rate. Given a representation $$M$$ of $$S$$, if $$x \not\in S$$ and $$\mathbf{B}_2(M,x)=1$$, we say that $$x$$ is a false positive. And we say that the probability bound $$\varepsilon$$ of outputting false positives is the error rate of $$\mathbf{B}$$.

In an adversarial environment, consider the following experiment for any Bloom filter $$\mathbf{B}=(\mathbf{B}_1,\mathbf{B}_2)$$, adversary $$\mathcal{A}$$, value $$t$$ as the bound of the amount of queries which $$\mathcal{A}$$ can make, and value $$\lambda$$ as the security parameter.

The Bloom filter resilience challenge experiment $$\mathsf{Challenge}_{\mathcal{A},\mathbf{B},t}(\lambda)$$:

1. $$M \leftarrow \mathbf{B}_1(1^\lambda,S)$$.5
2. $$x^* \leftarrow \mathcal{A}^{\mathbf{B}_2(M,\cdot)}(1^\lambda,S)$$, where $$\mathcal{A}$$ performs at most $$t$$ queries $$x_1,\dots,x_t$$ to the oracle $$\mathbf{B}_2(M,\cdot)$$. Note that $$\mathcal{A}$$ has only oracle access to the Bloom filter and cannot see the representation $$M$$.6
3. The output of the experiment is defined to be $$1$$, if $$x^* \not\in S \cup \{x_1,\dots,x_t\}$$ and $$\mathbf{B}_2(M,x^*)=1$$, and $$0$$ otherwise. If the output of the experiment is $$1$$, we say that $$\mathcal{A}$$ succeeds.

Definition 2. (Adversarial-resilient Bloom filter) Let $$\mathbf{B}=(\mathbf{B}_1,\mathbf{B}_2)$$ be an $$(n,\varepsilon)$$-Bloom filter (with a steady representation). We say that $$\mathbf{B}$$ is an $$(n,t,\varepsilon)$$-adversarial resilient Bloom filter if for any set $$S$$ of size $$n$$, for all sufficiently large $$\lambda \in \mathbb{N}$$ and for all probabilistic polynomial-time adversaries $$\mathcal{A}$$, it holds that $\Pr[\mathsf{Challenge}_{\mathcal{A},\mathbf{B},t}(\lambda) = 1] \leq \varepsilon$

where the probability is taken over the randomness used by the algorithm $$\mathbf{B}_1$$ and $$\mathcal{A}$$.

To define the non-triviality of a Bloom filter formally, notice that it is always desirable to minimize the memory use of the Bloom filter. Let $$\mathbf{B}$$ be an $$(n,\varepsilon)$$-Bloom filter that uses $$m$$ bits of memory. It is shown that the lower bound of $$m$$ is $$m \geq n \log \frac{1}{\varepsilon}$$. Thus, we define

Definition 3. (Minimal error) Let $$\mathbf{B}$$ be an $$(n,\varepsilon)$$-Bloom filter. We say that $$\varepsilon_0 = 2^{-\frac{m}{n}}$$ is the minimal error of $$\mathbf{B}$$.

As mentioned previously, a trivial construction of Bloom filters is a lookup table that stores $$S$$ precisely, in which case, the memory use $$m=\log \binom{u}{n} \approx n \log(\frac{u}{n})$$, thus by using the bound $$m \geq n \log \frac{1}{\varepsilon}$$, a construction is trivial if $$\varepsilon > \frac{n}{u}$$. On the other hand, if $$u$$ is super-polynomial in $$n$$, then $$\varepsilon$$ is negligible in $$n$$ and every polynomial-time adversary has only negligible probability to find any false positive, therefore for such $$S \subset U$$, the Bloom filter must be trivial. Notice that $$\varepsilon_o \leq \varepsilon$$, we then define

Definition 4. (Non-trivial Bloom filter) Let $$\mathbf{B}$$ be an $$(n,\varepsilon)$$-Bloom filter and let $$\varepsilon_0$$ be the minimal error of $$\mathbf{B}$$. We say that $$\mathbf{B}$$ is non-trivial if there exists a constant $$c \geq 1$$ such that $$\varepsilon_0 > \max\{\frac{n}{u},\frac{1}{n^c}\}$$.

# 3. Resilient Bloom Filters and One-Way Functions

We now show that the existence of adversarial resilient Bloom filters depends on the existence of one-way functions, that is, if any non-trivial, strongly resilient Bloom filter exists, then one-way functions also exist.

Theorem 5. Let $$\mathbf{B}=(\mathbf{B}_1,\mathbf{B}_2)$$ be any non-trivial Bloom filter (with a steady representation) of $$n$$ elements that uses $$m$$ bits of memory, and let $$\varepsilon_0$$ be the minimal error of $$\mathbf{B}$$. If $$\mathbf{B}$$ is $$(n,t,\varepsilon)$$-adversarial resilient for $$t=\mathcal{O}(\frac{m}{\varepsilon_0^2})$$, then one-way functions exist.

Proof. First we assume that one-way functions do not exist, then we show that we can construct a polynomial-time adversary $$\mathcal{A}$$ such that $\Pr[\mathsf{Challenge}_{\mathcal{A},\mathbf{B},t}(\lambda) = 1] > \varepsilon$ given a fixed value $$\varepsilon$$. That is, $$\mathbf{B}$$ cannot be $$(n,t,\varepsilon)$$-adversarial resilient.

Define the following function $$f$$: $f(S,r,x_1,\dots,x_t) = (x_1,\dots,x_t,\mathbf{B}_2(M,x_1),\dots,\mathbf{B}_2(M,x_t))$ where $$S$$ is a set of size $$n$$, $$r$$ is the number of bits used by the randomness of $$\mathbf{B}_1$$, $$M$$ is a representation of $$S$$ generated by $$\mathbf{B}_1$$, and $$t=\frac{200m}{\varepsilon_0}$$. Clearly, $$f$$ is polynomial-time computable.

Since $$f$$ is not a one-way function (under the assumption that one-way functions do not exist), there is also an algorithm that can invert $$f$$ efficiently. Thus we have,

Claim 6. Assume that one-way functions do not exist, there exists a polynomial-time algorithm $$\mathcal{A}$$ that inverts $$f$$ with a failure probability of at most $$\frac{1}{100}$$: $\Pr[f(\mathcal{A}(f(S,r,x_1,\dots,x_t))) \neq f(S,r,x_1,\dots,x_t)] < \frac{1}{100}$

Proof. Because $$f$$ is not a one-way function, there exists an algorithm $$\mathcal{A}'$$ such that $$\Pr[\mathsf{Invert}_{\mathcal{A}',f}(n) = 1] \geq \frac{1}{p(n)}$$, where $$p(n)$$ is polynomial in $$n$$. Construct an algorithm $$\mathcal{A}$$ that runs $$\mathcal{A}'$$ individually for $$\lceil\frac{\log 100}{\log(p(n)) - \log(p(n)-1)}\rceil$$ times, so we have the total failure probability $$\Pr[f(\mathcal{A}(f(S,r,x_1,\dots,x_t))) \neq f(S,r,x_1,\dots,x_t)] < \left(1-\frac{1}{p(n)}\right)^{\lceil\frac{\log 100}{\log(p(n)) - \log(p(n)-1)}\rceil} \leq \frac{1}{100}$$

Using $$\mathcal{A}$$, construct the following probabilistic polynomial-time algorithm $$\mathsf{Attack}$$:

The Algorithm $$\mathsf{Attack}$$

$$\mathsf{Attack}$$ is given oracle access to the query algorithm $$\mathbf{B}_2(M,\cdot)$$, and gets $$1^\lambda$$ as input.

1. For $$i \in \{1,\dots,t\}$$, sample $$x_i \in U$$ uniformly, and query $$y_i = \mathbf{B}_2(M,x_i)$$.
2. Run $$\mathcal{A}$$ (the inverter of $$f$$) and get $$(S',r',x_1,\dots,x_t) \leftarrow \mathcal{A}(x_1,\dots,x_t,y_1,\dots,y_t)$$.
3. Compute $$M' \overset{r'}{\leftarrow} \mathbf{B}_1(1^\lambda,S')$$, using $$r'$$ as the randomness bits in the initialization.
4. For $$k=1,\dots,\frac{100}{\varepsilon_0}$$, do:
1. Sample $$x^* \in U$$ uniformly.
2. If $$\mathbf{B}_2(M',x^*)=1$$ and $$x^* \not\in \{x_1,\dots,x_t\}$$, output $$x^*$$ and HALT.
5. Sample $$x^* \in U$$ uniformly, and output $$x^*$$.

Claim 7. Assume that $$\mathcal{A}$$ inverts $$f$$ successfully. For any representation $$M$$, the probability such that there exists a representation $$M'$$ that for $$i \in \{1,\dots,t\}$$, $$\mathbf{B}_2(M,x_i)=\mathbf{B}_2(M',x_i)$$, and that the error rate $$\Pr[\mathbf{B}_2(M,x) \neq \mathbf{B}_2(M',x)] > \frac{\varepsilon_0}{100}$$ is at most $$\frac{1}{100}$$.

Proof. From the error rate of any $$x$$ and the independence of the choice of $$x_i$$, we get $\Pr[\forall i \in \{1,\dots,t\} : \mathbf{B}_2(M,x_i) = \mathbf{B}_2(M',x_i)] \leq \left(1 - \frac{\varepsilon_0}{100}\right)^t$

Since the Bloom filter uses $$m$$ bits of memory, there are $$2^m$$ possible representations as candidates for $$M'$$. Thus, by union bound, $\Pr[\exists M' \ \forall i \in \{1,\dots,t\} : \mathbf{B}_2(M,x_i) = \mathbf{B}_2(M',x_i)] \leq 2^m \left(1 - \frac{\varepsilon_0}{100}\right)^t \leq \frac{1}{100}$

Since $$\mathcal{A}$$ is assumed to invert $$f$$ successfully, it must output a representation $$M'$$ such that for $$i \in \{1,\dots,t\}$$, $$\mathbf{B}_2(M,x_i)=\mathbf{B}_2(M',x_i)$$. Therefore, the above bound holds.

Define $$\mu(M)=\Pr_{x \in U}[\mathbf{B}_2(M,x)=1]$$ as the positive rate over $$U$$, we now show that for almost all possible representations $$M$$ generated from set $$S$$ and randomness $$r$$, it holds true that $$\mu(M) > \frac{\varepsilon_0}{8}$$, with only a negligible probability of error:

Claim 8. $$\Pr_S[\exists r : \mu(M_r^S) \leq \frac{\varepsilon}{8}] \leq 2^{-n}$$.

Proof. Let $$\mathsf{BAD}$$ be the set of all sets $$S$$ such that there exists an $$r$$ such that $$\mu(M_r^S) \leq \frac{\varepsilon_0}{8}$$. Given $$S \in \mathsf{BAD}$$, let $$\hat{S}$$ be the set of all elements $$x$$ such that $$\mathbf{B}_2(M_r^S,x)=1$$, then $$|\hat{S}| \leq \frac{\varepsilon_0}{8} \cdot u$$. Notice that we can encode the set $$S$$ using the representation $$M_r^S$$ while specifying $$S$$ from all subsets of $$\hat{S}$$ of size $$n$$, and the encoding bits must be no less than $$\log|\mathsf{BAD}|$$ (which is the number of bits required to encode $$|\mathsf{BAD}|$$): $\log|\mathsf{BAD}| \leq m + \log\binom{\frac{\varepsilon_0 u}{8}}{n} \leq m + n \log\left(\frac{\varepsilon_0 u}{8}\right) - n \log n + 2n \leq -n + \log\binom{u}{n}$ thus $$|\mathsf{BAD}| \leq 2^{-n}\binom{u}{n}$$. Since the number of sets $$S$$ is $$\binom{u}{n}$$, $$\Pr_S[\exists r : \mu(M_r^S) \leq \frac{\varepsilon}{8}] \leq 2^{-n}$$.

Claim 9. Assume that $$\Pr[\mathbf{B}_2(M,x) \neq \mathbf{B}_2(M',x)] \leq \frac{\varepsilon_0}{100}$$ and that $$\mu(M) > \frac{\varepsilon_0}{8}$$. The probability that $$\mathsf{Attack}$$ does not halt on Step 4 is at most $$\frac{1}{100}$$.

Proof. It follows directly from the assumptions that $$\mu(M') > \frac{\varepsilon_0}{8} - \frac{\varepsilon_0}{100} > \frac{\varepsilon_0}{10}$$. For convenience, let $$\mathcal{X} = \{x_1,\dots,x_t\}$$ and $$\hat{S'} = \{x : \mathbf{B}_2(M',x)=1\}$$. We have that

$E[|\hat{S'} \cap \mathcal{X}|] = t \cdot \mu(M') > \frac{200m}{\varepsilon_0} \cdot \frac{\varepsilon_0}{10} = 20m$

By Chernoff bound with a probability of at least $$(1-e^{-\Omega(m)})$$ we have that $$|\hat{S'} \cap \mathcal{X}| < 40m$$, $|\hat{S'} \backslash \mathcal{X}|=|\hat{S'}|-|\hat{S'} \cap \mathcal{X}| > |\hat{S'}| - 40m \geq \frac{\varepsilon_0 u}{10} - 40m$

Case 1. $$u=n^d$$ ($$d$$ is a constant). We show that $$|\hat{S'} \backslash \mathcal{X}| \geq 1$$, so that an exhaustive search over the universal set $$U$$ is efficient and guaranteed to find an element $$x^*$$ in $$\hat{S'} \backslash \mathcal{X}$$. Let $$c$$ be a constant such that $$\varepsilon_0 > \frac{1}{n^c}$$, $$\varepsilon_0 < \frac{1}{n^{c-1}}$$. We have $$\frac{u}{n} = n^{d-1} \geq \frac{1}{\varepsilon_0} > n^{c-1}$$, then $$d-c>1$$. Moreover, $$m \leq n \log\frac{u}{n} \leq nd \log n$$, thus, $|\hat{S'} \backslash \mathcal{X}| \geq \frac{\varepsilon_0 u}{10} - 40m \geq \frac{n^{d-c}}{10} - 40nd \log n > 1$

Case 2. $$u$$ is super-polynomial in $$n$$. We show that the fraction of $$|\hat{S'} \backslash \mathcal{X}|$$ is large enough so that sampling can find an $$x^*$$, with only a small failure probability. Since $$\frac{\varepsilon_0}{20}$$ is polynomial in $$\frac{1}{n}$$ but $$\frac{40m}{u} \leq \frac{40n \log u}{u}$$ is negligible, we get that $$\frac{\varepsilon_0}{20} > \frac{40m}{u}$$. It follow from $$\frac{|\hat{S'} \backslash \mathcal{X}|}{u} > \frac{\varepsilon_0}{10} - \frac{40m}{u}$$ that $$\frac{|\hat{S'} \backslash \mathcal{X}|}{u} > \frac{\varepsilon_0}{20}$$. Thus, the probability of sampling $$x \not\in \hat{S'} \backslash \mathcal{X}$$ in all $$k$$ attempts is bounded by $\left(1-\frac{\varepsilon_0}{20}\right)^k = \left(1-\frac{\varepsilon_0}{20}\right)^\frac{100}{\varepsilon_0} < \frac{1}{100}$

In both cases, the probability that $$\mathsf{Attack}$$ fails to find $$x^*$$ and halt on Step 4 is less than $$\frac{1}{100}$$.

Claim 10. $$\Pr[\mathbf{B}(M',x^*)=1; \mathbf{B}_2(M,x^*)=0] \leq \frac{1}{100}$$

Proof. This follows from the assumption that $$\Pr[\mathbf{B}_2(M,x) \neq \mathbf{B}_2(M',x)] \leq \frac{\varepsilon}{100}$$.

Consider Claim 6, 7, 8 & 10 which cover all cases that $$\mathsf{Attack}$$ fails, each happens only if the respective assumptions hold, so they provide an upper bound of failure probability. Taking a union bound, we have the total failure probability is at most $$\frac{4}{100}$$. Thus, we have constructed a polynomial-time adversary $$\mathsf{Attack}$$ such that $\Pr[\mathsf{Challenge}_{\mathsf{Attack},\mathbf{B},t}(\lambda) = 1] > \varepsilon \geq 1-\frac{4}{100}$ therefore $$\mathbf{B}$$ cannot be $$(n,t,\varepsilon)$$-adversarial resilient, which is a contradiction implying that such adversarial resilient Bloom filters do not exist, under the assumption that one-way functions do not exist. By modus tollens, we have that if non-trivial $$(n,t,\varepsilon)$$-adversarial resilient Bloom filters exist, then one-way functions exist.

In Theorem 5 we showed that the existence of adversarial resilient Bloom filters implies that one-way functions exist. Furthermore, assume that adversarial resilient Bloom filters exist (thus one-way functions exist), it can be shown that pseudorandom permutations may be used to construct non-trivial, strongly adversarial resilient Bloom filters. We have

Proposition 11. (Construction using pseudorandom permutations)7 Let $$\mathbf{B}$$ be an $$(n,\varepsilon)$$-Bloom filter using $$m$$ bits of memory. If pseudorandom permutations exist, then for any security parameter $$\lambda$$, there exists an $$(n,\varepsilon+\mathsf{negl}(\lambda))$$-strongly resilient Bloom filter that uses $$m'=m+\lambda$$ bits of memory.

Unsteady representation and computationally unbounded adversary. The above discussion about Bloom filters assumes that steady representation is used, that is, the query algorithm $$\mathbf{B}_2$$ is deterministic. Some implementations allow $$\mathbf{B}_2$$ to change the internal representation thus querying can also be probabilistic. Further results regarding unsteady representations may be found in , with the ACD8 framework proposed in . Moreover, some results are shown to hold even for computationally unbounded adversaries.

Bloom filters and secrecy. Bloom filters, like hash functions, are not designed as encryption schemes. Thus, even adversarial resilient Bloom filters may leak considerable information in an unintended way. As a concrete example, in Bitcoin lightweight SPV clients which rely on Bloom filters to store users’ Bitcoin addresses, an adversary can efficiently distinguish information about these addresses. See  for a discussion on this interesting scenario.

# References

 B. M. Maggs and R. K. Sitaraman, “Algorithmic nuggets in content delivery,” ACM SIGCOMM Computer Communication Review, vol. 45, no. 3, pp. 52–66, 2015.

 R. Benjamin and E. K. Yasmine, “Attacks on bitcoin,” 2015.

 L. Carter, R. Floyd, J. Gill, G. Markowsky, and M. Wegman, “Exact and approximate membership testers,” in Proceedings of the tenth annual acm symposium on theory of computing, 1978, pp. 59–65.

 J. Katz and Y. Lindell, “Introduction to modern cryptography,” 2014.

 M. Naor and E. Yogev, “Bloom filters in adversarial environments,” in Annual cryptology conference, 2015, pp. 565–584.

 M. Naor and G. N. Rothblum, “Learning to impersonate,” in Proceedings of the 23rd international conference on machine learning, 2006, pp. 649–656.

 A. Gervais, S. Capkun, G. O. Karame, and D. Gruber, “On the privacy provisions of bloom filters in lightweight bitcoin clients,” in Proceedings of the 30th annual computer security applications conference, 2014, pp. 326–335.

1. We will only consider polynomial-time adversaries here.

2. For this reason, a yes-answer is also referred to as a maybe-answer in some texts.

3. A CDN (Content Delivery Network) is a globally distributed network of proxy servers deployed in multiple data centers, in order to serve cached content to end-users with high availability and high performance.

4. In this work, we will consider only steady representations.

5. The security parameter $$\lambda$$ is implied in the initialization algorithm $$\mathbf{B}_1$$, thus we denote it as $$\mathbf{B}_1(1^\lambda,S)$$.

6. To strengthen our definition, we assume that the adversary also gets the set $$S$$, and it is important that the adversary can hardly find any false positive even if given $$S$$.

7. A proof of this can be found in .

8. Learning model of adaptively changing distributions (ACD).

]]>
The Decisional Hardness tag:www.soimort.org,2017:/mst/1 2016-11-01T00:00:00+01:00 2016-11-20T00:00:00+01:00 Mort Yao

(20 Nov 2016) Correction: P ≠ NP is not sufficient to imply that one-way functions exist. See P versus NP problem and one-way functions.

Intro. Starting from November, I’ll summarize my study notes on wiki into weekly blog posts. I always wanted to keep my study progress on track; I feel that it’s hard even to convince myself without visible footprints.

So here we have the first episode. (Hopefully it won’t be the last one)

Asymptotic notation is an important tool in analyzing the time/space efficiency of algorithms.

• Limit
• Formal definition of limit (the (ε, δ)-definition) in calculus. Note that limits involving infinity are closely related to asymptotic analysis. In addition to basic limit rules, L’Hôpital’s rule is also relevant.
• Asymptotic notation
• Introduction to the Bachmann–Landau notation family (among them are the most widely-used Big O notation and Big Theta notation).
• Master theorem is used to find the asymptotic bound for recurrence. This is particularly helpful when analyzing recursive algorithms (e.g., binary search, merge sort, tree traversal).
• Based on common orders of asymptotic running time using Big O notation, we can categorize algorithms into various classes of time complexities (among them are P, DLOGTIME, SUBEXP and EXPTIME). Note that we have not formally defined the word “algorithm” and “complexity class” yet.

For decision problems, we now formally define the time complexity classes P and NP, and propose the hardness of NP-complete problems, which plays an indispensable role in the study of algorithm design and modern cryptography.

• Formal language
• Formal definition of language. We will revisit this when studying formal grammars like the context-free grammar and parsing techniques for compilers. For now, it suffices to know that binary string is a common encoding for all kinds of problems (especially, decision problems).
• Decidability
• Among all abstract problems, we are mostly interested in decision problems.
• The decidability of a language depends on whether there exists an algorithm that decides it.
• Reducibility
• Polynomial-time reduction is a commonly used technique that maps one language to another.
• What is a hard language for a complexity class; what is a complete language for a complexity class.
• Time complexity
• Encodings of concrete problems matter. Normally we would choose a “standard encoding” for our language of interest.
• Polynomial-time algorithms are considered to be efficient and languages which have polynomial-time algorithms that decide them are considered tractable.
• P is the time complexity class of all problems that are polynomial-time solvable.
• NP is the time complexity class of all problems that are polynomial-time verifiable.
• NP-completeness
• The set of languages that are complete for the complexity class NP, that is, the “hardest problems” in NP.
• NP-complete problems are central in answering the open question whether P = NP.
• We (informally) show that every NP problem is polynomial-time reducible to CIRCUIT-SAT, and that CIRCUIT-SAT is NP-complete.
• There are other problems (SAT, 3-CNF-SAT, CLIQUE, VERTEX-COVER, HAM-CYCLE, TSP, SUBSET-SUM) polynomial-time reducible from one to another, thus they are also shown to be NP-complete.

Computational hardness assumption P ≠ NP. Although it is still an open proposition, many believe that P ≠ NP. Notably, if P ≠ NP holds true,

1. If a decision problem is polynomial-time unsolvable in general case, we should strive to find approximations or randomized algorithms; exact algorithms cannot be run in worst-case polynomial time thus may not be efficient. This applies to optimization problems too.
2. One-way functions exist, which implies that pseudorandom generators and functions exist. Consequently, many cryptographic constructions (private-key encryption, MACs, etc.) are provably computationally secure.

(I stand corrected: There is no such a known proof showing that P ≠ NP implies the existence of one-way functions. However, reversely, the existence of one-way functions implies that P ≠ NP. There is an informal argument given by Peter Shor on StackExchange1, rephrased in the section P versus NP problem and one-way functions.)

Later we will cover the notion of security in cryptography, so there is a refresher of basic probability: (Probability is also considerably used in analyzing the behaviors of non-deterministic algorithms, hash functions, etc.)

• Probability
• An intuitive introduction to basic probability theory based on Kolmogorov’s axioms, including the union bound (Boole’s inequality) and its generalized form Bonferroni inequalities, the conditional probability and Bayes’ theorem. We will revisit the notion of probability space when coming to measure theory.

Plan for next week:

• (Algorithms) More involved NP-complete problems. Exact algorithms. Approximation algorithms. Probabilistic algorithms.
• (Cryptography) Information-theoretic/computational security (semantic security, IND, IND-CPA, IND-CCA). Private-key encryption. Message authentication codes. Hash functions. Theoretical constructions (one-way functions, pseudorandomness). Practical constructions (Feistel network, substitution-permutation network, DES, AES).

## P versus NP problem and one-way functions

Consider the following map: $f : (x, r) \to s$ where $$x$$ is an arbitrary bit string, $$r$$ is a string of random bits, and $$s$$ is an instance of a $$k$$-SAT problem having $$x$$ as a planted solution, while the randomness of $$r$$ determines uniquely which $$k$$-SAT problem to choose.

If we can invert the above function $$f$$ (in polynomial time), we must already have solved the corresponding $$k$$-SAT problem $$s$$ with a planted solution $$x$$. $$k$$-SAT problems are known to be NP-complete, and inverting such a function would be as hard as solving a $$k$$-SAT problem with a planted solution, that is, inverting $$f$$ at one point can be hard. Clearly, should we have a one-way function, then inverting it is guaranteed to be no easier than inverting $$f$$.

So what does it mean if P ≠ NP? We know that $$k$$-SAT problem is hard to solve in its worst case, so function $$f$$ can be made as hard to invert as solving a $$k$$-SAT problem in its worst case. However, we don’t know whether it’s possible to have a class $$\mathcal{S}$$ of $$k$$-SAT problems with planted solutions that are as hard as general-case $$k$$-SAT problems. If such a class $$\mathcal{S}$$ exists, then given any $$s \in \mathcal{S}$$, no probabilistic polynomial-time algorithm is able to get $$x$$ with a non-negligible probability, so we can conclude that $$f$$ is indeed a one-way function. 

Problem 1.1. Does there exist a class $$\mathcal{S}$$ of $$k$$-SAT problems with planted solutions, such that every $$L \in \mathcal{S}$$ is NP-hard?

Conjecture 1.2. If $$\mathrm{P} \neq \mathrm{NP}$$, then one-way functions exist.

On the other hand, assume that $$f$$ is a one-way function, so that one-way functions do exist, then this implies that $$k$$-SAT problem is hard to solve (in its worse case) by a polynomial-time algorithm, thus we have P ≠ NP. By modus tollens, if P = NP, then no one-way function exists. 

Theorem 1.3. If one-way functions exist, then $$\mathrm{P} \neq \mathrm{NP}$$.

Proof. (Sketch) Let $$f : \{0,1\}^*\to\{0,1\}^*$$ be a one-way function. There is a polynomial-time algorithm $$M_f$$ that computes $$y=f(x)$$ for all $$x$$, thus, there exists a polynomial-time computable circuit that outputs $$y=f(x)$$ for all $$x$$.

Since $$f$$ is a one-way function, that is, for every probabilistic polynomial-time algorithm $$\mathcal{A}$$, there is a negligible function $$\mathsf{negl}$$ such that $$\Pr[\mathsf{Invert}_{\mathcal{A},f}(n) = 1] \leq \mathsf{negl}(n)$$, so we know that no $$\mathcal{A}$$ can fully compute $$f^{-1}(x)$$ for any given $$x$$. $$\mathcal{A}$$ fully computes $$f^{-1}$$ if and only if it solves the corresponding CIRCUIT-SAT problems of the circuit in all cases. Thus, there must exist some CIRCUIT-SAT problems that cannot be decided by a polynomial-time algorithm, therefore, $$\mathrm{P} \neq \mathrm{NP}$$.

Remark 1.4. If one can come up with a construction of the one-way function or a proof that such functions exist, then it holds true that $$\mathrm{P} \neq \mathrm{NP}$$.

Books:

T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms, 3rd ed.

M. Sipser, Introduction to the Theory of Computation, 3rd ed.

J. Katz and Y. Lindell, Introduction to Modern Cryptography, 2nd ed.

Papers:

 A. L. Selman, “A survey of one-way functions in complexity theory,” Mathematical Systems Theory, vol. 25, no. 3, pp. 203–221, 1992.

 M. Abadi, E. Allender, A. Broder, J. Feigenbaum, and L. A. Hemachandra, “On generating solved instances of computational problems,” in Proceedings on advances in cryptology, 1990, pp. 297–310.

]]>