\[\text{FL} \subsetneq \text{RL} \subsetneq \text{DCFL} \subsetneq \text{CFL} \subsetneq \text{CSL} \subsetneq \text{R} \subsetneq \text{U} \subsetneq \mathcal{P}(\mathcal{P}(\Sigma^*)) \]
Grammars provide a mathematical, recursive perspective for describing languages, while automata provide a physically implementable, iterative approach of recognizing languages (via finite states and possibly infinite stacks). A language is Turingrecognizable if and only if it has a welldefined grammar (so that it’s possible to construct some Turing machine that recognizes it), although it can sometimes be hard to formulate. Quite naturally, if we cannot specify a grammar for a language \(L \subset \mathcal{P}(\Sigma^*)\) theoretically, it would be impossible to construct a Turing machine that recognizes it.
We consider mainly three types of decision problems concerning different computational models: (generative grammars may be viewed as a form of machines like automata, in the following setting)
Model  Language  Acceptance problem  Emptiness problem  Equality problem 

DFA (Deterministic finite automaton) 
Regular language: decidable  \(A_\textsf{DFA}\): decidable  \(E_\textsf{DFA}\): decidable  \(EQ_\textsf{DFA}\): decidable 
NFA (Nondeterministic finite automaton) 
Regular language: decidable  \(A_\textsf{NFA}\): decidable  \(E_\textsf{NFA}\): decidable  \(EQ_\textsf{NFA}\): decidable 
Regular expression  Regular language: decidable  \(A_\textsf{REX}\): decidable  \(E_\textsf{REX}\): decidable  \(EQ_\textsf{REX}\): decidable 
CFG (Contextfree grammar) 
Contextfree language: decidable  \(A_\textsf{CFG}\): decidable  \(E_\textsf{CFG}\): decidable  \(EQ_\textsf{CFG}\): undecidable 
PDA (Pushdown automaton) 
Contextfree language: decidable  \(A_\textsf{PDA}\): decidable  \(E_\textsf{PDA}\): decidable  \(EQ_\textsf{PDA}\): undecidable 
CSG (Contextsensitive grammar) 
Contextsensitive language: decidable  \(A_\textsf{CSG}\): decidable  \(E_\textsf{CSG}\): undecidable  \(EQ_\textsf{CSG}\): undecidable 
LBA (Linear bounded automaton) 
Contextsensitive language: decidable  \(A_\textsf{LBA}\): decidable  \(E_\textsf{LBA}\): undecidable  \(EQ_\textsf{LBA}\): undecidable 
Turing machine and equivalent Turingcomplete models  Turingrecognizable language: may be decidable or undecidable  \(A_\textsf{TM}\): undecidable  \(E_\textsf{TM}\): undecidable  \(EQ_\textsf{TM}\): undecidable (not Turingrecognizable) 
The fact that \(A_\mathsf{TM}\) is undecidable implies that a Turing machine may not halt on some input \(w\). Consider the following problem: (the halting problem [1]) \[\textit{HALT}_\textsf{TM} = \{ \langle M, w \rangle\  \ M \text{ is a TM, and } M \text{ halts on input } w \}\]
Corollary 6.1. \(\textit{HALT}_\textsf{TM}\) is undecidable.
Proof. (By contraposition) Assume that there exists a Turing machine \(R\) that decides \(\textit{HALT}_\textsf{TM}\). We construct a Turing machine \(S\) that decides \(A_\mathsf{TM}\), using \(R\):
\(S =\) “On input \(\langle M, w \rangle\):
 Run \(R\) on input \(\langle M, w \rangle\).
 If \(R\) rejects, reject.
 If \(R\) accepts (so that we know \(M\) always halt on \(w\)), simulate \(M\) on \(w\) until it halts.
 If \(M\) accepts, accept; otherwise, reject.’’
If \(R\) decides \(\textit{HALT}_\textsf{TM}\), then \(S\) decides \(A_\mathsf{TM}\) by construction. Since \(A_\mathsf{TM}\) is undecidable, such an \(R\) that decides \(\textit{HALT}_\textsf{TM}\) does not exist.
■
Apart from Turing’s original halting problem, other undecidable problems of historical importance include:
Some problems, such as \(EQ_\mathsf{TM}\), are not even Turingrecognizable. Another example is \(MIN_\mathsf{TM}\): \[MIN_\mathsf{TM} = \{ \langle M \rangle\  \ M \text{ is a minimal Turing machine} \}\]
Theorem 6.2. \(MIN_\mathsf{TM}\) is not Turingrecognizable.
Proof. Assume that there exists an enumerator \(E\) that enumerates \(MIN_\mathsf{TM}\) (note that an enumerator is equivalent to a Turing machine). We construct a Turing machine \(C\) using \(E\):
\(C =\) “On input \(w\):
 Obtain the own description \(\langle C \rangle\). (by the recursion theorem)
 Run \(E\) until a machine \(D\) appears with a longer description than that of \(C\).
 Simulate \(D\) on \(w\).’’
Since \(MIN_\textsf{TM}\) is infinitely large but the description of \(C\) is of finite length, the enumerator \(E\) must eventually terminate with some \(D\) that has a longer description than that of \(C\). As \(C\) simulates \(D\) in the last step, \(C\) is equivalent to \(D\). But then the description of \(C\) is shorter than that of \(D\), thus \(D\) could not be an output of \(E\) (which enumerates only \(MIN_\mathsf{TM}\)). That is a contradiction. Therefore, such an enumerator \(E\) for \(MIN_\mathsf{TM}\) does not exist, that is, \(MIN_\mathsf{TM}\) is not Turingrecognizable.
■
The Turingunrecognizability of \(MIN_\textsf{TM}\) implies that the Kolmogorov complexity (descriptive complexity) \(K(x)\) is not a computable function.
Per the ChurchTuring thesis, a Turing machine (one finite automaton + one infinite linear table) defines the “limitation of computation”. Several hypothetical models (sometimes referred to as hypercomputation [4] as they are assumed to have the abilities to solve nonTuringcomputable problems) have been proposed for the sake of theoretical interest:
Note that standard (qubitbased) quantum computers are PSPACEreducible, thus they are still a Turingcomplete model of computation (i.e., quantum computers cannot be real computers or Zeno machines, and they do not break the ChurchTuring thesis). Moreover, it has been proposed that the simulation of every (quantum) physical process, where only computable reals present, is actually a Turingcomputable problem (ChurchTuringDeutsch principle).
Finite languages. A finite language consists of a finite set of strings, thus it may be written as a regular expression of a finite union of those strings. It may be recognized by a timeindependent combinational circuit, which can be viewed as a special form of acyclic finite automaton. (See also the blog post: What I Wish I Knew When Learning Boolean Circuits) There are several unsolved problems concerning the circuit complexity.
Generalized regular expression. A generalized regular expression is defined as \[R ::= a\ \ \varepsilon\ \ \emptyset\ \ (R_1 \cup R_2)\ \ (R_1 \cap R_2)\ \ (R_1 \circ R_2)\ \ (\lnot R_1)\ \ (R_1^*)\] where \(a \in \Sigma\), \(R_1\) and \(R_2\) are generalized regular expressions. Although it comes with two extra operators (intersection \(\cap\) and complementation \(\lnot\)), it actually has the same representational power as regular expressions, due to the fact that the class of regular languages is closed under intersection and complementation.
Starfree languages. A starfree language is a regular language that can be described by a generalized regular expression but without the Kleene star operation. Clearly, this class includes all finite languages. One example of an infinite starfree language is \(a^*\), because \(a^* = \lnot((\lnot\emptyset) \circ (\lnot{a}) \circ (\lnot\emptyset))\). On the other hand, \((a \circ a)^*\) is not starfree.
(Unsolved) Generalized star height problem. Can all regular languages be expressed using generalized regular expressions with a limited nesting depth of Kleene stars (star height)? Moreover, is the minimum required star height a computable function? For example, \(((((a \circ a)^*)^*)\cdots)^*\) can be expressed as \((a \circ a)^*\), which has a minimum required star height of 1 (but it is not starfree). The general result and whether the minimum star height is decidable are not yet known.
Linear bounded automata. LBAs provide an accurate model for realworld computers, which have only bounded memories. Given sufficient memory on an LBA, we can simulate another (smaller) LBA; a practical example would be a virtual machine running on VirtualBox or QEMU.
(Unsolved) LBA problem. Is the class of languages accepted by LBA equal to the class of languages accepted by deterministic LBA? Or, in terms of the complexity theory, is \(\text{SPACE}(n) = \text{NSPACE}(n)\)? We know that DFA and NFA accept the same class of languages, so do TM and NTM, while PDA and DPDA are not completely equivalent. However, it is still an open question whether LBA and DLBA are equivalent models. Note that it is known that \(\text{PSPACE} = \bigcup_k \text{SPACE}(n^k)\) is equivalent to \(\text{NPSPACE} = \bigcup_k \text{NSPACE}(n^k)\), that is, deterministic polynomially bounded automata and nondeterministic polynomially bounded automata are equivalent, by Savitch’s theorem.
Books:
M. Sipser, Introduction to the Theory of Computation, 3rd ed.
J. E. Hopcroft, R. Motwani, J. D. Ullman, Introduction to Automata Theory, Languages, and Computation, 3rd ed.
Articles:
D. Zeilberger, “A 2Minute Proof of the 2ndMost Important Theorem of the 2nd Millennium,” http://www.math.rutgers.edu/~zeilberg/mamarim/mamarimTeX/halt.
Papers:
[1] A. M. Turing, “On computable numbers, with an application to the entscheidungsproblem,” Proceedings of the London mathematical society, vol. 2, no. 1, pp. 230–265, 1937.
[2] E. L. Post, “A variant of a recursively unsolvable problem,” Bulletin of the American Mathematical Society, vol. 52, no. 4, pp. 264–268, 1946.
[3] T. Rado, “On noncomputable functions,” Bell System Technical Journal, vol. 41, no. 3, pp. 877–884, 1962.
[4] M. Davis, “The myth of hypercomputation,” in Alan turing: Life and legacy of a great thinker, Springer, 2004, pp. 195–211.
pacman Syu
(so well as to dare any potential issues due to Arch’s infamous instability) earlier today, weird things did happen: Something other than pacman
was eating up my CPU. I assumed it was Chromium in the first place, unless it wasn’t this time. It turned out to be gnomesettingsdaemon
together with some other GUI processes running on GNOME, that were consistently hogging my CPU in turn. And that made a percent of 100% (!). As I killed the seemingly problematic process, something else would insist to emerge on the top of top
with similarly excessive CPU usage. I decided that this was an unusual situation and without much further thought, forced a reboot
while pacman
was still running. Then I ended up with a corrupted system: the kernel didn’t load during the early boot process, complaining about missing modules.devname
and that /dev/sda11
(which is the root partition) could not be found:
Warning: /lib/modules/4.10.81ARCH/modules.devname not found  ignoring
starting version 232
Waiting 10 seconds for device /dev/sda11 ...
ERROR: device '/dev/sda11' not found. Skipping fsck.
mount: you must specify the filesystem type
You are now being dropped into an emergency shell.
Without proper modules loaded even the keyboard was not working, so I couldn’t do anything under the emergency shell. But I wasn’t completely out of luck – I have a dualbooting of Arch and Ubuntu, which is prepared just for recovery purposes like this. (otherwise I had to make room for another Live USB, which was a little bit inconvenient; I don’t use those rescueware for years)
Recover an (unbootable) Arch system from another distro: This is mostly relevant if you can’t boot into the system (due to a corrupted kernel upgrade), otherwise it’s possible to just login from TTY (locally or via SSH) and perform the fix (e.g., in the case that it’s just X, display manager or WM that fails to load but the kernel boots well).
On the host system (e.g., Ubuntu), create a working chroot environment: (Assume that the root for Arch is mounted to /dev/sda11
)
# TARGETDEV="/dev/sda11"
# TARGETDIR="/mnt/arch"
# mount $TARGETDEV $TARGETDIR
# mount t proc /proc $TARGETDIR/proc
# mount rbind /sys $TARGETDIR/sys
# mount rbind /dev $TARGETDIR/dev
# cp /etc/hosts $TARGETDIR/etc
# cp /etc/resolv.conf $TARGETDIR/etc
# chroot $TARGETDIR rm /etc/mtab 2> /dev/null
# chroot $TARGETDIR ln s /proc/mounts /etc/mtab
# chroot $TARGETDIR
Alternatively, use the archchroot
script from an Arch bootstrap image:
# archchroot $TARGETDIR
Now that we have a chroot environment with access to the (currently broken) Arch system, continue with unfinished pacman
upgrades and clean up any lock file if needed:
[/mnt/arch]# pacman Syu
Or, if there’s any unwanted package upgrade that causes the system to fail, downgrade it.
The important thing to remember is that, while continuing, a previously failed transaction will not run its hooks anymore. Therefore it might be wiser to find out which exact packages pacman
was trying to upgrade in the last transaction (from /var/log/pacman.log
) and reinstall all of them, rather than a complementing pacman Syu
. But if it’s known which package is causing the problem (and in this very common case, linux
), simply run the following to regenerate initramfs for the new kernel:
[/mnt/arch]# mkinitcpio p linux
And we are done. A few commands and our Arch system is now back and booting.
Why the mess? Initramfs images need to be regenerated (from a corresponding mkinitcpio preset file) after each kernel update. Starting from Arch’s package linux 4.8.82
, the calling of mkinitcpio
was specified in a PostTransaction alpm hook, in place of an explicit post_install()
command to be invoked from the .install
script as before: (The move towards alpm hooks is said to be a fix for #51818, as two or more packages in one transaction may need the regeneration of initramfs images)
[Trigger]
Type = File
Operation = Install
Operation = Upgrade
Target = boot/vmlinuz%PKGBASE%
Target = usr/lib/initcpio/*
[Action]
Description = Updating %PKGBASE% initcpios
When = PostTransaction
Exec = /usr/bin/mkinitcpio p %PKGBASE%
The faulty part about PostTransaction alpm hooks is that they run in a posttransaction manner; in the context of Arch Linux, a “transaction” means a complete run of pacman
, regardless of how many packages it installs or upgrades. Such PostTransaction hooks will not run until all preceding operations succeed. Quoted from the CAVEATS section in the alpmhooks(5) Manual Page:
“PostTransaction hooks will not run if the transaction fails to complete for any reason.”
This (kind of) misbehavior could lead to unintended aftermath in a failed system upgrade, as it does in the case of the linux
package. Confusingly, a pacman
“transaction” can be an arbitrary set of unrelated package updates; it could possibly fail to complete at any point (e.g., .install
script errors, process accidentally killed, or even power lost), and ideally, it should remain safe while encountering such failures before commit. However, since the generation of initramfs
in the PostTransaction hook is actually an indispensable step for the newly upgraded linux
package here (whether other packages complete their upgrades or not), it is not safe if a pacman
“transaction” fails halfway; PostTransaction hook won’t run and all you get is a halfway system that doesn’t boot (so you’ll need to fix that through a fallback or alternative kernel). To see this problem more clearly, think of the following two scenarios:
A successful system upgrade:
pacman Syu
sanity check (for packages to upgrade)

start upgrading package L (requires hook M to run)

finish upgrading package L
start upgrading package A

finish upgrading package A
start upgrading package B

finish upgrading package B
complete transaction
run posttransaction hook M

done.
[packages L, A, B are upgraded; hook M is run; working system]
An (ungracefully) failing system upgrade:
pacman Syu
sanity check (for packages to upgrade)

start upgrading package L (requires hook M to run)

finish upgrading package L
start upgrading package A

finish upgrading package A
start upgrading package B

crash!
[packages L, A are upgraded; hook M is not run; broken system]
How to perform a system upgrade safely? Although it ultimately depends on how we define a “safe system upgrade” and how much we want it, some robustness may be achieved at least. It’s clear to see that the upgrade of package L
and the run of hook M
should be dealt as one transaction, or treating M
in a posttransaction way will put us on the risk that if some other package fails to upgrade (or maybe it’s just computer crashes), we end up with having only L
upgraded, but not M
run (while it is desirable to have both or none).
A (gracefully) failing system upgrade:
sanity check (for packages to upgrade)

start upgrading package L (requires hook M to run)

finish upgrading package L
run hook M

complete transaction
start upgrading package A

finish upgrading package A
start upgrading package B

crash!
[packages L, A are upgraded; hook M is run; working system]
This is exactly the way things worked before (using the oldfashioned post_install()
): It’s less fragile under possible failures, although in order to fix #51818 (another package L2
also requires hook M
to run), we’ll need to manage the ordering of package upgrades properly:
start upgrading package L (requires hook M to run)

finish upgrading package L
start upgrading package L2 (requires hook M to run)

finish upgrading package L2
run hook M

complete transaction
start upgrading package A
...
That is, L
, L2
and M
are one transaction. It is not allowed to have only L
and L2
, or only L
, without a run of hook M
. Other unrelated packages (A
, B
, …) are prevented from causing such issues in the above scheme. There is still a possibility that the process crashes during the phase of L
, L2
or M
, but the chance would be much smaller, because we are working with a transaction consisting of the strongly coupling upgrades of L
, L2
and M
, not a whole pacman
“transaction” consisting of easily hundreds of arbitrary system upgrades and posttransaction hooks. Even better, in case of such failures, we can ask to redo uncommitted transactions anyway so we don’t rerun hooks manually:
redo uncommitted transactions (upgrades & hooks)

start upgrading package B
...
I don’t know if there’s any software management tool implementing this fully recovering approach. Transactions need to be more refined (pacman
’s “all is a transaction” is really a naïve semantics to have) and the ordering of upgrades is essential, in order to give best robustness in case of possible crashes during a system upgrade, so it might not be as implementable as most package managers are (although still far less sophisticated than a realworld database). Or – with reasonable efforts – if we cannot guarantee anything for a transaction – maybe a Windowsinspired “userfriendly” warning message like “Don’t turn off your computer” is good enough. (At least you’ll know when you’re breaking shit: a transaction is not completed yet, some necessary hooks are not run, and you’re on your own then.)
[1] ArchWiki, “Install from existing Linux”. https://wiki.archlinux.org/index.php/Install_from_existing_Linux
[2] ArchWiki, “Change root”. https://wiki.archlinux.org/index.php/change_root
[3] ArchWiki, “mkinitcpio”. https://wiki.archlinux.org/index.php/mkinitcpio
[4] alpmhooks(5) Manual Page. https://www.archlinux.org/pacman/alpmhooks.5.html
\frac{
\stackrel{\mathcal{E}_0}
{\langle b, \sigma \rangle \downarrow \textbf{true}} \qquad
\stackrel{\mathcal{E}_1}
{\langle c_0, \sigma \rangle \downarrow \sigma''} \qquad
\stackrel{\mathcal{E}_2}
{\langle \textbf{while } b \textbf{ do } c_0, \sigma'' \rangle \downarrow \sigma'}
}{
\langle \textbf{while } b \textbf{ do } c_0, \sigma \rangle \downarrow \sigma'}
I could just write it more neatly, intuitively, without all those hexes in magic backslashes and braces:
<b, sig> ! true  E_0
<c_0, sig> ! sig''  E_1
<while b do c_0, sig''> ! sig'  E_2

<while b do c_0, sig> ! sig'
But, of course, preserving the nicelooking outcome (from either TeX or MathJax).
Besides natural derivations, we also got a lot of evidently agreeable conventions for typesetting general mathematics: parentheses (brackets, braces) are almost always paired; “\left
” and “\right
” are very often desired since \(\left[\frac{1}{2}\right]\) is a bit less awkward than \([\frac{1}{2}]\); when referring to a function name, “\operatorname
” looks aesthetically better; etc. Furthermore, if we write down some math notations in plain text, clearly, “=>
” has to be a “\(\Rightarrow\)” and “>
” has to be a “\(\mapsto\)”; “
” means “\(\vdash\)” and “=
” means “\(\vDash\)”; a standalone letter “N
” is just \(\mathbb{N}\); etc. Taking such matters into consideration, there could be some handy alternative syntax (I call it “lazybones’ syntax”) for typesetting formulas, at least for a specific field someone specializes in, where these conventions are consistent:
Typesetting outcome  Syntax  

Provability  \(\Gamma \vdash \varphi\)  TeX:\Gamma \vdash \varphi Lazybones’: Gam  phi 
Validity  \(\Gamma \vDash \varphi\)  TeX:\Gamma \vDash \varphi Lazybones’: Gam = phi 
Bigstep semantics  \(\langle b_0 \land b_1, \sigma \rangle \downarrow t\)  TeX:\langle b_0 \land b_1, \sigma \rangle \downarrow t Lazybones’: <b_0 && b_1, sig> ! t 
Smallstep semantics  \(\sigma \vdash b_0 \land b_1 \to^* t\)  TeX:\sigma \vdash b_0 \land b_1 \to^* t Lazybones’: sig  b_0 && b_1 >* t 
Hoare logic  \(\vdash \{A\} \textbf{if } b \textbf{ then } c_0 \textbf{ else } c_1 \{B\}\)  TeX:\vdash \{A\} \textbf{if } b \textbf{ then } c_0 \textbf{ else } c_1 \{B\} Lazybones’:  <[A]> if b then c_0 else c_1 <[B]> 
Denotational semantics  \(\mathcal{C}[\![X:=a]\!] = \lambda \sigma . \eta (\sigma[X \mapsto \mathcal{A}[\![a]\!] \sigma])\)  TeX:\mathcal{C}[\![X:=a]\!] = \lambda \sigma . \eta (\sigma[X \mapsto \mathcal{A}[\![a]\!] \sigma]) Lazybones’: C[[X:=a]] = lam sig . eta (sig[X > A[[a]] sig]) 
For simplicity, we require that all such transformations are regular, i.e., “lazybones’ syntax” may be translated into plain TeX code using merely substitutions of regular expressions.
As a basic example, let’s consider angle brackets, for which we desire to type in simply “<
” and “>
” instead of “\langle
” and “\rangle
”. To avoid any ambiguity (with normal lessthan or greaterthan sign), it is required that such brackets have no internal spacing, that is, we write “<x, y>
” rather than “< x, y >
”, but “1 < 2
” instead of “1<2
”. Once we fix the transformation rules, implementation would be quite straightforward in Haskell with Text.Regex
, if we want a pandoc filter to handle these substitutions for everything embedded in TeX math mode:
subLAngle s =
subRegex (mkRegex "<([^[:blank:]])") s $ "\\langle \\1"
subRAngle s =
subRegex (mkRegex "([^[:blank:]])>") s $ "\\1\\rangle "
semType m@(Math mathType s) = do
let s' = subLAngle $ subRAngle s
return $ Math mathType s'
semType x = return x
So, why bother implementing a pandoc filter rather than just defining some TeX macros? Two main reasons:
Surely you can write pandoc filters in any language as you like (officially Haskell and Python). Nothing like plain TeX (despite the fact that it’s Turingcomplete): filters are easier to implement, unrestricted in the way that they interact with structured documents, with a great deal of cool supporting libraries even to blow up the earth. Although here in this case, we are only using some nobrainer pattern matching to release our fingers from heavy math typing.
A proofofconcept of this is in semtype.hs. While undocumented, all tricks it does should be clear from the comments and regular expressions themselves.
]]>There seem to be some common misconceptions about “proof by contradiction”. Recently I came across Robert’s blog post on this issue, and couldn’t help reviewing it in my own way. I prefer a more formal treatment, nevertheless, without sacrificing its intuitive comprehensibility.
What is a “proof by contradiction”, exactly? When a classical mathematician (probably you) talks about proving by contradiction, they can mean either one of these two (syntactically) different things:
Both syntactic forms have some kind of reductio ad absurdum (reduction to / derivation of contradiction) argument. However, the first form provides an indirect proof for \(P\); this is what we call a genuine “proof by contradiction”. The second form provides a direct proof for \(\lnot P\) which is just the negation of \(P\); this is preferably called a “proof of negation”, as it’s not a proof of \(P\) itself, but rather a proof of the negation of \(P\).
But how are they any different? You may ask. In a classical setting, there is no semantic difference. Notice that in the first form (“proof by contradiction”), we could rewrite \(P\) as \(\lnot Q\), then we get
which is just the second form, i.e., a “proof of negation”. Likewise, if we rewrite \(P\) as \(\lnot Q\) in the second form, we get
which is just the first form, i.e., a “proof by contradiction”. That’s the very reason people often misconceive them – classically, a proof by contradiction and a proof of negation can be simply converted to the form of one another, without losing their semantic validity.
From a constructivist perspective, things are quite different. In the above rewritten forms, we introduced a new term \(\lnot Q := P\). For the rewrite in the 3rd form to be valid, the new assumption \(Q\) must be as strong as the original assumption \(\lnot P\), which is just \(\lnot \lnot Q\). Formally, there must be \(\lnot \lnot Q \implies Q\). For the rewrite in the 4th form to be valid, the new statement \(Q\) must be not any stronger than the original statement \(\lnot P\), so formally there must also be \(Q \implies \lnot \lnot Q\). In intuitionistic logic, although we can derive a “double negation introduction” (thus complete the rewrite in the 4th form), there is no way to derive a “double negation elimination” as required to get the 3rd form. So technically, while we can soundly rewrite from a “proof of negation” to a “proof by contradiction”, the other direction is impossible. Indeed, we must make a clear distinction between a “proof by contradiction” and a “proof of negation” here: Semantically, they are not even dual and should not be fused with each other.
Why is this distinction important? Because in intuitionistic logic, the second form (proof of negation) is a valid proof; the first form (proof by contradiction) is not. Take a look at the negation introduction rule: \[\lnot_\mathsf{I} : \frac{\Gamma, P \vdash \bot}{\Gamma \vdash \lnot P}\] which justifies the validity of “proof of negation”. However, there is no such rule saying that \[\mathsf{PBC} : \frac{\Gamma, \lnot P \vdash \bot}{\Gamma \vdash P}\] In classical logic, where a rule like \(\mathsf{PBC}\) is allowed, one can easily derive the double negation elimination which we begged for before: Given \(\Gamma \vdash \lnot \lnot P\), the only rule that ever introduces a negation is \(\lnot_\mathsf{I}\), so we must also have \(\Gamma, \lnot P \vdash \bot\). Then by \(\mathsf{PBC}\), we get a derivation of \(\Gamma \vdash P\), as desired. \[\mathsf{DNE} : \frac{\Gamma \vdash \lnot \lnot P}{\Gamma \vdash P}\] If we adopt \(\mathsf{PBC}\), then we will also have adopted \(\mathsf{DNE}\); if we have \(\mathsf{DNE}\), then it would be perfectly valid to rewrite a “proof by contradiction” into the form of a “proof of negation”, or the other way around, as is already shown before. Since constructivists do not want to accept rules like \(\mathsf{PBC}\) or \(\mathsf{DNE}\) at all, they claim that a “proof by contradiction” and a “proof of negation” are essentially different, in that the latter is a valid proof but the former is doubtful, while their distinction is purely syntactical for classical mathematicians as the semantic equivalence would be trivial with \(\mathsf{PBC}\) or \(\mathsf{DNE}\).
The rationale behind constructivists’ choice of ruling out indirect proofs by rejecting derivation rules like \(\mathsf{PBC}\) and \(\mathsf{DNE}\) comes into view when talking about firstorder theories, where existential quantifiers are used. Say, if we wish to prove that there exists some \(x\) with a property \(P\), we must specify an example \(t\) which makes this property holds: \[\exists_{\mathsf{I}_t} : \frac{\Gamma \vdash P[t/x]}{\Gamma \vdash \exists x : S.P}\]
(\(t\) is a term of sort \(S\))
This is something called a constructive proof, in the sense that in order to derive an existentialization, one must construct such a term explicitly, directly.
What if we allow \(\mathsf{PBC}\) in our proofs then? We will be given enough power to utilize an alternate approach: Assume to the contrary that for all \(x\) in \(S\), \(\lnot P\) holds. Then we derive a contradiction. Thus, by \(\mathsf{PBC}\), there must exist some \(x\) such that \(P\) holds (since \(\lnot (\exists x : S.P) \equiv \forall x : S.\lnot P\)). Formally, \[\frac{\Gamma, \forall x : S.\lnot P \vdash \bot}{\Gamma \vdash \exists x : S.P}\] Note that a term \(t\) is nowhere to be evident in this form of proof. The downside of this classical approach of existence proof is that it is nonconstructive, so even if you can derive a proof that some mathematical object exists, you can’t claim that you necessarily know what it is, since you have not concretely constructed such an object yet. Its existence is just “principally provable”, but not necessarily constructible or witnessable.
I would say it’s too much of a philosophical choice between classical logic and intuitionistic logic – at least for old school mathematicians who don’t practically mechanize their proofs at all. But one with some logic maturity should be able to draw a semantic distinction between a “proof by contradiction” that \(\vdash P\) and a “proof of negation” that \(\vdash \lnot P\), bewaring of how their treatments can diverge in nonclassical logic settings. It is still questionable to me whether every theorem provable in classical logic can be proved constructively, whatsoever, a constructive proof almost always makes more sense: If you claim that you have an apple, just show me the apple, instead of arguing to me sophistically that it can’t be the case that you do not have an apple.
[1] Andrzej Filinski, “Course Notes for Semantics and Types, Lecture 1: Logic”.
[2] Robert Harper, “A ‘proof by contradiction’ is not a proof that ends with a contradiction”. https://existentialtype.wordpress.com/2017/03/04/aproofbycontradictionisnotaproofthatderivesacontradiction/
[3] Andrej Bauer, “Proof of negation and proof by contradiction”. http://math.andrej.com/2010/03/29/proofofnegationandproofbycontradiction/
[4] Timothy Gowers, “When is proof by contradiction necessary?”. https://gowers.wordpress.com/2010/03/28/whenisproofbycontradictionnecessary/
[5] Terence Tao, “The ‘no selfdefeating object’ argument”. https://terrytao.wordpress.com/2009/11/05/thenoselfdefeatingobjectargument/
Still, I have thought of many fun things I could do with statistical learning: image classification and face recognition (I always want a photo management application to be like that! I take and collect so many photos each day), a CBIR tool (like “Search by Image” in Google Images, but works locally on my computer), a GIMP ScriptFu that does tricks like image synthesis via ConvNets, etc. The good part about applying learning methods in image processing is that, once you can extract feature descriptors, everything becomes statistically learnable (and you as a programmer don’t really need to have prior knowledge about what an object visually is: an apple, or a pen?).
Cryptography, from what I learned, is a differently fun story. For a perfectly secure encryption scheme: \[\Pr[M=m\,\,C=c] = \Pr[M=m]\] that is, knowing the ciphertext \(c\) will not update any attacker’s belief about whatever the underlying message \(m\) is. Even if you have a statistical training model, it cannot learn anything from purely observations of ciphertexts. But this unbreakable level of security comes with a price: The key space must expose substantial entropy that is as high as the message space, thus the key length can be no shorter than the message length (given by Shannon’s theorem). In practice, the messages we sent do not usually have the highest entropy possible, and we can safely assume that the attackers’ computation ability is bounded by polynomialtime algorithms, thus, we as the cryptosystem designers need only to make up schemes that are assumed to be unbreakable (i.e., breakable with only a negligible probability) for any polynomialtime attackers. As we don’t know yet if there actually are any polynomial unsolvable cases (e.g., is P ≠ NP?), the proof of security would eventually rely on some unproven computational hardness assumptions: oneway functions exist, integer factorization is hard, discrete logarithm is hard, etc. If one can construct a provably secure scheme, it is guaranteed that statistical cryptanalysis would be theoretically impossible within polynomial time (except for sidechannel attacks); of course, if the hardness assumption we made is proved invalid, then nothing but the onetime pad can be secure.
I might be writing one or two blog posts about cryptographic security from an informationtheoretic perspective and some basic cryptanalysis on insecure schemes, but now is the time to move on with my unfinished courses about logic and programming. Before that, I feel that I should add a little bit philosophy to my wiki so as to refresh my viewpoint and methodology. And here it is.
(Philosophy is a sophisticated, highly arguable subject, so pardon me if there’s any inconsistency with your textbook.)
We can talk about the philosophy of science (particularly, philosophy of mathematics and statistics) with the understanding of epistemology and logic: Are you a logician or a statistician? If a logician, does set theory or type theory suit you the best? If a statistician, are you a Bayesian or a frequentist?
(As a personally opinionated note, I often find myself subscribe to the skepticism the most. But that doesn’t mean that logical reasoning and statistical inference aren’t useful to me; they are. Extremely. So as not to go too far with this subjective topic, I’ll be focusing more on classical / modal logic in the next few weeks.)
Papers:
[1] E. L. Gettier, “Is justified true belief knowledge?” analysis, vol. 23, no. 6, pp. 121–123, 1963.
That is, the discrete uniform distribution maximizes the entropy for a random string. Since \(\mathcal{X} = 2^\ell\), we have \(p(x) = 2^{\ell}\) and \(\operatorname{H}(X) = \sum_{x\in\mathcal{X}} 2^{\ell} \log_2 2^{\ell} = \ell\) (bits). We conclude that the information that can be represented in a \(\ell\)bit string is at most \(\ell\) bits. Some practical results include
Relative entropy (KL divergence) quantifies how much information diverges between two sets of data. For data differencing, the KL divergence gives the minimum patch size that is needed to reconstruct target data (with distribution \(p(x)\)) given source data (with distribution \(q(x)\)).
In particular, if \(p(x) = q(x)\), which means that the two distributions are identical, we have \(\operatorname{D}_\mathrm{KL}(p\q) = 0\). This follows our intuition that no information is gained or lost during data encoding/decoding. If \(p(x_0) = 0\) at \(x=x_0\), we take \(p(x) \log \frac{p(x)}{q(x)} = 0\), to justify the fact that the target data is trivial to reconstruct at this point, no matter how much information \(q(x)\) contains. However, if \(q(x_0) = 0\) at \(x=x_0\), we should take \(p(x) \log \frac{p(x)}{q(x)} = \infty\), so that the target data is impossible to reconstruct if we have only trivial \(q(x)\) at some point (unless \(p(x_0) = 0\)).
Lemma 4.1. (Gibbs’ inequality)^{2} The KL divergence is always nonnegative: \(\operatorname{D}_\mathrm{KL}(p\q) \geq 0\).
Informally, Lemma 4.1 simply states that in order to reconstruct target data from source data, either more information (\(\operatorname{D}_\mathrm{KL}(p\q) > 0\)) or no further information (\(\operatorname{D}_\mathrm{KL}(p\q) = 0\)) is needed.
Theorem 4.2. Normal distribution \(\mathcal{N}(\mu,\sigma^2)\) maximizes the differential entropy for given mean \(\mu\) and variance \(\sigma^2\).
Proof.^{3} Let \(g(x)\) be a pdf of the normal distribution with mean \(\mu\) and variance \(\sigma^2\). Let \(f(x)\) be an arbitrary pdf with the same mean and variance.
Consider the KL divergence between \(f(x)\) and \(g(x)\). By Lemma 4.1 (Gibbs’ inequality): \[\operatorname{D}_\mathrm{KL}(f\g) = \int_{\infty}^\infty f(x) \log \frac{f(x)}{g(x)} dx = \operatorname{H}(f,g)  \operatorname{H}(f) \geq 0\]
Notice that \[\begin{align*} \operatorname{H}(f,g) &=  \int_{\infty}^\infty f(x) \log g(x) dx \\ &=  \int_{\infty}^\infty f(x) \log \left( \frac{1}{\sqrt{2\pi\sigma^2}} e^{\frac{(x\mu)^2}{2\sigma^2}} \right) dx \\ &= \frac{1}{2} \left( \log(2\pi\sigma^2) + 1 \right) \\ &= \operatorname{H}(g) \end{align*}\]Therefore, \[\operatorname{H}(g) \geq \operatorname{H}(f)\] That is, the distribution of \(g(x)\) (Gaussian) always has the maximum entropy.
■
It is also possible to derive the normal distribution directly from the principle of maximum entropy, under the constraint such that \(\int_{\infty}^\infty (x\mu)^2f(x)dx = \sigma^2\).
The wellknown central limit theorem (CLT) which states that the sum of independent random variables \(\{X_1,\dots,X_n\}\) tends toward a normal distribution may be alternatively expressed as the monotonicity of the entropy of the normalized sum: \[\operatorname{H}\left( \frac{\sum_{i=1}^n X_i}{\sqrt{n}} \right)\] which is an increasing function of \(n\). [1]
Books:
T. M. Cover and J. A. Thomas. Elements of Information Theory, 2nd ed.
Papers:
[1] S. Artstein, K. Ball, F. Barthe, and A. Naor, “Solution of shannon’s problem on the monotonicity of entropy,” Journal of the American Mathematical Society, vol. 17, no. 4, pp. 975–982, 2004.
Generally, pseudorandom generators used in probabilistic algorithms yield random bits according to the uniform distribution, so it is worth mentioning:
Cryptographic schemes are defined as tuples of deterministic or probabilistic algorithms:
A brief, formalized overview of some classical ciphers, and their security:
Lessons learned from these classical ciphers: While perfect secrecy is easy to achieve (onetime pads), designing practical cryptographic schemes (with shorter keys, and computationally hard to break) can be difficult.
The construction of privatekey encryption schemes involves probabilistic algorithms. We simply assume that an unlimited supply of independent, unbiased random bits is available for these cryptographic algorithms. But in practice, this is a nontrivial issue, as the source of randomness must provide highentropy data so as to accommodate cryptographically secure random bits.
In the perfectly secret scheme of onetime pads, the key generation algorithm \(\mathsf{Gen}\) requires the access to a source of randomness in order to choose the uniformly random key \(k \in \mathcal{K}\). Practically, highentropy data may be collected via physical input or even fully written by hand with human labor.
Theoretically, without external intervention, we have:
Conjecture 3.1. Pseudorandom generators exist.
Theorem 3.2. (Pseudorandom generator theorem) Pseudorandom generators exist if and only if oneway functions exist.
Pseudorandomness is also a basic construction in CPAsecure encryption algorithms (\(\mathsf{Enc}\)), e.g., in stream ciphers and block ciphers.
So what is an acceptable level of pseudorandomness, if we are not sure whether such generators theoretically exist? Intuitively, if one cannot distinguish between a “pseudorandom” string (generated by a PRG) and a truly random string (chosen according to the uniform distribution), we have confidence that the PRG is a good one. Various statistical tests have been designed for testing the randomness of PRGs.
It holds true that:
Corollary 3.3. By redefining the key space, we can assume that any encryption scheme \(\Pi=(\mathsf{Gen},\mathsf{Enc},\mathsf{Dec})\) satisfies
If so, why do we still need probabilistic \(\mathsf{Enc}\) in CPAsecure encryptions? Can’t we just make \(\mathsf{Enc}\) deterministic while still being CPAsecure?
The first thing to realize is that chosenplaintext attacks are geared towards multiple encryptions (with the same secret key \(k\)), so when the adversary obtains a pair \((m_0, c_0)\) such that \(\Pr[C=c_0\,\,M=m_0] = 1\), the key is already leaked. (Recall that the adversary knows the deterministic algorithm \(\mathsf{Enc}_k\), thus reversing \(k\) from known \(m_0\) and \(c_0\) can be quite feasible; e.g., in a onetime pad, \(k = m_0 \oplus c_0\).) The only way to get around this is make \(\mathsf{Enc}_k\) probabilistic (constructed from a pseudorandom function), such that an adversary cannot reverse the key efficiently within polynomial time.
Note that perfect secrecy is not possible under CPA, since there is a small possibility that the adversary will reverse the key (by, for example, traversing an exponentially large lookup table of all random bits) and succeed in the further indistinguishability experiment with a slightly higher (but negligible) probability.
Onetime pad is one of the most (provably) secure encryption schemes, and its secrecy does not rely on any computational hardness assumptions. However, it requires that \(\mathcal{K} \geq \mathcal{M}\) (which in fact is a necessary condition for any perfectly secret scheme), thus its realworld use is limited.
The onetime key \(k\) (uniformly chosen from the key space \(\mathcal{K}\)) may not be simply reused in multiple encryptions. Assume that \(\mathcal{K} = \mathcal{M}\), for encryptions of \(n\) messages, the message space is expanded to size \(\mathcal{M}^n\), while the key space remains \(\mathcal{K}\), thus we have \(\mathcal{K} < \mathcal{M}^n\). Such a degraded scheme (manytime pad) is theoretically insecure and vulnerable to several practical cryptanalyses.
A historical exploit of the vulnerability of manytime pad occurred in the VENONA project, where the U.S. Army’s Signal Intelligence Service (later the NSA) aimed at decrypting messages sent by the USSR intelligence agencies (KGB) over a span of 4 decades. As the KGB mistakenly reused some portions of their onetime key codebook, the SIS was able to break a good amount of the messages.^{1}
Books:
J. Katz and Y. Lindell, Introduction to Modern Cryptography, 2nd ed.
Papers:
[1] C. E. Shannon, “Communication theory of secrecy systems,” Bell system technical journal, vol. 28, no. 4, pp. 656–715, 1949.
R. L. Benson, “The Venona story.” https://www.nsa.gov/about/cryptologicheritage/historicalfigurespublications/publications/coldwar/assets/files/venona_story.pdf↩
Intuitively, probability is a measure of uncertainty. Mathematically, probability is a realvalued function defined on a set of events in a probability space that satisfies measure properties such as countable additivity (or simply, measure on probability space).
Typically, a probability density function (pdf) or a probability mass function (pmf) determines a distribution in the probability space.
Example 2.1. Consider the wave function of a particle: \[\Psi(x,t)\] where \(x\) is position and \(t\) is time.
If the particle’s position is measured, its location cannot be determined but is described by a probability distribution: The probability that the particle is found in \([x, x+\Delta x]\) is \[\Delta\Pr = \Psi(x,t)^2 \Delta x\]
The square modulus of the wave function (which is realvalued, nonnegative) \[\left\Psi(x, t)\right^2 = {\Psi(x, t)}^{*}\Psi(x, t) = \rho(x, t)\] is interpreted as the pdf.
Since the particle must be found somewhere, we have the normalization condition: (by the assumption of unit measure) \[\int\limits_{\infty}^\infty \Psi(x,t)^2 dx = 1\]
Distributions are also called generalized functions in analysis. It expands the notion of functions to functions whose derivatives may not exist in the classical sense. Thus, it is not uncommon that many probability distributions cannot be described using classical (differentiable) functions. The Dirac delta function \(\delta\) (which is a generalized function) is often used to represent a discrete distribution, or a partially discrete, partially continuous distribution, using a pdf.
Bayes’ theorem forms the basis for Bayesian inference, which is an important method of statistical inference that updates the probability for a hypothesis as more evidence or information becomes available.
Hypotheses can also be fallacies. In Bayesian inference, if one can make the assumption that every event occurs independently and the probability is identically distributed throughout lasting trials, it is clear to see that some common beliefs are mistaken.
Gambler’s fallacy (Monte Carlo fallacy). If an outcome occurs more frequently than normal during some period, it will happen less frequently in the future; contrariwise, if an outcome happens less frequently than normal during some period, it will happen more frequently in the future. This is presumed to be a means of balancing nature.
Gambler’s fallacy is considered a fallacy if the probability of outcomes is known to be independently, identically distributed. Assume that the future (the probability of event \(A_2\)) has no effect on the past (the probability of event \(A_1\)), we have \(\Pr[A_1A_2] = \Pr[A_1]\). From Bayes’ theorem, it holds true that \[\Pr[A_2A_1] = \Pr[A_2]\] That is, past events should not increase or decrease our confidence in a future event.
Hothand fallacy. A person who has experienced success with a seemingly random event has a greater chance of further success in additional attempts. That is, if an outcome occurs more frequently than normal during some period, it will also happen frequently in the future.
If psychological factors can be excluded, then hothand fallacy is a fallacy caused by people’s confirmation bias. Like the gambler’s fallacy, if we can’t assume that the probability of outcomes is independently, identically distributed, we can’t simply conclude that this belief is mistaken.
Inverse gambler’s fallacy. If an unlikely outcome occurs, then the trials must have been repeated many times before.
Assume that the past (the probability of event \(A_1\)) has no effect on the future (the probability of event \(A_2\)), we have \(\Pr[A_2A_1] = \Pr[A_2]\). From Bayes’ theorem, it holds true that \[\Pr[A_1A_2] = \Pr[A_1]\] That is, our confidence in \(A_1\) should remain unchanged after we observe \(A_2\).
Fallacies of hasty generalization and slothful induction (law of small numbers). Informal fallacies reaching an inductive generalization based on insufficient evidence, or denying a reasonable conclusion of an inductive argument.
Statistically saying, sampling from a small group can lead to misbeliefs that fail to hold for the entire population, if hypothesis testing is not carefully conducted.
Theorem 2.2. (Law of large numbers) Let \(X_1, \dots, X_n\) be an infinite sequence of i.i.d. Lebesgue integrable random variables with fixed expectation \(\operatorname{E}[X_1] = \cdots = \operatorname{E}[X_n] = \mu\). Define the sample average \[\overline{X}_n = \frac{1}{n}(X_1 + \dots + X_n)\]
Chebyshev’s inequality provides an upper bound on the probability that a random variable deviates from its expected value. Thus, it may be used as a proof for the weak law of large numbers.
The expected value of a random variable \(X\): \[\operatorname{E}[X] = \sum_{x \in \mathcal{X}} x \Pr[X=x]\] While it seemingly gives an estimate on how people would “expect” a random variable to take its value, it can sometimes lead to counterintuitive results, as shown by the following paradox.
St. Petersburg Paradox.^{1} A casino offers a game of chance for a gambler to flip a fair coin until it comes up tails. The initial stake starts at \(2\) dollars and is doubled every time heads appears. The first time tails appears, the game ends and the gambler wins whatever is in the pot. Thus if the coin comes up tails the first time, the gambler wins \(2^1=2\) dollars, and the game ends. If the coin comes up heads, the coin is flipped again. If the coin comes up tails the second time, the gambler wins \(2^2=4\) dollars, and the game ends. If the coin comes up heads again, the coin is flipped again. If the coin comes up tails the third time, the gambler wins \(2^3=8\) dollars, and the game ends. So on and so like. Eventually the gambler wins \(2^k\) dollars, where \(k\) is the number of coin flips until tails appears. (It is easy to see that \(k\) satisfies the geometric distribution.) What would be a fair price to pay the casino for entering such a game? (Assume that there is no house edge)
\(k\)th coin flip  \(\Pr[\text{Tails}]\)  Stake  Expected payoff 

1  \(\frac{1}{2}\)  2  1 
2  \(\frac{1}{4}\)  4  1 
3  \(\frac{1}{8}\)  8  1 
4  \(\frac{1}{16}\)  16  1 
5  \(\frac{1}{32}\)  32  1 
…  …  …  … 
k  \((1/2)^k\)  \(2^k\)  1 
The price should be made equal to the expected value that a gambler wins the stake, which is \[\operatorname{E}[\text{Payoff}] = \sum_{k=1}^{+\infty} \left(\frac{1}{2}\right)^k \cdot 2^k = \sum_{k=1}^{+\infty} 1 = +\infty\]
If a rational gambler pays for entering a game if and only if its average payoff is larger than its price, then he would pay any price to enter this game (since the expected payoff of this game is infinitely large). But in reality, few of us are willing to pay even tens of dollars to enter such a game. What went wrong? Furthermore, if mathematical expectation does not reflect correctly what people expect from a game, how to quantify the “true” expectation?
The St. Petersburg paradox was initially stated by Nicolas Bernoulli in 1713. There are several proposed approaches for solving the paradox, including the expected utility theory with the hypothesis of diminishing marginal utility [1], and the cumulative prospect theory. However, none of them is purely probability theoretical, as they require the use of hypothesized economic/behavioral models.
In probability theory, the variance of a random variable \(X\) is defined as \[\operatorname{Var}(X) = \operatorname{E}[(X\mu)^2] = \frac{1}{N} \sum_{i=1}^N (X_i\bar{X})^2\]
In statistics, when calculating the sample variance in order to give an estimation of the population variance, and the population mean is unknown, Bessel’s correction^{2} (use of \(N1\) instead of \(N\)) is often preferred: \[s^2 = \frac{1}{N1} \sum_{i=1}^N (X_i\bar{X})^2\]
A few remarks and caveats:
Books:
M. Mitzenmacher and E. Upfal, Probability and Computing: Randomized Algorithms and Probabilistic Analysis.
M. Baron, Probability and Statistics for Computer Scientists, 2nd ed.
Articles:
[1] R. Martin, “The st. petersburg paradox,” in The stanford encyclopedia of philosophy, Summer 2014., E. N. Zalta, Ed. https://plato.stanford.edu/archives/sum2014/entries/paradoxstpetersburg/; Metaphysics Research Lab, Stanford University, 2014.
Yes, it bugs me when I can’t restore my last open tabs and I want my old session so badly. Remembering last tabs, if I get the history right, was a feature first introduced by Google Chrome, and soon it started to play an indispensable part in my daily workflow. I’m a multitasker, but the computing resource of my laptop is very limited – Say, if I have a session in which I am working on a homework report, having loads of folders, web pages and editor buffers open and those can fill up gigabytes of RAM easily, then I realize that I will need to compile something really hardcore, or maybe just take a short rest and do some random surfing on the web, certainly I would rather close all those engrossing processes for the time being, hoping that they could continue with all the open tabs I left off.
It’s mainly four types of applications that account for socalled “work sessions” for me:
Terminals don’t take up a lot of memory, so I wouldn’t mind leaving them open. Typical browsers, including Chromium and Firefox, do support session resuming (and there are even extensions which allow you to save current tabs and recover them at any later time). Any decent text editor (or IDE) may also be configured to remember open tabs / sessions. After all, average file managers fail to meet my basic needs of productivity.
I’m on GNOME 3, but currently using the Caja file manager – ever since Nautilus 3.6 decided to remove two or three features I found important to me (compact mode, backspace navigation) and introduced an awkward, smug “searchwhatevershitasyoutype” feature.
File managers I’ve tried so far:
Among all above, I settled down with Caja, simply because there was no reasonably good alternative. Still, I’m wishing for something that can save session states for me. After doing a little research, I realized that:
Let’s take the issue into more technical details. On Caja (or other similarly GTK+/GLibbased file managers), one need to implement:
destroy
callback of the main GtkObject
, all last remaining session data (e.g., internal states about open tabs, windows) must be saved to disk. (after the main loop ends there’s no more chance to get this information back)With caja_application_get_session_data()
, making a quick workaround that enables Caja to save and restore a session is somewhat trivial labor; however, it seems Caja doesn’t record the correct (spatial) ordering of tabs in its session data – so I wouldn’t consider this as a satisfying solution to the issue, and I have no intent to send such an incomplete patch to Caja. Nevertheless, it’s better than nothing, and, if ordering of tabs really matters, it would be feasible to write a wrapper script that manipulates the XML file in $HOME/.config/caja/lastsession
.
And here goes the patch: (Applied to Caja 1.16.1; definitely UNWARRANTED)
title
, author
and date
) into a properly typeset PDF document:
$ pandoc src.md o out.pdf
However, Markdown is not TeX. Not even close. Once you need to have some bleeding edge control over the typesetting outcome, or perhaps just a little refinement on its LaTeX templating, you’ll soon notice that Pandoc has its quirks and gotchas. I’ve been utilizing Pandoc in all my serious academic writing (incl. homework reports) for years, ever since I gave up on learning more about the overwhelmingly sophisticated TeX ecosystem and turned to something that “just works”. Pandoc fits my needs well. And when it doesn’t, there’s almost always a workaround that achieves the same thing neatly. And this is what this writeup is mostly about.
default.latex
? Bad idea.You could, of course, modify the default template (default.latex
) provided by Pandoc, as long as you’re no stranger to LaTeX. In this way, you can achieve anything you want – in pure LaTeX.
$ pandoc template mydefault.latex src.md o out.pdf
There are, however, a few problems with this naïve approach:
I’m conservative about changing the templates. If it’s a general issue that needs to be fixed in the default template, sending a pull request to pandoctemplates might be a better idea. Of course, if there’s a certain submission format you have to stick with (given LaTeX templates for conference papers), then you will fall back on your own.
I wouldn’t claim that I know the best practice of using Pandoc, but there’s such a common idiom that cannot be overstressed: Separate presentation and content!
In the YAML front matter of src.md
(the main Markdown file you’re writing), put only things that matter to your potential readers:

title: Boilerplating Pandoc for Academic Writing
subtitle: or How I Learned to Stop Typesetting and Concentrate on the Math
author: Mort Yao
date: 17 November 2016
abstract: 
Lorem ipsum dolor sit amet, consectetur adipiscing elit,
sed do eiusmod tempor incididunt ut labore et dolore magna
aliqua. Ut enim ad minim veniam, quis nostrud exercitation
ullamco laboris nisi ut aliquip ex ea commodo consequat.

And in a separate YAML file (let’s call it default.yaml
), here goes the formatting stuff:

geometry: margin=1.5in
indent: true
headerincludes: 
\usepackage{tcolorbox}
\newcommand\qed{\hfill\rule{1em}{1em}}

Above is my personal default, and it’s worth a few words to explain:
geometry
is where you control the geometric settings of your document. For example, you may narrow down the page margin to margin=1.5in
, and this is equivalent to raw LaTeX:
\usepackage[margin=1.5in]{geometry}
indent
to any value other than false
if paragraph indentation is desired. (And it is often desired in formal publications.)headerincludes
is where you define your own macros, configure existing ones, or claim \usepackage
in case you want to use a package not enabled by Pandoc (e.g., tcolorbox
). Although you might as well define those in other places (e.g., in the content of a Markdown file), don’t do that.
\newcommand\qed{\hfill\rule{1em}{1em}}
is my favorite of all time. It doesn’t require the amsthm
package.With a separate default.yaml
, now here we are:
$ pandoc default.yaml src.md o out.pdf
headerincludes
You might have already noticed that the subtitle
field won’t display in the produced PDF file. As far as I’m concerned (in Pandoc 1.18), this is the expected behavior. See here in README:
To make
subtitle
work with other LaTeX document classes, you can add the following toheaderincludes
:\providecommand{\subtitle}[1]{% \usepackage{titling} \posttitle{% \par\large#1\end{center}} }
Unfortunately, this won’t work (until Issue #2139 is resolved) since Pandoc parses the headerincludes
metadata field as Markdown, and the bracketed [1]
is misinterpreted as literals rather than a part of LaTeX control sequence. So the workaround is: Instead of embedding headerincludes
as a metadata field in YAML, we should separate it into another file for this dedicated purpose (it’s simply raw LaTeX anyway), and include it using includeinheader/H
:
$ pandoc H header.tex default.yaml src.md o out.pdf
Note that you can’t have two headerincludes
for one document. So the headerincludes
field specified in YAML metadata will be overridden by the content of header.tex
.
While the Markdown syntax for citing is rather easy ([@id]
), it takes effort to make things right, especially if you have a certain preferred citation format (APA, MLA, Chicago, IEEE, etc.).
The suggestion is: Use pandocciteproc. Once you have a list of references you’re interested in, you need two things to typeset those nicely in your document:
.csl
), to specify the citation format you want to use.
.bib
), which is a list of all entries you might cite.
As part of the YAML metadata: (Assume you have ieee.csl
and references.bib
)
csl: ieee.csl
bibliography: references.bib
Using pandocciteproc
as a filter, generate the document with citations:
$ pandoc filter pandocciteproc H header.tex default.yaml src.md o out.pdf
The list of references is appended to the end of the document. It is often desirable to give the references an obvious title (“References”), start from a new page and avoid any further indentation, so the following comes in the end of the Markdown source:
\newpage
\setlength\parindent{0pt}
# References
Basically, we need 5 files in total:
src.md
(Markdown + possibly LaTeX mixed format): Main text.references.bib
(BibTeX/BibLaTeX format): List of references.default.yaml
(YAML format): Formatrelated metadata.header.tex
(LaTeX format): Content of headerincludes
; package imports and macro definitions.ieee.csl
(CSL XML format): Citation style.And one command:
$ pandoc filter pandocciteproc H header.tex default.yaml src.md o out.pdf
amsthm
?Pandoc doesn’t provide native support for amsthm
(and I wonder if there will ever be). You can still have the same thing in Pandoc Markdown:
\newtheorem{definition}{Definition}
\begin{definition}
Man is a rational animal.
\end{definition}
However, everything in between \begin
and \end
will be treated as raw LaTeX, and the expressiveness of Markdown is lost there. More importantly, this is purely a LaTeXspecific thing, so there’s no way for Pandoc to convert this to HTML or any other format (unless you have a filter that does the trick). Consequently, I tend to write all definitions / theorems (lemmas, claims, corollaries, propositions…) in simple Markdown:
**Definition 1.** *Man is a rational animal.*
It does have some advantages over amsthm
:
amsthm
, you cannot see the numbering of each theorem (definition, etc.) in the text editor (well, you can’t without a dedicated plugin at least). This is inconvenient when you need to refer to a prior one later. By numbering them explicitly, you can clearly see these ordinals in the Markdown source.This also has some drawbacks compared to using amsthm
, though:
(Probably) The best solution is to write a filter that (conventionally) converts any plain text like Definition 1
(and Lemma 2
, Theorem 3
, etc.) in the beginning of a paragraph to proper Markdown (for HTML target) or corresponding amsthm
block (for LaTeX target). Even better, it should be able to do crossreferences accordingly (Remember Lemma 10.42
? Let’s put an anchored link on that!). This is yet to be done, but would be very helpful to someone who does a lot of theorems and proofs thus wants to avoid the kludge of mixing raw LaTeX with semantically evident Markdown.
This is an expository reading summary of a selected CRYPTO 2015 paper I did as an assignment in KU’s Introduction to Modern Cryptography course. Adversarialresilient Bloom filters are the counterparts of cryptographically secure hash functions in an adversarial setting, where adaptive adversaries that have access to a deterministic or nondeterministic query oracle may challenge the data structure in a way that intentionally increases the false positive rate of querying. As a preliminary result, this paper shows that the resistance of Bloom filters against computationally bounded adversaries requires that oneway functions exist; furthermore, such constructions are possible using pseudorandom permutations. I do find the proof a bit involved, but the notions of security introduced for Bloom filters are new and appealing (which I haven’t read previously anywhere else).
Original paper:
Abstract
Bloom filter is a hashbased probabilistic data structure which is spaceefficient for set membership querying, with a small probability of false positives. Naor and Yogev’s 2015 paper introduces the adversarial model and formally proposes a strong notion of security for Bloom filters, i.e., adversarial resilience, based on an adversarial game under a cryptographic setting. This paper also discusses the correspondence between adversarialresilient Bloom filters and the open assumption that oneway functions exist, thus enables theoretical constructions using pseudorandom permutations. We believe that such an understanding will help design practical Bloom filters that are safe from known attacks in software systems.
Probabilistic data structures are data structures that employ randomness in their designs to enable more efficient approaches of storing and querying data, compared to deterministic ones. Traditionally, the algorithmic probabilistic analysis of such data structures assumes the model where all inputs and queries are independent of the internal randomness of data structures. In this work, we consider an adversarial environment, where a computationally bounded adversary^{1} may adaptively chooses inputs and queries with the intention of degrading the efficiency of the underlying data structure of some computer system. By introducing the adversarial model, we analyze the behavior of such data representations under the cryptographic notion of computational security against adversaries; furthermore, it enables us to construct more efficient, provably secure schemes of probabilistic data structures.
As a concrete example, a Bloom filter is a probabilistic data structure that holds a set \(S\) of elements approximately, using significantly fewer bits of storage and allowing for faster access than a complete representation. As a tradeoff between efficiency and preciseness, for any query of \(x \in S\), a Bloom filter always outputs a yesanswer, and for any query of \(x \not\in S\), it should output a yesanswer only with small probability. In other words, a noanswer given by a Bloom filter indicates unambiguously that \(x \not\in S\), while a yesanswer indicates that \(x \in S\) probably holds^{2}, that is, it allows false positives. Ideally, the error probability that a Bloom filter returns a false positive should be as small as possible.
Approaching the commonlyseen set membership problem, Bloom filters have been implemented widely in realworld applications, specifically as internal data representations for optimizing largescale software systems. For example, Akamai’s CDN^{3} servers maintain Bloom filters in their memories to decide whether to lookup the disk cache for a requested resource, and a false positive of the Bloom filter causes a cache miss, which means that the server has to make an unnecessary disk lookup at an expense of time and system workload; if an attacker exploits the behaviors of the Bloom filter, it is possible for them to cast queries that degrade the disk cache hit rate of the CDN servers, and consequently, perform a DenialofService (DoS) attack.[1] On another scenario, where Bitcoin clients apply Bloom filters in the Simplified Payment Verification (SPV) mode to increase the overall performance of wallet synchronization, an adversary may perform a DoS attack on an SPV node by learning from the responses of Bloom filters they have access to.[2]
As discussed above, the adversarial model addresses some security issues, thus the necessity of defining security in adversarial environments and constructing provably secure Bloom filters arises. Essentially, it is desirable for a wellconstructed Bloom filter to maintain its small error probability in an adversarial environment; we say that such a Bloom filter is adversarial resilient (or just resilient). In an adversarial game, where an adversary has oracle access to the Bloom filter and is allowed to make a number of \(t\) queries before it outputs a certain \(x^*\) (that has not been queried before) which is believed to be a false positive, and if it is, the adversary wins the game. We say that a Bloom filter is \((n,t,\varepsilon)\)adversarial resilient if when initialized over sets of size \(n\) then after \(t\) queries the probability of \(x^*\) being a false positive is at most \(\varepsilon\). A Bloom filter that is resilient for any polynomially many queries is said to be strongly resilient.
Clearly, a trivial construction of a strongly resilient Bloom filter would be a deterministic lookup table that stores \(S\) precisely, so that there is no false positive which an adversary can find. However, such a construction does not take advantage of the space and time efficiency as a normal Bloom filter would do, since it stores every element in the memory. In the following, we consider only nontrivial Bloom filters, and we show that for a nontrivial Bloom filter to be adversarialresilient, oneway functions must exist; that is, if oneway functions do not exist, then any nontrivial Bloom filter can be attacked with a nonnegligible probability by an efficient adversary. Furthermore, under the assumption that oneway functions exist, a pseudorandom permutation (PRP) can be used to construct a strongly resilient Bloom filter which has a reasonable memory consumption.
The construction of a Bloom filter consists of two algorithms: an initialization algorithm that gets a set \(S\) and outputs a memoryefficient representation of \(S\); a query algorithm that gets a representation of \(S\) and an \(x\) to be checked and outputs \(1\) if \(x \in S\), otherwise \(0\). Typically, the initialization algorithm is randomized but the query algorithm is deterministic, that is, a query operation does not amend the existing representation. We say that such a Bloom filter has a steady representation.^{4}
In the following model, we consider a universal set \(U\) and a subset \(S \subset U\) to be stored in a Bloom filter. We denote that \(u=U\) and \(n=S\).
Definition 1. (Steadyrepresentation Bloom filter) Let \(\mathbf{B}=(\mathbf{B}_1,\mathbf{B}_2)\) be a pair of polynomialtime algorithms, where \(\mathbf{B}_1\) is a randomized algorithm that gets as input a set \(S\) and outputs a representation \(M\), and \(\mathbf{B}_2\) is a deterministic algorithm that gets as input a representation \(M\) and a query element \(x \in U\). We say that \(\mathbf{B}\) is an \((n,\varepsilon)\)Bloom filter (with a steady representation) if for any set \(S \subset U\) of size \(n\) it holds that:
\(\forall x \in S, \Pr[\mathbf{B}_2(\mathbf{B}_1(S), x) = 1] = 1\) (Completeness)
\(\forall x \not\in S, \Pr[\mathbf{B}_2(\mathbf{B}_1(S), x) = 1] \leq \varepsilon\) (Soundness)
where the probability is taken over the randomness used by the algorithm \(\mathbf{B}_1\).
Intuitively, the first property (completeness) says that for all elements in the set \(S\), the Bloom filter is guaranteed to output a yesanswer correctly; the second property (soundness) gives the upper bound that the Bloom filter outputs a false positive, that is, the query algorithm returns \(1\) when an element does not actually belong to the set \(S\). Formally,
False positive and error rate. Given a representation \(M\) of \(S\), if \(x \not\in S\) and \(\mathbf{B}_2(M,x)=1\), we say that \(x\) is a false positive. And we say that the probability bound \(\varepsilon\) of outputting false positives is the error rate of \(\mathbf{B}\).
In an adversarial environment, consider the following experiment for any Bloom filter \(\mathbf{B}=(\mathbf{B}_1,\mathbf{B}_2)\), adversary \(\mathcal{A}\), value \(t\) as the bound of the amount of queries which \(\mathcal{A}\) can make, and value \(\lambda\) as the security parameter.
The Bloom filter resilience challenge experiment \(\mathsf{Challenge}_{\mathcal{A},\mathbf{B},t}(\lambda)\):
Definition 2. (Adversarialresilient Bloom filter) Let \(\mathbf{B}=(\mathbf{B}_1,\mathbf{B}_2)\) be an \((n,\varepsilon)\)Bloom filter (with a steady representation). We say that \(\mathbf{B}\) is an \((n,t,\varepsilon)\)adversarial resilient Bloom filter if for any set \(S\) of size \(n\), for all sufficiently large \(\lambda \in \mathbb{N}\) and for all probabilistic polynomialtime adversaries \(\mathcal{A}\), it holds that \[\Pr[\mathsf{Challenge}_{\mathcal{A},\mathbf{B},t}(\lambda) = 1] \leq \varepsilon\]
where the probability is taken over the randomness used by the algorithm \(\mathbf{B}_1\) and \(\mathcal{A}\).
To define the nontriviality of a Bloom filter formally, notice that it is always desirable to minimize the memory use of the Bloom filter. Let \(\mathbf{B}\) be an \((n,\varepsilon)\)Bloom filter that uses \(m\) bits of memory. It is shown[3] that the lower bound of \(m\) is \(m \geq n \log \frac{1}{\varepsilon}\). Thus, we define
Definition 3. (Minimal error) Let \(\mathbf{B}\) be an \((n,\varepsilon)\)Bloom filter. We say that \(\varepsilon_0 = 2^{\frac{m}{n}}\) is the minimal error of \(\mathbf{B}\).
As mentioned previously, a trivial construction of Bloom filters is a lookup table that stores \(S\) precisely, in which case, the memory use \(m=\log \binom{u}{n} \approx n \log(\frac{u}{n})\), thus by using the bound \(m \geq n \log \frac{1}{\varepsilon}\), a construction is trivial if \(\varepsilon > \frac{n}{u}\). On the other hand, if \(u\) is superpolynomial in \(n\), then \(\varepsilon\) is negligible in \(n\) and every polynomialtime adversary has only negligible probability to find any false positive, therefore for such \(S \subset U\), the Bloom filter must be trivial. Notice that \(\varepsilon_o \leq \varepsilon\), we then define
Definition 4. (Nontrivial Bloom filter) Let \(\mathbf{B}\) be an \((n,\varepsilon)\)Bloom filter and let \(\varepsilon_0\) be the minimal error of \(\mathbf{B}\). We say that \(\mathbf{B}\) is nontrivial if there exists a constant \(c \geq 1\) such that \(\varepsilon_0 > \max\{\frac{n}{u},\frac{1}{n^c}\}\).
We now show that the existence of adversarial resilient Bloom filters depends on the existence of oneway functions, that is, if any nontrivial, strongly resilient Bloom filter exists, then oneway functions also exist.
Theorem 5. Let \(\mathbf{B}=(\mathbf{B}_1,\mathbf{B}_2)\) be any nontrivial Bloom filter (with a steady representation) of \(n\) elements that uses \(m\) bits of memory, and let \(\varepsilon_0\) be the minimal error of \(\mathbf{B}\). If \(\mathbf{B}\) is \((n,t,\varepsilon)\)adversarial resilient for \(t=\mathcal{O}(\frac{m}{\varepsilon_0^2})\), then oneway functions exist.
Proof. First we assume that oneway functions do not exist, then we show that we can construct a polynomialtime adversary \(\mathcal{A}\) such that \[\Pr[\mathsf{Challenge}_{\mathcal{A},\mathbf{B},t}(\lambda) = 1] > \varepsilon\] given a fixed value \(\varepsilon\). That is, \(\mathbf{B}\) cannot be \((n,t,\varepsilon)\)adversarial resilient.
Define the following function \(f\): \[f(S,r,x_1,\dots,x_t) = (x_1,\dots,x_t,\mathbf{B}_2(M,x_1),\dots,\mathbf{B}_2(M,x_t))\] where \(S\) is a set of size \(n\), \(r\) is the number of bits used by the randomness of \(\mathbf{B}_1\), \(M\) is a representation of \(S\) generated by \(\mathbf{B}_1\), and \(t=\frac{200m}{\varepsilon_0}\). Clearly, \(f\) is polynomialtime computable.
Since \(f\) is not a oneway function (under the assumption that oneway functions do not exist), there is also an algorithm that can invert \(f\) efficiently. Thus we have,
Claim 6. Assume that oneway functions do not exist, there exists a polynomialtime algorithm \(\mathcal{A}\) that inverts \(f\) with a failure probability of at most \(\frac{1}{100}\): \[\Pr[f(\mathcal{A}(f(S,r,x_1,\dots,x_t))) \neq f(S,r,x_1,\dots,x_t)] < \frac{1}{100}\]
Proof. Because \(f\) is not a oneway function, there exists[4] an algorithm \(\mathcal{A}'\) such that \(\Pr[\mathsf{Invert}_{\mathcal{A}',f}(n) = 1] \geq \frac{1}{p(n)}\), where \(p(n)\) is polynomial in \(n\). Construct an algorithm \(\mathcal{A}\) that runs \(\mathcal{A}'\) individually for \(\lceil\frac{\log 100}{\log(p(n))  \log(p(n)1)}\rceil\) times, so we have the total failure probability \(\Pr[f(\mathcal{A}(f(S,r,x_1,\dots,x_t))) \neq f(S,r,x_1,\dots,x_t)] < \left(1\frac{1}{p(n)}\right)^{\lceil\frac{\log 100}{\log(p(n))  \log(p(n)1)}\rceil} \leq \frac{1}{100}\)
■
Using \(\mathcal{A}\), construct the following probabilistic polynomialtime algorithm \(\mathsf{Attack}\):
The Algorithm \(\mathsf{Attack}\)
\(\mathsf{Attack}\) is given oracle access to the query algorithm \(\mathbf{B}_2(M,\cdot)\), and gets \(1^\lambda\) as input.
 For \(i \in \{1,\dots,t\}\), sample \(x_i \in U\) uniformly, and query \(y_i = \mathbf{B}_2(M,x_i)\).
 Run \(\mathcal{A}\) (the inverter of \(f\)) and get \((S',r',x_1,\dots,x_t) \leftarrow \mathcal{A}(x_1,\dots,x_t,y_1,\dots,y_t)\).
 Compute \(M' \overset{r'}{\leftarrow} \mathbf{B}_1(1^\lambda,S')\), using \(r'\) as the randomness bits in the initialization.
 For \(k=1,\dots,\frac{100}{\varepsilon_0}\), do:
 Sample \(x^* \in U\) uniformly.
 If \(\mathbf{B}_2(M',x^*)=1\) and \(x^* \not\in \{x_1,\dots,x_t\}\), output \(x^*\) and HALT.
 Sample \(x^* \in U\) uniformly, and output \(x^*\).
Claim 7. Assume that \(\mathcal{A}\) inverts \(f\) successfully. For any representation \(M\), the probability such that there exists a representation \(M'\) that for \(i \in \{1,\dots,t\}\), \(\mathbf{B}_2(M,x_i)=\mathbf{B}_2(M',x_i)\), and that the error rate \(\Pr[\mathbf{B}_2(M,x) \neq \mathbf{B}_2(M',x)] > \frac{\varepsilon_0}{100}\) is at most \(\frac{1}{100}\).
Proof. From the error rate of any \(x\) and the independence of the choice of \(x_i\), we get \[\Pr[\forall i \in \{1,\dots,t\} : \mathbf{B}_2(M,x_i) = \mathbf{B}_2(M',x_i)] \leq \left(1  \frac{\varepsilon_0}{100}\right)^t\]
Since the Bloom filter uses \(m\) bits of memory, there are \(2^m\) possible representations as candidates for \(M'\). Thus, by union bound, \[\Pr[\exists M' \ \forall i \in \{1,\dots,t\} : \mathbf{B}_2(M,x_i) = \mathbf{B}_2(M',x_i)] \leq 2^m \left(1  \frac{\varepsilon_0}{100}\right)^t \leq \frac{1}{100}\]
Since \(\mathcal{A}\) is assumed to invert \(f\) successfully, it must output a representation \(M'\) such that for \(i \in \{1,\dots,t\}\), \(\mathbf{B}_2(M,x_i)=\mathbf{B}_2(M',x_i)\). Therefore, the above bound holds.
■
Define \(\mu(M)=\Pr_{x \in U}[\mathbf{B}_2(M,x)=1]\) as the positive rate over \(U\), we now show that for almost all possible representations \(M\) generated from set \(S\) and randomness \(r\), it holds true that \(\mu(M) > \frac{\varepsilon_0}{8}\), with only a negligible probability of error:
Claim 8. \(\Pr_S[\exists r : \mu(M_r^S) \leq \frac{\varepsilon}{8}] \leq 2^{n}\).
Proof. Let \(\mathsf{BAD}\) be the set of all sets \(S\) such that there exists an \(r\) such that \(\mu(M_r^S) \leq \frac{\varepsilon_0}{8}\). Given \(S \in \mathsf{BAD}\), let \(\hat{S}\) be the set of all elements \(x\) such that \(\mathbf{B}_2(M_r^S,x)=1\), then \(\hat{S} \leq \frac{\varepsilon_0}{8} \cdot u\). Notice that we can encode the set \(S\) using the representation \(M_r^S\) while specifying \(S\) from all subsets of \(\hat{S}\) of size \(n\), and the encoding bits must be no less than \(\log\mathsf{BAD}\) (which is the number of bits required to encode \(\mathsf{BAD}\)): \[\log\mathsf{BAD} \leq m + \log\binom{\frac{\varepsilon_0 u}{8}}{n} \leq m + n \log\left(\frac{\varepsilon_0 u}{8}\right)  n \log n + 2n \leq n + \log\binom{u}{n} \] thus \(\mathsf{BAD} \leq 2^{n}\binom{u}{n}\). Since the number of sets \(S\) is \(\binom{u}{n}\), \(\Pr_S[\exists r : \mu(M_r^S) \leq \frac{\varepsilon}{8}] \leq 2^{n}\).
■
Claim 9. Assume that \(\Pr[\mathbf{B}_2(M,x) \neq \mathbf{B}_2(M',x)] \leq \frac{\varepsilon_0}{100}\) and that \(\mu(M) > \frac{\varepsilon_0}{8}\). The probability that \(\mathsf{Attack}\) does not halt on Step 4 is at most \(\frac{1}{100}\).
Proof. It follows directly from the assumptions that \(\mu(M') > \frac{\varepsilon_0}{8}  \frac{\varepsilon_0}{100} > \frac{\varepsilon_0}{10}\). For convenience, let \(\mathcal{X} = \{x_1,\dots,x_t\}\) and \(\hat{S'} = \{x : \mathbf{B}_2(M',x)=1\}\). We have that
\[E[\hat{S'} \cap \mathcal{X}] = t \cdot \mu(M') > \frac{200m}{\varepsilon_0} \cdot \frac{\varepsilon_0}{10} = 20m\]
By Chernoff bound with a probability of at least \((1e^{\Omega(m)})\) we have that \(\hat{S'} \cap \mathcal{X} < 40m\), \[\hat{S'} \backslash \mathcal{X}=\hat{S'}\hat{S'} \cap \mathcal{X} > \hat{S'}  40m \geq \frac{\varepsilon_0 u}{10}  40m\]
Case 1. \(u=n^d\) (\(d\) is a constant). We show that \(\hat{S'} \backslash \mathcal{X} \geq 1\), so that an exhaustive search over the universal set \(U\) is efficient and guaranteed to find an element \(x^*\) in \(\hat{S'} \backslash \mathcal{X}\). Let \(c\) be a constant such that \(\varepsilon_0 > \frac{1}{n^c}\), \(\varepsilon_0 < \frac{1}{n^{c1}}\). We have \(\frac{u}{n} = n^{d1} \geq \frac{1}{\varepsilon_0} > n^{c1}\), then \(dc>1\). Moreover, \(m \leq n \log\frac{u}{n} \leq nd \log n\), thus, \[\hat{S'} \backslash \mathcal{X} \geq \frac{\varepsilon_0 u}{10}  40m \geq \frac{n^{dc}}{10}  40nd \log n > 1\]
Case 2. \(u\) is superpolynomial in \(n\). We show that the fraction of \(\hat{S'} \backslash \mathcal{X}\) is large enough so that sampling can find an \(x^*\), with only a small failure probability. Since \(\frac{\varepsilon_0}{20}\) is polynomial in \(\frac{1}{n}\) but \(\frac{40m}{u} \leq \frac{40n \log u}{u}\) is negligible, we get that \(\frac{\varepsilon_0}{20} > \frac{40m}{u}\). It follow from \(\frac{\hat{S'} \backslash \mathcal{X}}{u} > \frac{\varepsilon_0}{10}  \frac{40m}{u}\) that \(\frac{\hat{S'} \backslash \mathcal{X}}{u} > \frac{\varepsilon_0}{20}\). Thus, the probability of sampling \(x \not\in \hat{S'} \backslash \mathcal{X}\) in all \(k\) attempts is bounded by \[\left(1\frac{\varepsilon_0}{20}\right)^k = \left(1\frac{\varepsilon_0}{20}\right)^\frac{100}{\varepsilon_0} < \frac{1}{100} \]
In both cases, the probability that \(\mathsf{Attack}\) fails to find \(x^*\) and halt on Step 4 is less than \(\frac{1}{100}\).
■
Claim 10. \(\Pr[\mathbf{B}(M',x^*)=1; \mathbf{B}_2(M,x^*)=0] \leq \frac{1}{100}\)
Proof. This follows from the assumption that \(\Pr[\mathbf{B}_2(M,x) \neq \mathbf{B}_2(M',x)] \leq \frac{\varepsilon}{100}\).
■
Consider Claim 6, 7, 8 & 10 which cover all cases that \(\mathsf{Attack}\) fails, each happens only if the respective assumptions hold, so they provide an upper bound of failure probability. Taking a union bound, we have the total failure probability is at most \(\frac{4}{100}\). Thus, we have constructed a polynomialtime adversary \(\mathsf{Attack}\) such that \[\Pr[\mathsf{Challenge}_{\mathsf{Attack},\mathbf{B},t}(\lambda) = 1] > \varepsilon \geq 1\frac{4}{100}\] therefore \(\mathbf{B}\) cannot be \((n,t,\varepsilon)\)adversarial resilient, which is a contradiction implying that such adversarial resilient Bloom filters do not exist, under the assumption that oneway functions do not exist. By modus tollens, we have that if nontrivial \((n,t,\varepsilon)\)adversarial resilient Bloom filters exist, then oneway functions exist.
■
In Theorem 5 we showed that the existence of adversarial resilient Bloom filters implies that oneway functions exist. Furthermore, assume that adversarial resilient Bloom filters exist (thus oneway functions exist), it can be shown that pseudorandom permutations may be used to construct nontrivial, strongly adversarial resilient Bloom filters. We have
Proposition 11. (Construction using pseudorandom permutations)^{7} Let \(\mathbf{B}\) be an \((n,\varepsilon)\)Bloom filter using \(m\) bits of memory. If pseudorandom permutations exist, then for any security parameter \(\lambda\), there exists an \((n,\varepsilon+\mathsf{negl}(\lambda))\)strongly resilient Bloom filter that uses \(m'=m+\lambda\) bits of memory.
Unsteady representation and computationally unbounded adversary. The above discussion about Bloom filters assumes that steady representation is used, that is, the query algorithm \(\mathbf{B}_2\) is deterministic. Some implementations allow \(\mathbf{B}_2\) to change the internal representation thus querying can also be probabilistic. Further results regarding unsteady representations may be found in [5], with the ACD^{8} framework proposed in [6]. Moreover, some results are shown to hold even for computationally unbounded adversaries.
Bloom filters and secrecy. Bloom filters, like hash functions, are not designed as encryption schemes. Thus, even adversarial resilient Bloom filters may leak considerable information in an unintended way. As a concrete example, in Bitcoin lightweight SPV clients which rely on Bloom filters to store users’ Bitcoin addresses, an adversary can efficiently distinguish information about these addresses. See [7] for a discussion on this interesting scenario.
[1] B. M. Maggs and R. K. Sitaraman, “Algorithmic nuggets in content delivery,” ACM SIGCOMM Computer Communication Review, vol. 45, no. 3, pp. 52–66, 2015.
[2] R. Benjamin and E. K. Yasmine, “Attacks on bitcoin,” 2015.
[3] L. Carter, R. Floyd, J. Gill, G. Markowsky, and M. Wegman, “Exact and approximate membership testers,” in Proceedings of the tenth annual acm symposium on theory of computing, 1978, pp. 59–65.
[4] J. Katz and Y. Lindell, “Introduction to modern cryptography,” 2014.
[5] M. Naor and E. Yogev, “Bloom filters in adversarial environments,” in Annual cryptology conference, 2015, pp. 565–584.
[6] M. Naor and G. N. Rothblum, “Learning to impersonate,” in Proceedings of the 23rd international conference on machine learning, 2006, pp. 649–656.
[7] A. Gervais, S. Capkun, G. O. Karame, and D. Gruber, “On the privacy provisions of bloom filters in lightweight bitcoin clients,” in Proceedings of the 30th annual computer security applications conference, 2014, pp. 326–335.
We will only consider polynomialtime adversaries here.↩
For this reason, a yesanswer is also referred to as a maybeanswer in some texts.↩
A CDN (Content Delivery Network) is a globally distributed network of proxy servers deployed in multiple data centers, in order to serve cached content to endusers with high availability and high performance.↩
In this work, we will consider only steady representations.↩
The security parameter \(\lambda\) is implied in the initialization algorithm \(\mathbf{B}_1\), thus we denote it as \(\mathbf{B}_1(1^\lambda,S)\).↩
To strengthen our definition, we assume that the adversary also gets the set \(S\), and it is important that the adversary can hardly find any false positive even if given \(S\).↩
A proof of this can be found in [5].↩
Learning model of adaptively changing distributions (ACD).↩
(20 Nov 2016) Correction: P ≠ NP is not sufficient to imply that oneway functions exist. See P versus NP problem and oneway functions.
Intro. Starting from November, I’ll summarize my study notes on wiki into weekly blog posts. I always wanted to keep my study progress on track; I feel that it’s hard even to convince myself without visible footprints.
So here we have the first episode. (Hopefully it won’t be the last one)
Asymptotic notation is an important tool in analyzing the time/space efficiency of algorithms.
For decision problems, we now formally define the time complexity classes P and NP, and propose the hardness of NPcomplete problems, which plays an indispensable role in the study of algorithm design and modern cryptography.
Computational hardness assumption P ≠ NP. Although it is still an open proposition, many believe that P ≠ NP. Notably, if P ≠ NP holds true,
(I stand corrected: There is no such a known proof showing that P ≠ NP implies the existence of oneway functions. However, reversely, the existence of oneway functions implies that P ≠ NP. There is an informal argument given by Peter Shor on StackExchange^{1}, rephrased in the section P versus NP problem and oneway functions.)
Later we will cover the notion of security in cryptography, so there is a refresher of basic probability: (Probability is also considerably used in analyzing the behaviors of nondeterministic algorithms, hash functions, etc.)
Plan for next week:
Consider the following map: \[f : (x, r) \to s\] where \(x\) is an arbitrary bit string, \(r\) is a string of random bits, and \(s\) is an instance of a \(k\)SAT problem having \(x\) as a planted solution, while the randomness of \(r\) determines uniquely which \(k\)SAT problem to choose.
If we can invert the above function \(f\) (in polynomial time), we must already have solved the corresponding \(k\)SAT problem \(s\) with a planted solution \(x\). \(k\)SAT problems are known to be NPcomplete, and inverting such a function would be as hard as solving a \(k\)SAT problem with a planted solution, that is, inverting \(f\) at one point can be hard. Clearly, should we have a oneway function, then inverting it is guaranteed to be no easier than inverting \(f\).
So what does it mean if P ≠ NP? We know that \(k\)SAT problem is hard to solve in its worst case, so function \(f\) can be made as hard to invert as solving a \(k\)SAT problem in its worst case. However, we don’t know whether it’s possible to have a class \(\mathcal{S}\) of \(k\)SAT problems with planted solutions that are as hard as generalcase \(k\)SAT problems. If such a class \(\mathcal{S}\) exists, then given any \(s \in \mathcal{S}\), no probabilistic polynomialtime algorithm is able to get \(x\) with a nonnegligible probability, so we can conclude that \(f\) is indeed a oneway function. [1]
Problem 1.1. Does there exist a class \(\mathcal{S}\) of \(k\)SAT problems with planted solutions, such that every \(L \in \mathcal{S}\) is NPhard?
Conjecture 1.2. If \(\mathrm{P} \neq \mathrm{NP}\), then oneway functions exist.
On the other hand, assume that \(f\) is a oneway function, so that oneway functions do exist, then this implies that \(k\)SAT problem is hard to solve (in its worse case) by a polynomialtime algorithm, thus we have P ≠ NP. By modus tollens, if P = NP, then no oneway function exists. [2]
Theorem 1.3. If oneway functions exist, then \(\mathrm{P} \neq \mathrm{NP}\).
Proof. (Sketch) Let \(f : \{0,1\}^*\to\{0,1\}^*\) be a oneway function. There is a polynomialtime algorithm \(M_f\) that computes \(y=f(x)\) for all \(x\), thus, there exists a polynomialtime computable circuit that outputs \(y=f(x)\) for all \(x\).
Since \(f\) is a oneway function, that is, for every probabilistic polynomialtime algorithm \(\mathcal{A}\), there is a negligible function \(\mathsf{negl}\) such that \(\Pr[\mathsf{Invert}_{\mathcal{A},f}(n) = 1] \leq \mathsf{negl}(n)\), so we know that no \(\mathcal{A}\) can fully compute \(f^{1}(x)\) for any given \(x\). \(\mathcal{A}\) fully computes \(f^{1}\) if and only if it solves the corresponding CIRCUITSAT
problems of the circuit in all cases. Thus, there must exist some CIRCUITSAT
problems that cannot be decided by a polynomialtime algorithm, therefore, \(\mathrm{P} \neq \mathrm{NP}\).
■
Remark 1.4. If one can come up with a construction of the oneway function or a proof that such functions exist, then it holds true that \(\mathrm{P} \neq \mathrm{NP}\).
Books:
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms, 3rd ed.
M. Sipser, Introduction to the Theory of Computation, 3rd ed.
J. Katz and Y. Lindell, Introduction to Modern Cryptography, 2nd ed.
Papers:
[1] A. L. Selman, “A survey of oneway functions in complexity theory,” Mathematical Systems Theory, vol. 25, no. 3, pp. 203–221, 1992.
[2] M. Abadi, E. Allender, A. Broder, J. Feigenbaum, and L. A. Hemachandra, “On generating solved instances of computational problems,” in Proceedings on advances in cryptology, 1990, pp. 297–310.