← back home

Simulating Time in Square-Root Space

Below is a slightly modified version of my Final Report from CS254B at Stanford. If you have the chance to take the CS254 series I totally recommend - It's incredibly well taught, and highlights a lot of very beautiful (and recent!) results from complexity theory.

Introduction

Hopcroft, Paul, and Valiant’s result (Hopcroft, Paul, and Valiant 1977) showed that any time-\(t(n)\) Turing machine can be solved in space \(O(\frac{t(n)}{\log t(n)})\). This result was achieved by splitting computation into smaller blocks, only keeping track of block content which was strictly necessary in the computation. But beyond pebbling strategies, HPV remained state of the art for 50 years.

The \(\mathsf{TREE-EVALUATION}\) problem introduced by Cook et al. (S. A. Cook et al. 2010) has been of great interest as a problem potentially separating \(\mathrm{P}\) from \(\mathrm{L}\). The polynomial time algorithm is quite clear, but it is an open question to whether there exists a log-space algorithm. The Problem statement is as follows: Suppose we have a tree of height at most \(h\) with each interior node having between 2 and \(d\) children. Each of these interior node describes a function from the children to the node, more concreteley for an interior node with \(t\) children, the function will take \(tb\) bits to \(b\) bits. We assume every leaf has variables with bit length \(b\).

Cook and Mertz initially showed that \(\mathsf{TREE-EVALUATION}\) can be solved in \(O(\log^2 n / \log \log n)\) space. In their 2024 paper, Cook and Mertz improve this result to \(O(\log n \cdot \log \log n)\). (J. Cook and Mertz 2024) They do this using register programs, saving values as node values are computed recursively.

The tree evaluation problem turns out to be very powerful. With its newly designed space-efficient algorithm, Williams found a reduction to show that any \(t(n)\) Turing machine can be converted to an instance of the tree evaluation problem. Through this reduction, Williams was able to improve on Hopcroft-Paul-Valiant’s result of simulating generic \(t(n)\) time TMs. Formally, Cook and Mertz showed:

Theorem 1 (Cook-Mertz 2024). \(\mathsf{TREE-EVALUATION}\) on trees with bit length \(b\), \(d\) children, and max height \(h\) can be computed in \(O(d \cdot b + h \log (d \cdot b))\) space.

Cook-Mertz’s algorithm uses catalytic space: a small set of polynomial registers is loaded with intermediate computations, where each node value is encoded as a polynomial evaluation. The properties of roots of unity in fields are used to essentially use occupied space by applying invertible operation on the filled registers.

Reducing to square root space simulation

Now we show Williams’s clever reduction that allows simulation in square root space. The main result is as follows:

Theorem 2 (Williams 2025 (Williams 2025)). For every function \(t(n)\) such that \(t(n) \geq n\), \[\begin{aligned} \mathrm{TIME}(t(n)) \subseteq \mathrm{SPACE}(\sqrt{t(n)\log(t(n))})\end{aligned}\] [wil25]

We introduce a few definitions. Williams’s result utilizes the notion of block-respecting Turing Machines introduced by Hopcroft, Paul, and Valiant in 1975:

Definition 3 (Block-respecting TMs (Hopcroft, Paul, and Valiant 1977)). We say that a time-\(t(n)\) Turing machine \(M'\) with block length \(b(n)\) is block-respecting if for all inputs of length \(n\), the computation can be partitioned into \(O(t(n)/b(n))\) time blocks and each tape can be partitioned into \(O(t(n)/b(n))\) contiguous tape blocks of length \(b(n)\) such that each tape head remains within a single tape block for the entire time block.

A key lemma in Hopcroft, Paul, and Valiant’s proof is that any TM can be made block-respecting with constant overhead. We will use this result as well, and given any TM \(M\), we will refer to the block-respecting modification as \(M'\). This new TM has \(p = \ell + 1\) when \(M\) has \(\ell\) tapes, and has \(B = O(t(n) / b(n))\) tape and time blocks.

Computation Graphs

Similar to HPV, we are interested in using computation graphs to explicitly model which blocks are needed at any given time. We modify the computation graph from HPV to fit the \(\mathsf{TREE-EVALUATION}\) problem. We refer to the computation graph as \(G_{M',x}\) for input \(x\). It has the set of nodes \[\begin{aligned} S = \{ (h, i), (h, 0, i) : h \in [p], i \in [B] \}\end{aligned}\] Each \((h, i)\) corresponds to the \(i\)th tape block of tape \(h\), with \((h, 0, i)\) corresponding to the block at time 0, i.e. the initial configuration. Each node has a label \(m_{(h, i)}\) which is given by: \[\begin{aligned} m_{(h, i)} = \begin{cases} 1 & \text{if head $h$ moves one block right after time block $i$} \\ -1 & \text{if head $h$ moves one block left after time block $i$} \\ 0 & \text{head $h$ stays in the same tape block after time block $i$} \end{cases}\end{aligned}\] And the edges between nodes are defined as follows. For a pair \((h', i), (h, j)\), there is a edge from \((h', i)\) to \((h, j)\) if either \(j = i + 1\) or if while \(M'\) is running time block \(j\), the tape block accessed by \(h'\) during time block \(i\) is not accessed again until time block \(j\). This results in a fan-in degree of at most \(2p\).

A naive encoding of the graph where we store edges (\(\log B\) bits each) requires \(O(B \log B)\) bits, resulting from the \(B\) nodes. However, we can exploit the structure of the computation graph to reduce this to \(O(B)\) bits. Let block(\(h, i) \in [B]\) be the index of the tape block of tape \(h\) at time block \(i\). We have that block(\(h, 1)\) is 1 since the tape starts at block 1. For \(i > 1\), we have that \[\begin{aligned} \text{block}(h, i) = 1 + \sum_{j = 1}^{i - 1} m_{(h, j)}\end{aligned}\] That is, with access to \(m_{(h, i)}\) we are able to recover the relation between block numbers. Since the existence of an edge is completely determined by the values of block\((h, i)\) and block\((h', j)\), we can thus recover the computation graph in \(O(B)\) bits from the \(B\) values of block\((h, i)\). That is, We are able to compute this block value in \(O (\log t(n))\) additional space, since we iterate over the \(m_{(h, i)}\) values, and \(block(h, i)\) is at most \(B = O(t(n))\). Since checking if \(j = i + 1\) is also in \(O(\log t(n))\) space, checking the existence of an edge is also in \(O(\log t(n))\) space.

Node values

Now we will consider the value that each node takes.

Definition 4. For \((h, 0, i)\), we say content(\(h, 0, i\)) is blank when \(h > 1\) and is the content on the \(i\)th tape block of the first tape, subset of the input when \(h = 1\). For \((h, i)\), we say content(\(h, i\)) is simply the content of the tape block of tape \(h\) at time block \(i\). With a reasonable encoding, we can encode each content in \(O(b(n))\) bits.

But the key question still remains: how do we actually compute our graph \(G_{M', x}\)? We have no guarantee on the movement of tape heads for arbitrary inputs of length \(n\).

The \(\mathsf{TREE-EVALUATION}\) problem allows us to approach the search through a brute force guess. We assume \(t(n)\) is not explicitly constructable, but this is actually fine. We start with our guess of \(t(n)\) of \(n\). For our choice of \(b(n)\), we have that \(B = O(t(n) / b(n))\). We know that given our \(t(n)\) guess what size the graph must be, so we can construct potential graph encodings \(G'\). For every guess of \(G'\) we check consistency with \(G_{M', x}\) through functions in a \(\mathsf{TREE-EVALUATION}\) instance \(R_{G'}\):

If every possible \(G'\) fails for a given \(t(n)\) guess, then we simply increment by one and then try again. Eventually \(t(n)\) will be correct and the algorithm will terminate when the consistency checks all pass.

Constructing the Tree Evaluation Instance \(R_{G'}\)

Given a potential encoding \(G'\) of the computation graph, we describe how to build the \(\mathsf{TREE-EVALUATION}\) instance \(R_{G'}\), along with the functions we utilize at each non-leaf node. The tree is implicit in that we never fully store \(R_{G'}\) in memory, but are able to traverse it on demand.

The encoding \(G'\) describes a DAG on the node set \(S\). To produce a tree, we unroll \(G'\) by tracing all root-to-leaf paths:

This unrolling is exponential in the worst case: a DAG of height \(h\) and fan-in \(d\) unrolls into a tree of up to \(d^h\) leaves. The resulting tree has up to \(2^{O(t/b)}\) nodes, but this is acceptable since we never store the tree in memory.

Each internal tree-node \(v\) corresponding to \((h, j) \in S\) is labeled with a function \[F_v : \{0, 1\}^{d \cdot b(n)} \;\to\; \{0, 1\}^{b(n)} \cup \{FAIL\}\] that takes the contents of \(v\)’s children (each a \(b(n)\)-bit string) and produces either the new content content\((h, j)\) or the special value FAIL. Concretely, \(F_v\) operates as follows:

  1. Read inputs. The children of \(v\) correspond to in-neighbors of \((h, j)\) in \(G'\). By construction, these provide:

  2. Check input consistency. If any child’s content is FAIL, immediately output FAIL.

  3. Simulate \(b(n)\) steps of \(M'\). Starting from the state and head positions at the end of the previous time block, run \(M'\) for \(b(n)\) steps. This is how we validate the consistency of \(G'\) with \(G_{M', x}\).

  4. Verify head movement matches \(m_{(h, j)}\). During the simulation, track which tape block head \(h\) ends up in. Compare against the prediction encoded in \(G'\): \[\text{actual movement of } h = m_{(h, j)} \;?\] If not, output FAIL.

  5. Output. If all checks pass, output content(h, j): the state of \(M'\) after these \(b(n)\) steps, the new head position of tape \(h\), and the (possibly updated) contents of tape \(h\)’s active block during this time step.

The function \(F_v\) uses \(O(b(n))\) time and \(O(b(n))\) space to compute its output, since simulating \(b(n)\) steps of a multitape TM and managing \(O(1)\) tape blocks each of size \(b(n)\) both fit in this budget.

Leaf functions

At a leaf corresponding to source node \((h', 0, k)\), the function is trivial: output content\((h', 0, k)\).

Leaf functions use \(O(b(n) + \log n)\) space; the \(\log n\) is for indexing into the input tape.

Why FAIL works

The FAIL value is a designated string disjoint from valid \(b(n)\)-bit contents. Since every internal \(F_v\) immediately outputs FAIL whenever any child outputs FAIL (step 2 above), a single inconsistency at any tree-node makes the entire root evaluation output FAIL. Combined with the head-movement check at step 4, this gives the two-case soundness/completeness:

Space complexity

Summarizing, \(R_{G'}\) is a \(\mathsf{TREE-EVALUATION}\) instance with: \[\text{height } h = B + 1 = O(t(n)/b(n)), \quad \text{fan-in } d \leq 2p = O(1), \quad \text{bit-length } b = O(b(n)).\] The Cook–Mertz algorithm from Theorem Theorem 1 evaluates such instances in space \[O\!\left(d \cdot b + h \log(d \cdot b)\right) = O\!\left(b(n) + \frac{t(n)}{b(n)} \log b(n)\right).\] Adding the \(O(t(n)/b(n))\) bits for storing the current \(G'\) encoding and the \(O(\log t(n))\) bits for the outer \(T\) counter, the total space of the simulation is \[S(n) = O\!\left(b(n) + \frac{t(n)}{b(n)} \log b(n)\right).\] Setting \(b(n) = \Theta(\sqrt{t(n) \log t(n)})\) gives us our desired result \[S(n) = O\!\left(\sqrt{t(n) \log t(n)}\right),\] completing the proof of Theorem [wil25].

Conclusion

The \(\mathsf{TREE-EVALUATION}\) problem is a problem which was first raised as a problem to seperate \(P\) from \(L\). Significant progress has been made towards improving the space complexity of \(\mathsf{TREE-EVALUATION}\), but it’s still an open problem if \(\mathsf{TREE-EVALUATION}\) is in \(\mathrm{L}\). In fact,

Corollary 5. If \(\mathsf{TREE-EVALUATION}\) is in \(\mathrm{L}\), then \(\mathrm{TIME}(t(n)) \subseteq \mathrm{SPACE}(\sqrt{t(n)}).\)

Williams notes that if his reduction to the \(\mathsf{TREE-EVALUATION}\) problem was discovered before Cook and Mertz’s 2024 result, then there would have been a barrier adopted within the research community. Hopcroft-Paul-Valiant has been a long-standing result, and Williams believes that the community would have believed that \(\mathsf{TREE-EVALUATION}\) could not be improved in space complexity.

There are some more open problems raised by Williams. Can this proccedure be applied recursively to show that \(\mathrm{TIME}(t) \subseteq \mathrm{SPACE}(t^\varepsilon)\) for all \(\varepsilon > 0\)? This is not obvious from the structure of the \(\mathsf{TREE-EVALUATION}\) problem, as Cook-Mertz requires computing low degree extension. Showing this inclusion would imply \(\mathrm{P}\neq \mathrm{PSPACE}\).

References

Cook, James, and Ian Mertz. 2024. “Tree Evaluation Is in Space o (Log n · Log Log n).” In Proceedings of the 56th Annual ACM Symposium on Theory of Computing, 1268–78. STOC 2024. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3618260.3649664.
Cook, Stephen A., Pierre McKenzie, Dustin Wehr, Mark Braverman, and Rahul Santhanam. 2010. “Pebbles and Branching Programs for Tree Evaluation.” CoRR abs/1005.2642. http://arxiv.org/abs/1005.2642.
Hopcroft, John, Wolfgang Paul, and Leslie Valiant. 1977. “On Time Versus Space.” J. ACM 24 (2): 332–37. https://doi.org/10.1145/322003.322015.
Williams, R. Ryan. 2025. “Simulating Time with Square-Root Space.” https://arxiv.org/abs/2502.17779.