housecarpenter

The difference between key-value containers and functions

I made a proper blog post for the first time in over a year! Thus ends my longest-ever blogging hiatus (at least since 2014 when I properly started blogging; I did make a couple of posts back in 2011 but never got into it).

Here's the link:

Reversing a Linked List in Place

I fixed a bug today which was mildly interesting. It was caused by a local variable in a function having the same name as an unrelated imported module which was referenced earlier in the function. So the code was of this shape:

import x

...

def f():
    x.g()

    ...

    x = a

When g was called, an error occurred with the message "local variable x referenced before assignment". Now when I was debugging this, all I saw at first was the x.g() line (since that's the line the error occurred at), and I scrolled up, trying to find where the x variable had been defined, eventually finding that x was the name of the module. This left me rather confused, since I couldn't imagine how the module name x was getting interpreted as the name of a local variable when it hadn't been reassigned at any point earlier in the code.

Of course, the answer is that the later reassignment was responsible. Python (like most programming languages, I believe) allocates local variables on the call stack, so it has to know what local variables there are as it calls the function, before it executes the function body. So it effectively does an initial pass over the body looking for assignments, and notices the x = a line and rebinds the name x to a local variable, before it executes the x.g() line.

Now I see why C89 requires you to put all variable declarations at the start of a function! OK, I don't know if this was ever the actual rationale behind that restriction, but it does seem like a good reason for it. The C89 restriction makes it clear that local variable names have to be valid across the whole body of a function, not just from the line on which they are declared onwards.

Realized today that you can write the parse function from my earlier post in a much simpler way using iterators and recursion:

Expr = t.Union[str, t.Tuple['Expr']]

def parse(tokens: t.Iterable[str]) -> t.Iterator[Expr]:
    """Parse an iterable of Lisp tokens.

    Unmatched parentheses will be ignored.

    >>> tuple(parse(lex('a (b (c d))')))
    ('a', ('b', ('c', 'd')))
    >>> tuple(parse(lex('((a b) c (d e))')))
    ((('a', 'b'), 'c', ('d', 'e')),)
    >>> tuple(parse(lex('(a (b)')))
    (('a', ('b',)),)
    >>> tuple(parse(lex('a)(')))
    ('a',)
    """
    for token in tokens:
        if token == ')':
            return
        elif token == '(':
            yield tuple(parse(tokens))
        else:
            yield token

Some day, I would like to install a new tool at work without literally every step of the process going wrong somehow.

I'm proud of this pair of little S-expression-parsing functions (nonrecursive!):

from typing import Iterable, Iterator, List, Union
import string

def lex(code: str) -> Iterator[str]:
    """Lexes a string of Lisp code, returning an iterator over its tokens.

    >>> list(lex(''))
    []
    >>> list(lex('()ab cde'))
    ['(', ')', 'ab', 'cde']
    >>> list(lex('ab ( (b ))'))
    ['ab', '(', '(', 'b', ')', ')']
    """
    token: List[str] = []

    for c in code:
        if c == '(':
            if token:
                yield ''.join(token)
                token.clear()
            yield c
        elif c == ')':
            if token:
                yield ''.join(token)
                token.clear()
            yield c
        elif c in string.whitespace:
            if token:
                yield ''.join(token)
                token.clear()
        else:
            token.append(c)

    if token:
        yield ''.join(token)

class UnmatchedParenthesis(Exception):
    pass

Expression = Union[str, List['Expression']]

def parse(tokens: Iterable[str]) -> List[Expression]:
    """Parses an iterable of Lisp tokens, returning an abstract syntax tree.

    >>> parse(lex(''))
    []
    >>> parse(lex('a ((b) (c d) (e)) f'))
    ['a', [['b'], ['c', 'd'], ['e']], 'f']
    >>> parse(lex('(a)'))
    [['a']]
    >>> parse(lex('('))
    Traceback (most recent call last):
        ...
    UnmatchedParenthesis: 1 unmatched opening parenthesis
    >>> parse(lex('(((a)'))
    Traceback (most recent call last):
        ...
    UnmatchedParenthesis: 2 unmatched opening parentheses
    >>> parse(lex(')'))
    Traceback (most recent call last):
        ...
    UnmatchedParenthesis: unmatched closing parenthesis
    """
    tree: List[Expression] = []
    parent_nodes: List[List[Expression]] = [tree]

    for token in tokens:
        if token == '(':
            subtree: List[Expression] = []
            parent_nodes[-1].append(subtree)
            parent_nodes.append(subtree)
        elif token == ')':
            if len(parent_nodes) <= 1:
                raise UnmatchedParenthesis('unmatched closing parenthesis')
            parent_nodes.pop()
        else:
            parent_nodes[-1].append(token)
 
    if len(parent_nodes) > 1:
        unmatched_count: int = len(parent_nodes) - 1
        ending: str = (
            'is' if unmatched_count == 1
            else 'es'
        )
        raise UnmatchedParenthesis(
            f'{unmatched_count} unmatched opening parenthes{ending}'
        )

    return tree

if __name__ == '__main__':
    import doctest
    doctest.testmod()

S	M	T	W	T	F	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

Entries tagged with programming

(no subject)

Reversing a linked list in place

(no subject)

(no subject)

(no subject)

S-expression parsing code snippet

Profile

April 2019

Syndicate

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags