housecarpenter: A drawing of a woodlouse. (woodlouse)
I made a proper blog post for the first time in over a year! Thus ends my longest-ever blogging hiatus (at least since 2014 when I properly started blogging; I did make a couple of posts back in 2011 but never got into it).

Here's the link:

Reversing a Linked List in Place

housecarpenter: A drawing of Bob Dylan singing into a microphone with white make-up on his face. (Default)
I fixed a bug today which was mildly interesting. It was caused by a local variable in a function having the same name as an unrelated imported module which was referenced earlier in the function. So the code was of this shape:

import x

...

def f():
    x.g()

    ...

    x = a


When g was called, an error occurred with the message "local variable x referenced before assignment". Now when I was debugging this, all I saw at first was the x.g() line (since that's the line the error occurred at), and I scrolled up, trying to find where the x variable had been defined, eventually finding that x was the name of the module. This left me rather confused, since I couldn't imagine how the module name x was getting interpreted as the name of a local variable when it hadn't been reassigned at any point earlier in the code.

Of course, the answer is that the later reassignment was responsible. Python (like most programming languages, I believe) allocates local variables on the call stack, so it has to know what local variables there are as it calls the function, before it executes the function body. So it effectively does an initial pass over the body looking for assignments, and notices the x = a line and rebinds the name x to a local variable, before it executes the x.g() line.

Now I see why C89 requires you to put all variable declarations at the start of a function! OK, I don't know if this was ever the actual rationale behind that restriction, but it does seem like a good reason for it. The C89 restriction makes it clear that local variable names have to be valid across the whole body of a function, not just from the line on which they are declared onwards.
housecarpenter: A drawing of Bob Dylan singing into a microphone with white make-up on his face. (Default)
Realized today that you can write the parse function from my earlier post in a much simpler way using iterators and recursion:

Expr = t.Union[str, t.Tuple['Expr']]

def parse(tokens: t.Iterable[str]) -> t.Iterator[Expr]:
    """Parse an iterable of Lisp tokens.

    Unmatched parentheses will be ignored.

    >>> tuple(parse(lex('a (b (c d))')))
    ('a', ('b', ('c', 'd')))
    >>> tuple(parse(lex('((a b) c (d e))')))
    ((('a', 'b'), 'c', ('d', 'e')),)
    >>> tuple(parse(lex('(a (b)')))
    (('a', ('b',)),)
    >>> tuple(parse(lex('a)(')))
    ('a',)
    """
    for token in tokens:
        if token == ')':
            return
        elif token == '(':
            yield tuple(parse(tokens))
        else:
            yield token
housecarpenter: A drawing of Bob Dylan singing into a microphone with white make-up on his face. (Default)
Some day, I would like to install a new tool at work without literally every step of the process going wrong somehow.
housecarpenter: A drawing of Bob Dylan singing into a microphone with white make-up on his face. (Default)
I'm proud of this pair of little S-expression-parsing functions (nonrecursive!):

from typing import Iterable, Iterator, List, Union
import string

def lex(code: str) -> Iterator[str]:
    """Lexes a string of Lisp code, returning an iterator over its tokens.

    >>> list(lex(''))
    []
    >>> list(lex('()ab cde'))
    ['(', ')', 'ab', 'cde']
    >>> list(lex('ab ( (b ))'))
    ['ab', '(', '(', 'b', ')', ')']
    """
    token: List[str] = []

    for c in code:
        if c == '(':
            if token:
                yield ''.join(token)
                token.clear()
            yield c
        elif c == ')':
            if token:
                yield ''.join(token)
                token.clear()
            yield c
        elif c in string.whitespace:
            if token:
                yield ''.join(token)
                token.clear()
        else:
            token.append(c)

    if token:
        yield ''.join(token)

class UnmatchedParenthesis(Exception):
    pass

Expression = Union[str, List['Expression']]

def parse(tokens: Iterable[str]) -> List[Expression]:
    """Parses an iterable of Lisp tokens, returning an abstract syntax tree.

    >>> parse(lex(''))
    []
    >>> parse(lex('a ((b) (c d) (e)) f'))
    ['a', [['b'], ['c', 'd'], ['e']], 'f']
    >>> parse(lex('(a)'))
    [['a']]
    >>> parse(lex('('))
    Traceback (most recent call last):
        ...
    UnmatchedParenthesis: 1 unmatched opening parenthesis
    >>> parse(lex('(((a)'))
    Traceback (most recent call last):
        ...
    UnmatchedParenthesis: 2 unmatched opening parentheses
    >>> parse(lex(')'))
    Traceback (most recent call last):
        ...
    UnmatchedParenthesis: unmatched closing parenthesis
    """
    tree: List[Expression] = []
    parent_nodes: List[List[Expression]] = [tree]

    for token in tokens:
        if token == '(':
            subtree: List[Expression] = []
            parent_nodes[-1].append(subtree)
            parent_nodes.append(subtree)
        elif token == ')':
            if len(parent_nodes) <= 1:
                raise UnmatchedParenthesis('unmatched closing parenthesis')
            parent_nodes.pop()
        else:
            parent_nodes[-1].append(token)
 
    if len(parent_nodes) > 1:
        unmatched_count: int = len(parent_nodes) - 1
        ending: str = (
            'is' if unmatched_count == 1
            else 'es'
        )
        raise UnmatchedParenthesis(
            f'{unmatched_count} unmatched opening parenthes{ending}'
        )

    return tree

if __name__ == '__main__':
    import doctest
    doctest.testmod()

Profile

housecarpenter: A drawing of Bob Dylan singing into a microphone with white make-up on his face. (Default)
housecarpenter

April 2019

S M T W T F S
 123456
78910111213
14151617181920
21222324252627
2829 30    

Syndicate

RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Dec. 15th, 2025 02:24 am
Powered by Dreamwidth Studios