Writing a JSON Parser

May 18, 2020   

Notes on Writing a simple JSON Parser: https://notes.eatonphil.com/writing-a-simple-json-parser.html

  • JSON is pretty easy to parse

Parsing is often broken up into two stages

  • lexical analysis
  • syntactic analysis

Lexical Analysis breaks source input into the simplest decomposable elements (tokens)

Syntactic Analysis is often called parsing, receives the list of tokens and tries to find patterns in them.

Lexical Analysis

  • input string is broken into tokens
  • comments and whitespace are often discarded
  • A simple lexical analyzer might iterate over all the characters in an input string non-recursively

Syntactic Anlysis

  • iterate over a one-dimensional list of tokens and match groups of tokens up to pieces of the language according to the defination of the langauge

Call lex - return tokens Call paser on the tokens

  • A key difference between this lexer and parser is that the lexer returns a 1D array of tokens. Parser are often defined recursively and returns a recursive tree like object. Since this is JSON NOT a language this parser just returns the needed datastructures

A JSON parser - iterate over the tokens received after a call to lex and try to match the tokens to objects, lists, or plan values

Parsers are often defined recursively and return a recursive, tree-like object. Since JSON is a data serialization format and not a language the parser should produce objects in Python rather then a syntax tree.