The sophia language interpreter is build around my previous experience building a markdown to html compiler. I made use of the visitor design pattern in this implementation and made use of it in the sophia interpreter. The sophia interpreter is inspired by my compiler design university course.
The interpretation is split up in the following three abstract steps:
Step | Description |
---|---|
Lexical analyisis | Converts the character input to a stream of token with meanings |
Syntactical analyisis | Converts the tokenstream to an abstract syntax tree, includes semantic analyisis |
Evaluation | Walks the tree and evaluates each node |
The following chapters will explain each step with an example.
The first step for interpreting a given sophia expression is to scan the characters for token. For the sake of simplicity we will use the most basic expression i can think of:
(println "hello world")
The above statement can be roughly translated into the following list of token:
Token type | Line index | Line | Raw value |
---|---|---|---|
LEFT_BRACE |
0 | 0 | ( |
IDENT |
1 | 0 | println |
STRING |
7 | 0 | hello world |
RIGHT_BRACE |
18 | 0 | ) |
The lexer specifically distinguishes between strings, operators, keywords, booleans, identifiers and floating point integers.
This list is passed to the next interpretation step - the abstract syntax tree creation.
The parser checks if the expression starts with a token of type LEFT_BRACE
,
if it does the parser proceeds to check for an operator or a keyword, such as
put
or +-*/%
or if
. If the token matches the parser checks if the
following token are valid arguments for the previously matched operator. If
so the last step for an expression to be valid is the RIGHT_BRACE
token.
With our example:
LEFT_BRACE -> valid expression start, proceed
println -> valid function, proceed
"hello world" -> valid argument for operator, proceed
RIGHT_BRACE -> valid expression end, return ast
The resulting abstract syntax tree of our before defined expression expressed using JSON would look like:
{
"println": {
"children": [
{
"string": {
"value": "hello world"
}
}
]
}
}
As said before the evaluation step is realised using the visitor pattern, which enables the evaluation of each node. Consider the following expression definition for a sophia string:
// core/expr/str.go
package expr
import "sophia/core/token"
type String struct {
Token token.Token
}
func (s *String) GetToken() token.Token {
return s.Token
}
func (s *String) Eval() any {
return s.Token.Raw
}
By attaching the GetToken
and the Eval
methods to the String
structure,
the Node
structure is implemented:
// core/expr/node.go
package expr
import "sophia/core/token"
type Node interface {
GetToken() token.Token
Eval() any
}
Therefore a string can be accepted as a value for all other expressions which
accept structures of type Node
as values for their attributes.
Some Nodes accept children of all types, other operations, such as arithmetics
should and can not accept arguments of all types. To fix this the expr
package contains the expr.castPanicIfNotType
function, which does exactly
what its name implies - it panics if the given argument is not of the generic
type specified. It’s used extensively in functions such as expr.Add.Eval()
:
// core/expr/add.go
package expr
import "sophia/core/token"
type Add struct {
Token token.Token
Children []Node
}
func (a *Add) GetToken() token.Token {
return a.Token
}
func (a *Add) Eval() any {
if len(a.Children) == 0 {
return 0.0
}
res := 0.0
for i, c := range a.Children {
if i == 0 {
res = castPanicIfNotType[float64](c.Eval(), a.Token)
} else {
res += castPanicIfNotType[float64](c.Eval(), a.Token)
}
}
return res
}
The first children is the starting point to add all other children onto. The
expr.castPanicIfNotType
function is used to check if the return value of
evaluating an expression can be added onto an other value. The second argument
is used for indicating the operation which caused the runtime panic in the
error message.
Error handling is a big topic for the developer experience. The error handling of the three different interpreter steps is different, therefore each stage has its own following chapter.
If the lexer is invoked with an empty input it exits and returns the following message:
$ echo " " | sophia
error: Unexpected end of file
at: /home/teo/stdin:1:1:
1|
| ^
Source possibly empty
Syntax errors found, skipping remaining interpreter stages. (parsing and evaluation)
If the lexer encounters an unknown character it tries to scan the resulting input to provide the user with all possible errors:
$ sophia -exp '(= ? "unclosed string )'
error: Unknown character
at: cli:1:2:
1| (= ? "unclosed string )
| ^
Unexpected "="
error: Unknown character
at: cli:1:4:
1| (= ? "unclosed string )
| ^
Unexpected "?"
error: Unterminated string
at: cli:1:6:
1| (= ? "unclosed string )
| ^^^^^^^^^^^^^^^^^^
Consider closing the string via "
Syntax errors found, skipping remaining interpreter stages. (parsing and evaluation)
Providing the lexer with input containing more than one line, the lexer shows context for the currently found error:
$ cat err.phia
;; this input contains two errors
(println " ?)
(? "test")
$ sophia err.phia
error: Unterminated string
at: /home/teo/err.phia:2:7:
1| ;; this input contains two errors
2| (println " ?)
| ^^^^
3| (? "test")
4|
Consider closing the string via "
error: Unknown character
at: /home/teo/err.phia:3:2:
1| ;; this input contains two errors
2| (println " ?)
3| (? "test")
| ^
4|
Unexpected "?"
Syntax errors found, skipping remaining interpreter stages. (parsing and evaluation)
Before creating the ast the interpreter checks if the expression matches our
languages grammar, for instance in the aforementioned expression we could’ve
put a second keyword after put
, which would not match our language grammar:
(println put)
$ sophia -exp '(println put)'
error: Undefined variable
at: cli:1:10:
1| (println put)
| ^^^
Variable "put" is not defined.
Trying to define a function without an identifier as its name results in the following error:
$ sophia -exp '(fun (_) (println))'
error: Type error
at: cli:1:7:
1| (fun (_) (println))
| ^
Expected the first argument for function definition to be of type IDENT, got "PARAM".
Semantic errors found, skipping remaining interpreter stages. (evaluation)
Omitting the parameters out, results in:
$ sophia -exp '(fun square (* n n))'
error: Type error
at: cli:1:14:
1| (fun square (* n n))
| ^
Expected the second argument for function definition to be of type PARAM, got "MUL".
Semantic errors found, skipping remaining interpreter stages. (evaluation)
Using an undefined variable or function:
$ sophia -exp '(println a)'
rror: Undefined variable
at: cli:1:6:
1| (println a)
| ^
Variable "a" is not defined.
$ sophia -exp '(square 12)'
error: Undefined function
at: cli:1:2:
1| (square 12)
| ^^^^^^
Function "square" not defined
Iterating of anything which is not a container:
$ sophia -exp '(for (_ i) "test" (println "iteration" i))'
error: Invalid iterator
at: cli:1:13:
1| (for (_ i) "test" (println "iteration" i))
| ^^^^
expected container or upper bound for iteration, got: string
Trying to use a string in addition:
$ sophia -exp '(+ "test" 1)'
error: Type error
at: cli:1:2:
1| (+ "test" 1)
| ^
Incompatiable types string and float64
Sophia contains a massive amount of error handling and edge cases, the above are the most prominent cases.