forked from blynn/compiler
-
-
Notifications
You must be signed in to change notification settings - Fork 0
/
type.lhs
275 lines (207 loc) · 11 KB
/
type.lhs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
= Types and Typeclasses =
Types enable humans and computers to reason about our programs. The halting
problem is not a problem, because types can prove code must terminate. They can
do even more and prove a program terminates with the right answer: for example,
we can prove a given function correctly sorts a list.
Types can automate programming, namely,
https://www.youtube.com/watch?reload=9&v=mOtKD7ml0NU[a human supplies the type
of a desired function and the computer programs itself].
Types can be lightweight. Indeed, they can be invisible. A compiler can
use 'type inference' to type-check a program completely free of any type
annotations. However, it's good to throw a few type annotations in the source,
as they are a form of documentation that is especially reliable because the
compiler ensures they stay in sync with the code.
Therefore, we ought to add types to our language. We mimic Haskell, with at
least one deliberate difference:
https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tldi10-vytiniotis.pdf[let is not generalized]. We only generalize top-level definitions.
== Typically ==
We shamelessly lift code from https://web.cecs.pdx.edu/~mpj/thih/thih.pdf[Mark
P. Jones, 'Typing Haskell in Haskell']. Our version is simpler because we lack
support for mutual recursion and pattern matching.
Since we're using the Scott encoding, from a data type declaration:
------------------------------------------------------------------------
data Adt a b c = Foo a | Bar | Baz b c
------------------------------------------------------------------------
we generate types for the data constructors:
------------------------------------------------------------------------
("Foo", a -> Adt a b c)
("Bar", Adt a b c)
("Baz", b -> c -> Adt a b c)
------------------------------------------------------------------------
Along with:
------------------------------------------------------------------------
("|Foo|Bar|Baz", Adt a b c -> (a -> x) -> x -> (b -> c -> x) -> x)
------------------------------------------------------------------------
which represents the type of `case` in:
------------------------------------------------------------------------
case x of
Foo a -> f a
Bar -> g
Baz b c -> h b c
------------------------------------------------------------------------
The `case` keyword is replaced with the identity combinator
during compilation:
------------------------------------------------------------------------
I x (\a -> f a) (\ -> g) (\b c -> h b c)
------------------------------------------------------------------------
Our type checker is missing several features, such as kind checking and
rejecting duplicate definitions.
++++++++++
<script>
function hideshow(s) {
var x = document.getElementById(s);
if (x.style.display === "none") {
x.style.display = "block";
} else {
x.style.display = "none";
}
}
</script>
<p><a onclick='hideshow("typically");'>▶ Toggle Source</a></p>
<div id='typically' style='display:none'>
++++++++++
------------------------------------------------------------------------
include::typically.hs[]
------------------------------------------------------------------------
++++++++++
</div>
++++++++++
== Classy ==
In the worst case, types are a burden, and force us to wrestle with the
compiler. We twist our code this way and that, and add eye-watering type
annotations until it finally compiles.
In contrast, well-designed types do more with less. Haskell's type system not
only enables easy type inference, but also enables 'typeclasses', a syntax
sugar for principled overloading. By bestowing Prolog-like powers to the type
checker, the compiler can predictably generate tedious code so humans can
ignore irrelevant details.
Again, 'Typing Haskell in Haskell' provides some background. Since we're
generating code as well as checking types, we also need techniques described in
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.53.3952&rep=rep1&type=pdf[John
Peterson and Mark Jones, 'Implementing Type Classes'].
We choose the dictionary approach. A dictionary is a record of functions that
is implicitly passed around. For example, if we infer the function `foo` has
type:
------------------------------------------------------------------------
foo :: Eq a => Show a => a -> String
------------------------------------------------------------------------
then we may imagine our compiler turning fat arrows into thin arrows:
------------------------------------------------------------------------
foo :: Eq a -> Show a -> a -> String
------------------------------------------------------------------------
Our compiler then seeks dictionaries that fit the two new arguments of types
`Eq a` and `Show a`, and inserts them into the syntax tree.
With this in mind, we modify the type inference functions so they return a
syntax tree along with its type. Most of the time, they just return the input
syntax tree unchanged, but if type constraints are inferred, then we create a
`Proof` node for each constraint, and apply the syntax tree to these new nodes.
In our example, if `t` is the syntax tree of `foo`, then our type inference
function would change it to `A (A t (Proof "Eq a")) (Proof "Show a")`.
Here, we're using strings to represent constraints for legibiity; in reality,
we have a dedicated data type to hold constraints, though later on, we
do in fact turn them into strings when generating variable names.
We call such a node a `Proof` because it's a cute short word, and we think of a
dictionary as proof that a certain constraint is satisfied. Peterson and Jones
instead write "Placeholder".
Typeclass methods are included in the above. For example, while processing
the expression:
------------------------------------------------------------------------
(==) (2+2) 5
------------------------------------------------------------------------
we infer that `(==)` has type `Eq a => a -> a -> Bool`, so we modify the
syntax tree to:
------------------------------------------------------------------------
(select-==) (Proof "Eq a") (2+2) 5
------------------------------------------------------------------------
After type unification, we learn `a` is `Int`:
------------------------------------------------------------------------
(select-==) (Proof "Eq Int") (2+2) 5
------------------------------------------------------------------------
The next phase constructs records of functions to be used as proofs. We
loosely follow 'Typing Haskell in Haskell' once more, and search for instances
that match a given constraint. A matching instance may create more constraints.
We walk through how our compiler finds a proof for:
------------------------------------------------------------------------
Proof "Eq [[Int]]"
------------------------------------------------------------------------
Our compiler finds an instance match: `Eq a => Eq [a]`, so it rewrites the
above as:
------------------------------------------------------------------------
(V "Eq [a]") (Proof "Eq [Int]")
------------------------------------------------------------------------
The "Eq [a]" string is taken verbatim from an instance declaration, while the
"Eq [Int]" is the result of a type substitution found during unification
on "Eq a".
Our compiler recursively seeks an instance match for the new `Proof`. Again it
finds `Eq a => Eq [a]`, so the next rewrite yields:
------------------------------------------------------------------------
(V "Eq [a]") ((V "Eq [a]") (Proof "Eq Int"))
------------------------------------------------------------------------
and again it recursively looks for an instance match. It finds the `Eq Int`
instance, and we have:
------------------------------------------------------------------------
(V "Eq [a]") ((V "Eq [a]") (V "Eq Int"))
------------------------------------------------------------------------
This works, because our compiler has previously processed all class and
instance declarations, and has prepared the symbol table to map "Eq Int"
to a record of functions for integer equality testing, and "Eq [a]" to a
function that takes a "Eq a" and returns a record of functions for equality
testing on lists of type `a`.
Overloading complicates our handling of recursion. For example,
each occurrence of `(==)` in:
------------------------------------------------------------------------
instance Eq (Int, String) where (x, y) == (z, w) = x == z && y == w
------------------------------------------------------------------------
refers to a distinct function, so introducing the `Y` combinator here is
incorrect. We should only look for recursion after type inference expands
them to `select-== Dict-Eq-Int` and `select-== Dict-Eq-String`, and we
look for recursion at the level of dictionaries.
Among the many deficiencies in our compiler: we lack support for class
contexts; our code allows instances to stomp over one another; our algorithm
for finding proofs may not terminate.
Without garbage collection, this compiler requires unreasonable amounts of
memory.
++++++++++
<p><a onclick='hideshow("classy");'>▶ Toggle Source</a></p>
<div id='classy' style='display:none'>
++++++++++
------------------------------------------------------------------------
include::classy.hs[]
------------------------------------------------------------------------
++++++++++
</div>
++++++++++
== Barely ==
Our compilers are becoming noticeably slower. The main culprit is the type
unification algorithm we copied from the paper. It is elegant and simple, but
also grossly inefficient. The pain is exacerbated by long string constants,
which expand to a chain of cons calls before type checking.
Fortunately, we can easily defer expansion so it takes place after type
checking. This alleviates enough of the suffering that we'll leave improving
unification for another time.
This change is a good opportunity to tidy up. In particular, we eliminate the
murky handling of the `R` data constructor: before type checking, it
represented integer constants, while after type checking, its field was to be
passed verbatim to the next phase. Now, data types clear up the picture.
We immediately take advantage of the neater code and add a few rewrite rules
to cut down the number of combinators.
To make this incarnation of our compiler more substantial, we knuckle down
and implement a `Map` based on
https://pdfs.semanticscholar.org/0284/0ad5ab57dd13e388800094e52a3069df23cd.pdf[Adams
trees]. Again, we mostly copy code from a paper. The running time improves
slightly after replacing the association list used in the last compilation
step with `Map`.
We choose the BB-2.5 variant, based on the benchmarks in the paper, though
it is troubling that `Data.Map` chose BB-3 trees.
We also move the assembler from C to Haskell, that is, our compiler now outputs
bare machine code rather than ION assembly.
++++++++++
<p><a onclick='hideshow("barely");'>▶ Toggle Source</a></p>
<div id='barely' style='display:none'>
++++++++++
------------------------------------------------------------------------
include::barely.hs[]
------------------------------------------------------------------------
++++++++++
</div>
++++++++++