A place to be (re)educated in Newspeak

Wednesday, May 12, 2010

Patterns of Dynamic Typechecking

Dynamic type tests are ubiquitous in programming. Regardless of whether the programming language is statically or dynamically typed, object-oriented or functional, the need to establish what sort of value you are dealing with at run time is always there.

In object oriented languages constructs like Java’s instanceOf (or C# typeOf, or Smalltalk isKindOf:) serve this need - as do casts.

Some functional languages put a large emphasis on being “completely statically typed”. In fact, these languages rely heavily on dynamic typechecking in the guise of pattern matching. Somewhere in the middle lie constructs like typecase.

In most languages, constructs for dynamic typechecking undermine data abstraction. They allow one to test for the implementation type of a value. Checking whether an object is an instance of a particular class, or pattern matching against a specific datatype constructor suffer from this problem. In contrast, testing whether a value supports an interface is harmless. Dynamically typed languages don’t support that however.

An old trick is to add methods that support these kind of type queries. Where you might, in Java, define a class A that implements an interface T, you would simply define a method isT on A, that returns true. The problem is that you then need to define a version of isT that returns false for all other types that you might possibly encounter. To be safe, you’d add isT to Object. This is monkey patching of the highest order, and readers of this blog know my view “People don’t let Primitive Primates Program”.

However, there is an out. Define the default version of doesNotUnderstand: to check if the method name is of the form isX, where X has the form of a type identifier. If so, return false. Now all you need to do is define isT for the classes that actually support the T interface. Credit to Peter Ahe for proposing this. It’s in the current Newspeak implementation.

Pattern matching is much more general than isT however; using isT is equivalent to a simple nullary pattern - except for this annoying issue with data abstraction.

Recently, we’ve seen approaches that overcome the data abstraction problem: Scala and F# introduce pattern matching mechanism that allow you to preserve data abstraction. It was Scala’s influence, combined with demand from users at Cadence, that prompted me to examine the idea of supporting pattern matching in Newspeak.

One can’t just go graft a pattern matching construct on to Newspeak however. The only operations allowed in Newspeak are message sends. If we can’t express the matching constructs via messages, they cannot go into the language. This led me to the idea of pattern literals: a special syntax for patterns, that denotes a pattern object. We already have special syntax for number objects, string objects, closure objects etc., so adding pattern objects fits in quite naturally. The nice thing about such an approach is that patterns are first class values. Patterns can be abstracted over in methods, stored in slots and data structures, serialized to disk or what have you. You can’t do that with a pattern in ML or Haskell.

So what exactly would pattern matching in Newspeak look like? Well, Felix Geller has recently completed his masters thesis at HPI Potsdam on just that question. Felix has implemented an experimental language extension supporting pattern literals and integrated it into the Newspeak IDE. The new language features should be available in the next Newspeak refresh (coming to a computer near you, summer 2010).

In the new extension (NS3) patterns are matched against objects via a protocol very similar to Scala’s extractors, thereby preserving data abstraction. However, there is no special construct for pattern matching. Instead, complex patterns can be composed out simpler ones using combinators.

Of course, a quick scan of the literature shows that the functional community got there first. Mark Tullsen proposed first class patterns with combinators for Haskell a decade ago, and there are several other papers dealing with the idea. However, having a proposal does not guarantee that a feature is in fact included in a language; in practice, none of the popular functional languages supports first class patterns.

First class patterns combined with data abstraction are a potent combination. We mean to use them as first class queries that can be sent to objects of various sorts - ordinary in-memory collections, or databases, or logic programs. This is reminiscent of LINQ, except that the queries are first class so they can be abstracted over. You can write code that takes queries as parameters, accepts queries as input from the user, serializes and deserializes queries to persistent storage (allowing you to do meta-queries) and so on. And the beauty of a language like Newspeak is that all this requires but a fraction of the machinery you need in a mainstream setting.

Felix’s thesis will soon be available as a technical report. I am reluctant to steal his thunder before that, but once the report is out, I'll put out a follow up post that illustrates how this works and why it is neat.

17 comments:

Eric said...

First-class pattern matching is indeed very potent and I think it's worth adding a link to Barry Jay's pattern calculus page here: http://www-staff.it.uts.edu.au/~cbj/patterns

chaetal said...

Functional Pattern Matching for Squeak http://map.squeak.org/package/3772b420-ba02-4fbd-ae30-8eadfc323b7b

Paul said...

Minor nitpick: in C#, the 'is' keyword is the closest equivalent of instanceOf. The typeof keyword resolves a type name into a System.Type instance and is often used as part of a runtime check, but in those cases it is accompanied by a call to .GetType(), which is the actual mechanism for determining a run-time type.

Neal Gafter said...

LINQ can abstract over queries, too, using expression trees.

Michael Greenberg said...

F#'s active patterns are a form of programmable pattern, which are as first-class as functions are in that language.

Basically, a total active pattern is a function that returns an element of a defined-on-the-fly disjoint union (a polymorphic variant); a partial active pattern is a function that returns an option. Don Syme had a paper in ICFP07.

Gilad Bracha said...

Neal: My point is that the conversion from closures to expression trees is done statically (by the compiler), the literal queries cannot be abstracted over. Yes, you can build a tree by hand etc. but that's not the same.

Thanks to everyone else for the pointers. We'll need those when we publish, and it will be interesting to compare.

思翰 said...

很棒的分享~如有打擾之處,敬請原諒!

Alex Cruise said...

Would you consider Scala's composable PartialFunctions to constitute "first-class pattern matching", or something resembling it?

scala> val div: PartialFunction[(Int,Int),Int] = {
| case (x,y) if y != 0 => x / y
| }
div: PartialFunction[(Int, Int),Int] =

scala> div.isDefinedAt((1,0))
res3: Boolean = false

scala> div.isDefinedAt((1,1))
res4: Boolean = true

scala> div(1,1)
res5: Int = 1

scala> div(1,0)
scala.MatchError: (1,0)
at $anonfun$1.apply(:5)
...

scala> val prettify: PartialFunction[(Int,Int),String] = {
| case (_,y) if y == 0 => "Sorry, you can't do that"
| }
prettify: PartialFunction[(Int, Int),String] =

scala> div orElse prettify
res7: PartialFunction[(Int, Int),Any] =

scala> res7(1,1)
res8: Any = 1

scala> res7(1,0)
res9: Any = Sorry, you can't do that

Gilad Bracha said...

Alex,

If you squint hard enough, you can see a resemblance. But I'd still say no, unless you can actually use a parameter or a variable directly anywhere you write a literal pattern (which you can do in F#, AFIK).

When I visited EPFL a couple of weeks ago, I brought up the topic of first class patterns briefly, but we were too short of time to really compare :-(

Alex Cruise said...
This comment has been removed by the author.
Alex Cruise said...

Thanks Gilad!

Oops, in my deleted comment I claimed that Scala extractors had no nominal type, and yet could be passed as parameters and stored as variables. It's apparent to me in retrospect that, due to the lack of a nominal type, they can't be used in a method that has received them as a parameter, and they can only be used as a variable if the variable has the precise type of the extractor.

Ryan said...

Smalltalk-71 had some notions of pattern matching, but I don't think a functioning implementation was ever produced:

http://gagne.homedns.org/~tgagne/contrib/EarlyHistoryST.html

Petar Tomicic said...

I was intrigued with similarity of virtual machine's msil, bytecode and assembler execution. My thoughts changed. Java reflection caouses, ofcourse, assembler's dreams about similar. Gilad Bracha and David Ungar: Mirrors: Design Principles for Meta-level Facilities of Object-Oriented Programming Languages. How mirror programming would look in MASM? Sometimes I understand programming from a different perspective, like I am not a beginner. Tomcat's source codes, or from Glassfish, JVM and compiler I hope too soon I will be able to read and learn every msil, assembler and bytecode cold. Old intention's video games designs dreams found a sw design and architecture. Abstract relations of virtual machines I want to study while they execute. Like with reversed engineering is with what with I would like to create with a programming language with a new reflection language with. Change of the programming's language ideas is making a fast chip and lots of memory found in a different conditions of a small quantities of resources. With the people joining the edge of the science, on this way, I hope that I'm. I would like to describe in which way the computer science will grow. Some people say that the difference between Java and Java Script is that one is compiled and other interpreted. They learned that on university a few years ago. I am fascinated that the assembler is being interpreted like some new script language as well, like Java Script, and example is the virtual machine of NTs. I would describe that with a language from the book that someone once told me: "You can't dance about the architecture", but Harry and Sally doesn't think so. Harry and Sally is one building designed by the same architects that designed some parts of MIT. The speed of chip and largness of memory offers "programming about programming" so it is possible becaouse of that to to find programming of some imagionary programming language with help of the scientific methods. I appologise if You find this uninteresting. Also, towards the SQL i "have feelings" becouse I am thinking about how in the data base to fit a big ammount of informations about the design patterns used inside, for example, some open source project, if it is easy to make analytic recognition of functional use of those design patterns inside the source code and inbetween (self)relations, to which it may be easy to come with steps, and elaborate and reap. When if "the mirror" of inter-pattern's relations which were used would be metamorphosable, with it would be created an interface of laboratory for experimenting with such a "changeable mirror programming language". Further, it would be interesting then, to observe execution of that same already elaborated program from inside (O.S.) of the frame of such a "spectator" that I am describing, with help, of conditionally called "sondes" putted in the O.S. for detection of dynamics as for example is apperiance and disapperiance of events "independent" from their couses, watching the rythm and goodly known consequences.

Barend Venter said...
This comment has been removed by the author.
Barend Venter said...

What about existential quantification in glasgow haskell? For example, with the data type:

data Bar = One | forall s. (Foo s) => Two s

If you try to pattern match over Bar, and expose a Two, the implementation type of the value of Two is still hidden; all you know is that it is a member of the Foo class.

Does this preserve the level of abstraction you want?

(First version had a typo).

Gilad Bracha said...

Barend,

a. Existentials aren't part of standard Haskell AFIK.
b. Consequently, I am not sure that I know enough details to give a definitive answer. But
c. That won't stop me from trying.

There are two issues here - data abstraction and abstracting over patterns themselves. I don't see how the existentials do much for the latter. For the former, this is certainly helpful. However, type classes and interfaces are not really interchangeable (see William Cook's soon-to-be-classic essay in OOPSLA 2010). Each has its own strengths and weaknesses. If I have to take one to a desert island, it would be interfaces.

Sandro Magi said...

In fact, these languages rely heavily on dynamic typechecking in the guise of pattern matching.

Pattern matching on sum types is to type checking as method dispatch is to pattern matching on sum types: each can encode the other because they share similar structure, but they are not "the same".

Pattern matching on sum types matches on values, not types, and thus closer to if-statements than it is to type checking.