It seems appropriate in this, my last installment in the (take...) interview series, to go back to the beginning and catch up with one of my early interviewees. Close to two years ago I interviewed a high-school Clojurian by the name of Anthony Grimes -- my favorite of the series. I wanted to catch up with Anthony again to see what life held for him since graduation, and possibly to get a glimpse into what the future holds.
What have you been working on since I last interviewed you?
Oh wow. Lots of stuff!
One significant project is https://www.refheap.com. It's a pastebin site written in Clojure. It uses the wonderful Pygments syntax highlighter for generating syntax highlighted code and aims to be on par with the feature offering of Gist while not using git for storing pastes and sporting a (hopefully) slightly more pleasant interface. There is still a ways to go, of course. We want to support Markdown (among other markup languages) rendering, revisions, diffs of revisions, etc. There is plenty to do and issues for almost all of it, so if anybody reading this is feeling frisky and opensourcy, take a look at them.
To go with refheap, I wrote https://github.com/Raynes/refheap.el for pasting from Emacs, https://github.com/Raynes/refheap.vim for pasting from Vim (uses Ruby), and a little Ruby gem for pasting from the command line: https://github.com/Raynes/refh. And, of course, API client libraries for refheap in Ruby and Clojure with more to come.
Of my projects, I admire refheap in particular because it can attract a wide variety of contributors. For example, William Manire has been contributing significantly to the Javascript portion of the site. As a side effect, he has been teaching me Javascript!
Some other stuff...
There is also Tentacles, my Github API client lib. Lots of people seem to like it. At least one person (hi Phil!) uses it so he doesn't have to use Github's upload file interface.
Also, Conch, a little library for shelling it. It is useful because, while clojure.java.shell is a very useful and well-written library, it (purposely) covers up all of the Java process API. Sometimes you don't want that though, and Conch exists for those times.
Finally, I've been trying to help out with Noir as much as possible. Mostly just reviewing and pulling pull requests and keeping a steady stream of beta releases going. Chris is really busy these days, so he needs all the help he can get with non-Light Table projects.
I think that's enough horn tooting. Check out my Github page if I'm just so irresistible that you must see all my projects.
How is the book coming?
Slowly. I've had a lot on my plate at work and haven't had a lot of time to work on it, unfortunately. I hope to be able to pick things up soon.
Can you explain Flatland for the uninitiated?
It's pretty simple. I work at Geni and a lot of our code is open source. We just adore modularity and we very often recognize patterns and split things out into standalone libraries. Flatland is our open source home. Things like Jiraph, our graph database, Useful, our utility library, and protobuf, our Google Protobuf library and accompanying lein plugin, among other things, are put there. You might note that lazybot, clojail, and irclj are all there as well. We decided to put those there because 1) we use them in our company IRC channels and 2) Alan Malloy and I both work on them and maintain so it makes sense for us both to have administrator access to the projects.
tl;dr: It's Geni's open source Github organization.
What was going through your mind when presenting to ~300 people at the Conj?
It's hard to recall. At first, it was something between "Just do it. Breath. Now talk. Clear your throat. Breath. Talk." and absolutely nothing at all. I actually anticipated being unable to talk and put lots of notes in my slides so that when I got nervous during the talk I could mostly just read them and avoid having to think.
About 5 minutes in it was all fine though. Once everybody had laughed or chuckled at something I said, my hair slicked back and I grew sunglasses and knew I would be okay.
My favorite two moments of the entire conference (not taking away from the talks at all, this is entirely personal) was how, after my talk, Rich told me I was great in person and said I was awesome on twitter. Easily comparable to teenage girls getting a Justin Bieber autograph. Would totally do it again.
Will we see you at the next Conj?
With bells on. I might even submit a talk again. I'd do it under one condition: at some point in my talk, Michael Fogus must come to the stage with a top hat and cane and dance for precisely 45 seconds.
postscript
It's been a wild ride trying to track down interesting Clojure programmers over the past two years, but all good things must come to and end. From the beginning of the (take...) series I've tried to highlight all types of programmers. While it's definitely been wonderful talking to luminaries such as Rich Hickey, William Byrd, Daniel Spiewak and Martin Odersky, I've always tried to highlight the people using Clojure at work or during their free time to create, teach and inform. Undoubtably the computing industry as a whole would be much the poorer without the likes of Hickey and Odersky, but it's community that has allowed them to make a sustained impact. One of Clojure's greatest strengths is its community, and I feel an overwhelming joy to know that its members include amazing people like David Edgar Liebke, Phil Hagelberg, David Nolen, Brenton Ashworth, George Jahad, Justin Balthrop, Carin Meier, Ambrose Bonnaire-Sergeant, Sam Aaron, Arnoldo Jose Muller-Molina, Kevin Lynagh, Baishampayan Ghose, Colin Jones and Jim Crossley. Thanks to them, and all of the other brilliant Clojurians whom I was not fortunate enough to interview, for making the Clojure community one of the language's most awesome innovations.
I'm off to practice my dance steps.
:F
Jim Crossley and his team at Red Hat are attempting to create the programming language polyglot's dream. Their latest creation Immutant is billed as a Clojure application server built on JBoss, but as you'll discover below is aiming to integrate numerous Java.next languages. Immutant is an exciting project and looks destined to become something special.
What is the "elevator pitch" for Immutant?
Immutant is an application server for Clojure built atop JBoss AS7.
Our primary goal is a robust, scalable deployment platform for Clojure applications, with built-in services like messaging, caching, distributed transactions and clustering. We aim to encapsulate the accidental complexity associated with most non-trivial applications and allow the developer to focus on the essential complexity of his problem domain.
Immutant is an integrated stack of services. Instead of provisioning your servers with Jetty for web, RabbitMQ for messaging, and Memcached for caching, for example, you could simply install Immutant.
We currently have projects to support JRuby (TorqueBox) and Clojure (Immutant) apps, and we're looking at exposing the AS7 services to Scala, Python, and JavaScript as well. All of these projects can be overlaid on each other, creating an app server that provides services for multiple languages at once. This allows you to leverage the libraries and skills appropriate to any particular aspect of your application, regardless of the language, e.g. Rails used in the front-end web tier and perhaps Incanter used for back-end report generation.
Were there any barriers to using Clojure in the implementation of Immutant?
Quite the contrary, Clojure has enabled us to remove much of the boilerplate typically associated with JEE services, so we've been able to encapsulate the complex JEE API's and configuration within a small set of Clojure functions.
Clojure also inspired us to support more dynamic applications. It's not necessary to declare all the services required by your app in some sort of "deployment descriptor" for Immutant; those decisions can be deferred until runtime. We also let you expose the application running within Immutant via either Swank or nREPL, enabling a more interactive development experience than what folks typically expect from a JEE container.
So far, the biggest challenge has been working around Clojure's assumption that there is only one Clojure runtime per JVM - we wanted each application deployed within an Immutant to have its own runtime, and even its own Clojure version. To work around this, each application gets its own isolated ClassLoader and we do some classloader munging to give each application its own runtime.
We're aided in this by JBoss AS7's modular classloading system - it does a great job of isolating an application's dependencies from those of other applications or Immutant itself.
Why do you think that there are some who are mistrustful of "Clojure in the enterprise"?
Answering this question requires making broad assumptions about the fears and motivations of others, not to mention an agreed-upon definition of a pretty overloaded term... but what the hell! :)
Seriously, I don't think good developers are mistrustful of Clojure, though some may be justifiably leery of bloated application servers that are traditionally associated with enterprises. JBoss AS7 is not your father's app server: though obviously bigger than any single component of a home-grown stack, it's fast and light enough to not make you "stabby" while coding on your laptop.
But developers are usually only one minority amongst the decision-making stakeholders in an enterprise...
How can Immutant alleviate those doubts?
Oddly, the integrated, java-app-server-ness aspect of Immutant that gives some developers pause is the very thing that IT managers and sys-admins love. Immutant offers a great opportunity to "get Clojure in the enterprise" because it can deploy those Java legacy apps right alongside the newer Clojure/JRuby/etc ones, with minimal impact to the devops guys.
But truthfully, we don't really care about that. We're looking for ways to make development and deployment simple. And by simple, I mean no more complicated than what's required to address the problems within a specific domain.
And though it's true that forces acting on an enterprise often complect applications in ways that don't affect individuals or small teams, we think simpler development and deployment benefits everyone, whether you develop apps for/in an enterprise or not.
What is the state of Clojure in the broader Red Hat corporation?
Well, when we started Immutant almost a year ago, it was pretty much just me and Toby. ;)
We're part of a research team at Red Hat led by our "Director of Polyglot"[1], Bob McWhirter. We're fortunate to have recently added Charlie Nutter and Tom Enebo (the JRuby guys) to our team, and both are very interested in alternative JVM languages (including Clojure) and eager to apply their skills to our JVM-related projects, including TorqueBox, Immutant and OpenJDK, among others.
Red Hat is a very large company of very smart engineers on very long leashes, so I have little doubt others are using Clojure internally. And we have a solid mandate from our leadership to expose the excellent JBoss services through as many JVM-based languages as makes sense.
Do you think that there is a synergy between JRuby and Clojure in the enterprise?
Absolutely! And not just in the enterprise. The proliferation of web frameworks and tools for Ruby make it a great language for developing web UI's and/or services, whereas to this point Clojure has mostly been used to create some pretty incredible data analytics applications.
It's rare to find individual developers with enough expertise in both JRuby and Clojure to effectively combine frameworks and libraries from each. But we believe they're out there, and as the benefits of polyglot programming become more publicized, their numbers will increase. But in the meantime, we expect larger organizations to form teams of experts from both camps, each producing loosely-coupled applications that coordinate to solve domain-specific problems.
We hope Immutant will appeal to both individuals and teams by allowing those seemingly disparate applications to be easily developed and deployed to a single platform.
[1] Best job title ever!
Colin Jones is a software craftsman at 8th Light and Clojure veteran. His contributions to the Clojure ecosystem are far and wide, but tend toward helping newcomers to discover the language and provide tools for getting started as quickly as possible. In this installment of the (take ...) series Colin discusses koans, TDD, REPL-ing, and the Clojure Conj.
What are koans in the context of programming?
The basic idea is that they're exercises to help you learn a particular language or other technology. There's a Zen metaphor that ties things together, but the main thing is to have small, bite-sized chunks to guide folks in their learning without being overwhelming. I first heard about the idea in the EdgeCase Ruby Koans, and they in turn credit other sources (including The Little Lisper! with inspiration. So when Aaron Bedra started the Functional Koans project, I was excited to pitch in, and before I knew it, we had a bunch of people having fun working through the Clojure Koans, as their first exposure to Clojure.
Is there a place for TDD in the Clojure ecosystem?
Of course! I think Brian Marick has shown that most publicly with his library Midje, and related screencasts and conference talks, but there are a ton of other great testing tools out there as well: Speclj by my boss, Micah Martin, LazyTest, Expectations, clojure.test, etc.
I have heard TDD and deep thinking put into stark contrast with each other at times, but the people I respect in the TDD world certainly do a lot of deep thinking about simple design, so I think we sometimes get a bit of a false dichotomy.
I personally find a lot of benefits to unit testing - it's a way of making my thinking about problems concrete, and when I'm done, there's an automated system looking for assumptions that have become untrue.
What itch are you attempting to scratch with your latest project REPL-y?
OK, the 5-second version is that I want it to be easier to use a Clojure REPL, unix-style, at the shell. Projects like Russ Olsen's dejour were working towards similar goals, which I found exciting, so I got in touch with Russ to pitch in. As it happened, a lot of my ideas for improvement could be implemented orthogonally to dejour: handling Ctrl-C interruptions without bailing out of the process, basic code completion, in-REPL examples (via clojuredocs), and a truer readline-like interface (because let's face it: JLine 0.9.94 is broken). Hence REPL-y was born.
Down the line a bit, with some encouragement and help from Phil Hagelberg and the Leiningen team, it's now the default REPL frontend for the upcoming Leiningen 2, which is now in preview (nREPL is the backend). So at this point, REPL-y is wiring up of a bunch of great projects: Clojure, JLine2, clojure-complete, nREPL, and clojuredocs-client. There's a shell script to run REPL-y alone, but of course I expect the version in Leiningen to see a lot more use.
The project lives at https://github.com/trptcolin/reply (varying pronunciations are both acceptable and hilarious).
How can Clojure improve, in general?
Language-wise, I don't have too much to say. I know a lot of folks struggle with the various ways of pulling code in from other namespaces, so maybe there are ways to make those easier to use. My main interests have mostly been in helping people get over hurdles - which is of course a cause shared by a lot of people on the IRC channel and beyond. So, needless to say, I was very impressed by the fact that the planning for Clojure 1.4 included error messages, documentation, even potentially a command-line launcher app.
Tools-wise, I think things are moving in the right direction. Of course I mentioned the REPL stuff and Leiningen, and there are dozens of great new things coming out all the time.
I'd love to see clojure.org become a bit friendlier as an introduction
to Clojure, from the very first steps. I wonder if it'd be possible to have something like the Java Tutorials that's maintained
in an "official" way? There's so much incredibly good information on clojure.org - I refer to it often - but it can be intimidating, and things get out of date from time to time. I also think function findability can be hard in the clojure.core namespace from a REPL if you don't know what you're looking for, like if you want an operation on a particular data structure but don't remember the function name, so maybe a similar function to find-doc could do some nice grouping, like what's laid out on http://clojure.org/data_structures.
First impressions matter, and if we can hook people right at the start without compromising our principles, I think that's a really big win for the language and the community.
You've been to both Clojure conferences; was this years' offering different than the first?
I was pretty bummed about the lack of Clojure temporary tattoos this year, but you can't win them all.
Seriously, though, this conference is amazing. Certainly there were differences: last year's seemed a bit more language-focused, and this year's seemed more product- and library- focused. That's a pretty broad and unscientific impression, but suffice to say, both were awesome.
Some of the bonuses this year: an additional day of talks, bunches of informal sessions that sprung up in the conference space, and lots of great restaurants within walking distance. The talks, like last year, were fantastic. I suspect we'll see even more of these great stories about the things people are building as Clojure gets used in production more and more.
Plus, Overtone? Bad ass.
Last time, I blogged about Clojure's new reducers library. This time I'd like to look at the details of what constitutes a reducer, as well as some background about the library.
What's a Reducing Function?
The reducers library is built around transforming reducing functions. A reducing function is simply a binary function, akin to the one you might pass to reduce. While the two arguments might be treated symmetrically by the function, there is an implied semantic that distinguishes the arguments: the first argument is a result or accumulator that is being built up by the reduction, while the second is some new input value from the source being reduced. While reduce works from the 'left', that is neither a property nor promise of the reducing function, but one of reduce itself. So we'll say simply that a reducing fn has the shape:
(f result input) -> new-result
In addition, a reducing fn may be called with no args, and should then return an identity value for its operation.
Transforming Reducing Functions
A function that transforms a reducing fn simply takes one, and returns another one:
(xf reducing-fn) -> reducing-fn
Many of the core collection operations can be expressed in terms of such a transformation. Imagine if we were to define the cores of map, filter and mapcat in this way:
(defn mapping [f]
(fn [f1]
(fn [result input]
(f1 result (f input)))))
(defn filtering [pred]
(fn [f1]
(fn [result input]
(if (pred input)
(f1 result input)
result))))
(defn mapcatting [f]
(fn [f1]
(fn [result input]
(reduce f1 result (f input)))))
There are a few things to note:
- The functions consist only of the core logic of their operations
- That logic does not include any notion of collection, nor order
- filtering and kin can 'skip' inputs by simply returning the incoming result
- mapcatting and kin can produce more than one result per input by simply operating on result more than once
Using these directly is somewhat odd, because we are operating on the reducing operation rather than the collection:
(reduce + 0 (map inc [1 2 3 4]))
;;becomes
(reduce ((mapping inc) +) 0 [1 2 3 4])
Reducers
We expect map/filter etc to take and return logical collections. The premise of the reducers library is that the minimum definition of collection is something that is reducible. reduce ends up using a protocol (CollReduce) to ask the collection to reduce itself, so we can make reducible things by extending that protocol. Thus, given a collection and a reducing function transformer like those above, we can make a reducible with a function like this:
(defn reducer
([coll xf]
(reify
clojure.core.protocols/CollReduce
(coll-reduce [_ f1 init]
(clojure.core.protocols/coll-reduce coll (xf f1) init)))))
Now:
(reduce + 0 (map inc [1 2 3 4]))
;;becomes
(reduce + 0 (reducer [1 2 3 4] (mapping inc)))
That's better. It feels as if we have transformed the collection itself. Note:
- reducer ultimately asks the source collection to reduce itself
- reducer will work with any reducing function transformer
Another objective of the library is to support reducer-based code with the same shape as our current seq-based code. Getting there is easy:
(defn rmap [f coll]
(reducer coll (mapping f)))
(defn rfilter [pred coll]
(reducer coll (filtering pred)))
(defn rmapcat [f coll]
(reducer coll (mapcatting f)))
(reduce + 0 (rmap inc [1 2 3 4]))
;=> 14
(reduce + 0 (rfilter even? [1 2 3 4]))
;=> 6
(reduce + 0 (rmapcat range [1 2 3 4 5]))
;=> 20
From Reducible to (Parallel) Foldable
While it is an interesting exercise to find another fundamental way to define the core collection operations, the end result is not much different, just faster, certainly something a state-of-the-art compilation and type system (had we one) might do for us given sequence code. To stop here would be to completely miss the point of the library. These operations have different, fundamentally simpler semantics than their sequence-based counterparts.
How does one define parallel mapping/filtering/mapcatting etc? We already did! As long as the transformation itself doesn't care about order (e.g. as take does), then a reducer is as foldable as its source. As with reduce, fold bottoms out on a protocol (CollFold), and our reducer can extend that:
(defn folder
([coll xf]
(reify
;;extend CollReduce as before
CollFold
(coll-fold [_ n combinef reducef]
(coll-fold coll n combinef (xf reducef))))))
Note that:
- folder has the same requirements as reducer - collection + reducing function transformer
- when fold is applied to something that can't fold, it devolves to reduce
Thus the real definitions of reducers/map et al use folder (while take uses reducer):
(defn rmap [f coll]
(folder coll (mapping f)))
(defn rfilter [pred coll]
(folder coll (filtering pred)))
(defn rmapcat [f coll]
(folder coll (mapcatting f)))
Thus a wide variety of collection transformations can instead be expressed as reducing function transformations, and applied in both sequential and parallel contexts, across a wide variety of data structures.
The library deals with several other details, such as:
- the transformers all need a nullary arity that just delegates to the transformed reducing function
- the transformers support a ternary arity where 2 inputs are supplied per step, as occurs with reduce-kv and map sources
- all of the reducers are curried
These additions are all mechanical, and are handled by macros. It is my hope that the above will help illuminate the core logic underlying the library.
Background
Much prior work highlights the value of fold as a primary mechanism for collection manipulation, superior to iteration, although most of that work was done in the context of recursively defined functions on lists or sequences - i.e. fold implies foldl/foldr, and the results remain inherently sequential.
The two primary motivators for this library were the Haskell Iteratee library and Guy Steele's ICFP '09 talk.
Haskell Iteratees
The Haskell Enumerator/Iteratee library and its antecedents are an inspiring effort to disentangle the source of data and the operations that might apply to it, and one of the first I think to reify the role of the 'iteratee'. An enumerator makes successive calls to the iteratee to supply it items, decoupling the iteratee from the data source. But the iteratee is still driving in some sense, as it is in charge of signaling Done, and, it returns on each step the next iteratee to use, effectively dictating a single thread of control. One benefit is that even operations like take can be defined functionally, as they can encode their internal state in the 'next' iteratee returned. OTOH, and unlike reducers, the design wraps the result being built up in a new iteratee each step, with potential allocation overhead.
Being an automaton in a state, an iteratee is like a reified left fold, and thus inherently serial. So, while they form quite a nice substrate for the design of, e.g. parsers, iteratees are unsuitable for defining things like map/filter etc if one intends to be able to parallelize them.
Guy Steele's ICFP '09 talk
Organizing Functional Code for Parallel Execution or, foldl and foldr Considered Slightly Harmful
This talk boils down to - stop programming with streams, lists, generators etc if you intend to exploit parallelism, as does the reducers library.
Where reducers diverges from that talk is in the structure of the fork/join parallel computation. Rather than map+reduce, reducers uses reduce+combine. This reflects 2 considerations:
- It is accepted fork/join practice that at some point you stop splitting in half and handle the leaves 'sequentially'
- if the best way to do that at the top is reduce, why not at the bottom as well?
- map forces a result per input
You can see the awkwardness of the latter in the map/reduce-oriented definition of parallel filter in the talk, which must 'listify' items or return empty lists, creating a bunch of concatenation busy-work for the reducing step. Many other collection algorithms suffer similarly in their map/reduce-oriented implementations, having greater internal complexity and wrapping the results in collection representations, with corresponding creation of more garbage and reduction busy-work etc vs the reducing function transformer versions of same.
It is interesting that the accumulator style is not completely absent from the reducers design, in fact it is important to the characteristics just described. What has been abandoned are the single initial value and serial execution promises of foldl/r.
Summary
I hope this makes reducers easier to understand, use and define.
Rich
I'm happy to have pushed today the beginnings of a new Clojure library for higher-order manipulation of collections, based upon reduce and fold. Of course, Clojure already has Lisp's reduce, which corresponds to the traditional foldl of functional programming. reduce is based upon sequences, as are many of the core functions of Clojure, like map, filter etc. So, what could be better? It's a long story, so I'll give you the ending first:
- There is a new namespace: clojure.core.reducers
- It contains new versions of map, filter etc based upon transforming reducing functions - reducers
- It contains a new function, fold, which is a parallel reduce+combine
- fold uses fork/join when working with (the existing!) Clojure vectors and maps
- Your new parallel code has exactly the same shape as your existing seq-based code
- The reducers are composable
- Reducer implementations are primarily functional - no iterators
- The model uses regular data structures, not 'parallel collections' or other OO malarkey
- It's fast, and can become faster still
- This is work-in-progress
Basics
The story starts best at the bottom.
Clojure and other functional languages have a function called map that takes a function and a collection/list.
- What does it mean to map a function on a collection?
- What are the common signatures?
- Do they complect what to do with how to do it?
The classic recursive functional definition of map is to apply f to the first thing in the collection, then cons the result onto the result of mapping f on the rest of the collection. This definition includes plenty of 'how':
- How: mechanism - recursion
- How: order - sequentially
- How: laziness - (often) lazily
- How: representation - making a list/seq, or other concrete collection
Newer OO frameworks will often remove some of these problems by having map be a function of fn * Coll -> Coll for any type of Coll, removing the sequentiality but also losing the laziness, and they still specify a concrete collection result.
Semantically, and minimally, map means "apply-to-all" e.g. (map inc coll) means give me a (logical) collection where every item is one greater than it was in coll. But, map doesn't know how to navigate around every collection - the use of seqs/lists/iterators/streams etc forces a shared known representation. Nor does inc (or any function) know how to apply itself to every collection representation, else we could just say (inc coll).
The only thing that knows how to apply a function to a collection is the collection itself.
What is the generic gateway to a collection applying things to itself? In Clojure, it is (internal) reduce.
We now have a new super-generalized and minimal abstraction for collections - a collection is some set of things that, when given a function to apply to its contents, can do so and give you the result, i.e. a collection is (at minimum) reducible. In other words, you can call reduce on it.
Thus, core.reducers/map is a function of fn * reducible -> reducible. (Whereas core/map is a fn of fn * seqable -> seqable)
Now, how? If someone is going to ask the result of (map inc coll) to reduce itself with some function f, map must ultimately ask coll to do the job. Rather than pass coll f, map passes coll a new, transformed, reducing function that takes what coll supplies, calls inc on it, and then calls f on that.
(reduce + (r/map inc [1 2 3])) === (reduce (fn [ret x] (+ ret (inc x))) (+) [1 2 3])
i.e. the core work of map f looks like this:
(fn [f1]
(fn [ret v]
(f1 ret (f v))))
It takes a reducing function f1, and returns a new reducing function that calls f1 after applying f to its input.
Thus you can define map as a function of fn * reducible -> reducible by merely transforming the reducing function. Mapping is semantically a function of the function of one step of a reduction. This transformation is decomplected from both representation and order. We call functions such as this map, that take a reducible, and in turn return something reducible via transformation of the reducing function, reducers.
Now let's revisit the hows above...
- How: mechanism - functional transformation of reducing function
- How: order - doesn't know
- How: laziness - doesn't know
- How: representation - doesn't build anything
It is important to note that now, when (map f coll) is called nothing happens except the creation of a recipe for a new collection, a recipe that is itself reducible. No work is done yet to the contained elements and no concrete collection is produced.
The beautiful thing is that this 'transformation of reducing function' mechanism also works for many of the traditional seq functions, like filter, take, flatten etc. Note the fact that filter is (potentially) contractive, and flatten is (potentially) expansive per step - the mechanism is general and not limited to 1:1 transformations. And other reducer definitions are as pretty as map's - none of the imperativeness of iterators, or generators with yield.
Ok, So Where's My Cake?
If map doesn't do the work of mapping, but merely creates a recipe, when does the work get done? When you reduce its result:
(require '[clojure.core.reducers :as r])
(reduce + (r/filter even? (r/map inc [1 1 1 2])))
;=> 6
That should look familiar - it's the same named functions, applied in the same order, with the same arguments, producing the same result as the Clojure's seq-based fns. The difference is that, reduce being eager, and these reducers fns being out of the seq game, there's no per-step allocation overhead, so it's faster. Laziness is great when you need it, but when you don't you shouldn't have to pay for it.
The reducer fns are curried, and they can be easily composed:
;;red is a reducer awaiting a collection
(def red (comp (r/filter even?) (r/map inc)))
(reduce + (red [1 1 1 2]))
;=> 6
Thus reduction 'recipes' (reducers) are first class.
What if we want a collection result? It's good to know that into uses reduce:
(into [] (r/filter even? (r/map inc [1 1 1 2])))
;=> [2 2 2]
Note there are no intermediate collections produced.
And, of course, you don't always want a result of the same collection type:
(into #{} (r/filter even? (r/map inc [1 1 1 2])))
;=> #{2}
Simplicity is Opportunity
Decomplecting the core operations from representation and laziness has given us some speed, but what about the elimination of order? It should open the door to parallelism, but we are stuck with the semantics of reduce being foldl, i.e. it uses an accumulator and is fundamentally serial. We can parallelize reduction by using independent sub-reductions and combining their results, and the library defines a function that does just that: fold.
The primary signature of fold takes a combining function, a reducing function, and a collection and returns the result of combining the results of reducing subsegments of the collection, potentially in parallel. Obviously if the work is to occur in parallel, the functions must be associative, but they need not be commutative - fold preserves order. Note that there is no initial 'seed' or 'accumulator' value, as there may be with reduce and foldl. But, since the subsegments are themselves reduced (with reduce), it raises the question as to what supplies the seed values for those reductions?
The combining function (an associative binary fn) must have some 'identity' value, a value that, when combined with some X, yields X. 0 is an identity value for +, as is 1 for *. The combining fn must supply an identity value when called with no arguments (as do + and *). It will be called with no arguments to supply a seed for each leaf reduction. There is a fn (called monoid, shh!) to help you build such combining functions.
If no combining fn is supplied, the reducing fn is used. Simple folds look like reduces:
(r/fold + [1 2 3 4])
;=> 10
But by promising less (i.e. not promising stepwise reduction from left or right) fold can do more - run in parallel. It does this when the collection is amenable to parallel subdivision. Ideal candidates are data structures built from trees. Clojure vectors and maps are trees, and have parallel implementations of fold based upon the ForkJoin framework.
What if the underlying collection is not amenable (e.g. is a sequence)? fold just devolves into reduce, producing the same semantic, if not physical, result.
There's a tremendous amount you can accomplish with this reduce+combine strategy, especially when you consider that the map, filter etc reducers will not constitute independent layers of parallel jobs - they just transform the reducing fn working on the leaves.
You can have a look at the cat function included in the library for an interesting example of a combining fn. cat quickly gathers up the fold results, forming a binary tree with the reductions as leaves. It returns a highly abstract, yet now quite useful 'collection' that is just counted, reducible, foldable and seqable.
Oh yeah, perf. Don't be surprised to see things become 2-3X faster, or more with more cores.
More Opportunity (i.e. Work)
As much fun as this is, there's still more fun to be had by those so inclined:
- There are more seq fns that could become reducer fns
- Given multiple iterable sources, we should be able to build a multi-reducible, recovering the multi-input capabilities of map.
- Arrays, arraylists, strings etc are all amenable to parallel fold.
- fork/join-based vector fold is 14 lines, so these are not difficult.
- Those IFn.LLL, DDD etc primitive-taking function interfaces can now spring to life.
- We should be able to build primitive-transmitting reducer function pipelines.
- We'd then need to look for and use them in the reductions of arrays and vectors of primitives
- Internal reduce solves the lazily dangling open resource problem, a problem solved similarly by Haskell's enumerators and iteratees. (Note that unlike iteratees, reducers do not allocate wrappers per step)
- We need reducible I/O sources.
Summary
By adopting an alternative view of collections as reducible, rather than seqable things, we can get a complementary set of fundamental operations that tradeoff laziness for parallelism, while retaining the same high-level, functional programming model. Because the two models retain the same shape, we can easily choose whichever is appropriate for the task at hand.
Follow Up
See the follow up blog post for more details about what constitutes a reducer, as well as some background about the library.
Rich

