I met Arnoldo Jose Muller-Molina for the first time at the second Clojure Conj where he presented a fascinating talk entitled "Hacking the Human Genome using Clojure and Similarity Search". As a speaker I enjoyed his engaging style and his grasp of and ability to clearly explain deep topics. In this installment of the
(take ...) series we discuss the power of Lisp in general and Clojure specifically, starting a company built on Clojure, and the current barriers to entry for Clojure in data sciences.
How did you discover Clojure?
I learned about Clojure in Hacker News. HN is the first site I open in the morning when I wake up. Someone linked a post written by a long-time Lisper saying many good things about Clojure so I decided to give it a try.
The following three characteristics were instrumental in my decision in adopting Clojure:
Lisp (macro system): You can write programs that write programs.
JVM ecosystem: Years of programs built for the JVM and also years of effort put into the JVM itself make a very robust and complete platform. You can fully harness the power of the JVM from Clojure.
Many data science tools (Hadoop, Cassandra, etc) are built in Java already. If you are doing big data projects , then it makes a lot of sense to employ a JVM-based language.
Are you pushing for wider adoption of Clojure on your team?
I am in a transition period right now. I am starting a data science company simMachines that will be focused on providing solutions that turn data into money/time by using similarity functions as they are easier to understand by non-technical people. I will be using Clojure for data analysis and people who join simMachines will be using Clojure too. In addition, I will be teaching a "Big Data" Clojure course from January using the book: The Joy of Clojure. I will make my students handle very large biological datasets with Clojure!
What are your plans for using Clojure in the context of your work?
Widely. There is a strong preference towards Perl, Python, Matlab and R. The Java language is not that popular I would say. The fact that Java itself hasn't penetrated is already something that could hinder Clojure's adoption. Perhaps we need to further work on Incanter's plotting facilities so that we reach the customization capabilities of GGPlot.
People in life sciences are very visual, and the way of communicating complex ideas is with very elaborate illustrations, graphs and plots. People could actually chose a language based only on this, because at the end, it helps a lot to explain complex ideas.
What are the problems that Clojure is suited to helping solve in your field?
Clojure is exceedingly good at handling data. You can parse complex raw files in a very declarative way. You can then extract useful bits of your stream, create records and then pass them on to an arbitrarily complex pipeline of transformations. In OO languages, you need to write Iterators to traverse large files, whereas in Clojure you declaratively concatenate functions that efficiently handle your data with lazy evaluation. With Clojure, you can focus on the problem
and forget about error prone tasks like iterators,
while loop counters, etc. After you have processed data with Clojure, you will not want to go back to your previous programming language.