Interview: Eigen Developers on 2.0 Release

Recently Eigen 2.0 was released. You might already have heard about Eigen, it is a small but very high performance maths library which has its roots in KDE. Below, the two core developers are interviewed about it.

OK, let's start out with the basics. Could you introduce yourselves?

Benoit: I'm a Mathematics postdoc at the University of Toronto, coming from France. I'm working onC*-algebras, and teaching linear algebra. In addition to Eigen, I contribute to a few other free software projects, mostly within KDE. In the past I also contributed to Avogadro and a little bit to Open Babel.

Gael: I'm a French researcher in Computer Graphics currently working at INRIA Bordeaux. In particular, my research interests include real-time rendering and surface representations. My major contribution in the open-source world is by far what I did in Eigen, but I also contributed a bit to vcglib and MeshLab.

What is Eigen?

Benoit and Gael: Eigen is a free C++ template maths library mainly focused on vectors, matrices, and linear algebra. It is a self-contained library covering a very broad range of use cases. For example, it covers both dense and sparse objects, and in the dense case, it covers both fixed-size and dynamic-size objects. Moreover it provides linear algebra algorithms, a geometry framework, etc. It has a very nice API for C++ programmers, and it embraces very high performance.

What drove you to create Eigen and Eigen 2?

Benoit: Eigen 1 was a very small project, 2500 LOC and just a few months of development. Its creation in 2006 was driven by the then-simple needs of some KDE and KOffice apps. Although these needs were simple, they were already very diverse because KDE is a large meta project, and the existing libraries were too specialised to cover them all. However it quickly turned out that we had underestimated KDE's needs, and Eigen 1 was insufficient. So in 2007, I started developing Eigen 2. The aim was to finally cover all the needs of KDE and KOffice apps - a goal that, in retrospect, was very ambitious and will only be reached with Eigen 2.1. After an initial experiment with TVMET's code, I decided to restart from scratch in August 2007 and quickly got a working implementation of expression templates. However, this early Eigen 2 was very small. Development speed really picked up when Gael joined in early 2008.

Gael: A bit more than a year ago, I became tired of going back and forth between my own fixed size vector/matrix classes and more generic linear algebra packages. So, I started looking at the other existing solutions without being excited by any of them. Since I have been using KDE for about 9 years, I was really curious to know what the KDE's folks did in this area. At that time, it was exactly the start of Eigen 2 which looked promising but the fact it was based on TVMET puzzled me. Eventually, Benoît had the great idea to restart the development of Eigen 2 from scratch, and after one or two months he came up with a very lean design. Moreover his vision and feature plan for Eigen 2 exactly matched my own, and being part of the KDE community was exciting too! At the beginning, I naively thought that after one or two months the job would be done! Instead, we started playing with exciting stuff like explicit vectorisation, efficient matrix products, sparse matrix, etc.

Many people are familiar with other linear algebra and matrix libraries, including BLAS/LAPACK, Intel's Math Kernel Library,and Apple's vecLib framework. Can you explain how Eigen is different,besides being written in C++?

Benoit and Gael: Giving a fair answer to that question would require a thorough comparison to all existing libraries which is obviously out the scope of this interview. Search for "C++ matrix library" to get an idea. For us, the most important criteria includes:

  • generality: we need many different kinds of matrices: fixed-size, dynamic-size dense, sparse. For example, BLAS and LAPACK handle only dynamic-size dense matrices. Even MKL and vecLib have only limited support for fixed-size matrices.
  • performance: with optimisations for fixed size matrices, vectorisation, lazy evaluation, cache-friendly algorithms... See some benchmarks here.
  • ease of use: we have a C++ API that's neat even by the high standards of KDE developers, good documentation, and lots of convenience features...
  • license policy: Eigen is LGPL, there is no licensing issue.

What projects are using Eigen right now?

Benoit and Gael: Eigen is already used in a wide range of applications. Some of them include:

It is very interesting and motivating to see how many projects already switched, or are going to switch to Eigen, proving Eigen fills a real gap.

What are some goals of Eigen moving forward? What kind of help do you need? What are some new features we might see in Eigen 2.1?

Benoit and Gael:

Our goals for 2.1 are:

  • Finish stabilising the API (the API guarantee is only partial in 2.0)
  • Complete the Sparse module. One goal is to make it good enough for Step in KDE 4.3 and Krita.
  • Make sure all dense decompositions are world-class (LU is already good, though we have improvements in mind, SVD is being completely rewritten, etc...)
  • Make fixed-size specialisations of algorithms
  • Optimize the case of complex numbers
  • Vectorise more operations (e.g., exp() or sin() or log())
  • Investigate optionally using new GCC 4.4 hotness: per-function optimisation flags

We need a lot of help with all that, more details can be found in our To-do. If new contributors join the team, in the longer term, we could see some new modules covering statistics, fast fourier transform, non linear optimisations, etc.

We also welcome testing, especially on exotic (read: non-GNU or non-x86/x86-64) platforms. All bug reports are much appreciated.

How does it benefit Eigen to be part of KDE?

Benoit: In many, many ways!

  • KDE initially provided a long list of use cases, so we had a clear picture of what was needed.
  • The surrounding KDE community is where the first people interested in Eigen came from.
  • Still today, we occasionally receive help from various KDE contributors. For example, just because we are in kdesupport, Alexander Neundorf reviewed our CMake code.
  • The KDE SVN repository works well and working in such a huge repository is a guarantee that possible issues (such as preserving history across a SCM change) will be handled for us.
  • Having my blog aggregated on Planet KDE means that my Eigen posts have much more impact.
  • We also benefit a lot from having our users forum at the excellent forum.kde.org.

Eigen is being developed in KDE's subversion repository. How does it benefit KDE that Eigen is developed in kdesupport, rather than being an external dependency?

Benoit: It allows KDE to track the development version of Eigen much faster. Whenever a feature or a fix is added to Eigen, KDE can use it right away without waiting for a release. Of course, we still have to make releases of Eigen for each release of KDE.

Another thing is that it makes it easier to build KDE: one less dependency. And whenever KDE developers need matrix maths, they don't need to wonder for a long time what library to use, they can right away rely on Eigen since it is in kdesupport. Of course, this only is a valid argument if Eigen is actually a better choice for KDE than the alternatives, but we're convinced it is :)

For a variant of this interview follow see mac research.

Comments

In general this is true, but here I'm not sure.
AFAIK Eigen consists completely/almost completely of headers, I'm not sure there is actually an installed compiled library.
I.e. all the logic is in the headers, and the code is generated (AFAIK) from the templates at build time and linked into the resulting binaries.
We just had this discussion for the template classes and Qt 4.5 being LGPL.
Maybe you need some special exception so it is really legal to use Eigen in closed software.


Alex

I don't know if it's enough, but in the linked interview they said they use LGPL version 3 exactly for this reason.

I remember that Gael even contacted the FSF in order to ensure that. Only the LGPL v <= 2.1 would be identical to GPL in the case of a header-only library (since it required the separation of the application and the LGPL library binary code which was possible when using shared libs, but not if headers are used). LGPLv3 solved this issue.

LGPL version 3 is not so bad as long as we just have to use it right i guess -:)
Fashion models korean fashion for clothing shopping

First of all, indeed Eigen consists 100% of headers, there is no binary library to link to. Yes, the LGPL 2.1 has a big problem with that case, and yes, the LGPL 3 fixes that proble. See the FAQ [1] and the variant of the interview [3].

We contacted the FSF first 2 years ago [2] about the problems of the LGPL 2.1, and then again after the LGPL 3 came out to confirm that it fixes this issue.

Note, this is exactly why I hope Qt goes LGPL3. Incidentally, the case of Eigen was picked up by a FSF guy on his blog recently [4] as he raised the Qt LGPL issue.

[1] http://eigen.tuxfamily.org/index.php?title=FAQ#Licensing
[2] http://listengine.tuxfamily.org/lists.tuxfamily.org/eigen/2008/02/msg000...
[3] http://www.macresearch.org/interview-eigen-matrix-library
[4] http://lab.obsethryl.eu/content/lgpl-21-qt-45-and-c-templates

The API examples are indeed very, very nice. I'm impressed that you put in support for both column-major or row-major ordering. The flexible initializers, geometric transforms, etc are also very distinctive compared to the older libraries I've used before.

One thing that I've sometimes wanted in a matrix library is the option of memory protection. That is, every once in a while I have a bug :-) and end up accessing an element that doesn't exist (even if it's within the range of memory allocation, e.g., m(6,0) in a 6-by-4 column-major matrix). Checking bounds on every access would, of course, be a big performance hit, but having a compile-time flag that turns on checking just during debugging seems like it could be a big help. Valgrind is great for catching many bugs, but it's not obvious to me that it solves all of the problems. Or is there a better way to do this?

Anyway, it looks like a terrific library. Thanks for your contributions!

Thank you for the kind words.
About your request, bound checking is the default behavior in debug mode (debug mode is disabled by -DNDEBUG).

Sweet! I'm going to have to give it a try when I next have a relevant coding project.

Thanks.

I concur that this is a very nice project. What I like most about it is that, in conjunction with the GCC Intrinsic Functions, it greatly reduces the distance between the code and the hardware. While it works, the programing interface is certainly not simple. I have to wonder why we are implementing matrix operations on C++ when it clearly was never designed for it rather than simply using FORTRAN 95 which is designed for it and has a simple programing interface for matrix operations. Ultimately, as hardware develops, new language constructs and compiler designs are going to be needed to minimize the distance between the code and the hardware (specifically the elimination of function calls for things implemented in hardware). And currently, this is the answer to why we don't simply use FORTRAN 95, the GNU Fortran compiler is not really a good implementation of FORTRAN 95 and it appears to me that to implement the features planed for FORTRAN 200x that a new compiler design will be needed. I wait to see what the future brings, but for now Eigen2 appears to be the fastest way to do matrix arithmetic.

Fortran is painful to the point of being useless for pretty much anything besides math, and math is not the only purpose of programming. Eigen may be slower than well-coded fortran (I haven't seen any benchmarks comparing them but I am assuming it is true), but if you want to write some app of which matrix math is only one part, Eigen makes perfect sense.

This is not the only domain in which we are seeing a shift from special-purpose to general purpose languages - see python vs matlab or Maya MEL script. Many people are willing to put up with a loss of performance for a gain in generality.

Some Fortran programmers always assume Fortran to be faster than anything you can achieve with C++, but they're living in the past.



1) The only core language feature that gives Fortran an advantage over C/C++, is that it natively supports arithmetic expressions on arrays. However, in C++ it is possible to achieve the same thing (and much, much more along the way) by the technique of expression templates. For a long time that was mostly a theoretical possibility as C++ compilers were not good enough to get good performance from that, but Eigen shows an implementation of expression templates that works very well on various recent compilers.



2) Fortran has a rich collection of scientific libraries that used to be the best available (reference BLAS and LAPACK...). However, first of all, that's not a language feature, so we're comparing apples and oranges; and second, Eigen beats every BLAS for level 1 & 2 operations, see our benchmark page on our wiki, and with only a few % of the number of lines of code, and that is thanks to the kind of generic metaprogramming (templates) that only C++ allows.

No, I said that Eigen2 was faster. However, in general, a procedural language is faster than an OO language.

Re #1. I think that if you look into it you will find that FORTRAN 95 has many advantages over C. This is best illustrated in the book: The F Programing Language.

I have no doubt that Eigen2 is faster than FORTRAN compiled with the GFortran compiler. There are three issues:

1) Eigen2 is not a highlevel language. It is more like a meta assembler. Programs written with a meta assembler are, in general, faster than those written in a highlevel language.

2) The GFortran compiler sucks. IIUC, the G95 compiler is better.

3) To optimize matrix arithmetic written in a highlevel language would require a dedicated compiler -- not just a front end.

The fact that 95 supports matrix arithmetic and 200X will directly support threads mean that with, a good compiler, it would be faster than C/C++.

An interesting thought I has was if it would be possible to add matrix arithmetic to C/C++ (actually the arithmetic is in C) by overloading "+", "-" & "*"? It certainly would make the programing easier.

KSU257. I don't know where you got all the information you posted, but pretty much all of it is very far from the truth.

The statement about procedural vs. OO performance is an over-generalization. Eigen2 is a good proof that C++ can compete (and win) not only with high level procedural languages, but with hand coded assembler (GotoBLAS). But again, you can't reall do direct comparisons, because programming techniques are different.

Eigen2 is in C++, not in C so whatever advantages F95 has over C is irrelevant.

1) Eigen2 is a library, not a language. It is written in a high level language.
2) Gfortran is a good compiler. No, you don't understand it correctly. G95 is not better than Gfortran. Performance-wise g95 lags well behind gfortran, which is a big deal, because Fortran has to be fast, otherwise what's the point? Compiler diagnostic capabilities are almost on par. You can see it yourself http://polyhedron.com/compare0html
One advantage g95 has at the moment is support for co-arrays. But this module is a closed source shareware product. Not that I am against closed source in general, but that's clearly a disadvantage for many projects.
3) Just look at the Eigen benchmarks and see what high level language can do.

Fortran has supported array operations since F90. In fact F95 brought nothing new in this respect. F95 did add some useful language features, but those have nothing to do with array operations.
There is no such thing as F200x. There is F2003 that added OO features to Fortran and standardized C interoperability, but no compiler supports all of F2003 yet. There is F2008 that adds co-arrays (not threads), but who knows how long it will take to implement F2008 - maybe 10 more years, maybe more. To optimize all that will take even longer.

Already now you can overload arithmetic operators in C++ and use them to manipulate arrays. What more do you want?

Are you familiar with the current FORTRAN 95? It is a general purpose language that is a great improvement over C. The thing is that there is no OO in FORTRAN 95 (you have to go to 2003 for that), if you consider that to be painful ... . And, FORTRAN 200X will be a great improvement since it will directly support multiple threads. We need to face the fact that C/C++ are becoming obsolete because they will not scale well to multiple core processors.

Fortran (any standard) is not an improvement over C. Never was, never will be. It's a totally different language with its own history, design and applications.
Why would you do OO in Fortran is there are plenty of other OO languages?
See above, there is no such thing as F200x, but there is C++0x. You'd be surprised, but one can do threads in C! And in C++!!! There is more than one way to work with threads efficiently in C++.

"We need to face the fact that C/C++ are becoming obsolete because they will not scale well to multiple core processors." This is not a fact. This is just wrong. I have no idea where you got it from...

I consider myself as one of the beta testers and I have to thank both, Benoit and Gael, for two things:



(1) for the excellent user support which is of commercial premium quality. They opened a user forum and both reply within 2 days (upper bound). Bugs are fixed within hours and requested code snippets are delivered as well.



(2) for the excellent API.
I am accustomed to different frameworks (such as BLAS/LAPACK, Arpack, SuiteSparse, MUMPS, METIS, etc.) where we required our own linalg framework that provided the interfaces for these libs. recently we tried uBlas, but uBlas suffers from bad performance (it is not only about the solver interfaces, but also basic operations like Vector+Vector, element access and matrix assembling). Furthermore, uBlas emphasizes compatibility concerning consistent iterators and container algorithms (like STL) which produces some overhead. Moreover, this hides the mathematics one originally wanted to implement.



With Eigen2 things became very different. The math is in the foreground, the iterators are fast and simple and most operations are REALLY fast! For almost every action where I required external libs before, Eigen2 offers at least a basic implementation next to interfaces (and even bindings) to those libraries.



The interfacing with other libs is (at least for the dense part I can tell) very well thought. All data storages are conform with standards and can be accessed via c-arrays in order to export the data to external libs. But this is not all: One can even interprete any c-array in terms of an Eigen2 object by applying a Eigen2::Map which directly operates on the given data. By doing so, one can easily port his project to Eigen2 since one can use Eigen2 concurrently to the original linalg library.



Thank you!

Thanks for the positive comment and beta testing!