Jan 9, 2012

Due primarily to their relative popularity, there are far more books available for R than for Python, at least related to using the respective languages for data analysis and other numerical applications. There are fewer yet related to the various third-party packages that are available for Python. So, it was with more than a little excitement that I tracked down a copy of the NumPy 1.5 Beginner's Guide (Packt Publishing), authored by Ivan Idris 1.

One of the unavoidable issues with writing books about software is that for even moderately well-maintained packages, the release version has changed by the time the book is published; this is the case for Idris' book, as NumPy has reached version 1.6 (and many users work from the current codebase on GitHub). Don't let this deter you, however, as the major functionality changes little from version to version, particularly for point releases.

The book begins with thorough, visual installation instructions for the three major platforms (sorry, OS/2 users!). Though the NumPy website includes decent install instructions, this is a welcome chapter, particularly for new users, because it is well-organized and highly visual. Depending on your setup, the instructions for building from source may not be sufficient, at least on OS X, where some configuration is sometimes necessary depending on which version of Xcode (and hence, compilers) is installed.

In addition to installation guidance, the author reveals several avenues for 3rd-party help with NumPy, which is useful since one inevitably outstrips any book's ability to answer all one's questions. Its a good choice of recommended resources, too: mailing lists, IRC, and critically, Stack Overflow.

The catch-phrase of the Packt Beginner's Guide series is Learn by doing: less theory, more results. True to its word, this particular guide is extremely hands on, comprised mostly of step-by-step "Time for Action" recipes for performing particular atomic tasks using NumPy within Python. For the most part, this philosophy works to the user's benefit by providing a ton of useful guides to the core functionality of NumPy. However, in places this causes suffering due to lack of context. For example, what is a Bessel function, and why would I need it? How do I interpret the plot I just generated? To me, you could provide some background without fear of being mistaken for a textbook.

Though the book is technically a beginner's guide, it is stated up front that knowledge of Python is assumed. This is evident early: the reader is taken from the installation guide kiddie pool directly to the deep end of vector operations. The early chapters illustrate how NumPy works closely with other scientific programming packages, most notably Matplotlib and iPython. This includes a very condensed iPython tutorial, which is a boon because everyone doing scientific computing with Python should be using iPython as the default shell.

The early chapters do a solid job of covering NumPy fundamentals, and by fundamentals we are, of course, talking about ndarrays. The introduction to array data types jumps right in to creating custom types, a topic that is truly useful for those of us applying NumPy to real data. Overall, the range of array methods provided is deep relative to other resources I have encountered.

One of the nice things about the book is that you are almost guaranteed to learn something new, almost irrespective of your level of expertise. For as long as I've been using NumPy, I tend use a particular subset of its functionality, related to the sort of work I do. This book is more comprehensive than that. Who knew you could truncate the values of an array with ndarray.clip()? Not me. Some of the included workflows should be really useful for a lot of users, such as inserting arbitrary values into a sorted array such that it remains sorted.

A big take-home lesson from the book that is likely to impress the novice user is the manifold advantages of working with arrays, rather than looping over lists, be it by using ufuncs or vectorizing scalar functions. Its not only a performance boon, but from a development standpoint its much easier to read and to maintain.

My only pedantic criticism of the book is the use of some rather awkward English in places: "Get into Terms with Commonly Used Functions" doesn't sound quite right, and "Have a go hero" is an odd little phrase used to encourage hands-on programming that seems to have come straight from "Go Diego Go". This, along with a few function usage mistakes would suggest that another round of edits was in order.

Idris makes heavy use of financial applications for his examples. This is fine, I suppose, but it may be heavy going for those from other fields, given the finance's tendency to use specialized terminology. A broader range of motivating examples would have both shown the breadth of NumPy's potential application and helped the book appeal to a wider audience.

Though full coverage of a package so jammed with functionality as NumPy is impossible, this guide covers a reasonable cross section: Users are introduced to calculating and drawing waveforms, fitting polynomial functions, matrix functions and manipulation, statistical distributions, sorting algorithns and (you guessed it) financial functions.

It was very nice (particularly given the aforementioned constraints) to see a testing chapter, though it is just an overview of the key testing functions. This is a good example of where a little background would have made a big difference--new users are rarely able to appreciate the importance of testing, particularly unit testing, so a little motivation is required to make the concept stick, and to ensure that new NumPy users put them into practice.

Despite my criticisms, I would recommend The NumPy 1.5 Beginner's Guide without hesitation to users unfamiliar with NumPy (provided they already have some Python chops). NumPy is the most important Python package for mathematics, statistics and other numerical computation, and there is currently no better novice introduction in print. Having a copy of my own, it will be nice to have it close at hand to remind me of how much power is contained within the NumPy code base.

  1. Disclaimer -- I was given a complementary electronic copy of the book from the publisher in exchange for writing a review. 

Previous articles

Jul 15, 2011