Jul 15, 2011

One of the highlights of SciPy 2011 was a demonstration of the upcoming release of iPython 0.11 (which is likely to become version 1.0, eventually). Fernando Perez calls iPython a "tool for manipulating namespaces", but this dry, technical definition belies the productivity-boosting power of this project. I have been using iPython for a few years as a replacement for the standard shell because it affords users easier access to debugging tools, command history and the underlying file system. Version 0.11 adds to this a variety of enhancements that further improve its utility and usability. A full account of it many features can be found in the iPython documentation, but I want to highlight a couple of the recent innovations that users might not be aware of.

Parallel Processing

Its probably impossible to buy a new computer in 2011 that does not have multiple cores, and yet most everyday scientific computing does not take advantage of parallel processing. Some have the impression that because of the Global Interpreter Lock (GIL), Python is not a useful parallel processing platform. In fact, though the GIL prevents concurrency for multithreaded processes, Python allows the GIL to be side-stepped through its multiprocessing module. In fact, there is a large suite of Python packages that provide multiprocessing for Python in various forms, from simply taking advantage of multiple cores on a single computer to computing on clusters, grids and clouds. iPython not only provides Python users with multiprocessing capability, but also allows for monitoring, debugging and interacting with multiple processes.

iPython facilitates multiprocessing through the use of "engines", which are instances of Python that are started as needed by iPython. These engines, in turn, are governed by a controller process that allows them to be addressed by the user, either directly or through a LoadBalanced interface that will automatically assign computation to available engines in a smart way.

Using parallel computing via iPython requires one additional step, which instantiates the engines that will be used for multiprocessing tasks. Before starting iPython run ipcluster from the terminal, as follows:

ipcluster start --n=4

where the n argument specifies the number of engines to start. On my Intel Core i5 machine, I have 4 cores available to me, so I chose to start an engine on each. Now, iPython has access to all of these engines, via a Client object:

In [1]: from IPython.parallel import Client

In [2]: c = Client()

In [3]: c.ids
Out[3]: [0, 1, 2, 3]

We can now multiprocess, using all four engines. As a simple example, consider running a set of linear regressions in parallel. The following code simulates some data that we can analyze:

In [5]: %paste
    import numpy as np

    # generate fake data
    nsample = 100
    x = np.linspace(0,10, 100)
    X = np.c_[np.ones(len(x)), x, x**2]
    beta = np.array([1, 0.1, 10])
    y = np.dot(X, beta) + np.random.normal(size=nsample)
## -- End pasted text --

We now have a response variable that was generated as a quadratic function of a predictor variable, with some Gaussian noise added. Lets use the scikits.statsmodels package to estimate some alternative models, in parallel.

In [6]: import scikits.statsmodels.api as sm

In [7]: model_set = [sm.OLS(y, X[:,0]), sm.OLS(y, X[:,:-1]), 
sm.OLS(y, X), sm.OLS(y, np.c_[X, x**3])]

This series of models are presented in order of increasing complexity, including an intrecept-only model, a simple linear model, a quadratic polynomial model (the generating model) and a cubic polynomial model. We can now fit the models in parallel, and calculate AIC values (an information criterion) so that we may discover which performs best.

In [8]: view = c.load_balanced_view()

In [9]: rc = view.map(lambda m: m.fit().aic, model_set)

In [10]: rc[:]
Out[10]: 
[1427.6221389537438,
 1153.7084062595984,
 319.70035715389247,
 321.67688530260961]

Needless to say, it would be excessive to fit a handful of ordinary least squares regression models in parallel, but this shows how simple it is to run computing tasks in parallel using iPython. But, this barely scratches the surface, so I strongly encourage you to read the parallel computing section of the iPython docs.

Qt Console

Another slick enhancement in iPython 0.11 is the introduction of a Qt-based console. This is facilitated by a ZeroMQ layer that allows iPython to run in two processes. If you are willing and able to install Qt and the Qt bindings for Python (either PyQt or PySide) (which can be a non-trivial task for Mac users), you can take advantage of the new iPython Qt console. Launching is a matter of providing a qtconsole argument:

ipython qtconsole

This spawns a Python window that contains the iPython console that you would have expected in the terminal. An immediate advantage of the Qt console is the automatic availability of tooltips, which pop up whenever the opening parenthesis of a class or function is typed:

Not impressed yet? How about the ability to load remote python scripts, via iPython magic? Here is a PyMC model, loaded directly from a GitHub gist:

Any file appened with .py can be imported this way, local or remote. Not only does this bring the script into the console, but the code can be edited prior to execution. This is really, really nice. How many of us find snippets of code from various places on the internet, which we copy and modify to apply it to our own work? This task just got a little easier.

Perhaps the nicest new feature is the ability to embed matplotlib figures directly into the console. The easiest way to enable this is to include a --pylab=inline argument when starting iPython. This will place figures generated by matplotlib into the console, just below the function call, rather than in a separate window:

It reminds me a little of Mathematica. Then, what is really nice is that the entire contents of the console can be exported to xml or html, via a contextual menu option:

This saves a document, with figures and all, that can be displayed in a web browser:

One of the improvements afforded by the two-process model implemented in iPython 0.11 is that the kernel can interact with multiple consoles. When a console is launched, you will notice a message from the kernel that provides information for additional remote connections:

[503] 11:21:41 fonnescj: $ ipython qtconsole --pylab=inline
[IPKernelApp] To connect another client to this kernel, use:
[IPKernelApp] --existing --shell=54639 --iopub=54640 --stdin=54641 --hb=54642

By default, the kernel only listens for local connections, but adding external interfaces is as simple as providing the IP address to iPython on startup. But for now, I can add an additional console to the kernel by pasting the flags provided to the ipython call that starts the second console. This will connect the new console to the already-running kernel. For example, in the screen shot below, I defined a variable b=10 in the bottom console, then attached a new console to the kernel; you can see that d (along with all the other objects in the namespace) is available to the second console.

Of course, there is much, much more to tell, but I will leave it at that for now, and encourage you to grab a build or clone the source code from the iPython GitHub repo, and use the online documentation to guide you through the trove of goodies that iPython has to offer. While they received no shortage of accolades at SciPy 2011 for their amazing work, I'd like to add my thanks to the iPython development team for their ongoing commitment to improving the usability of Python!

Previous articles

Jul 15, 2011

One of the highlights of SciPy 2011 was a demonstration of the upcoming release of iPython 0.11 (which is likely to become version 1.0, eventually). Fernando Perez calls iPython a "tool for manipulating namespaces", but this dry, technical definition belies the productivity-boosting power of this project. I have been using iPython for a few years as a replacement for the standard shell because it affords users easier access to debugging tools, command history and the underlying file system. Version 0.11 adds to this a variety of enhancements that further improve its utility and usability. A full account of it many features can be found in the iPython documentation, but I want to highlight a couple of the recent innovations that users might not be aware of.

Parallel Processing

Its probably impossible to buy a new computer in 2011 that does not have multiple cores, and yet most everyday scientific computing does not take advantage of parallel processing. Some have the impression that because of the Global Interpreter Lock (GIL), Python is not a useful parallel processing platform. In fact, though the GIL prevents concurrency for multithreaded processes, Python allows the GIL to be side-stepped through its multiprocessing module. In fact, there is a large suite of Python packages that provide multiprocessing for Python in various forms, from simply taking advantage of multiple cores on a single computer to computing on clusters, grids and clouds. iPython not only provides Python users with multiprocessing capability, but also allows for monitoring, debugging and interacting with multiple processes.

iPython facilitates multiprocessing through the use of "engines", which are instances of Python that are started as needed by iPython. These engines, in turn, are governed by a controller process that allows them to be addressed by the user, either directly or through a LoadBalanced interface that will automatically assign computation to available engines in a smart way.

Using parallel computing via iPython requires one additional step, which instantiates the engines that will be used for multiprocessing tasks. Before starting iPython run ipcluster from the terminal, as follows:

ipcluster start --n=4

where the n argument specifies the number of engines to start. On my Intel Core i5 machine, I have 4 cores available to me, so I chose to start an engine on each. Now, iPython has access to all of these engines, via a Client object:

In [1]: from IPython.parallel import Client

In [2]: c = Client()

In [3]: c.ids
Out[3]: [0, 1, 2, 3]

We can now multiprocess, using all four engines. As a simple example, consider running a set of linear regressions in parallel. The following code simulates some data that we can analyze:

In [5]: %paste
    import numpy as np

    # generate fake data
    nsample = 100
    x = np.linspace(0,10, 100)
    X = np.c_[np.ones(len(x)), x, x**2]
    beta = np.array([1, 0.1, 10])
    y = np.dot(X, beta) + np.random.normal(size=nsample)
## -- End pasted text --

We now have a response variable that was generated as a quadratic function of a predictor variable, with some Gaussian noise added. Lets use the scikits.statsmodels package to estimate some alternative models, in parallel.

In [6]: import scikits.statsmodels.api as sm

In [7]: model_set = [sm.OLS(y, X[:,0]), sm.OLS(y, X[:,:-1]), 
sm.OLS(y, X), sm.OLS(y, np.c_[X, x**3])]

This series of models are presented in order of increasing complexity, including an intrecept-only model, a simple linear model, a quadratic polynomial model (the generating model) and a cubic polynomial model. We can now fit the models in parallel, and calculate AIC values (an information criterion) so that we may discover which performs best.

In [8]: view = c.load_balanced_view()

In [9]: rc = view.map(lambda m: m.fit().aic, model_set)

In [10]: rc[:]
Out[10]: 
[1427.6221389537438,
 1153.7084062595984,
 319.70035715389247,
 321.67688530260961]

Needless to say, it would be excessive to fit a handful of ordinary least squares regression models in parallel, but this shows how simple it is to run computing tasks in parallel using iPython. But, this barely scratches the surface, so I strongly encourage you to read the parallel computing section of the iPython docs.

Qt Console

Another slick enhancement in iPython 0.11 is the introduction of a Qt-based console. This is facilitated by a ZeroMQ layer that allows iPython to run in two processes. If you are willing and able to install Qt and the Qt bindings for Python (either PyQt or PySide) (which can be a non-trivial task for Mac users), you can take advantage of the new iPython Qt console. Launching is a matter of providing a qtconsole argument:

ipython qtconsole

This spawns a Python window that contains the iPython console that you would have expected in the terminal. An immediate advantage of the Qt console is the automatic availability of tooltips, which pop up whenever the opening parenthesis of a class or function is typed:

Not impressed yet? How about the ability to load remote python scripts, via iPython magic? Here is a PyMC model, loaded directly from a GitHub gist:

Any file appened with .py can be imported this way, local or remote. Not only does this bring the script into the console, but the code can be edited prior to execution. This is really, really nice. How many of us find snippets of code from various places on the internet, which we copy and modify to apply it to our own work? This task just got a little easier.

Perhaps the nicest new feature is the ability to embed matplotlib figures directly into the console. The easiest way to enable this is to include a --pylab=inline argument when starting iPython. This will place figures generated by matplotlib into the console, just below the function call, rather than in a separate window:

It reminds me a little of Mathematica. Then, what is really nice is that the entire contents of the console can be exported to xml or html, via a contextual menu option:

This saves a document, with figures and all, that can be displayed in a web browser:

One of the improvements afforded by the two-process model implemented in iPython 0.11 is that the kernel can interact with multiple consoles. When a console is launched, you will notice a message from the kernel that provides information for additional remote connections:

[503] 11:21:41 fonnescj: $ ipython qtconsole --pylab=inline
[IPKernelApp] To connect another client to this kernel, use:
[IPKernelApp] --existing --shell=54639 --iopub=54640 --stdin=54641 --hb=54642

By default, the kernel only listens for local connections, but adding external interfaces is as simple as providing the IP address to iPython on startup. But for now, I can add an additional console to the kernel by pasting the flags provided to the ipython call that starts the second console. This will connect the new console to the already-running kernel. For example, in the screen shot below, I defined a variable b=10 in the bottom console, then attached a new console to the kernel; you can see that d (along with all the other objects in the namespace) is available to the second console.

Of course, there is much, much more to tell, but I will leave it at that for now, and encourage you to grab a build or clone the source code from the iPython GitHub repo, and use the online documentation to guide you through the trove of goodies that iPython has to offer. While they received no shortage of accolades at SciPy 2011 for their amazing work, I'd like to add my thanks to the iPython development team for their ongoing commitment to improving the usability of Python!

More articles