Lazy Parallel Processing and Visualization of Large Data with ImgLib2, BigDataViewer, the N5-API, and Spark

We will do this workshop in Jupyter notebooks

In this workshop, we will learn how to use ImgLib2, BigDataViewer, the N5-API, and Apache Spark for lazy evaluation data processing workflows. We will draft all examples in interactive Jupyter notebooks. For this, we will need to create an environment that runs a Jupyter notebook server, a fast Java kernel, and a few other dependencies. If you don’t have conda installed yet, please do this now by following their installation instructions.

Now, we can create an environment:

conda create -c conda-forge -n i2k2024-lazy python=3

and activate it:

conda activate i2k2024-lazy

Now, let’s install the Blosc compression library, the IJava Jupyter kernel, and a modern version of OpenJDK:

conda install conda-forge::python-blosc
conda install conda-forge::ijava
conda install conda-forge::openjdk

Now, checkout the repository with the code example for our workshop:

git clone https://github.com/saalfeldlab/i2k2024-lazy-workshop

This repository includes the notebooks for our workshop and renders them into a web-blog using the Quarto publishing system. For our workshop, it is not important to run Quarto, but you may find it exciting to use the same structure for your own experiments. You will find the notebooks as posts in the repository:

cd i2k2024-lazy-workshop/posts

Here, please start your Jupyter notebook server and let’s open the first example:

jupyter notebook

PS: The IJava kernel uses Java’s JShell tool, so in a production environment, you can use JShell and Maven to execute your code. For that, you will have to declare the dependencies in a pom.xml file, and start JShell by:

mvn com.github.johnpoth:jshell-maven-plugin:1.3:run