Using deep learning to listen for whales

January 10, 2014 | categories: Python, Biology, Programming, Bioacoustics, Machine Learning | View Comments

Since recent breakthroughs in the field of speech recognition and computer vision, neural networks have gotten a lot of attention again. Particularly impressive were Krizhevsky et al.'s seminal results at the ILSVRC 2012 workshop, which showed that neural nets are able to outperform conventional image recognition systems by a large margin; results that shook up the entire field. [1]

Krizhevsky's winning model is a convolutional neural network (convnet), which is a type of neural net that exploits spatial correlations in 2-d input. Convnets can have hundreds of thousands of neurons (activation units) and millions of connections between them, many more than could be learned effectively previously. This is possible because convnets share weights between connections, and thus vastly reduce the number of parameters that need to be learned; they essentially learn a number of layers of convolution matrices that they apply to their input in order to find high-level, discriminative features.

https://danielnouri.org/media/deep-learning-whales-krizhevsky-lsvrc-2012-predictions.jpg

Figure 1: Example predictions of ILSVRC 2012 winner; eight images with their true label and the net's top five predictions below. (source)

Many papers have since followed up on Krizhevsky's work and some were able to improve upon the original results. But while most attention went into the problem of using convnets to do image recognition, in this article I will describe how I was able to successfully apply convnets to a rather different domain, namely that of underwater bioacoustics, where sounds of different animal species are detected and classified.

My work on this topic began with last year's Kaggle Whale Detection Challenge, which asked competitors to classify two-second audio recordings, some of which had a certain call of a specific whale on them, and others didn't. The whale in question was the North Atlantic Right Whale (NARW), which is a whale species that's sadly nearly extinct, with less than 400 individuals estimated to still exist. Believing that this could be a very interesting and meaningful way to test my freshly acquired knowledge around convolutional neural networks, I entered the challenge early, and was able to reach a pretty remarkable Area Under Curve (AUC) score of roughly 97% after only two days into the competition. [2]

https://danielnouri.org/media/deep-learning-whales-leaderboard.png

Figure 2: The Kaggle leaderboard after two days into the competition.

The trick to my early success was that I framed the problem of finding the whale sound patterns as an image reconition problem, by turning the two-second sound clips each into spectrograms. Spectrograms are essentially 2-d arrays with amplitude as a function of time and frequency. This allowed me to use standard convnet architectures quite similar to those Krizhevsky had used when working with the CIFAR-10 image dataset. With one of few differences in architecture stemming from the fact that CIFAR-10 uses RGB images as input, while my spectrograms have one real number value per pixel, not unlike gray-scale images.

https://danielnouri.org/media/deep-learning-whales-spectrogram.jpg

Figure 3: Spectrogram containing a right whale up-call.

Spurred by my success, I registered for the International Workshop on Detection, Classification, Localization, and Density Estimation (DCLDE) of Marine Mammals using Passive Acoustics, in St Andrews. A world-wide community of scientists meets every two years at this workshop to discuss the latest developments in using passive acoustics (listening for sounds) to detect and track marine mammals.

That there was such a breadth of research around this topic was entirely new to me, and it was fascinating to learn about it. Another thing that I've since learned is that this breadth is sadly dwarfed by the amounts of massive underwater noise that humans produce today, through shipping, oil exploration, and military sonar. And this noise severely affects the lives of animals for which "listening is as important as seeing is for humans – they communicate, locate food, and navigate using sound."

The talk that I gave at the DCLDE 2013 workshop was well received. In it, I elaborated how my method relied on little to no problem-specific human engineering, and therefore could be easily adapted to detect and classify all sorts of marine mammal sounds, not just right whale up-calls.

At DCLDE, the execution speed of detection algorithms was frequently quoted as being x times faster than real-time, with x often being a fairly low number around 1 to 10. My GPU-powered implementation turned out to be on the faster side here: on my workstation, it detects and classifies sounds 700x faster than real-time, which means it runs detections on one year of audio recordings in roughly twelve hours, using only a single NVIDIA GTX 580 graphics card.

In terms of accuracy, it was somewhat hard to get an idea of which of the algorithms presented really worked better than others. This had two reasons: the inconsistent use of reliable metrics such as AUC and use of cross-validation, and a lack of standard datasets that everyone could test their algorithms against. [3]

However, it should be mentioned that good datasets are a bit tricky to come by in this field. The nature of hydrophone recordings is that the signal you're listening for could be generated a few meters away, or many kilometers, and therefore be very faint. Plus, recordings often contain a lot of ambient noise coming from cargo ships, offshore drilling, hydrophone cable flutter, and the like. With the effect that often it's hard even for a human expert to tell if the particular sound they're listening to is a vocalization of the mammal they're looking for, or just noise. Thus, analysts will often label segments as unsure, and two analysts will sometimes even give conflicting labels to the same sound.

Four NARW up-calls that are easy to detect.
(Here's a much messier example. And some more fascinating recordings of marine mammals.)

This leads to a situation where people tend to ignore noisy sounds altogether, since if you consider them, predictions become difficult to verify manually, and good training examples harder to collect. But more importantly, when you ignore sounds with a bad signal-to-noise ratio (SNR), your algorithms will have an easier time learning the right patterns, too, and they will make fewer mistakes. As it turns out, noise is often more of a problem for algorithms than it is for human specialists.

The approach of ignoring sounds with a bad SNR seems fine until you're in a situation where you've put a lot of effort into collecting recordings, and then they turn out to be unusually noisy, and trying to adjust your model's detection threshold yields either way too many false positives detections, or too many calls are missed.

One of the very nice people I met at DCLDE was Holger Klinck from Oregon State University. He wanted to try out my convnet with one of his lab's "very messy" recordings. Some material that his group at OSU had collected at five sites near Iceland and Greenland in 2007 and 2008 had unusually high levels of noise in them, and their detection algorithms had maybe worked less than optimal there.

https://danielnouri.org/media/deep-learning-whales-osu-iceland-detections.png

Figure 4: "Locations of passive acoustic moorings near Iceland and southern Greenland (black spots), and the number of right whale upcalls detected per day in late 2007 at the five sites." Taken from [4]. Note the very low number of calls detected at the CE and SE sites.

I was rather amazed when a few weeks later I had a hard disk from OSU in my hands containing in total many years of hydrophone recordings from two sites near Iceland and four locations on the Scotian Shelf. I dusted off the model that I had used for the Kaggle Whale Detection Challenge and quite confidently started running detections on the recordings. Which is when I was in for a surprise: the predictions my shiny 97% model made were all really lousy! Very many obvious non-whale noises were detected wrongly. How was it possible?

To solve this puzzle, I had to understand that the Kaggle Whale Detection Challenge's train and test datasets had a strong selection bias in them. The tens of thousands of examples that I had used to train my model for the challenge were unrepresentative of all whale, and particularly, a lot of similar-sounding non-whale sounds out there. That's because the Kaggle challenge's examples were collected by use of a two-stage pipeline, where an automated detector would first pick out likely candidates from the recording, only after which a human analyst would label them with true or false. I realized that what we were building in the Kaggle challenge was a classifier that worked well only if it had a certain detector running in front of it that would take care of the initial pass of detection. My neural net had thus never seen during training anything like the sounds that it mistook for whale calls now.

If I wanted my convnet to be usable by itself, on continuous audio recordings, and independently of this other detector, I would have to train it with a more balanced training set. And so I ditched most of the training examples I had, and started out with only a few hundred, and trained a new model with them. As was expected, training with only few examples left me with a pretty weak model that would overfit and make lots of obvious mistakes. But this allowed me to pick up the worst mistakes, label them correctly, and feed them back into the system as training examples. And then repeat that. (A process that Olivier Grisel later told me amounts to active learning.)

Many (quite enjoyable) hours of listening to underwater sounds later, I had collected some 2000 training examples this way, some of which were already pretty tricky to verify. And luckily, the newly trained model started to make pretty good predictions. When I sent my results back to Holger, he said that, yes, the patterns I'd found were very similar to those that his group had found for the Scotian Shelf sites!

https://danielnouri.org/media/deep-learning-whales-my-scotian-shelf-detections.png

Figure 5: Number of right whale up-call detections per hour at two sites on the Scotian Shelf, detected by the convnet. The numbers and seasonal pattern match with what Mellinger et al. reported in [5].

The OSU team had used a three-stage detection process to produce their numbers. Humans verified in phases two (broadly) and three (in more detail) the detections that the algorithm came up with in phase one. Whereas my detection results came straight out of the algorithm.

A case-by-case comparison still needs to happen, but the similarities of the overall call patterns suggests that the convnet reaches comparable performance, but without the need for human analysts to be part of the detection pipeline, making it potentially much more time-efficient to use in practice.

What's even more exciting is that the neural net was able to find right whale up-calls at the problematic SE site near Iceland, where previously no up-calls could be detected due to high noise levels.

https://danielnouri.org/media/deep-learning-whales-my-iceland-detections.png

Figure 6: NRW up-call detections per day at sites SW and SE near Iceland, detected by the convnet. The patterns at the SW site match roughly with what was reported in [4], while no calls could be identified previously at the SE site (cf. Figure 4).

Another thing we're currently looking into is wheter or not the relatively small but constant number of calls that the convnet detected during Winter season are real, or if they're false positives. Right whales are not known to hang around so high up North during that time of the year, so proving that would constitute significant news for people studying the migration routes of these whales.

(Comments also on Hacker News.)

[1]For a more detailed history and recent developments around neural nets, see this article in Nature: "Computer science: The learning machines".
[2]See the mention of my results in this Wired article: "Wanted: Right Whale Caller ID".
[3]For a comparison of machine learning algorithms in use, see: Mellinger DK, et al. 2007. An overview of fixed passive acoustic observation methods for cetaceans. Oceanography 20:36–45.
[4](1, 2) Mellinger DK, et al. 2011. Confirmation of right whales near a historic whaling ground east of Southern Greenland. Biol Lett 7:411−413
[5]Mellinger DK, et al. 2007. Seasonal occurrence of North Atlantic right whale (Eubalaena glacialis) vocalizations at two sites on the Scotian Shelf. Mar. Mamm. Sci. 23, 856–867.
Read and Post Comments

Kotti Zidanca sprint report

July 31, 2013 | categories: Python, Web, Pyramid, Programming, Kotti | View Comments

Big up to Termitnjak for organizing the amazing Zidanca sprint held last week. The venue for our get-together in south-eastern Slovenia was one of the most beautiful place I've ever sprinted at. Despite that and the gallons of fine wine that were poured, we still managed to get quite a few things done. Here is a summary.

http://www.coactivate.org/projects/zidanca-sprint-2013/project-home/lokve2.JPG

kotti_tinymce: new version

Vanč <ferewuz> worked on upgrading Kotti's WISYWIG editor TinyMCE to version 4.0.2. This new version features a much nicer looking user interface, which fits a lot better into the overall Kotti style. It also works better with smaller displays, as the editor will now scale down along with the browser window's width. Our existing image-upload and document-linking pop-ups were updated to work with the new version. You can test drive on our demo site.

deform_bootstrap: tabbed forms

Natan <nightmarebadger> worked on adding tabbed forms support to deform_bootstrap. With this change, each nested (Mapping)Schema in your Colander schema will be rendered inside its own tab. As an example, consider this Client schema:

import colander

class Person(colander.Schema):
    name = colander.SchemaNode(
        colander.String(),
        title='Name',
        )

class Car(colander.Schema):
    horsepower = colander.SchemaNode(
        colander.Integer(),
        title='Horsepower',
        )

class Client(colander.Schema):
    person = Person(title='Person data')
    car = Car(title='Car stuffs')

A deform_bootstrap form using the Client schema will render with two tabs 'Person data' and 'Car stuffs'.

kotti_multilingual: translated content

kotti_multilingual was started by Andreas <disko> earlier this year. It includes basic support for language roots and switching between languages. The sprinters decided to use this package as the starting point for adding more advanced content translation support.

Before getting down to coding, we first had a round of discussion with Domen <iElectric>, Ramon <bloodbare>, Jure <ibi> and me. In particular, we discussed how plone.multilingual handles translations and 'language independent fields' and compared it to the venerable LinguaPlone. We decided that we'd use the LinguaPlone model, where translations derive from a so-called canonical document. This is the only document that holds the actual data for any of the language independent fields, and where all translated documents refer to when they look up those fields.

This is still a work in progress, but so far we've managed to h̶a̶c̶k̶ convince SQLAlchemy to return the value of a language independent attribute from the canonical document, if it exists.

We've also implemented linking between translations using a separate translations table. This, too, works quite similar to how LinguaPlone does it when you configure it to use one root folder per language (e.g. /en, /de, and so on). For locating the destination folder for new translations, we look up the parent document's translation, and put the new translation in there. Thus, a translation from /en/food/mexican will automagically be created at /de/food/mexican, given that /de/food is already a translation of /en/food.

Versioning support

Andreas pointed us to the SQLAlchmey Versioned Objects example. He had used it in a Pyramid project and it worked very well for him. So Vanč took this up on the last day of the sprint and implemented a prototype for history support for Kotti.

Vanč reports that his prototype saves the old version of a document anytime you edit it, and keeps all the old versions. Next on his list is adding a user interface for viewing older versions (there's merely a list right now), and adding options for deciding which content types to version and how many old versions to keep. Other things to work on are a separate button for when you want to save a new version, so that small changes can be made without creating a new version. And possibly a comment field to describe what the changes in your new version are about.

Finishing the tutorial

Natan took on the important task of finishing the developer tutorial. Previously, the tutorial introduced concepts such as setting up a new project, configuring Kotti, adding Poll and Choice content types, and simple views. But it did not have forms to actually vote on the poll. This is now being added.

During the sprint, we also discussed which are the most important topics that the docs don't cover well enough yet. We decided we should add more documentation for users and security, workflows, and a more advanced forms example.

Read and Post Comments

libblas and liblapack issues and speed, with SciPy and Ubuntu

December 19, 2012 | categories: Python, Programming | View Comments

The problem

You've built from source a package that uses the linear algebra libraries BLAS and LAPACK, but now you're getting:

libatlas.so.3: cannot open shared object file: No such file or directory

In my case, I was trying to install SciPy from source using pip install scipy, but then ran into this issue when I tried to use it. I also ran into this other problem:

ImportError: scipy/linalg/clapack.so: undefined symbol: clapack_sgesv

To fix it

  1. Make sure you've got libatlas3-base installed:

    sudo apt-get install libatlas3-base
    
  2. Make sure you're using the right implementation of libblas.so.3, that is, the one that libatlas3-base provides. This implementation lives in /usr/lib/atlas-base/atlas/libblas.so.3.

    To do this, run the following command to set your system's default Atlas implementation:

    sudo update-alternatives --config libblas.so.3
    

    An example of what this might produce, and what number you might need to enter:

    Selection      Path                                     Priority   Status
    ------------------------------------------------------------
    * 0            /usr/lib/openblas-base/libopenblas.so.0   40        auto mode
      1            /usr/lib/atlas-base/atlas/libblas.so.3    35        manual mode
      2            /usr/lib/libblas/libblas.so.3             10        manual mode
      3            /usr/lib/openblas-base/libopenblas.so.0   40        manual mode
    
     Press enter to keep the current choice[*], or type selection number: 1
    
  3. Use liblapack3.so.3 from /usr/lib/atlas-base/atlas/liblapack.so.3 as your default:

    sudo update-alternatives --config liblapack.so.3
    

That's it. Now you shouldn't be getting the aforementioned errors anymore.

More information

The Debian Wiki has an overview of linear algebra libraries available, and it also describes how to use update-alternatives to switch between BLAS and LAPACK implementations.

Speed it up! Build an optimized Atlas for your architecture.

The README.Debian file of libatlas3-base explains:

Building your own optimized packages of Atlas is straightforward. Just get the sources of the package and its build-dependencies:

# apt-get source atlas
# apt-get build-dep atlas
# apt-get install devscripts dpkg-dev dch

and type the following from the atlas source subdir:

# fakeroot debian/rules custom

it should produce a package called:

../libatlas3-base_*.deb

which is optimized for the architecture Atlas has been built on. Then install the package using dpkg -i.

If you, like me, get this "classical" error when building Atlas:

VARIATION EXCEEDS TOLERENCE, RERUN WITH HIGHER REPS

then you'll need to read this: Your install dies with "unable to get timings in tolerance".

Lastly, you might need to remove any existing libopenblas installation from your system for the building of the Debian package to finish successfully:

sudo aptitude purge libopenblas-{base,dev}
Read and Post Comments

Use apt-get to install Python dependencies for Travis CI

November 23, 2012 | categories: Python, Programming | View Comments

Travis CI is used by more and more open source Python projects to do their continous testing. scikit-learn is the latest project to adopt it.

With Travis, usually you'll use pip or setuptools to get your project's Python dependencies installed before running the test script. Here's the minimal .travis.yml example from the Travis docs for Python, one that will install dependencies using pip before it runs nosetests:

language: python
python:
  - "2.6"
  - "2.7"
  - "3.2"
# command to install dependencies
install: "pip install -r requirements.txt --use-mirrors"
# command to run tests
script: nosetests

There is however some dependencies where this is problematic. Two of those are numpy and scipy, which contain a lot of C code that, with the method just discussed, needs to be compiled every time you run the Travis tests.

Travis allows you to install system packages through apt-get, which is quite cool. And there's already the binary python-numpy and python-scipy packages in Ubuntu, so why not use them?

The problem is that simply installing them via apt-get does not work for the same reason it doesn't work when you do this locally: The default virtualenv that Travis sets up for you to run the tests in is isolated from the system packages, so it won't see those globally installed numpy and scipy packages.

The solution for this is use virtualenv with the --system-site-packages option, which allows you to also import packages from the global site packages directory.

How it works

Add these lines to your Travis configuration to use a virtualenv with --system-site-packages:

virtualenv:
  system_site_packages: true

You can thus install Python packages via apt-get in the before_install section, and use them in your virtualenv:

before_install:
 - sudo apt-get install -qq python-numpy python-scipy

A real-world use of this approach can be found in nolearn.

Read and Post Comments

python-mode gone wrong

November 16, 2012 | categories: Python, Emacs, Programming | View Comments

tl;dr

If you recently upgraded Emacs' python-mode to version 6.0 or higher, and it feels wrong (indentation seems broken, the menu is confusing), downgrade to version 5. In Debian or Ubuntu you can do it like so:

sudo aptitude install python-central pymacs
wget http://launchpadlibrarian.net/22053597/python-mode_5.1.0-1_all.deb
sudo dpkg -i python-mode_5.1.0-1_all.deb

Broken hearts

Something terrible happened. My beloved python-mode for Emacs has been infested with featuritis, and it doesn't look like it's coming back.

Yesterday I upgraded to Xubuntu 12.10 for my work laptop. That was after it turned out that I'm not manly enough to install Debian Wheezy proper. (What the f--k is wrong with your WLAN support, Debian!!)

With Xubuntu and a few tweaks you can luckily ignore all the less fortunate decisions that Ubuntu has made recently. No Unity means no Dash means no broken privacy out of the box.

Synaptic and GDebi are quickly installed to replace the ad-ridden and sluggish Ubuntu Software Center. And packages rock again and are fast. Also, I've come to really like Xfce; but I digress...

So far so good. Then I installed Emacs and python-mode and started editing my first Python file. Which was when I fell into a deep shock. A few things felt just plain wrong. In fact so wrong that I was sure I had the wrong mode active (namely the other python.el, which ships with Emacs). For hours and hours I searched for a way to disable Emacs' own Python mode and enable the "good one", tried some hacks, but it wouldn't work. Because, really I was using the right mode all the time, but python-mode had changed so much I couldn't recognize it anymore.

Firstly, the menu entries are completely different. Instead of the two entries "IM-Python", and "Python" (which I hardly ever used), there's now five different menus called "PyShell", "PyEdit", "PyExec", "PyTools" and "Outline", each with a shitload of entries inside, with mostly either trivial things, or stuff I have no idea what they're good for.

Take an example: The newer python-mode added skeletons. So that now instead of writing a for statement, you can now call up a macro that will ask you what to put into the name and expression parts of the statement, and then it'll write it. Yes, friends, imagine that instead of writing for foo in baz:, you can now call up this insanely useful macro and it will ask you to enter foo and then you hit enter and then it asks for baz, enter, and then it will write the line for foo in baz:.

What the f--k?

Surely, you could say I can ignore all of this seemingly useless functionality. After all, it might be useful to someone (it's NOT!!). But it still leaves a very bad taste in my mouth. I feel alarmed that my beloved python-mode seems to have taken a step into a very bad direction. And I wonder if with all these added features, is it going to be as well-maintained? Is it going to make similar decisions in the future, ones that will affect me more? (This is software that I rely on for my daily work.)

And it looks like my fears are already confirmed. It turns out python-mode gets indentation wrong, arguably the most fundamental feature of any Python editing mode, or has anyway changed the way it works. As an example, if you hit enter where the X is below, it will indent four characters:

class Bar:
    def foo(self):
        pass

Xclass Baz:

So that you'll end up with:

class Bar:
    def foo(self):
        pass

    class Baz:

(Where what you really wanted is put a blank line between them, or maybe add a new method to Bar.)

So what I did after I realized what was going on was to downgrade to python-mode 5.1, and finally my blood pressure went back to normal.

I'd be interested to hear other people's experiences with python-mode 6, I wonder if I'm alone.

Screenshots

Here's the two versions with their respective menus.

python-mode version 5

Simple is better than complex

python-mode version 5

python-mode version 6

F--k simple, let's do a million features because we can

python-mode version 6
Read and Post Comments

« Previous Page -- Next Page »