Classifiers

BirdNET and HawkEars are computer algorithms, commonly referred to as classifiers or recognizers, that have been trained to interpret audio recording spectrograms to classify the species of bird (and a few other acoustic taxa) making sounds in those audio recordings.

BirdNET is a worldwide bioacoustic classifier, which has been trained to classify thousands of species, while HawkEars is a regional classifier that has been trained to classify hundreds of species, mostly Canadian and northern US species.  There are also differences in their algorithms that lead to more or less accurate results for different species and recording conditions. Think of them like two different hammers: both are meant to strike nails, but the different size and shape of the hammer changes the result of striking different nails. HawkEars will often be the preferred choice if you are working with acoustic communities for the region it was trained for (Canada, northern US). A comparison of the two classifiers for the HawkEars region is available here (https://www.sciencedirect.com/science/article/pii/S1574954125001311)

BirdNET and HawkEars scan every 3-second window of a spectrogram and give a confidence score for every species in their model for each of these 3-second windows. The classifier scores are not probabilities, but they are reported on a scale of 0-1 for readability.  So a score of 0.5 for an American Robin should not be considered to have a 50% chance of being correct, but it is generally more likely to be correct than a score of 0.2 for an American Robin. Note that each model is different and each species’ calls are more or less complex than those of other species, so scores for different species or for the same species but from a different model should not be considered equivalent. More information on how to interpret score thresholds can be found in Wood and Kahl 2024 (https://link.springer.com/article/10.1007/s10336-024-02144-5).

For each 3-second window, the classifiers are generating a score for every species in their dataset. Most of those scores, however, will be very low, indicating that the species is not actually present within the 3-second window, and so a minimum threshold is set for which scores are reported. 

WildTrax is set to only report scores that are above a minimum score threshold of 0.2 for BirdNET and 0.3 for HawkEars (although these can be set higher by the user), since most instances of a detection score below those thresholds are incorrect and including all false positives would overwhelm the true positives. Even scores between 0.2 and 0.6 are likely to be false positives for many species. Every species is a little different and sometimes different recording conditions may generate different detection scores for the same species, so WildTrax maintains a relatively low minimum threshold for any users that have a need to explore low-scoring detections for their specific purposes.

 

The threshold for a classifier is a trade-off between the number of false positive detections and the number of missed detections (i.e., false negatives). That balance and the subsequent threshold selected will depend on the desired application for the data. Users should validate a subset of their data to understand the precision and recall of the models on their species of interest. More information on the evaluation can process can be found in Wood and Kahl 2024 (https://link.springer.com/article/10.1007/s10336-024-02144-5) and there are tools within the wildrtrax R package to support evaluation and threshold selection (https://rdrr.io/github/ABbiodiversity/wildRtrax/f/vignettes/classifiers-tutorial.Rmd).

The classifier scores of BirdNET and HawkEars are scaled differently for each model, with more HawkEars detections generated at intermediate (0.1–0.6) thresholds than BirdNET. Since HawkEars generates more intermediate scores some of these are ignored by using a higher minimum threshold in WildTrax for HawkEars compared to BirdNET. The user interface for both classifiers at scores below 0.5 can get quite busy with a lot of false positives, so the recommended minimum threshold when looking at a spectrogram is 0.5. For users who want to use lower thresholds they can go as low as 0.3 for HawkEars or 0.2 for BirdNET. The classifier reports will output a csv with all classifier detections down to those lower thresholds for users interested in exploring recall and precision curves for particular species and classifiers.  If users want to use lower thresholds than those in WildTrax they should explore the option of running the classifiers on their data directly, outside of WildTrax, to test very low thresholds.

BirdNET and HawkEars both have a few filters that can be used to adjust the results generated by the classifiers beyond just the minimum confidence threshold. WildTrax only uses the location filters in BirdNET and HawkEars. This is done to limit results to those that are geographically relevant.  These location filters use eBird data to determine if species detections are expected in the area the recording was taken.  As an example, this means that classifier detections for a species such as California Quail (a west coast species) will not appear in a recording from the Maritimes because there are no records of the species appearing there so any detections generated by the classifiers are likely to be false positives and can be ignored.  Any users that want unfiltered results from the classifiers should explore running the classifiers on their recordings themselves so that they may adjust the use of any filters themselves for their specific purposes.

The BirdNET and HawkEars classifiers are by no means perfect. They are tools that have been extensively trained on highly advanced software, but they are not the equivalent of a trained human who can recognize patterns and differences in sounds much better than the most highly advanced computer algorithms. Overlapping calls, abnormal calls, and quiet calls are particularly difficult for the classifiers to detect and identify because the patterns in the spectrogram may not match the patterns they have been trained to detect. In addition, species with higher variation in their calls are more likely to be missed or confused because higher variation calls are more difficult to identify correctly.

Just as the classifiers sometimes miss a species that a human can detect, they can also report species that are not present within the recording. The classifiers provide a score value for every species they are trained on, and the purpose of the score threshold is to hopefully eliminate those species that score low because they are not actually present in the recording. However, because the classification process is not perfect, the classifier output often contains false positives. How you deal with those false positives will depend on the intended application for your data. If you are looking for a common species and are detecting dozens of vocalizations in every recording, but one or two of them are wrong then it probably isn’t a big deal. If you would like to remove the need for manual verification of your detections, you can verify a small proportion to obtain a false positive rate and incorporate that rate into a variety of statistical models like occupancy models. On the other hand, if you are looking for a rare species that only calls occasionally, and the classifier is regularly giving false positives, then you may be interested in manually verifying all your classifier detections to ensure the interpretation of your dataset is accurate.

WildTrax and ABMI do not own or maintain these classifiers; they were developed by programmers and biologists who then made them freely available for others to use.  WildTrax runs the classifiers and makes their results available via the user interface to make it easier for users to perform all their acoustic data processing in one platform, without needing to access external programs to use the classifiers. The developers of these models are continuing to update their models to improve the accuracy and add new species; WildTrax doesn’t have the ability to keep up with every update but it will periodically update the version of the model when the newest models offer significant upgrades for our users.  The code repository for BirdNET can be found here (https://github.com/birdnet-team/BirdNET-Analyzer) and for HawkEars can be found here ( https://github.com/jhuus/HawkEars).