Frederick Kistner, Ph.D. Candidate, Karlsruhe Institute of Technology, Germany.
Figure 1 see below: Eurasian otters feeding by a river in Portugal
in previous posts we have learned a lot about use cases and the advantages of the Footprint Identification Technique (FIT) as a noninvasive tool in wildlife monitoring. When the preconditions are met, FIT achieves excellent classification rates for individuals.
In this post, I’m going to share my experiences using FIT to monitor Eurasian otters. This species has some unusual characteristics that have greatly influenced the path of my research!
FIT uses a customized statistical model to measuring hundreds of variables such as distances, angles and areas, extracted from footprint images. For most species one of the preconditions is finding a continuous trail (and unbroken series of prints) that has at least five to seven good footprints of the same foot (eg left front foot) to be analyzed. Classification rates tend to decrease significantly when fewer prints of a trail are fed into the model.
So what can you do when you work with a species that is semi-aquatic and tends to leave broken trails behind as individuals constantly leave and re-enter the water! What if various trails are crossing on top of each other making it hard to predict which print belongs to which trail? What if prints are rubbed out by a dragging tail?!
A few years ago, I started to develop the first FIT algorithms for the Eurasian otter (Lutra lutra) and have achieved excellent classification results (90%) for footprints of known captive animals where I had control over the substrate and how the animals accessed it. However, once I tested these algorithms with a free-ranging otter population in southwest Portugal, I experienced the above-mentioned difficulties.
Although data from my captive animals proved that Eurasian otters individuals can easily distinguished by their footprints, in the field it was much more difficult to find enough footprints in a trail to satisfy the variability of gait, substrate and moisture!
Figure 2, see below: Example of two beautiful Otter trails running parallel on a riverbank of the Aljezur River in south west Portugal
So, what are possible strategies to deal with that issue? Often there are more than enough footprints found at a single location, but just not in a continuous trail. Well, one could assume that all the footprints in one location are most likely from one individual as the literature often describes otters as mainly solitary territorial animals. Unfortunately, this is an unreliable assumption and recipe for disaster, especially if your heat-sensing camera trap failed to be activated due to the well thermo-insulated otter fur - for probably the 100 time in your field campaign!
Figure 3, see below: Example of a sand preparation that even though containing many otter footprints this is difficult to analyze with FIT as the assignment of enough footprints to clearly identifiable trails.
Another option is to also include imperfect footprints within your analysis. After looking at thousands of footprints from the same species you develop a sense for where certain features of a footprint are supposed to be, even though they are not or barely visible within the image. So, with enough expertise you can sort of guess missing features within an image and still get good classification results. This however introduces a strong personal bias which then makes it difficult to compare and reproduce results of different observers. So this is not an ideal solution either.
Figure 4, see below: Example of a perfect right front otter footprint an a more frequently found imperfect print with a barely visible thumbprint (red upper circle) and a missing heel pad.
In order to standardize the approach, you could think of an incomplete image of a footprint also as an incomplete data frame. There are missing observations in data frames almost anywhere data is created, so perhaps there are options for dealing with this that we could adopt from other disciplines? There are in fact a wide variety of methods (for example, imputation methods) in almost any statistical analytics software like the Mice package in R, or the excellent Missing Value Imputation features implemented in JMP that uses modern machine e-learning algorithms to model your missing features.
Therefore, one of the topics I am investigating in my current PhD is to see if some of these proven off- the-shelf imputation methods can be implemented in FIT. I do this in cooperation with Markus Pauly and his group, a professor for technical statistics at TU Dortmund and with Sky Alibhai & Zoe Jewell, creators of FIT from WildTrack.
Even though this is still under development first results indicate that using off-the-shelf imputation methods based on machine learning algorithms can in fact not only standardize the use of incomplete images, but also increase classification accuracy in comparison to model runs that only used perfect (but less) images within a trail.
Figure 5, see below: Results of classification accuracy from a total of 10 trails of a test set of 5 individual otters, for various imputation strategies. Deleting images with missing features always led to the worst classification rate within these trails. Multi norm imputation implemented in JMP software lead to the best results with almost no loss in classification accuracy.
If the final results agree with the promising findings of the trials this will be implemented in the FIT workflow. This will not only increase the total number of trails that meet the requirement for a robust FIT analysis. It could potentially also increase the range of FIT users to less experienced observers as only clearly visible landmarks are then needed to be marked. As this is an issue that occurs for many species, I will try to find a general strategy of how the improve the handling of missing values within the FIT framework.
As a general outcome, I learned a few things that go beyond FIT. First, there is a whole variety of strategies how to deal with incomplete data in your datasets that goes beyond simple median/mean imputation. Modern statistical software like JMP has more advanced methods for missing data imputation that are amazingly easy to implement even without a statistical background. So, a default removal of all missing data cells might be an unnecessary pruning of your dataset that would potentially decrease your data quality.
In my opinion it is always worth investigating how different strategies to manage missing values can affect your results, allowing you to boost your datasets and reach faster conservation solutions!
Figure 1: Eurasian otters feeding by a river in Portugal
Figure 2: Example of two beautiful Otter trails running parallel on a riverbank of the Aljezur River in south west Portugal
Figure 3: Example of a sand preparation that even though containing many otter footprints this is difficult to analyze with FIT as the assignment of enough footprints to clearly identifiable trails.
Figure 4: Example of a perfect right front otter footprint an a more frequently found imperfect print with a barely visible thumbprint (red upper circle) and a missing heel pad.
Figure 5: Results of classification accuracy from a total of 10 trails of a test set of 5 individual otters, for various imputation strategies. Deleting images with missing features always led to the worst classification rate within these trails. Multi norm imputation methods implemented in JMP software led to the best results with almost no loss in classification accuracy.