# Quantifying leafhopper damage with automated supervised classification

As part of my fieldwork in China, I collected harvested tea leaves that were damaged by the tea green leafhopper. I want to quantify the amount of leafhopper damage for each harvest. I was able to find several solutions for quantifying holes in leaves or even damage to leaf margins, but typical leafhopper damage is just tiny brown spots on the undersides of leaves. I did find some tutorials on using ImageJ to analyze diseased area on leaves, but found that the leafhopper damage spots were too small and too similar in color to undamaged leaves for these tools to work reliably and be automated.

Last year I piloted a method to quantify leafhopper damage on scanned images of tea leaves with the help of a Tufts undergraduate, Maxwell Turpin. We ended up getting the most success using a supervised classification algorithm implemented by the trainable WEKA segmentation plugin in FIJI (which stands for “FIJI is just Image J”). This semester, another Tufts undergraduate, Michelle Mu, worked on refining this approach, automating it, and applying it to the hundreds of images I obtained over the summer as part of my research.

## Supervised pixel classification

Just to clarify, my goal here is not image classification—that is, I’m not trying to classify leaves into categories like “undamaged”, “medium damaged”, “high damage”, but rather trying to classify individual pixels in the image as being damaged or undamaged leaf tissue (or background).

In short, after selecting some pixels representative of damaged leaf, undamaged leaf, and background (regions of interest, or ROIs), the WEKA plugin trains a random forest algorithm using data from various transformations of the pixels in the ROIs. Then, I can apply the algorithm to other images and extract data in the form of numbers of pixels classified in each category.

## WEKA segmentation tips

The documentation on the WEKA segmentation plugin is fairly detailed, so I won’t go into great detail on how to use it, rather focus on some things I learned specific to this project.

### Creating a training stack

I had hundreds of images to classify, so obviously it made sense to train a classifier on a subset of leaves. We started by taking my leaf scans, which contained dozens of leaves, and making images of individual leaves. You can do this in any number of image manipulation software, but we found it easiest using Preview, the default image and PDF viewer on OS X. You just select a leaf with the rectangle selection tool, copy with cmd + c, and create a new image with cmd + n, then save with cmd + s.

We then chose a random subset of 15 leaves to use as a training set. Why 15? At the time, we were using a regular desktop computer with 8 GB of RAM, and using the WEKA plugin with an image stack any larger than that caused it to crash. Fortunately, because the leaves were all scanned in a uniform way and leaf color didn’t vary too much, 15 leaves was suitable for a training set. If you have more RAM at your disposal, feel free to train on more leaves.

### Training and applying a classifier

We trained the classifier on three classes, background, damaged leaf, and undamaged leaf. For background and undamaged leaf, we found it was really important to focus on leaf edges, making sure to include shadows in the background class and lighter green leaf margins in the undamaged class. Without doing this, the classifier would consistently mis-classify shadows and edges as either damaged or background, respectively. This was also an iterative process and took several rounds of selecting ROIs, training a classifier, viewing results, and adding more ROIs. Once we were satisfied with our classifier, we saved it and applied it to all of our images in stacks of 20 on a computer with 32 GB of RAM. The results created by the WEKA plugin are stacks of three color images (because we used 3 classes). Getting numerical results turned out to be another problem.

### Exporting results

Exporting results in a numeric format turned out to be a lot more difficult than we thought it would. The manual way of doing this through the FIJI menus is Analyze > Histogram which opens a histogram window, then clicking “list” to get a results window with the number of pixels in each class for that image, then copying and pasting into Excel. This was far too labor intensive and error-prone to be appropriate for hundreds of images. We needed a better way, which led us to FIJI macros.

Building a macro turned out to be relatively painless, even though neither Michelle nor I had any experience coding in any language other than R. Through a combination of forum posts and using the documentation for the ImageJ macro language as a reference, we were able to create a macro that opens results stacks (three-color images) and exports a text file containing the number of pixels in each class for each image in the stack.

//ImageJ macro for exporting numerical results from classified image stacks
inpath = getDirectory("Analyzed Stacks");
File.makeDirectory(inpath + "//Results//");
//outpath = getDirectory("Results");
//for some reason using getDirectory() twice screws things up.
//My solution is to just create a results folder in the folder with the analyzed stacks
//But then you get an error at the end when the script tries to open that folder.
files = getFileList(inpath);
for(j = 0; j < lengthOf(files); j++){
open(files[j]);
title = getTitle();
for (n = 1; n <= nSlices(); n++) { //loop through slices
showProgress(n, nSlices); //this just adds a progress bar
setSlice(n); //set which slice
getStatistics(area, mean, min, max, std, histogram); //this gets the number of pixels
for (i=0; i<histogram.length; i++) {
setResult("Value", i, i);
setResult("Leaf." + n, i, histogram[i]); //adds a column for each slice called "Count[slicenumber]"
}
saveAs("results", inpath + "//Results//" + title + ".txt"); //saves results table as text file
}
close();
run("Clear Results");
}

### Importing Results

The results files then get read into R and tidied using a relatively simple script.

#packages you will need
library(dplyr)
library(tidyr)
library(stringr)

#Get the filenames of all the results files
filenames <- list.files("Results/") #might need to change path

#Create paths to those files
filepaths <- paste0("Results/", filenames)

#make a list to eventually contain all the data files.
raw.list <- as.list(filenames)
names(raw.list) <- filenames
raw.list

#for loop for reading in every file into an element of the list.  There is probably a faster way to do this with purrr::map
for(i in 1:length(filenames)){   #loop through all files
raw.list[[i]] <- read_tsv(filepaths[i])[1:2, -1] #read only the first two rows and NOT the first column into the list
}
raw.list #now contains multiple data frames.

At this point, we have created a list of data frames, one for each image stack. Since our image stacks were split up just to make analysis possible with limited RAM, we want to merge the results back together now.

I also couldn’t figure out how to rename the classes in the Image J macro (they appear as numeric labels “0”, “1”, and “2”), so we took this opportunity to rename them in our R script.

raw.data <- raw.list %>%
bind_rows(.id = "File") %>%
mutate(Value = ifelse(Value == 0, "damaged", "undamaged") %>% as.factor()) %>%
#converts 0's to "damaged" and anything else to "undamaged", then converts Value to a factor
rename(Type = "Value")
#renames the "Value" column "Type"

raw.data.2 <- raw.data %>%
gather(-File, -Type, key = LeafID, value = Pixels) %>%  #gathers all the data into three columns
spread(key = Type, value = Pixels)

And finished! With all the image stacks analyzed using our classifier, numeric results exported using our custom macro, and then read into R and tidied using our R script, we have data ready for statistical analysis!

###### Scientific Programmer & Educator

I’m a Scientific Programmer & Educator at the University of Arizona in the CCT Data Science group.