What is diversity?

Diversity is a frequently used concept across a broad range of scientific disciplines, ranging from mathematics to biology and ecology. In each of these areas, diversity is a measure of the range and distribution of certain features within a given population. It is considered a key attribute that can be dynamically varying, influenced by intra and inter-population interactions, and modified by environmental factors. A quick dip into the literature on diversity reveals a bewildering range of measures. Each of these measures seeks to characterize the diversity of a sample or community by a single number.

From a mathematical standpoint, diversities can be viewed as generalizations of metrics. For example, the Euclidean distance can be considered a diversity measure for a pair of points. This concept can be extended to a collection of points with sizes greater than two. By introducing a scoring function that satisfies certain conditions, we can label it as a diversity measure. In ecology, diversity measures often take into account two key factors: species richness (the number of species) and evenness (how equally abundant the species are). Some ecological definitions of diversity may not meet all the conditions in the mathematical definition of diversity as generalized metrics.

In my Ph.D. research, I explore a specific diversity measure defined on split systems. In this context, a split refers to any way of dividing a finite group of species, genes, or individuals into two non-empty disjoint groups. A collection of such divisions is termed a split system. The ultimate goal is to identify a subset of individuals with maximum diversity. However, maximizing diversity is typically NP-hard, except for certain specific data structures. For instance, if the relationship between a group of species follows a tree structure, a diversity measure like Phylogenetic Diversity can be efficiently maximized. However, it's worth noting that most data structures do not exhibit a tree-like nature. You can read more about the limitations of current methods for maximizing diversity in non-treelike data in my paper.