Multidimensional Scaling Analysis
Contents
2. Multidimensional Scaling Analysis#
Multidimensional scaling analysis (MDS) provides a visual representation of similarity between samples that have multiple variables. MDS is built on the idea of a distance matrix that quantifies the dissimilarity between samples.
2.1. Euclidean Distance#
How far away two points are when plotted as \(x\),\(y\) coordinates.
For three variables, the distance between two samples (\(i\) and \(j\)) can be visualized in three-dimensional space.
\(d_{ij} =||X_i-X_j|| = \sqrt{(x_i-x_j)^2 + (y_i-y_j)^2 + (z_i-z_j)^2}\)
if \(X_i = (x_{i}, y_{i}, z_{i})\)
2.2. Bray Curtis Distance#
Often used for identifying differences in community composition based on abundance. If \(u\) and \(v\) represent two different samples of counts of different groups, the Bray Curtis distance is:
\(d= \frac{\Sigma|u_i-v_i|}{\Sigma(u_i+v_i)}\)
if \(u\) and \(v\) are positive, then 0 < d < 1
Example#
In this example, \(u\), \(v\) and \(q\) are three different samples, in which four different groups are counted.
import numpy as np
from scipy.spatial import distance
u = [415,200,310,411]
v = [615,100,330,203]
q = [614,101,331,202]
data = np.array([u,v,q])
print(data)
[[415 200 310 411]
 [615 100 330 203]
 [614 101 331 202]]
data.T # Transpose so that each column represents a sample
array([[415, 615, 614],
       [200, 100, 101],
       [310, 330, 331],
       [411, 203, 202]])
# compute distances between points
dist = distance.pdist(data,'braycurtis')
dist
array([0.20433437, 0.20433437, 0.00160256])
# represent distances as a matrix
distmatrix = distance.squareform(dist)
distmatrix
array([[0.        , 0.20433437, 0.20433437],
       [0.20433437, 0.        , 0.00160256],
       [0.20433437, 0.00160256, 0.        ]])
2.3. Other measures of distance#
Python can be used to compute a variety of different distance measures.
https://docs.scipy.org/doc/scipy/reference/spatial.distance.html
For an excellent resource on the applications of different distance calculations in ecology, including appropriate measures for binary (presence/absence) data, see A Primer of Ecological Statistics by Gotelli and Ellison.
2.4. Types of multidimensional scaling analysis#
- classical MDS 
Also known as Torgerson MDS or principal coordinate analysis (PCoA)
The distance matrix is converted to a similarity matrix. Once this is done, the same steps as PCA are performed:
- compute eigenvectors and eigenvalues 
- same as PCA for Euclidean distances 
Steps:
- create a data matrix 
- compute a dissimilarity matrix, D, with elements \(d_{ij}\) 
- transform the dissimilarity matrix \(d^*_{ij} = \frac{1}{2}d^2_{ij}\) 
- center the dissimilarity matrix \(\delta^*_{ij} = d^*_{ij}-\bar{d}^*_{i}-\bar{d}^*_{j}+\bar{d}^*\) 
- compute the eigenvectors and eigenvalues 
- if the dissimilarity index is euclidean distance, this is mathematically equivalent to PCA 
- Non-metric (iterative) MDS 
- preserves rank of distance 
- minimizes stress 
 

2.5. Summary of comparison of PCA and MDS#
- PCA - based on Euclidean distances, good for data without strong skew & data without outliers 
- PCoA - use when other distance measures are appropriate equivalent to PCA when Euclidean distances used 
- Non-metric multidimensional scaling (NMDS) - preserves ran order of distance rather than actual values (similar to many non-parametric statistics. One reason for using this type of analysis is that it is less sensitive to outliers. 
2.6. ANOSIM and PermANOVA#
- determine whether groups of samples are significantly different 
- are distances WITHIN the groups smaller than the differences BETWEEN groups 
2.7. Implementing MDS in Python#
- Compute distance matrices 
- Includes PCA functions 
- Includes PCoA (metric MDS based on distance matrix) 
- Does not include non-metric MDS 
- Includes PermANOVA for assessing statistically significant differences 
- Computes distance matrices 
- Includes general MDS function (metric or non-metric) 
- Includes related analyses such as linear discriminant analysis 
