pymds.DistanceMatrix

class pymds.DistanceMatrix(path_or_array_like, na_values='*')

A distance matrix.

Parameters:
  • path_or_array_like (str or array-like) – If str, path to csv containing distance matrix. If array-like, the distance matrix. Must be square.
  • na_values (str) – How nan is represented in csv file.

Notes

Negative distances are converted to 0.

optimize(start=None, n=2)

Run multidimensional scaling on this distance matrix.

Parameters:
  • start (None or array-like) – Starting coordinates. If start=None, random starting coordinates are used. If array-like must have shape [m * n, ].
  • n (int) – Number of dimensions to embed samples in.

Examples

>>> import pandas as pd
>>> from pymds import DistanceMatrix
>>> dist = pd.DataFrame({
...    'a': [0.0, 1.0, 2.0],
...    'b': [1.0, 0.0, 3 ** 0.5],
...    'c': [2.0, 3 ** 0.5, 0.0]} , index=['a', 'b', 'c'])
>>> dm = DistanceMatrix(dist)
>>> pro = dm.optimize(n=2)
>>> pro.coords.shape
(3, 2)
>>> type(pro)
<class 'pymds.mds.Projection'>

Returns: pymds.Projection

optimize_batch(batchsize=10, returns='best', paralell=True)

Run multiple optimizations using different starting coordinates.

Parameters:
  • batchsize (int) – Number of optimizations to run.
  • returns (str) – If 'all', return results of all optimizations, ordered by stress, ascending. If 'best' return the projection with the lowest stress.
  • parallel (bool) – If True, run optimizations in parallel.

Examples

>>> import pandas as pd
>>> from pymds import DistanceMatrix
>>> dist = pd.DataFrame({
...    'a': [0.0, 1.0, 2.0],
...    'b': [1.0, 0.0, 3 ** 0.5],
...    'c': [2.0, 3 ** 0.5, 0.0]} , index=['a', 'b', 'c'])
>>> dm = DistanceMatrix(dist)
>>> batch = dm.optimize_batch(batchsize=3, returns='all')
>>> len(batch)
3
>>> type(batch[0])
<class 'pymds.mds.Projection'>
Returns:list: Length batchsize, containing instances of pymds.Projection. Sorted by stress, ascending.

or

pymds.Projection: Projection with the lowest stress.

Return type:list or pymds.Projection

pymds.Projection

class pymds.Projection(coords)

Samples embedded in n-dimensions.

Parameters:coords (pandas.DataFrame) – Coordinates of the projection.
coords

pandas.DataFrame – Coordinates of the projection.

stress

float – Residual error of multidimensional scaling. (If generated using Projection.from_optimize_result()).

classmethod from_optimize_result(result, n, m, index=None)

Construct a Projection from the output of an optimization.

Parameters:
Returns:

pymds.Projection

orient_to(other, index=None, inplace=False, scaling=False)

Orient this Projection to another dataset.

Orient this projection using reflection, rotation and translation to match another projection using procrustes superimposition. Scaling is optional.

Parameters:
  • other – (pymds.Projection or pandas.DataFrame or array-like): The other dataset to orient this projection to. If other is an instance of pymds.Projection or pandas.DataFrame, then other must have indexes in common with this projection. If array-like, then other must have the same dimensions as self.coords.
  • index (list-like or None) – If other is an instance of pandas.DataFrame or pymds.Projection then orient this projection to other using only samples in index.
  • inplace (bool) – Update coordinates of this projection inplace, or return an instance of pymds.Projection.
  • scaling (bool) – Allow scaling. (Not implemented yet).

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from pymds import Projection
>>> array = np.random.randn(10, 2)
>>> pro = Projection(pd.DataFrame(array))
>>> # Flip left-right, rotate 90 deg and translate
>>> other = np.fliplr(array)
>>> other = np.dot(other, np.array([[0, -1], [1, 0]]))
>>> other += np.array([10, -5])
>>> oriented = pro.orient_to(other)
>>> (oriented.coords.values - other).sum() < 1e-6
True
Returns:If inplace=False.
Return type:pymds.Projection
plot(**kwds)

Plot the coordinates in the first two dimensions of the projection.

Removes axis and tick labels, and sets the grid spacing to 1 unit. One way to display the grid is to use Seaborn:

Parameters:**kwds – Passed to pandas.DataFrame.plot.scatter().

Examples

>>> from pymds import DistanceMatrix
>>> import pandas as pd
>>> import seaborn as sns
>>> sns.set_style('whitegrid')
>>> dist = pd.DataFrame({
...    'a': [0.0, 1.0, 2.0],
...    'b': [1.0, 0.0, 3 ** 0.5],
...    'c': [2.0, 3 ** 0.5, 0.0]} , index=['a', 'b', 'c'])
>>> dm = DistanceMatrix(dist)
>>> pro = dm.optimize()
>>> ax = pro.plot(c='black', s=50, edgecolor='white')
Returns:matplotlib.axes.Axes
plot_lines_to(other, index=None, **kwds)

Plot lines from samples shared between this projection and another dataset.

Parameters:

Examples

>>> import numpy as np
>>> from pymds import Projection
>>> pro = Projection(np.random.randn(50, 2))
>>> R = np.array([[0, -1], [1, 0]])
>>> other = np.dot(pro.coords, R)  # Rotate 90 deg
>>> ax = pro.plot(c='black', edgecolor='white', zorder=20)
>>> ax = pro.plot_lines_to(other, linewidths=0.3)
Returns:matplotlib.axes.Axes