Analysis

class MolearnAnalysis[source]

This class provides methods dedicated to the quality analysis of a trained model.

generate(crd)[source]

Generate a collection of protein conformations, given coordinates in the latent space.

Parameters:

crd (numpy.array) – coordinates in the latent space, as a (Nx2) array

Returns:

collection of protein conformations in the Cartesian space (NxMx3, where M is the number of atoms in the protein)

get_all_dope_score(tensor, refine=True)[source]

Calculate DOPE score of an ensemble of atom coordinates.

Parameters:
  • tensor

  • refine (bool) – if True, return DOPE score of input and output structure after refinement

get_all_ramachandran_score(tensor)[source]

Calculate Ramachandran score of an ensemble of atomic conrdinates.

Parameters:

tensor

get_dataset(key)[source]
Parameters:

key (str) – key pointing to a dataset previously loaded with set_dataset

get_decoded(key)[source]
Parameters:

key (str) – key pointing to a dataset previously loaded with set_dataset

get_dope(key, refine=True, **kwargs)[source]
Parameters:
  • key (str) – key pointing to a dataset previously loaded with set_dataset

  • refine (bool) – if True, refine structures before calculating DOPE score

Returns:

dictionary containing DOPE score of dataset, and its decoded counterpart

get_encoded(key)[source]
Parameters:

key (str) – key pointing to a dataset previously loaded with set_dataset

Returns:

array containing the encoding in latent space of dataset associated with key

get_error(key, align=True)[source]

Calculate the reconstruction error of a dataset encoded and decoded by a trained neural network.

Parameters:
  • key (str) – key pointing to a dataset previously loaded with set_dataset

  • align (bool) – if True, the RMSD will be calculated by finding the optimal alignment between structures

Returns:

1D array containing the RMSD between input structures and their encoded-decoded counterparts

get_ramachandran(key)[source]
Parameters:

key (str) – key pointing to a dataset previously loaded with set_dataset

num_trainable_params()[source]
Returns:

number of trainable parameters in the neural network previously loaded with set_dataset

reference_dope_score(frame)[source]
Parameters:

frame (numpy.array) – array with shape [1, N, 3] with Cartesian coordinates of atoms

Returns:

DOPE score

scan_custom(fct, params, key)[source]

Generate a surface coloured as a function of a user-defined function.

Parameters:
  • fct – function taking atomic coordinates as input, an optional list of parameters, and returning a single value.

  • params (list) – parameters to be passed to function f. If no parameter is needed, pass an empty list.

  • key (str) – name of the dataset generated by this function scan

Returns:

latent space NxN surface, evaluated according to input function

Returns:

x-axis values

Returns:

y-axis values

scan_dope(key=None, refine=True, **kwargs)[source]

Calculate DOPE score on a grid sampling the latent space. Requires a grid system to be defined via a prior call to set_dataset.

Parameters:
  • key (str) – label for unrefined DOPE score surface (default is DOPE_unrefined or DOPE_refined)

  • refine (bool) – if True, structures generated will be energy minimised before DOPE scoring

Returns:

DOPE score latent space NxN surface

Returns:

x-axis values

Returns:

y-axis values

scan_error(s_key='Network_RMSD', z_key='Network_z_drift')[source]

Calculate RMSD and z-drift on a grid sampling the latent space. Requires a grid system to be defined via a prior call to set_dataset.

Parameters:
  • s_key (str) – label for RMSD dataset

  • z_key (str) – label for z-drift dataset

Returns:

input-to-decoded RMSD latent space NxN surface

Returns:

z-drift latent space NxN surface

Returns:

x-axis values

Returns:

y-axis values

scan_error_from_target(key, index=None, align=True)[source]

Calculate landscape of RMSD vs single target structure. Target should be previously loaded datset containing a single conformation.

Parameters:
  • key (str) – key pointing to a dataset previously loaded with set_dataset

  • index (int) – index of conformation to be selected from dataset containing multiple conformations.

  • align (bool) – if True, structures generated from the grid are aligned to target prior RMSD calculation.

Returns:

RMSD latent space NxN surface

Returns:

x-axis values

Returns:

y-axis values

scan_ramachandran()[source]

Calculate Ramachandran scores on a grid sampling the latent space. Requires a grid system to be defined via a prior call to set_dataset. Saves four surfaces in memory, with keys ‘Ramachandran_favored’, ‘Ramachandran_allowed’, ‘Ramachandran_outliers’, and ‘Ramachandran_total’.

Returns:

Ramachandran_favoured latent space NxN surface (ratio of residues in favourable conformation)

Returns:

x-axis values

Returns:

y-axis values

set_dataset(key, data, atomselect='*')[source]
Parameters:
  • dataPDBData object containing atomic coordinates

  • key (str) – label to be associated with data

  • atomselect (list/str) – list of atom names to load, or ‘*’ to indicate that all atoms are loaded.

set_decoded(key, structures)[source]
Parameters:

key (str) – key pointing to a dataset previously loaded with set_dataset

set_encoded(key, coords)[source]
Parameters:

key (str) – key pointing to a dataset previously loaded with set_dataset

set_network(network)[source]
Parameters:

network – a trained neural network defined in molearn.models

setup_grid(samples=64, bounds_from=None, bounds=None, padding=0.1)[source]

Define a NxN point grid regularly sampling the latent space.

Parameters:
  • samples (int) – grid size (build a samples x samples grid)

  • bounds_from (str/list) – Name(s) of datasets to use as reference, either as single string, a list of strings, or ‘all’

  • bounds (tuple/list) – tuple (xmin, xmax, ymin, ymax) or None

  • padding (float) – define size of extra spacing around boundary conditions (as ratio of axis dimensions)

class MolearnGUI(MA=None)[source]

This class produces an interactive visualisation for data stored in a MolearnAnalysis object, viewable within a Jupyter notebook.

Parameters:

MA – Either MolearnAnalysis instance, or None (default). If None an empty GUI will be produced.

get_path(idx_start, idx_end, landscape, xvals, yvals, smooth=3)[source]

Find shortest path between two points on a weighted grid

Parameters:
  • idx_start (int) – index on a 2D grid, as start point for a path

  • idx_end (int) – index on a 2D grid, as end point for a path

  • landscape (numpy.array) – 2D grid

  • xvals (numpy.array) – x-axis values, to yield actual coordinates

  • yvals (numpy.array) – y-axis values, to yield actual coordinates

  • smooth (int) – size of kernel for running average (must be >=1, default 3)

Returns:

array of 2D coordinates each with an associated value on lanscape

get_path_aggregate(crd, landscape, xvals, yvals, input_is_index=False)[source]

Create a chain of shortest paths via give waypoints

Parameters:
  • crd (numpy.array) – waypoints coordinates (Nx2 array)

  • landscape (numpy.array) – 2D grid

  • xvals (numpy.array) – x-axis values, to yield actual coordinates

  • yvals (numpy.array) – y-axis values, to yield actual coordinates

  • input_is_index (bool) – if False, assume crd contains actual coordinates, graph indexing otherwise

Returns:

array of 2D coordinates each with an associated value on lanscape

oversample(crd, pts=10)[source]

Add extra equally spaced points between a list of points.

Parameters:
  • crd (numpy.array) – Nx2 numpy array with latent space coordinates

  • pts (int) – number of extra points to add in each interval

Returns:

Mx2 numpy array, with M>=N.