Analysis¶

class MolearnAnalysis[source]¶

This class provides methods dedicated to the quality analysis of a trained model.

generate(crd)[source]¶

Generate a collection of protein conformations, given coordinates in the latent space.

Parameters:: crd (numpy.array) – coordinates in the latent space, as a (Nx2) array
Returns:: collection of protein conformations in the Cartesian space (NxMx3, where M is the number of atoms in the protein)

get_all_dope_score(tensor, refine=True)[source]¶

Calculate DOPE score of an ensemble of atom coordinates.

Parameters:

tensor –
refine (bool) – if True, return DOPE score of input and output structure after refinement

get_all_ramachandran_score(tensor)[source]¶

Calculate Ramachandran score of an ensemble of atomic conrdinates.

Parameters:: tensor –

get_dataset(key)[source]¶

Parameters:: key (str) – key pointing to a dataset previously loaded with set_dataset

get_decoded(key)[source]¶

Parameters:: key (str) – key pointing to a dataset previously loaded with set_dataset

get_dope(key, refine=True, **kwargs)[source]¶

Parameters:

key (str) – key pointing to a dataset previously loaded with set_dataset
refine (bool) – if True, refine structures before calculating DOPE score

Returns:

dictionary containing DOPE score of dataset, and its decoded counterpart

get_encoded(key)[source]¶

Parameters:: key (str) – key pointing to a dataset previously loaded with set_dataset
Returns:: array containing the encoding in latent space of dataset associated with key

get_error(key, align=True)[source]¶

Calculate the reconstruction error of a dataset encoded and decoded by a trained neural network.

Parameters:

key (str) – key pointing to a dataset previously loaded with set_dataset
align (bool) – if True, the RMSD will be calculated by finding the optimal alignment between structures

Returns:

1D array containing the RMSD between input structures and their encoded-decoded counterparts

get_ramachandran(key)[source]¶

Parameters:: key (str) – key pointing to a dataset previously loaded with set_dataset

num_trainable_params()[source]¶

Returns:: number of trainable parameters in the neural network previously loaded with set_dataset

reference_dope_score(frame)[source]¶

Parameters:: frame (numpy.array) – array with shape [1, N, 3] with Cartesian coordinates of atoms
Returns:: DOPE score

scan_custom(fct, params, key)[source]¶

Generate a surface coloured as a function of a user-defined function.

Parameters:

fct – function taking atomic coordinates as input, an optional list of parameters, and returning a single value.
params (list) – parameters to be passed to function f. If no parameter is needed, pass an empty list.
key (str) – name of the dataset generated by this function scan

Returns:

latent space NxN surface, evaluated according to input function

Returns:

x-axis values

Returns:

y-axis values

scan_dope(key=None, refine=True, **kwargs)[source]¶

Calculate DOPE score on a grid sampling the latent space. Requires a grid system to be defined via a prior call to set_dataset.

Parameters:

key (str) – label for unrefined DOPE score surface (default is DOPE_unrefined or DOPE_refined)
refine (bool) – if True, structures generated will be energy minimised before DOPE scoring

Returns:

DOPE score latent space NxN surface

Returns:

x-axis values

Returns:

y-axis values

scan_error(s_key='Network_RMSD', z_key='Network_z_drift')[source]¶

Calculate RMSD and z-drift on a grid sampling the latent space. Requires a grid system to be defined via a prior call to set_dataset.

Parameters:

s_key (str) – label for RMSD dataset
z_key (str) – label for z-drift dataset

Returns:

input-to-decoded RMSD latent space NxN surface

Returns:

z-drift latent space NxN surface

Returns:

x-axis values

Returns:

y-axis values

scan_error_from_target(key, index=None, align=True)[source]¶

Calculate landscape of RMSD vs single target structure. Target should be previously loaded datset containing a single conformation.

Parameters:

key (str) – key pointing to a dataset previously loaded with set_dataset
index (int) – index of conformation to be selected from dataset containing multiple conformations.
align (bool) – if True, structures generated from the grid are aligned to target prior RMSD calculation.

Returns:

RMSD latent space NxN surface

Returns:

x-axis values

Returns:

y-axis values

scan_ramachandran()[source]¶

Calculate Ramachandran scores on a grid sampling the latent space. Requires a grid system to be defined via a prior call to set_dataset. Saves four surfaces in memory, with keys ‘Ramachandran_favored’, ‘Ramachandran_allowed’, ‘Ramachandran_outliers’, and ‘Ramachandran_total’.

Returns:: Ramachandran_favoured latent space NxN surface (ratio of residues in favourable conformation)
Returns:: x-axis values
Returns:: y-axis values

set_dataset(key, data, atomselect='*')[source]¶

Parameters:

data – PDBData object containing atomic coordinates
key (str) – label to be associated with data
atomselect (list/str) – list of atom names to load, or ‘*’ to indicate that all atoms are loaded.

set_decoded(key, structures)[source]¶

Parameters:: key (str) – key pointing to a dataset previously loaded with set_dataset

set_encoded(key, coords)[source]¶

Parameters:: key (str) – key pointing to a dataset previously loaded with set_dataset

set_network(network)[source]¶

Parameters:: network – a trained neural network defined in molearn.models

setup_grid(samples=64, bounds_from=None, bounds=None, padding=0.1)[source]¶

Define a NxN point grid regularly sampling the latent space.

Parameters:

samples (int) – grid size (build a samples x samples grid)
bounds_from (str/list) – Name(s) of datasets to use as reference, either as single string, a list of strings, or ‘all’
bounds (tuple/list) – tuple (xmin, xmax, ymin, ymax) or None
padding (float) – define size of extra spacing around boundary conditions (as ratio of axis dimensions)

class MolearnGUI(MA=None)[source]¶

This class produces an interactive visualisation for data stored in a MolearnAnalysis object, viewable within a Jupyter notebook.

Parameters:: MA – Either MolearnAnalysis instance, or None (default). If None an empty GUI will be produced.

get_path(idx_start, idx_end, landscape, xvals, yvals, smooth=3)[source]¶

Find shortest path between two points on a weighted grid

Parameters:

idx_start (int) – index on a 2D grid, as start point for a path
idx_end (int) – index on a 2D grid, as end point for a path
landscape (numpy.array) – 2D grid
xvals (numpy.array) – x-axis values, to yield actual coordinates
yvals (numpy.array) – y-axis values, to yield actual coordinates
smooth (int) – size of kernel for running average (must be >=1, default 3)

Returns:

array of 2D coordinates each with an associated value on lanscape

get_path_aggregate(crd, landscape, xvals, yvals, input_is_index=False)[source]¶

Create a chain of shortest paths via give waypoints

Parameters:

crd (numpy.array) – waypoints coordinates (Nx2 array)
landscape (numpy.array) – 2D grid
xvals (numpy.array) – x-axis values, to yield actual coordinates
yvals (numpy.array) – y-axis values, to yield actual coordinates
input_is_index (bool) – if False, assume crd contains actual coordinates, graph indexing otherwise

Returns:

array of 2D coordinates each with an associated value on lanscape

oversample(crd, pts=10)[source]¶

Add extra equally spaced points between a list of points.

Parameters:

crd (numpy.array) – Nx2 numpy array with latent space coordinates
pts (int) – number of extra points to add in each interval

Returns:

Mx2 numpy array, with M>=N.

Analysis¶

Table of Contents

Previous topic

Next topic

This Page