Analysis¶
- class MolearnAnalysis[source]¶
This class provides methods dedicated to the quality analysis of a trained model.
- generate(crd)[source]¶
Generate a collection of protein conformations, given coordinates in the latent space.
- Parameters:
crd (numpy.array) – coordinates in the latent space, as a (Nx2) array
- Returns:
collection of protein conformations in the Cartesian space (NxMx3, where M is the number of atoms in the protein)
- get_all_dope_score(tensor, refine=True)[source]¶
Calculate DOPE score of an ensemble of atom coordinates.
- Parameters:
tensor –
refine (bool) – if True, return DOPE score of input and output structure after refinement
- get_all_ramachandran_score(tensor)[source]¶
Calculate Ramachandran score of an ensemble of atomic conrdinates.
- Parameters:
tensor –
- get_dataset(key)[source]¶
- Parameters:
key (str) – key pointing to a dataset previously loaded with
set_dataset
- get_decoded(key)[source]¶
- Parameters:
key (str) – key pointing to a dataset previously loaded with
set_dataset
- get_dope(key, refine=True, **kwargs)[source]¶
- Parameters:
key (str) – key pointing to a dataset previously loaded with
set_datasetrefine (bool) – if True, refine structures before calculating DOPE score
- Returns:
dictionary containing DOPE score of dataset, and its decoded counterpart
- get_encoded(key)[source]¶
- Parameters:
key (str) – key pointing to a dataset previously loaded with
set_dataset- Returns:
array containing the encoding in latent space of dataset associated with key
- get_error(key, align=True)[source]¶
Calculate the reconstruction error of a dataset encoded and decoded by a trained neural network.
- Parameters:
key (str) – key pointing to a dataset previously loaded with
set_datasetalign (bool) – if True, the RMSD will be calculated by finding the optimal alignment between structures
- Returns:
1D array containing the RMSD between input structures and their encoded-decoded counterparts
- get_ramachandran(key)[source]¶
- Parameters:
key (str) – key pointing to a dataset previously loaded with
set_dataset
- num_trainable_params()[source]¶
- Returns:
number of trainable parameters in the neural network previously loaded with
set_dataset
- reference_dope_score(frame)[source]¶
- Parameters:
frame (numpy.array) – array with shape [1, N, 3] with Cartesian coordinates of atoms
- Returns:
DOPE score
- scan_custom(fct, params, key)[source]¶
Generate a surface coloured as a function of a user-defined function.
- Parameters:
fct – function taking atomic coordinates as input, an optional list of parameters, and returning a single value.
params (list) – parameters to be passed to function f. If no parameter is needed, pass an empty list.
key (str) – name of the dataset generated by this function scan
- Returns:
latent space NxN surface, evaluated according to input function
- Returns:
x-axis values
- Returns:
y-axis values
- scan_dope(key=None, refine=True, **kwargs)[source]¶
Calculate DOPE score on a grid sampling the latent space. Requires a grid system to be defined via a prior call to
set_dataset.- Parameters:
key (str) – label for unrefined DOPE score surface (default is DOPE_unrefined or DOPE_refined)
refine (bool) – if True, structures generated will be energy minimised before DOPE scoring
- Returns:
DOPE score latent space NxN surface
- Returns:
x-axis values
- Returns:
y-axis values
- scan_error(s_key='Network_RMSD', z_key='Network_z_drift')[source]¶
Calculate RMSD and z-drift on a grid sampling the latent space. Requires a grid system to be defined via a prior call to
set_dataset.- Parameters:
s_key (str) – label for RMSD dataset
z_key (str) – label for z-drift dataset
- Returns:
input-to-decoded RMSD latent space NxN surface
- Returns:
z-drift latent space NxN surface
- Returns:
x-axis values
- Returns:
y-axis values
- scan_error_from_target(key, index=None, align=True)[source]¶
Calculate landscape of RMSD vs single target structure. Target should be previously loaded datset containing a single conformation.
- Parameters:
key (str) – key pointing to a dataset previously loaded with
set_datasetindex (int) – index of conformation to be selected from dataset containing multiple conformations.
align (bool) – if True, structures generated from the grid are aligned to target prior RMSD calculation.
- Returns:
RMSD latent space NxN surface
- Returns:
x-axis values
- Returns:
y-axis values
- scan_ramachandran()[source]¶
Calculate Ramachandran scores on a grid sampling the latent space. Requires a grid system to be defined via a prior call to
set_dataset. Saves four surfaces in memory, with keys ‘Ramachandran_favored’, ‘Ramachandran_allowed’, ‘Ramachandran_outliers’, and ‘Ramachandran_total’.- Returns:
Ramachandran_favoured latent space NxN surface (ratio of residues in favourable conformation)
- Returns:
x-axis values
- Returns:
y-axis values
- set_dataset(key, data, atomselect='*')[source]¶
- Parameters:
data –
PDBDataobject containing atomic coordinateskey (str) – label to be associated with data
atomselect (list/str) – list of atom names to load, or ‘*’ to indicate that all atoms are loaded.
- set_decoded(key, structures)[source]¶
- Parameters:
key (str) – key pointing to a dataset previously loaded with
set_dataset
- set_encoded(key, coords)[source]¶
- Parameters:
key (str) – key pointing to a dataset previously loaded with
set_dataset
- set_network(network)[source]¶
- Parameters:
network – a trained neural network defined in
molearn.models
- setup_grid(samples=64, bounds_from=None, bounds=None, padding=0.1)[source]¶
Define a NxN point grid regularly sampling the latent space.
- Parameters:
samples (int) – grid size (build a samples x samples grid)
bounds_from (str/list) – Name(s) of datasets to use as reference, either as single string, a list of strings, or ‘all’
bounds (tuple/list) – tuple (xmin, xmax, ymin, ymax) or None
padding (float) – define size of extra spacing around boundary conditions (as ratio of axis dimensions)
- class MolearnGUI(MA=None)[source]¶
This class produces an interactive visualisation for data stored in a
MolearnAnalysisobject, viewable within a Jupyter notebook.- Parameters:
MA – Either
MolearnAnalysisinstance, or None (default). If None an empty GUI will be produced.
- get_path(idx_start, idx_end, landscape, xvals, yvals, smooth=3)[source]¶
Find shortest path between two points on a weighted grid
- Parameters:
idx_start (int) – index on a 2D grid, as start point for a path
idx_end (int) – index on a 2D grid, as end point for a path
landscape (numpy.array) – 2D grid
xvals (numpy.array) – x-axis values, to yield actual coordinates
yvals (numpy.array) – y-axis values, to yield actual coordinates
smooth (int) – size of kernel for running average (must be >=1, default 3)
- Returns:
array of 2D coordinates each with an associated value on lanscape
- get_path_aggregate(crd, landscape, xvals, yvals, input_is_index=False)[source]¶
Create a chain of shortest paths via give waypoints
- Parameters:
crd (numpy.array) – waypoints coordinates (Nx2 array)
landscape (numpy.array) – 2D grid
xvals (numpy.array) – x-axis values, to yield actual coordinates
yvals (numpy.array) – y-axis values, to yield actual coordinates
input_is_index (bool) – if False, assume crd contains actual coordinates, graph indexing otherwise
- Returns:
array of 2D coordinates each with an associated value on lanscape