WaterNetworkAnalysis

WaterNetworkAnalysis functions

`WaterNetworkAnalysis.align_trajectory`	Align the trajectory.
`WaterNetworkAnalysis.align_and_extract_waters`	Align and extracts waters from trajectory.
`WaterNetworkAnalysis.extract_waters_from_trajectory`	Extract waters for clustering analysis.
`WaterNetworkAnalysis.read_results_and_make_pdb`	Read results from files and generate a pdb file.
`WaterNetworkAnalysis.make_results_pdb_MDA`	Generate pdb file with clustering results.
`WaterNetworkAnalysis.get_selection_string_from_resnums`	Return selection string for given residue ids.
`WaterNetworkAnalysis.get_center_of_selection`	Compute centre of selection with MDAnalysis.
`WaterNetworkAnalysis.calculate_oxygen_density_map`	Generate oxygen density maps.

WaterNetworkAnalysis Module for preparation of raw trajectories for analysis of conserved waters for ConservedWaterSearch

WaterNetworkAnalysis.align_and_extract_waters(center_for_water_selection: np.ndarray, trajectory: str, aligned_trajectory_filename: str, align_target_file_name: str, topology: str | None = None, every: int = 1, align_mode: str = 'mda', align_target: int | None = -1, align_selection: str = 'protein', probis_exec: str | None = None, dist: float = 12.0, SOL: str = 'SOL', OW: str = 'OW', HW: str = 'HW') → tuple[np.ndarray][source]

Align and extracts waters from trajectory.

Aligns the trajectory first and then extracts water molecules for further water clustering analysis. If trajectory has already been aligned, one can use extract_waters_from_trajectory() to extract the water molecules for water clustering analysis.

Parameters:

center_for_water_selection (np.ndarray) – Coordiantes around which all water molecules inside a radius dist will be seleceted for water clustering analysis.
trajectory (str) – File name of the trajectory from which waters will be extracted.
aligned_trajectory_filename (str) – File name to which aligned trajectory will be saved.
align_target_file_name (str) – File name for saving the align target (usually pdb) if align_target is int. If align target is None, the align target will be read from this file instead!
topology (str | None, optional) – Topology file name. Defaults to None.
every (int, optional) – Take every every snapshot instead of taking all the snapshots (every = 1) for alignment. Defaults to 1.
align_mode (str, optional) – Align algorithm to use. “mda” uses MDAnalysis while “probis” uses the probis algorithm. Defaults to “mda”.
align_target (int | None, optional) – Align target. If None the align target is read from the align_target_file_name. If a number is given uses the given snapshot of the trajectory as the align target. If -1 uses the last snapshot. Defaults to -1.
align_selection (str, optional) – Selection to align to. Defaults to “protein”.
probis_exec (str | None, optional) – location of probis executable if probis is used. If None it is downloaded from the internet. Defaults to None.
dist (float, optional) – Radius around center_for_water_selection to be used for extraction of water molecules. Defaults to 12.0.
SOL (str, optional) – Residue name for waters. Defaults to “SOL”.
OW (str, optional) – Name of the oxygen atom. Defaults to “OW”.
HW (str, optional) – Name of the hydrogen atom. Defaults to “HW”.

Returns:

Returns coordinates of oxygen atoms, first hydrogen atom and second hydrogen atom in three seperate numpy arrays. Each row in each array makes up coordinates of a single water molecule.

Return type:

tuple[np.ndarray, np.ndarray]

Example:

# Generate water coordinates for clustering analysis from unaligned trajectory
resids = [8,12,143,144]
align_and_extract_waters(
    get_center_of_selection(get_selection_string_from_resnums(resids)),
    trajectory = 'trajectory.xtc',
    aligned_trajectory_filename = 'aligned_trj.xtc',
    align_target_file_name = 'aligned.pdb',
    topology = 'topology.tpr',
    every = 1,
    align_mode = "mda",
    align_target= 0,
    align_selection = "protein",
    dist = 10.0,
)

WaterNetworkAnalysis.align_trajectory(trajectory: str, output_trj_file: str, align_target_file_name: str, topology: str | None = None, every: int = 1, align_mode: str = 'mda', align_target: int | None = -1, align_selection: str = 'protein', probis_exec: str | None = None) → None[source]

Align the trajectory.

Before running water clustering for identification of conserved water molecules the trajectory should be aligned first. Alignment can be done via MDAnalysis or using the probis algorithm. Whole protein is aligned by default. To select the align reference state either select an integer for align_target and specify a file name to which the align target will be saved to with align_target_file_name OR set align_target to None and align_target_file_name will be read and used as align target.

The trajectory or topology should contain information on bond topology for alignment. Supported topology file types:

DATA DMS GSD MMTF MOL2 PARMED PDB ENT PSF TOP PRMTOP PARM7 TPR TXYZ ARC XML XPDB

Alternatively the whole trajectory can be provided in some of the above given file types as well.

Parameters:

trajectory (str) – File name containing unaligned trajectory.
output_trj_file (str) – output file name for aligned trajectory.
align_target_file_name (str) – File name for saving the align target (usually pdb) if align_target is int. If align target is None, the align target will be read from this file instead.
topology (str | None, optional) – Topology file name. Defaults to None.
every (int, optional) – Take every every snapshot instead of taking all the snapshots (every = 1) for alignment. Defaults to 1.
align_mode (str, optional) – Align algorithm to use. “mda” uses MDAnalysis while “probis” uses the probis algorithm. Defaults to “mda”.
align_target (int | None, optional) – Align target. If None the align target is read from the align_target_file_name. If a number is given uses the given snapshot of the trajectory as the align target. If -1 uses the last snapshot. Defaults to -1.
align_selection (str, optional) – Selection to align to. Defaults to “protein”.
probis_exec (str | None, optional) – location of probis executable if probis is used. If None it is downloaded from the internet. Defaults to None.

Example:

# align the trajectory and save to a file
align_trajectory(
    trajectory="trajectory.xtc",
    output_trj_file="aligned_trajectory.xtc",
    align_target_file_name='aligned.pdb', align_mode="mda",
    align_target=0, align_selection="protein",
    topology="topology.tpr",
)

WaterNetworkAnalysis.calculate_oxygen_density_map(selection_center: np.ndarray, trajectory: str, topology: str | None = None, dist: float = 12.0, delta: float = 0.4, every: int = 1, SOL: str | None = None, OW: str | None = None, output_name: str = 'water.dx') → Density[source]

Generate oxygen density maps.

Generate oxygen density maps using MDAnalysis.

Parameters:

selection_center (np.ndarray) – center of selection around which waters will be selected.
trajectory (str) – trajectory filename.
topology (str | None, optional) – Topology filename if available. Defaults to None.
dist (float, optional) – distance around selection center inside which the oxygen will be selected. Defaults to 12.0.
delta (float, optional) – bin size for density map. Defaults to 0.4 Angstroms.
every (int, optional) – Take every n_every snapshot instead of taking all the snapshots (every = 1) for alignment. Defaults to 1.
SOL (str, optional) – Residue name of the water residue. If None it will be determined automatically. Defaults to None.
OW (str, optional) – Name of the oxygen atom. If None it will be determined automatically. Defaults to None.
output_name (str, optional) – name of the output file, it should end with ‘.dx’ . Defaults to “water.dx”.

Returns:

returns MDA Density object containing the density map

Return type:

Density

Example:

# Generate water oxygen density map near active site
resids = [8,12,143,144] calculate_oxygen_density_map(
    get_center_of_selection(
        get_selection_string_from_resnums(resids)), trajectory =
        'trajectory.pdb'
    )
)

WaterNetworkAnalysis.extract_waters_from_trajectory(selection_center: np.ndarray, trajectory: str, topology: str | None = None, dist: float = 12.0, every: int = 1, SOL: str | None = None, OW: str | None = None, HW: str | None = None, extract_only_O: bool = False, save_file: str | None = None) → tuple[np.ndarray, np.ndarray][source]

Extract waters for clustering analysis.

Calculates water (oxygen and hydrogen) coordinates for all the waters in the aligned trajectory using MDAnalysis for further use in water clustering. The trajectory should be aligned previously.

Parameters:

selection_center (np.ndarray) – coordinates of selection center around which waters will be selected.
trajectory (str) – Trajectory file name.
topology (str | None, optional) – Topology file name. Defaults to None.
dist (float, optional) – Distance around the center of selection inside which water molecules will be sampled. Defaults to 12.0.
every (int, optional) – Take every every snapshot instead of taking all the snapshots (every = 1) for alignment. Defaults to 1.
SOL (str, optional) – Residue name of the water residue. If None it will be determined automatically. Defaults to None.
OW (str, optional) – Name of the oxygen atom. If None it will be determined automatically. Defaults to None.
HW (str, optional) – Name of the hydrogen atom in water. Names checked will be the provided name and the name with a 1 or 2 appended. If None it will be determined automatically. Defaults to None.
extract_only_O (bool, optional) – If True only oxygen atom positions. Defaults to False.
save_file (str | None, optional) – File to which coordinates will be saved. If none doesn’t save to a file. Defaults to None.

Returns:

returns xyz numpy arrays that contain coordinates of oxygens, and combined array of hydrogen 1 and hydrogen 2 coordinates. If extract_only_O is True, returns only oxygen coordinates.

Return type:

tuple[np.ndarray, np.ndarray]

Example:

# Generate water coordinates for clustering analysis
resids = [8,12,143,144]
coordO, coordH = extract_waters_from_trajectory(
    get_center_of_selection(get_selection_string_from_resnums(resids)),
    trajectory = 'trajectory.xtc',
    topology = 'topology.tpr'
)

WaterNetworkAnalysis.get_center_of_selection(selection: str, trajectory: str, topology: str | None = None) → np.ndarray[source]

Compute centre of selection with MDAnalysis.

Calculates coordinates in xyz of the centre of selection using MDAnalysis.

Parameters:

selection (str) – selection string for MDAnalysis
trajectory (str) – trajectory filename - anything that is accepted by MDAnalysis should work
topology (str | None, optional) – topology filename. Defaults to None.

Returns:

returns array that contains coordinates of center of selection

Return type:

np.ndarray

Example:

# find center of active site defined by residue ids
resids = [8,12,143,144]
get_center_of_selection(get_selection_string_from_resnums(resids))

WaterNetworkAnalysis.get_selection_string_from_resnums(resids: list[int], selection_type: str = 'MDA') → str[source]

Return selection string for given residue ids.

Returns the selection command string for different programs based on amioacid residue IDs list given.

Parameters:

resids (list[int]) – list of aminoacid residue ids.
selection_type (str, optional) – selection program language. Options:”MDA”, “PYMOL”, “PROBIS” and “NGLVIEW” Defaults to “MDA”.

Returns:

selection command in form of a string

Return type:

str

Example:

# list of resids
resids = [8,12,143,144] # print PYMOL selection string
get_selection_string_from_resnums(resids, selection_type =
"PYMOL")

WaterNetworkAnalysis.make_results_pdb_MDA(water_type: list[str], waterO: np.ndarray, waterH1: np.ndarray, waterH2: np.ndarray, output_fname: str, protein_file: str | None = None, ligand_name: str | None = None, mode: str = 'SOL') → None[source]

Generate pdb file with clustering results.

The water molecules determined by the clustering procedure are written in a pdb file which also contains protein and the ligand. Waters are labeled based on their hydrogen orientations (FCW for fully conserved, HCW for half conserved and WCW for weakly conserved). Uses MDAnalysis for construction of the pdb file. First 4 arguments of the function can be read from the results file by using cws.utils.read_results() or directly from the cws.water_clustering.WaterClustering class.

Parameters:

water_type (list[str]) – List of water types.
waterO (np.ndarray) – numpy array containing coordinates of conserved waters’ oxygens.
waterH1 (np.ndarray) – numpy array containing coordinates of conserved waters’ first hydrogen
waterH2 (np.ndarray) – numpy array containing coordinates of conserved waters’ second hydrogen
output_fname (str) – name of the output pdb file. Must end in ‘.pdb’.
protein_file (str | None, optional) – file which contains protein and ligand. It should be aligned in the same way as the trajectory used for calculation of conserved waters. If None no protein is saved. Defaults to None.
ligand_name (str | None, optional) – residue name of the ligand. If None no ligand is saved. Defaults to None.
mode (str, optional) –
mode in which conserved waters will be saved. Options:
- ”SOL” - default mode. Saves water molecules as SOL so that visualisation softwares can recognise them as waters. No distinction is made between different types of conserved waters.
- ”cathegorise” - cathegorises the waters according to hydrogen orienation into fully conserved (FCW), half-coserved (HCW) and weakly conserved (WCW). This mode makes visualisers not able to recognise the waters as water/sol but usefull for interpreting results.

Example:

# Generate pdb results file
make_results_pdb_MDA(
    *cws.utils.read_results(),
    output_fname = 'results.pdb',
    mode = 'cathegorise'
)

WaterNetworkAnalysis.read_results_and_make_pdb(fname: str = 'Clustering_results.dat', typefname: str = 'Type_Clustering_results.dat', protein_file: str = 'aligned.pdb', ligand_name: str | None = None, output_fname: str | None = None, mode: str = 'SOL') → None[source]

Read results from files and generate a pdb file.

Parameters:

fname (str, optional) – File name with clustering results - coordinates of water molecules. Defaults to “Clustering_results.dat”.
typefname (str, optional) – File name which contains the types of each water molecule. Defaults to “Type_Clustering_results.dat”.
protein_file (str, optional) – File name which contains the reference structure trajectory has been aligned to. Defaults to “aligned.pdb”.
ligand_name (str | None, optional) – Residue name for the ligand. If none is given, no ligand is extracted and visualised/saved in the pdb file. Defaults to None.
output_fname (str | None, optional) – Name of the output file (pdb prefered). Defaults to None.
mode (str, optional) –
mode in which conserved waters will besaved. Options:
- ”SOL” - default mode. Saves water molecules as SOL so that visualisation softwares can recognise them as waters. No distinction is made between different types of conserved waters.
- ”cathegorise” - cathegorises the waters according to hydrogen orienation into fully conserved (FCW), half-coserved (HCW) and weakly conserved (WCW). This mode makes visualisers not able to recognise the waters as water/sol but usefull for interpreting results.

Example:

# Generate pdb results file
read_results_and_make_pdb(
    fname = 'Results.dat',
    typefname = 'TypeResults.dat',
    protein_file = 'aligned_protein.pdb',
    ligand_name = 'UBY',
    output_fname = 'results.pdb',
    mode = 'cathegorise',
)