Skip to content

Commit

Permalink
improved functionality
Browse files Browse the repository at this point in the history
  • Loading branch information
ASLeonard committed Apr 24, 2019
1 parent f24bfab commit de55308
Show file tree
Hide file tree
Showing 8 changed files with 146 additions and 90 deletions.
73 changes: 48 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,41 +3,71 @@
[![Build Status](https://travis-ci.org/ASLeonard/polyomino_interfaces.svg?branch=master)](https://travis-ci.org/ASLeonard/polyomino_interfaces)
![License Badge](https://img.shields.io/github/license/ASLeonard/polyomino_interfaces.svg?style=flat)

Polyomimo lattice self-assembly model with binary string interfaces.
Code repository for the generalised polyomimo lattice self-assembly model with binary string interfaces.



### Core modules
The majority of polyomino core features are found in the `polyomino_core' submodule, designed to provide a single, consistent set of definitions in all models.
Assembly algorithms, phenotype classifications, and selection dynamics can be found within this repository.
A direct link to the current core development is [here](https://github.com/ASLeonard/polyomino_core).

### Installation
Building the simulator requires a c++ compiler that has general support for c++17. It has been tested with g++7 and greater, and clang++6 and greater. The compiling itself is taken care of automatically through the makefile. The recommended steps are as follows
```
Building the simulator requires a c++ compiler that has general support for c++17. It has been tested with g++7 and greater, and clang++6 and greater. The compiling itself is taken care of through the makefile. The recommended steps are as follows
```shell
git clone --recurse-submodules https://github.com/ASLeonard/polyomino_interfaces
cd polyomino_interfaces
make
```
At which point the simulation program is in `/bin/ProteinEvolution`.
Errors at this stage probably indicate the compiler (or the CXX environment variable) is not modern enough.
At which point the simulation program is callable at `./bin/ProteinEvolution`.
Errors at this stage probably indicate the compiler (set through the CXX environment variable) is not modern enough.

Several compiler flags can be added depenending on what is available on the user's system if desired, like g++ has support for multi-threading (-fopenmp) and link-time optimization (-flto).
Several compiler flags can be added depending on what is available on the user's system if desired, like g++ has support for multi-threading (-fopenmp) and link-time optimization (-flto), but are not required to be used.

#### Python requirements
The analysis and plotting has been tested with the following versions, although many older/newer versions are likely to work. These are common packages, but not always installed by default.
+ python (3.6.5)
The analysis and plotting is mostly handled in python3, and should be useable with any python version 3.6+ (tested on 3.6.5).
Some common additional packages have been used, but are not always installed by default. Many older/newer versions are likely to work, but versions used during development were:
+ Numpy (1.16.2)
+ Matplotlib (3.03)
+ SciPy (1.2.1) (only necessary for scripts within **polyomino_core**, not used by default)

#### Testing
The install, c++, and python can be tested after making, by calling `make test`. This will run an evolution simulation with default parameters, analyse the generated files, save the analysis, and then erase all the generated files and analysis.
The installation, c++ program, and python can be tested after making, by calling `make test`. This will run an evolution simulation with default parameters, analyse the generated files, save the analysis, and then erase all the generated files and analysis. Note this erases generated files by pattern matching, so will erase any generated txt or pkl files within the root directory.

#### Directory layout
The main folders of interest are bin/ and scripts/, although a more curious user can modify the c++ in the other folders.

### Simulations
The code is split into two main parts. The polyomino evolutions are written in c++, while the analysis is in python3. The c++ code, in the executable **ProteinEvolution**, can be run directly, with parameters mostly detailed by calling `./bin/ProteinEvolution -H`. There is one main parameter that is hardcoded in the c++, the interface length, found in `includes/interface_model.hpp` as `using interface_type = ...`. Details on how to change it are described there.

+ bin/
+ exectuable
+ build/ (internal compiling only)
+ scripts/
A simple example has been provided below to automatically generate, analyse, and plot some data. The default parameters are sufficient to generate some interesting data, and can and should be modified to run over more simulations and different parameters. They can be found in `scripts/interface_analysis.py`, as a dictionary within the `runEvolutionSequence()` method.

The files are generated in the calling directory unless the default parameter for file_path is changed, and can be large (>100s of MB) for larger choices of population size and longer simulations.

#### Example usage
A very minimal example can be achieved from the root directory with
```shell
python3 scripts/interface_analysis.py [optional parameter for number of simulations, e.g. 10]
python3 scripts/interface_plotting.py
```

### Plotting
Several visualising methods have been included to interpret the generated data, in the `interface_plotting` file. They mostly involve loading simulation results, called through e.g. `loadPickle(params...)`, and passing those to the plotting functions. Each method has comments explaining how to use in more detail.

After generating data through the first line of the minimal example, you can plot the data more selective by calling different plots and options via
```python
data=loadPickle(.6875,25,1,5)

calculatePathwaySucess(data)
#or
plotEvolution(data)
#or
plotPhaseSpace(data)
```
Where various method default parameters have been commented on in the files.

### Directory layout
The main folder of interest is scripts/, although a more curious user can modify the c++ in the other folders. In the main directory there is also the makefile, which can be modified with the previously mentioned compiler flags.

+ scripts/ (main directory for user)
+ utility methods
+ analysis module
+ plotting module
Expand All @@ -50,15 +80,8 @@ The main folders of interest are bin/ and scripts/, although a more curious user
+ c++ headers
+ scripts/
+ generic polyomino plotting

### Simulations
The code is split into two main parts. The polyomino evolutions are written in c++, while the analysis is in python3. The c++ code, in the executable **ProteinEvolution**, can be run directly, with parameters mostly detailed by calling the `ProteinEvolution -H` feature.

However, the easiest way is to use the main method in the _interface\_analysis_ file, which calls `runEvolutionSequence`. The default parameters are sufficient to generate some interesting data, and can be modified if desired.

The files are generated in the calling directory unless the default parameter for file_path is changed, and can be large (>100s of mB) for larger choices of population size and longer simulations.

### Plotting
Several visualising methods have been included to interpret the generated data, in the _interface\_plotting_ file. They mostly involve loading simulation results, called through e.g. `loadPickle(params...)`, and passing those to the plotting functions. Each method has comments explaining how to use in more detail.
+ bin/
+ executable
+ build/ (internal compiling only)


17 changes: 9 additions & 8 deletions includes/interface_model.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,21 @@
#include "core_evolution.hpp"

//using interfaces of 64 bits length, can use any width of unsigned integer
//i.e. can alternatively use uint8_t, uint16_t, uint32_t
//[UNTESTED] potentially could use further width types, like the GNU specific "unsigned __int128"
using interface_type = uint64_t;

//set genotype to be a vector of these elements
constexpr uint8_t interface_size=CHAR_BIT*sizeof(interface_type);
using BGenotype = std::vector<interface_type>;

//parameters used
namespace simulation_params
{
extern uint8_t model_type,n_tiles,samming_threshold;
extern double temperature,binding_threshold,mu_prob;
}

//extension of base assembly (in polyomino_core/include/core_genotype)
class InterfaceAssembly : public PolyominoAssembly<InterfaceAssembly> {

Expand All @@ -32,14 +41,6 @@ class InterfaceAssembly : public PolyominoAssembly<InterfaceAssembly> {
static void Mutation(BGenotype& genotype);
};

//parameters used
namespace simulation_params
{
extern uint8_t model_type,n_tiles,samming_threshold;
extern uint16_t dissociation_time;
extern double temperature,binding_threshold,mu_prob;
}


namespace interface_model
{
Expand Down
12 changes: 6 additions & 6 deletions makefile
Original file line number Diff line number Diff line change
Expand Up @@ -53,12 +53,12 @@ test:
@echo "Running basic test to check c++ and python3 integration\nRunning default generation and analysis"
@python3 scripts/interface_analysis.py
@echo "Test seems successful, cleaning up generated files"
@rm PIDs_Run0.txt
@rm Strengths_Run0.txt
@rm PhenotypeTable_Run0.txt
@rm Selections_Run0.txt
@rm Y0.6875T10.0Mu1.0F5.0.pkl
@rm Mu1.0Y0.6875T10.0F5.0O0.pkl
@rm PIDs_Run*.txt
@rm Strengths_Run*.txt
@rm PhenotypeTable_Run*.txt
@rm Selections_Run*.txt
@rm Y*.pkl
@rm Mu*.pkl


#Compile
Expand Down
25 changes: 14 additions & 11 deletions scripts/interface_analysis.py
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,7 @@ def __valid_descendent(gen_idx,kid):
return True

##helper method to identify where a tree branches into a new ancestry
def __addBranch():
def __addBranch():
if g_idx:
if len(bond_ref)<len(interactions[g_idx-1,p_idx].bonds):
return False;
Expand Down Expand Up @@ -205,7 +205,7 @@ def __addBranch():

##if not equal, found a new transition, add branch at this point
elif not np.array_equal(phenotypes[g_idx-1,p_idx],pid_ref):
if not __addBranch():
if not __addBranch():
return None
bond_ref=interactions[g_idx-1,p_idx].bonds
pid_ref=phenotypes[g_idx-1,p_idx]
Expand Down Expand Up @@ -247,7 +247,7 @@ def __addBranch():
if not np.array_equal(pid_c,pid_d):
if np.count_nonzero(evo_stage[g_idx]>=len(interactions[g_idx,c_idx].bonds))>POPULATION_FIXATION:
continue
##only consider it failed if it "should" have survived based on fitness advantage
##only consider it failed if it "should" have survived based on fitness advantage
if not __growDescendentTree(Tree(pid_c,interactions[g_idx,c_idx].bonds,(-1,-1),g_idx,[[c_idx]]),SURVIVAL_DEPTH):
failed_jumps[tuple(tuple(_) for _ in (pid_c,pid_d))]+=1

Expand Down Expand Up @@ -329,15 +329,17 @@ def analyseHomogeneousPopulation(run,temperature):
return np.asarray(param_trajectory)


def runEvolutionSequence():
##main wrapper function, uses hardcoded parameters given below to run an evolution simulation (from c++ files), and then analyse results in python3
##results are then written into a pickle or npz file depending on parameters, and saved for later plotting/printing
def runEvolutionSequence(runs=1):

##change parameters here for data generation, meanings for parameter flags can be found in the src/interface_simulation.cpp file at the end

##change parameters here for data generation

##defaults well-suited for generating evolution of fixed system
default_parameters={'file_path' : 'bin/', 'N' : 2, 'P' : 100, 'K' : 400, 'B' : 20, 'X': .5, 'F': 5, 'A' : 1, 'D' : 1, 'J': 5, 'M': 1, 'Y' : .6875, 'T': 10, 'O' : 100, 'G' : 10}
default_parameters={'file_path' : 'bin/', 'N' : 2, 'P' : 100, 'K' : 500, 'B' : 25, 'X': 0, 'F': 5, 'A' : 1, 'D' : runs, 'J': 5, 'M': 1, 'Y' : .6875, 'T': 25, 'O' : 0, 'G' : 0}

##defaults for dynamic fitness landscape
#default_parameters={'file_path' : 'bin/', 'N' : 2, 'P' : 100, 'K' : 600, 'B' : 150, 'X': 0, 'F': 1, 'A' : 2, 'D' : 1, 'J': 1, 'M': 1, 'Y' : .6875, 'T': 25, 'O' : 200, 'G' : 10}
##defaults for dynamic fitness landscape
#default_parameters={'file_path' : 'bin/', 'N' : 2, 'P' : 100, 'K' : 600, 'B' : 150, 'X': 0, 'F': 1, 'A' : 2, 'D' : 1, 'J': 1, 'M': 1, 'Y' : .6875, 'T': 25, 'O' : 200, 'G' : 10}

##helper method to stringify parameters
def generateParameterString():
Expand Down Expand Up @@ -386,13 +388,14 @@ def collateByMode(mode,run_range, params):

elif mode == 2:
np.savez_compressed(file_base,collateNPZs(*params,runs=run_range))

else:
print('unknown mode request, set parameter \'A\'')

if __name__ == '__main__':
try:
runEvolutionSequence()
##by default run two simulations, otherwise use user input
runEvolutionSequence(2 if len(sys.argv)!=2 else int(sys.argv[1]))
except Exception as e:
print(e)
else:
Expand Down
33 changes: 23 additions & 10 deletions scripts/interface_methods.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,7 @@ def __repr__(self):
def loadSelectionHistory(run):
selections=[]
for line in open(BASE_PATH.format('Selections',run)):
converted=[int(i) for i in line.split()]
selections.append(converted)
selections.append([int(i) for i in line.split()])
return np.array(selections,np.uint16)

def loadPIDHistory(run):
Expand Down Expand Up @@ -110,25 +109,34 @@ def __init__(self,S_star,T,mu,gamma,data):
print("Clearing out 10s")
del form[(10,0)]


def __repr__(self):
return r'Data struct for S*={:.3f},T={:.3g},mu={:.2g},gamma={:.2g}'.format(self.S_star,self.T,self.mu,self.gamma)


##primary method to load pickle data
##parameters:
## S_star (float): critical interaction strength
## t (float): temperature
## mu (float): mutation rate
## gamma (float): fitness punishment value
def loadPickle(S_star,t,mu,gamma):
with open('Y{}T{}Mu{}F{}.pkl'.format(S_star,*(float(i) for i in (t,mu,gamma))), 'rb') as f:
return EvoData(S_star,t,mu,gamma,load(f))


##primary method to load numpyzip data
##same parameters as above
def loadNPZ(S_star,t,mu,gamma):
return np.load('Y{}T{}Mu{}F{}.npz'.format(S_star,*(float(i) for i in (t,mu,gamma))), 'rb')['arr_0']

#################
## RANDOM WALK ##
#################

##internal method for calculating Markov chain
def RandomWalk(I_size=64,n_steps=1000,phi=0.5,S_star=0.6,analytic=False):
s_hats=np.linspace(0,1,I_size+1)
N_states=int(I_size*(1-S_star))+1

##find normalised eigenvector for Markov matrix
def __getSteadyStates(val):
rows=[[1-phi,phi*(1-val[0])]+[0]*(N_states-2)]
for i in range(1,N_states-1):
Expand All @@ -138,14 +146,17 @@ def __getSteadyStates(val):
eigval,eigvec=LA.eig(matrix)
ve=eigvec.T[np.argmax(eigval)]
return max(eigval),ve/sum(ve)


##return matrix calculation
if analytic:
analytic_states=__getSteadyStates(s_hats[-N_states:])[1]
return sum(s_hats[-N_states:]*analytic_states)


##otherwise simulate directly
states=np.array([1]+[0]*(N_states-1),dtype=float)
progressive_states=[sum(s_hats[-N_states:]*states)]

##simulate random walk on strength states
def __updateStates(states,val):
states_updating=states.copy()
for i in range(states.shape[0]):
Expand All @@ -156,6 +167,7 @@ def __updateStates(states,val):
states_updating[i]+=states[i+1]*phi*val[i+1]
return states_updating/sum(states_updating)

##walk for n_steps, and then return state evolutions
for _ in range(n_steps):
states=__updateStates(states,s_hats[-N_states:])
progressive_states.append(sum(s_hats[-N_states:]*states))
Expand All @@ -165,6 +177,7 @@ def __updateStates(states,val):
## PHASE SPACE##
################

##all various methods for calculated steps
def HetTet(a,b):
return 1/(1+2*b)*(2*b+2/(a+2)*(1+a*1/(a+2)))

Expand Down Expand Up @@ -197,7 +210,8 @@ def seed2():
return seed2()
else:
return .5*seed1()+.5*seed2()


##internal method to calculate phase space coordinates for transitions
def calcTransitionParams(evo_strs,transitions,T,S_star):
param_dict={}

Expand Down Expand Up @@ -225,8 +239,7 @@ def calcTransitionParams(evo_strs,transitions,T,S_star):
param_dict[(1,(16,0),(8,0))]=(S_star/evo_strs[(8,0)][(2,2)][-1])**T
param_dict[(0,(16,0),(8,0))]=(evo_strs[(16,0)][(4,4)][0]/evo_strs[(16,0)][(3,4)][0])**T
param_dict[(0,(16,0),(16,0))]=(evo_strs[(16,0)][(4,4)][-1]/evo_strs[(16,0)][(3,4)][-1])**T



if (12,0) in evo_strs:
if (4,1) in transitions[(12,0)]:
if (2,0) in transitions[(4,1)]:
Expand Down
Loading

0 comments on commit de55308

Please sign in to comment.