improved functionality

ASLeonard · Apr 24, 2019 · de55308 · de55308
1 parent f24bfab
commit de55308
Show file tree

Hide file tree

Showing 8 changed files with 146 additions and 90 deletions.
diff --git a/README.md b/README.md
@@ -3,41 +3,71 @@
 [![Build Status](https://travis-ci.org/ASLeonard/polyomino_interfaces.svg?branch=master)](https://travis-ci.org/ASLeonard/polyomino_interfaces)
 ![License Badge](https://img.shields.io/github/license/ASLeonard/polyomino_interfaces.svg?style=flat)
 
-Polyomimo lattice self-assembly model with binary string interfaces.
+Code repository for the generalised polyomimo lattice self-assembly model with binary string interfaces.
+
+
 
 ### Core modules
 The majority of polyomino core features are found in the `polyomino_core' submodule, designed to provide a single, consistent set of definitions in all models.
+Assembly algorithms, phenotype classifications, and selection dynamics can be found within this repository. 
 A direct link to the current core development is [here](https://github.com/ASLeonard/polyomino_core).
 
 ### Installation
-Building the simulator requires a c++ compiler that has general support for c++17. It has been tested with g++7 and greater, and clang++6 and greater. The compiling itself is taken care of automatically through the makefile. The recommended steps are as follows
-```
+Building the simulator requires a c++ compiler that has general support for c++17. It has been tested with g++7 and greater, and clang++6 and greater. The compiling itself is taken care of through the makefile. The recommended steps are as follows
+```shell
 git clone --recurse-submodules https://github.com/ASLeonard/polyomino_interfaces
 cd polyomino_interfaces
 make
 ```
-At which point the simulation program is in `/bin/ProteinEvolution`.
-Errors at this stage probably indicate the compiler (or the CXX environment variable) is not modern enough.
+At which point the simulation program is callable at `./bin/ProteinEvolution`.
+Errors at this stage probably indicate the compiler (set through the CXX environment variable) is not modern enough.
 
-Several compiler flags can be added depenending on what is available on the user's system if desired, like g++ has support for multi-threading (-fopenmp) and link-time optimization (-flto).
+Several compiler flags can be added depending on what is available on the user's system if desired, like g++ has support for multi-threading (-fopenmp) and link-time optimization (-flto), but are not required to be used.
 
 #### Python requirements
-The analysis and plotting has been tested with the following versions, although many older/newer versions are likely to work. These are common packages, but not always installed by default.
-+ python (3.6.5)
+The analysis and plotting is mostly handled in python3, and should be useable with any python version 3.6+ (tested on 3.6.5).
+Some common additional packages have been used, but are not always installed by default. Many older/newer versions are likely to work, but versions used during development were:
 + Numpy (1.16.2)
 + Matplotlib (3.03)
 + SciPy (1.2.1) (only necessary for scripts within **polyomino_core**, not used by default)
 
 #### Testing
-The install, c++, and python can be tested after making, by calling `make test`. This will run an evolution simulation with default parameters, analyse the generated files, save the analysis, and then erase all the generated files and analysis.
+The installation, c++ program, and python can be tested after making, by calling `make test`. This will run an evolution simulation with default parameters, analyse the generated files, save the analysis, and then erase all the generated files and analysis. Note this erases generated files by pattern matching, so will erase any generated txt or pkl files within the root directory.
 
-#### Directory layout
-The main folders of interest are bin/ and scripts/, although a more curious user can modify the c++ in the other folders.
+
+### Simulations
+The code is split into two main parts. The polyomino evolutions are written in c++, while the analysis is in python3. The c++ code, in the executable **ProteinEvolution**, can be run directly, with parameters mostly detailed by calling `./bin/ProteinEvolution -H`. There is one main parameter that is hardcoded in the c++, the interface length, found in `includes/interface_model.hpp` as `using interface_type = ...`. Details on how to change it are described there.
 
-+ bin/
-  + exectuable
-+ build/ (internal compiling only)
-+ scripts/
+A simple example has been provided below to automatically generate, analyse, and plot some data. The default parameters are sufficient to generate some interesting data, and can and should be modified to run over more simulations and different parameters. They can be found in `scripts/interface_analysis.py`, as a dictionary within the `runEvolutionSequence()` method.
+
+The files are generated in the calling directory unless the default parameter for file_path is changed, and can be large (>100s of MB) for larger choices of population size and longer simulations.
+
+#### Example usage
+A very minimal example can be achieved from the root directory with 
+```shell
+python3 scripts/interface_analysis.py [optional parameter for number of simulations, e.g. 10]
+python3 scripts/interface_plotting.py
+```
+
+### Plotting
+Several visualising methods have been included to interpret the generated data, in the `interface_plotting` file. They mostly involve loading simulation results, called through e.g. `loadPickle(params...)`, and passing those to the plotting functions. Each method has comments explaining how to use in more detail.
+
+After generating data through the first line of the minimal example, you can plot the data more selective by calling different plots and options via
+```python
+data=loadPickle(.6875,25,1,5)
+
+calculatePathwaySucess(data)
+#or
+plotEvolution(data)
+#or
+plotPhaseSpace(data)
+```
+Where various method default parameters have been commented on in the files.
+
+### Directory layout
+The main folder of interest is scripts/, although a more curious user can modify the c++ in the other folders. In the main directory there is also the makefile, which can be modified with the previously mentioned compiler flags.
+
++ scripts/ (main directory for user)
   + utility methods
   + analysis module
   + plotting module
@@ -50,15 +80,8 @@ The main folders of interest are bin/ and scripts/, although a more curious user
     + c++ headers
   + scripts/
     + generic polyomino plotting
-
-### Simulations
-The code is split into two main parts. The polyomino evolutions are written in c++, while the analysis is in python3. The c++ code, in the executable **ProteinEvolution**, can be run directly, with parameters mostly detailed by calling the `ProteinEvolution -H` feature.
-
-However, the easiest way is to use the main method in the _interface\_analysis_ file, which calls `runEvolutionSequence`. The default parameters are sufficient to generate some interesting data, and can be modified if desired.
-
-The files are generated in the calling directory unless the default parameter for file_path is changed, and can be large (>100s of mB) for larger choices of population size and longer simulations.
-
-### Plotting
-Several visualising methods have been included to interpret the generated data, in the _interface\_plotting_ file. They mostly involve loading simulation results, called through e.g. `loadPickle(params...)`, and passing those to the plotting functions. Each method has comments explaining how to use in more detail.
++ bin/
+  + executable
++ build/ (internal compiling only)
 
 
diff --git a/includes/interface_model.hpp b/includes/interface_model.hpp
@@ -6,12 +6,21 @@
 #include "core_evolution.hpp"
 
 //using interfaces of 64 bits length, can use any width of unsigned integer
+//i.e. can alternatively use uint8_t, uint16_t, uint32_t
+//[UNTESTED] potentially could use further width types, like the GNU specific "unsigned __int128" 
 using interface_type = uint64_t;
 
 //set genotype to be a vector of these elements
 constexpr uint8_t interface_size=CHAR_BIT*sizeof(interface_type);
 using BGenotype = std::vector<interface_type>;
 
+//parameters used
+namespace simulation_params
+{
+  extern uint8_t model_type,n_tiles,samming_threshold;
+  extern double temperature,binding_threshold,mu_prob;
+}
+
 //extension of base assembly (in polyomino_core/include/core_genotype)
 class InterfaceAssembly : public PolyominoAssembly<InterfaceAssembly> {
 
@@ -32,14 +41,6 @@ class InterfaceAssembly : public PolyominoAssembly<InterfaceAssembly> {
   static void Mutation(BGenotype& genotype);  
 };
 
-//parameters used
-namespace simulation_params
-{
-  extern uint8_t model_type,n_tiles,samming_threshold;
-  extern uint16_t dissociation_time;
-  extern double temperature,binding_threshold,mu_prob;
-}
-
 
 namespace interface_model
 {

diff --git a/makefile b/makefile
@@ -53,12 +53,12 @@ test:
 	@echo "Running basic test to check c++ and python3 integration\nRunning default generation and analysis"
 	@python3 scripts/interface_analysis.py
 	@echo "Test seems successful, cleaning up generated files"
-	@rm PIDs_Run0.txt
-	@rm Strengths_Run0.txt
-	@rm PhenotypeTable_Run0.txt
-	@rm Selections_Run0.txt
-	@rm Y0.6875T10.0Mu1.0F5.0.pkl
-	@rm Mu1.0Y0.6875T10.0F5.0O0.pkl
+	@rm PIDs_Run*.txt
+	@rm Strengths_Run*.txt
+	@rm PhenotypeTable_Run*.txt
+	@rm Selections_Run*.txt
+	@rm Y*.pkl
+	@rm Mu*.pkl
 
 
 #Compile

diff --git a/scripts/interface_analysis.py b/scripts/interface_analysis.py
@@ -158,7 +158,7 @@ def __valid_descendent(gen_idx,kid):
                     return True
 
      ##helper method to identify where a tree branches into a new ancestry
-     def __addBranch():        
+     def __addBranch():
           if g_idx:
                if len(bond_ref)<len(interactions[g_idx-1,p_idx].bonds):
                     return False;
@@ -205,7 +205,7 @@ def __addBranch():
 
                ##if not equal, found a new transition, add branch at this point
                elif not np.array_equal(phenotypes[g_idx-1,p_idx],pid_ref):
-                    if not __addBranch():                         
+                    if not __addBranch():
                          return None
                     bond_ref=interactions[g_idx-1,p_idx].bonds
                     pid_ref=phenotypes[g_idx-1,p_idx]
@@ -247,7 +247,7 @@ def __addBranch():
           if not np.array_equal(pid_c,pid_d):
                if np.count_nonzero(evo_stage[g_idx]>=len(interactions[g_idx,c_idx].bonds))>POPULATION_FIXATION:
                     continue
-               ##only consider it failed if it "should" have survived based on fitness advantage 
+               ##only consider it failed if it "should" have survived based on fitness advantage
                if not __growDescendentTree(Tree(pid_c,interactions[g_idx,c_idx].bonds,(-1,-1),g_idx,[[c_idx]]),SURVIVAL_DEPTH):
                     failed_jumps[tuple(tuple(_) for _ in (pid_c,pid_d))]+=1
 
@@ -329,15 +329,17 @@ def analyseHomogeneousPopulation(run,temperature):
      return np.asarray(param_trajectory)
 
 
-def runEvolutionSequence():
+##main wrapper function, uses hardcoded parameters given below to run an evolution simulation (from c++ files), and then analyse results in python3
+##results are then written into a pickle or npz file depending on parameters, and saved for later plotting/printing
+def runEvolutionSequence(runs=1):
+
+     ##change parameters here for data generation, meanings for parameter flags can be found in the src/interface_simulation.cpp file at the end
 
-     ##change parameters here for data generation
-
      ##defaults well-suited for generating evolution of fixed system
-     default_parameters={'file_path' : 'bin/', 'N' : 2, 'P' : 100, 'K' : 400, 'B' : 20, 'X': .5, 'F': 5, 'A' : 1, 'D' : 1, 'J': 5, 'M': 1, 'Y' : .6875, 'T': 10, 'O' : 100, 'G' : 10} 
+     default_parameters={'file_path' : 'bin/', 'N' : 2, 'P' : 100, 'K' : 500, 'B' : 25, 'X': 0, 'F': 5, 'A' : 1, 'D' : runs, 'J': 5, 'M': 1, 'Y' : .6875, 'T': 25, 'O' : 0, 'G' : 0}
 
-     ##defaults for dynamic fitness landscape 
-     #default_parameters={'file_path' : 'bin/', 'N' : 2, 'P' : 100, 'K' : 600, 'B' : 150, 'X': 0, 'F': 1, 'A' : 2, 'D' : 1, 'J': 1, 'M': 1, 'Y' : .6875, 'T': 25, 'O' : 200, 'G' : 10} 
+     ##defaults for dynamic fitness landscape
+     #default_parameters={'file_path' : 'bin/', 'N' : 2, 'P' : 100, 'K' : 600, 'B' : 150, 'X': 0, 'F': 1, 'A' : 2, 'D' : 1, 'J': 1, 'M': 1, 'Y' : .6875, 'T': 25, 'O' : 200, 'G' : 10}
 
      ##helper method to stringify parameters
      def generateParameterString():
@@ -386,13 +388,14 @@ def collateByMode(mode,run_range, params):
 
      elif mode == 2:
           np.savez_compressed(file_base,collateNPZs(*params,runs=run_range))
-          
+
      else:
           print('unknown mode request, set parameter \'A\'')
 
 if __name__ == '__main__':
      try:
-          runEvolutionSequence()
+          ##by default run two simulations, otherwise use user input
+          runEvolutionSequence(2 if len(sys.argv)!=2 else int(sys.argv[1]))
      except Exception as e:
           print(e)
      else:

diff --git a/scripts/interface_methods.py b/scripts/interface_methods.py
@@ -43,8 +43,7 @@ def __repr__(self):
 def loadSelectionHistory(run):
      selections=[]
      for line in open(BASE_PATH.format('Selections',run)):
-          converted=[int(i) for i in line.split()]
-          selections.append(converted)
+          selections.append([int(i) for i in line.split()])
      return np.array(selections,np.uint16)
 
 def loadPIDHistory(run):
@@ -110,25 +109,34 @@ def __init__(self,S_star,T,mu,gamma,data):
                     print("Clearing out 10s")
                     del form[(10,0)]
 
-
      def __repr__(self):
           return r'Data struct for S*={:.3f},T={:.3g},mu={:.2g},gamma={:.2g}'.format(self.S_star,self.T,self.mu,self.gamma)
-
+
+##primary method to load pickle data
+##parameters:
+## S_star (float): critical interaction strength
+## t (float): temperature
+## mu (float): mutation rate
+## gamma (float): fitness punishment value
 def loadPickle(S_star,t,mu,gamma):
      with open('Y{}T{}Mu{}F{}.pkl'.format(S_star,*(float(i) for i in (t,mu,gamma))), 'rb') as f:
           return EvoData(S_star,t,mu,gamma,load(f))
 
-
+##primary method to load numpyzip data
+##same parameters as above
 def loadNPZ(S_star,t,mu,gamma):
      return np.load('Y{}T{}Mu{}F{}.npz'.format(S_star,*(float(i) for i in (t,mu,gamma))), 'rb')['arr_0']
 
 #################
 ## RANDOM WALK ##
 #################
+
+##internal method for calculating Markov chain
 def RandomWalk(I_size=64,n_steps=1000,phi=0.5,S_star=0.6,analytic=False):
      s_hats=np.linspace(0,1,I_size+1)
      N_states=int(I_size*(1-S_star))+1
 
+     ##find normalised eigenvector for Markov matrix
      def __getSteadyStates(val):
           rows=[[1-phi,phi*(1-val[0])]+[0]*(N_states-2)]
           for i in range(1,N_states-1):
@@ -138,14 +146,17 @@ def __getSteadyStates(val):
           eigval,eigvec=LA.eig(matrix)
           ve=eigvec.T[np.argmax(eigval)]
           return max(eigval),ve/sum(ve)
-
+
+     ##return matrix calculation
      if analytic:
           analytic_states=__getSteadyStates(s_hats[-N_states:])[1]
           return sum(s_hats[-N_states:]*analytic_states)
-
+
+     ##otherwise simulate directly
      states=np.array([1]+[0]*(N_states-1),dtype=float)
      progressive_states=[sum(s_hats[-N_states:]*states)]
 
+     ##simulate random walk on strength states
      def __updateStates(states,val):
           states_updating=states.copy()
           for i in range(states.shape[0]):
@@ -156,6 +167,7 @@ def __updateStates(states,val):
                     states_updating[i]+=states[i+1]*phi*val[i+1]
           return states_updating/sum(states_updating)
 
+     ##walk for n_steps, and then return state evolutions
      for _ in range(n_steps):
           states=__updateStates(states,s_hats[-N_states:])
           progressive_states.append(sum(s_hats[-N_states:]*states))
@@ -165,6 +177,7 @@ def __updateStates(states,val):
 ## PHASE SPACE##
 ################
 
+##all various methods for calculated steps 
 def HetTet(a,b):
      return 1/(1+2*b)*(2*b+2/(a+2)*(1+a*1/(a+2)))
 
@@ -197,7 +210,8 @@ def seed2():
           return seed2()
      else:
            return .5*seed1()+.5*seed2()
-
+
+##internal method to calculate phase space coordinates for transitions     
 def calcTransitionParams(evo_strs,transitions,T,S_star):
      param_dict={}
 
@@ -225,8 +239,7 @@ def calcTransitionParams(evo_strs,transitions,T,S_star):
                     param_dict[(1,(16,0),(8,0))]=(S_star/evo_strs[(8,0)][(2,2)][-1])**T
                param_dict[(0,(16,0),(8,0))]=(evo_strs[(16,0)][(4,4)][0]/evo_strs[(16,0)][(3,4)][0])**T
                param_dict[(0,(16,0),(16,0))]=(evo_strs[(16,0)][(4,4)][-1]/evo_strs[(16,0)][(3,4)][-1])**T
-
-
+
      if (12,0) in evo_strs:
           if (4,1) in transitions[(12,0)]:
                if (2,0) in transitions[(4,1)]: