Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Directionality into MCF #8

Open
wants to merge 18 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -105,3 +105,9 @@ venv.bak/

# data files
referencePathways/reactome/*.gOut

# output files
*.sif

# .DS_store
*.DS_Store
17 changes: 17 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,18 @@ More details of the algorithm can be found in:
Chris S Magnano, Anthony Gitter.
*npj Systems Biology and Applications*, 7:12, 2021.

## Edge Handling
The code is designed to process both undirected and directed edges, prioritizing directed edges in scenarios where an equivalent undirected edge exists and selecting higher edge weights in the case of duplicate edges.

## Input Format Example
The input should be formatted as follows, with columns for node1, node2, rank, and direction:
```
A B 0.9 U
B A 0.1 D
...
```
In this format, "U" represents an undirected edge, and "D" represents a directed edge.

## Dependencies

Google's [OR-Tools library](https://developers.google.com/optimization/flow/mincostflow) is required to run this script.
Expand All @@ -32,3 +44,8 @@ Python 3 is required to run this script
> --output Prefix for all output files.
>
> --capacity The amount of flow which can pass through a single edge.

## Testing
`python test_minCostFlow.py`

The code executes two sets of graph series, namely the 'graph series' and the 'test series' The graphs series of graphs are used to check the code's correctness. Except for internal tiebreaking by the solver, each result is deterministic. The tests series of graphs are used to verify whether the code is executing appropriately depending on distinct edge cases. The expected results for both series can be found in graphs/correct_outputs.txt for the graph series and tests/correct_outputs.txt for the test series.
53 changes: 53 additions & 0 deletions graphs/correct_outputs.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
The graphs series of graphs are used to check the code's correctness. Each result is deterministic.

graph1:
A B D
B D D

graph2:
A B D
B D D

graph3:
A C D
C D D

graph4:
A B D
B D D

graph5:
A B U
B D U

graph6:
A B U
B D U

graph7:
A B U
B D U

graph8:
A B U
B D U

graph9:
A B D
B D U

graph10:
A B D
B D D

graph11:
A B U
B D U

graph12:
A B D
B D D

graph13:
A B U
B D U
8 changes: 4 additions & 4 deletions graphs/graph1/edges.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
A B 0.9
A C 0.1
B D 0.9
C D 0.1
A B 0.9 D
A C 0.1 D
B D 0.9 D
C D 0.1 D
4 changes: 4 additions & 0 deletions graphs/graph10/edges.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
A B 0.9 D
A C 0.1 U
B D 0.9 D
C D 0.1 U
1 change: 1 addition & 0 deletions graphs/graph10/sources.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
A
1 change: 1 addition & 0 deletions graphs/graph10/targets.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
D
4 changes: 4 additions & 0 deletions graphs/graph11/edges.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
A B 0.9 U
A C 0.1 D
B D 0.9 U
C D 0.1 D
1 change: 1 addition & 0 deletions graphs/graph11/sources.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
A
1 change: 1 addition & 0 deletions graphs/graph11/targets.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
D
4 changes: 4 additions & 0 deletions graphs/graph12/edges.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
A B 0.9 D
A C 0.1 D
B D 0.8 D
C D 0.2 D
1 change: 1 addition & 0 deletions graphs/graph12/sources.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
A
1 change: 1 addition & 0 deletions graphs/graph12/targets.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
D
4 changes: 4 additions & 0 deletions graphs/graph13/edges.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
A B 0.9 U
A C 0.1 U
B D 0.8 U
C D 0.2 U
1 change: 1 addition & 0 deletions graphs/graph13/sources.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
A
1 change: 1 addition & 0 deletions graphs/graph13/targets.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
D
10 changes: 5 additions & 5 deletions graphs/graph2/edges.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
A B 0.9
A C 0.1
B D 0.9
C D 0.1
A D 0.8
A B 0.9 D
A C 0.1 D
B D 0.9 D
C D 0.1 D
A D 0.8 D
8 changes: 4 additions & 4 deletions graphs/graph3/edges.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
A B 0.9
A C 0.1
B D 0.1
C D 0.9
A B 0.9 D
A C 0.1 D
B D 0.1 D
C D 0.9 D
8 changes: 4 additions & 4 deletions graphs/graph4/edges.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
A B 0.9
A C 0.9
B D 0.9
C D 0.9
A B 0.9 D
A C 0.9 D
B D 0.9 D
C D 0.9 D
4 changes: 4 additions & 0 deletions graphs/graph5/edges.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
A B 0.9 U
A C 0.1 U
B D 0.9 U
C D 0.1 U
1 change: 1 addition & 0 deletions graphs/graph5/sources.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
A
1 change: 1 addition & 0 deletions graphs/graph5/targets.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
D
5 changes: 5 additions & 0 deletions graphs/graph6/edges.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
A B 0.9 U
A C 0.1 U
B D 0.9 U
C D 0.1 U
A D 0.8 U
1 change: 1 addition & 0 deletions graphs/graph6/sources.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
A
1 change: 1 addition & 0 deletions graphs/graph6/targets.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
D
4 changes: 4 additions & 0 deletions graphs/graph7/edges.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
A B 0.9 U
A C 0.1 U
B D 0.1 U
C D 0.9 U
1 change: 1 addition & 0 deletions graphs/graph7/sources.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
A
1 change: 1 addition & 0 deletions graphs/graph7/targets.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
D
4 changes: 4 additions & 0 deletions graphs/graph8/edges.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
A B 0.9 U
A C 0.9 U
B D 0.9 U
C D 0.9 U
1 change: 1 addition & 0 deletions graphs/graph8/sources.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
A
1 change: 1 addition & 0 deletions graphs/graph8/targets.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
D
4 changes: 4 additions & 0 deletions graphs/graph9/edges.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
A B 0.9 D
A C 0.1 U
B D 0.9 U
C D 0.1 D
1 change: 1 addition & 0 deletions graphs/graph9/sources.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
A
1 change: 1 addition & 0 deletions graphs/graph9/targets.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
D
67 changes: 59 additions & 8 deletions minCostFlow.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,10 @@
import argparse
from ortools.graph.python.min_cost_flow import SimpleMinCostFlow

# (node1, node2) : weight
directed_dict = dict()
undirected_dict = dict()

def parse_nodes(node_file):
''' Parse a list of sources or targets and return a set '''
with open(node_file) as node_f:
Expand All @@ -26,13 +30,17 @@ def construct_digraph(edges_file, cap):
capacity of 1.
'''
G = SimpleMinCostFlow()
idDict = dict() #Hold names to number ids
idDict = dict() # Hold names to number ids
curID = 0
default_capacity = int(cap)

with open(edges_file) as edges_f:
for line in edges_f:
tokens = line.strip().split()

if len(tokens) != 4 :
raise ValueError (f"Each row in the edges file {edges_file} should contain 4 values to define an edge. Currently a row has {len(tokens)} values.")

node1 = tokens[0]
if not node1 in idDict:
idDict[node1] = curID
Expand All @@ -41,10 +49,42 @@ def construct_digraph(edges_file, cap):
if not node2 in idDict:
idDict[node2] = curID
curID += 1
#Google's solver can only handle int weights, so round to the 100th
w = int((1-(float(tokens[2])))*100)
G.add_arc_with_capacity_and_unit_cost(idDict[node1],idDict[node2], default_capacity, int(w))
G.add_arc_with_capacity_and_unit_cost(idDict[node2],idDict[node1], default_capacity, int(w))
# Google's solver can only handle int weights, so round to the 100th
w = int((1-(float(tokens[2])))*100) # lower the weight from token[2], higher the cost
d = tokens[3]
ntalluri marked this conversation as resolved.
Show resolved Hide resolved
edge = (node1, node2)
sorted_edge = tuple(sorted(edge, reverse=False)) # all undirected edges are sorted edges
sorted_edge_reverse = tuple(sorted(edge, reverse=True))

if d == "D":
ntalluri marked this conversation as resolved.
Show resolved Hide resolved
if edge in directed_dict:
if w < directed_dict[edge]: # if weight is lower than the current edge, replace with newer edge weight
directed_dict[edge] = w
elif sorted_edge in undirected_dict: # priorize directed edges over undirected edges
del undirected_dict[sorted_edge]
directed_dict[edge] = w
else: # edge not in directed_dict
directed_dict[edge] = w

elif d == "U":
# add new edge to undirected dict; check for edge existing in directed_edges or undirected_dict
# if edge == sorted_edge, there is a chance reverse of edge (sorted_edge_reverse) is still in the directed_dict
if edge not in directed_dict and sorted_edge not in directed_dict and sorted_edge_reverse not in directed_dict and sorted_edge not in undirected_dict:
undirected_dict[sorted_edge] = w
elif sorted_edge in undirected_dict:
if w < undirected_dict[sorted_edge]: # if weight is lower than the current edge, replace with newer edge weight
undirected_dict[sorted_edge] = w
else:
raise ValueError (f"Cannot add edge: d = {d}")


# go through and add the edges from directed_dict and undirected_dict to G
for key, value in directed_dict.items():
G.add_arc_with_capacity_and_unit_cost(idDict[key[0]],idDict[key[1]], default_capacity, int(value))
for key, value in undirected_dict.items():
G.add_arc_with_capacity_and_unit_cost(idDict[key[0]],idDict[key[1]], default_capacity, int(value))
G.add_arc_with_capacity_and_unit_cost(idDict[key[1]],idDict[key[0]], default_capacity, int(value))

idDict["maxID"] = curID
return G,idDict

Expand Down Expand Up @@ -87,8 +127,9 @@ def write_output_to_sif(G,out_file_name,idDict):
names = {v: k for k, v in idDict.items()}
numE = 0
for i in range(G.num_arcs()):
node1 = names[G.head(i)]
node2 = names[G.tail(i)]
node1 = names[G.tail(i)]
node2 = names[G.head(i)]

flow = G.flow(i)
if flow <= 0:
continue
Expand All @@ -97,7 +138,17 @@ def write_output_to_sif(G,out_file_name,idDict):
if node2 in ["source","target"]:
continue
numE+=1
out_file.write(node1+"\t"+node2+"\n")

edge = (node1, node2)
sorted_edge = tuple(sorted(edge))

if edge in directed_dict:
out_file.write(edge[0]+"\t"+edge[1]+"\t"+"D"+"\n")
elif sorted_edge in undirected_dict:
out_file.write(sorted_edge[0]+"\t"+sorted_edge[1]+"\t"+"U"+"\n")
else:
raise KeyError(f"edge {edge} is not in the dicts")

print("Final network had %d edges" % numE)
out_file.close()

Expand Down
34 changes: 34 additions & 0 deletions test_minCostFlow.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
import subprocess

command = "python"
script = "minCostFlow.py"

print("TEST SERIES")
for i in range (1,8):

print("test: ",i)
args = [
"--edges_file", f"tests/test{i}/edges.txt",
"--sources_file", f"tests/test{i}/sources.txt",
"--targets_file", f"tests/test{i}/targets.txt",
"--output", f"test{i}"
]
cmd = [command, script] + args

# Run the command
subprocess.run(cmd)


print("\nGRAPHS SERIES")
for i in range (1,14):
print("graph: ",i)
args = [
"--edges_file", f"graphs/graph{i}/edges.txt",
"--sources_file", f"graphs/graph{i}/sources.txt",
"--targets_file", f"graphs/graph{i}/targets.txt",
"--output", f"graph{i}"
]
cmd = [command, script] + args

# Run the command
subprocess.run(cmd)
Loading
Loading