Code for the paper published at ICLR 2025.
Designing mechanically efficient geometry for architectural structures like shells, towers, and bridges, is an expensive iterative process. Existing techniques for solving such inverse problems rely on traditional optimization methods, which are slow and computationally expensive, limiting iteration speed and design exploration. Neural networks would seem to offer a solution via data-driven amortized optimization, but they often require extensive fine-tuning and cannot ensure that important design criteria, such as mechanical integrity, are met.
In this work, we combine neural networks with a differentiable mechanics simulator to develop a model that accelerates the solution of shape approximation problems for architectural structures represented as bar systems. This model explicitly guarantees compliance with mechanical constraints while generating designs that closely match target geometries. We validate our approach in two tasks, the design of masonry shells and cable-net towers. Our model achieves better accuracy and generalization than fully neural alternatives, and comparable accuracy to direct optimization but in real time, enabling fast and reliable design exploration. We further demonstrate its advantages by integrating it into 3D modeling software and fabricating a physical prototype. Our work opens up new opportunities for accelerated mechanical design enhanced by neural networks for the built environment.
- Pretrained models
- Repository structure
- Installation
- Configuration
- Training
- Testing
- Visualization
- Direct optimization
- Predict then optimize
- Citation
- Contact
Wan't to skip training? We got you 💪🏽 Our trained model weights are publicly available at this link. Once downloaded, you can test the models at inference time and display their predictions.
This repository contains two folders with the meat of our work: src
and scripts
.
The first folder, src
, defines all the code infrastructure we need to build, train, serialize, and visualize our model and the baselines.
The second one, scripts
, groups a list of routines to execute the code in src
, and more importantly, to reproduce our experiments at inference time.
With the scripts, you can even tesselate and 3D print your own masonry vault from one of our model predictions if you fancy!
We only support installation on a CPU. Our paper does not use any GPUs. Crazy, right? 🪄
Create a new Anaconda environment with Python 3.0 and then activate it:
conda create -n neural python=3.9
conda activate neural
- Install the required Conda dependencies:
conda install -c conda-forge compas==1.17.10 compas_view2==0.7.0
- Install the package and its pip dependencies:
pip install -e .
The -e .
flag installs the package in "editable" mode, which means changes to the source code take effect immediately without reinstalling.
If you need additional development tools (testing, formatting, etc.), are interested in making data plots, or want to generate bricks for a shell, follow these steps:
- Install the necessary and additional Conda dependencies:
conda install -c conda-forge compas==1.17.10 compas_view2==0.7.0 compas_cgal==0.5.0
- Install the package with development dependencies:
pip install -e ".[dev]"
Our work focuses on two structural design tasks: compression-only shells and cablenet towers.
We thus create a .yml
file with all the configuration hyperparameters per task.
The files are stored in the scripts
folder as:
bezier.yml
, andtower.yml
for the first and the second task, respectively. The hyperparameters exposed in the configuration files range from choosing a data generator, prescribing the model architecture, and the optimization scheme. We'll be mingling with them to steer the wheel while we run experiments.
An advantage of our work is that we only need to define target shapes alone, without a vector of force densities to be paired as ground-truth labels.
That would be the case in a fully supervised setting, which is not the case here.
Our model figures these labels out automatically.
This allows us to generate a dataset of target shapes on the fly at train time by specifying a generator
configuration and a random seed
to create pseudo-random keys.
The target shell shapes are parametrized by a square Bezier patch.
name
: The name of the generator to instantiate. One ofbezier_symmetric_double
,bezier_symmetric
,bezier_asymmetric
, andbezier_lerp
. The first two options constraint the shapes to be symmetric along two or one axis, respectively. The third option does not enforce symmetry. The last option,bezier_lerp
is used to interpolate linearnly a batch of doubly-symmetric and asymmetric shapes (i.e., shapes generated bybezier_symmetric_double
andbezier_asymmetric
).num_points
: the number of control points on one side of the grid that parametrizes a Bezier patch.bounds
: It specifies how to wiggle the control points of the patch on anum_points x num_points
grid. The optionpillow
only moves the internal control point up and down, whiledome
additionally jitters the two control points on the boundary in and out.saddle
is an extension ofdome
in that it lets one of the control points on the boundary move up and down too.num_uv
: The number of spans to evaluate on the Bezier along the u and v directions. A value of10
, for example, would result in a10x10
grid of target points. These are the points to be matched during training.size
: The length of the sides of the patch. It defines the scale of the task.lerp_factor
: A scalar factor in [0, 1] to interpolate between two target surfaces. Only employed forbezier_lerp
.
The target tower shapes are described in turn by a vertical sequence of planar circles.
The tower rings are deformed and rotated depending on the generator name
and bounds
.
name
: The generator name. Usetower_ellipse
to make target shapes with elliptical rings, andtower_circles
to keep the rings as circles.bounds
: Eitherstraight
ortwisted
. The former only scales the rings on the plane at random. The latter scales and rotates the rings at random.height
: The tower height.radius
: The start radius of the all the generated circles.num_sides
: The number of segments to discretize each circle with.num_levels
: The number of circles to create along the tower's height. Equidistantly spaced.num_rings
: The number of circles to be morphed during training. Must be>2
since two of these rings are, by default, at the top and bottom of the tower.
We specify the architecture of a model in the configuration file, which for the most part, ressembles an autoencoder. The configuration scheme is the same for any task.
Our experiments use multilayer perceptrons (MLP) for the encoder that maps shapes to simulation parameters, although we are by no means restricted to that. An MLP too serves as a decoder for our fully neural baselines. We employ one of the simplest possible neural networks, the MLP, to quantify the benefits of having a physics simulator in a neural network in large-scale mechanical design tasks. This sets a baseline from which we can build upon with beefier architectures like graph neural networks, transformers, and beyond.
The encoder hyperparameters are:
shift
: The lower bound shift in output of the last layer of the encoder. This is what we calltau
in the paper.hidden_layer_size
: The width of every fully-connected hidden layer. We restrict the size to256
in all the experiments.hidden_layer_num
: The number of hidden layers, output layer included.activation_fn_name
: The name of the activation function after each hidden layer. We typically resort toelu
.final_activation_fn_name
: The activation function name after the output layer. We usesoftplus
to ensure a strictly positive output, as needed by the simulator decoder.
The neural decoder's setup mirrors the encoder's, except for the include_params_xl
flag.
If set to True
, then the decoder expects the latents and boundary conditions as inputs.
Otherwise, it only decodes the latents.
We fix this hyperparameter to True
in the paper.
For the simulator, the force density method (FDM), we only have load
as a hyperparameter, which sets the magnitude of a vertical area load applied to the structures in the direction of gravity (hello Isaac Newton! 🍎).
If this value is nonzero, then the model will convert the area load into point loads to be compatible with our physics simulator.
The training setup is also defined in the configuration file of the task, including the loss
function to optimize for, the optimizer
that updates the model parameters, and the training
schedule that pretty much allocates the compute budget.
The loss
function is the sum of multiple terms, that for the most part are a shape loss and a physics loss, as we explain in the paper.
We allow for more refined control on the scaling of each loss term in the file:
include
: Whether or not to include the loss term during training. If set toFalse
, then the value of the loss term is not calculated, saving some computation resources. By default,include=True
.weight
: The scalar weight of the loss term used for callibrating model performance, calledkappa
in the paper. It is particularly useful to tune the scale of the physics loss when training the PINN baseline. Theweight=1.0
by default.
The optimizer
hyperparameters are:
name
: the name of the gradient-based optimizer. We currently supportadam
andsgd
from theoptax
library, but only useadam
in the paper.learning_rate
: The constant learning rate. The rate is fixed, we ommit schedulers - it is more elegant.clip_norm
: The global norm for gradient clipping. If set to0.0
, then gradient clipping is ignored.
And for the training
routine:
steps
: The number of optimization steps for model training (i.e., the number of times the model parameters are updated). We mostly train the models for10000
steps.batch_size
: The batch size of the input data.
After setting up the config files, now it's time to make that CPU go brrrrr.
Execute the train.py
script from your terminal:
python train.py <model_name> <task_name>
Where task_name
is either bezier
for the shells task or tower
for the towers task.
Task-specific configuration details are given in the paper.
The model_name
is where things get interesting.
In summary:
- Ours:
formfinder
- NN and PINN baseline:
autoencoder
If autoencoder
is trained with the residual
(i.e., the physics loss is included and active), this model will become a PINN baseline and will internally be renamed as autoencoder_pinn
(sorry, naming is hard).
We invite you to check the docstring of the train.py
script to see all the input options.
They would allow you to warmstart the training from an existing pretrained model, checkpoint every so often, as well as plot and export the loss history for your inspection.
A note on hyperparameter tuning. We utilized WandB to run hyperparameter sweeps. The sweeps are in turn handled by the
sweep.py
script in tandem withsweep_bezier.yml
orsweep_tower.yml
files, depending on the task. The structure of these sweep files mimics that of the configuration files described herein. We trust you'll be able to find your way around them if you really want to fiddle with them.
To evaluate the trained models on a test batch, run:
python predict.py <model_name> <task_name> --batch_size=<batch_size> --seed=<test_seed>
where we set to --batch_size=100
during inference to match what we do in the paper.
The test set is created by a generator that follows the same configuration as the train set, except for the random seed.
We set test_seed
to 90
in the bezier
task and test_seed
to 92
in the tower
task.
Feel free to specify other seed values to test the model on different test datasets.
An image is worth more than a thousand words, or in this case, more than a thousand numbers in a JAX array.
You can visualize the prediction a model makes, either ours or the baselines, with a dedicated script that lets you take control over the style of the rendered prediction:
python visualize.py <model_name> <task_name> --shape_index=<shape_index> --seed=<test_seed>
The shape to display is selected by inputting its index relative to the batch size with the <shape_index>
argument.
Check out the docstring of visualize.py
for the nitty-gritty details of how to control color palettes, linewidths, and arrow scales for making pretty pictures.
So far we've only discussed how to create neural models for shape-matching tasks. Direct gradient-based optimization is another baseline that merits its own section as it is the ground-truth in traditional design optimization in structural engineering. Take an optimizer for a ride via:
python optimize.py <optimizer_name> <task_name> --batch_size=<batch_size> --seed=<test_seed> --blow=<blow> --bup=<bup> --param_init=<param_init> --maxiter=<maxiter> --tol=<tol>
We support two constrained gradient-based algorithms as implemented in jaxopt
.
Select one of them through their optimizer_name
:
slsqp
: The sequential least squares quadratic programming algorithm.lbfgsb
: The limited-memory Broyden–Fletcher–Goldfarb–Shanno algorithm.
The algorithms support box constraints on the simulation parameters.
We take advantage of this feature to constrain their value to a specific sign and to a range of reasonable values, depending on the task.
The lower box constraint is equivalent to the effect that tau
has on the decoder's output in the paper, by prescribing a minimum output value.
Both optimizers run for maxiter=5000
iterations at most and stop early if they hit the convergence tolerance of tol=1e-6
.
We're in the business of local optimization, so the inialization affects convergence.
You can pick between two initialization schemes with param_init
that respect the force density signs of a task (compression or tension):
- If specified as a scalar, it determines the starting constant value of all the simulation parameters.
- If set to
None
, the initialization samples starting parameters betweenblow
andbup
from a uniform distribution.
In the shells task, we apply slsqp
, set blow=0.0
and bup=20.0
, and param_init=None
.
In contrast, the towers task uses lbfgsb
and blow=1.0
to match the value of tau
we used in this task in the paper.
The towers task is more nuanced because we explore three different initialization schemes:
- Randomized:
param_init=None
- Expert:
param_init=1.0
The third initialization type relies on the predictions of a pre-trained model and, to use it, we need to invoke a different script. See Predict then optimize below.
There is enormous potential in combining neural networks with traditional optimization techniques to expedite mechanical design. An opportunity in this space is to leverage the prediction made by one of our models and refine that prediction with direct optimization to unlock the best-performing designs.
Our key to open a tiny (very tiny) door into this space is the predict_optimize.py
script:
python predict_optimize.py <model_name> <optimizer_name> <task_name> --batch_size=<batch_size> --seed=<test_seed> --blow=<blow> --bup=<bup> --maxiter=<maxiter> --tol=<tol>
What is different from the optimize.py
script is that, now, you will have to specify the name of a trained model via model_name
.
The predictions will warmstart the optimization, replacing any of the param_init
schemes described earlier.
The rest of the inputs work the same way as in optimize.py
.
Consider citing our paper if this work was helpful to your research. Don't worry, it's free.
@inproceedings{
pastrana_2025_diffmechanics,
title={Real-time design of architectural structures with differentiable mechanics and neural networks},
author={Rafael Pastrana and Eder Medina and Isabel M. de Oliveira and Sigrid Adriaenssens and Ryan P. Adams},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=Tpjq66xwTq}
}
Reach out! If you have questions or find bugs in our code, please open an issue on Github or email the authors at arpastrana@princeton.edu.