Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new builds are not picking up new commits #157

Open
bennlich opened this issue May 4, 2019 · 10 comments
Open

new builds are not picking up new commits #157

bennlich opened this issue May 4, 2019 · 10 comments

Comments

@bennlich
Copy link
Collaborator

bennlich commented May 4, 2019

First clue: retrieve_ip was still deleting itself even though eenblam pushed a commit months ago to leave it in place. Just saw this on a node flashed with https://builds.sudomesh.org/sudowrt-firmware/latest/2019-04-28/ar71xx/openwrt-ar71xx-generic-mynet-n600-squashfs-factory-ede0ea.bin

I currently don't understand how this is possible, unless the image we're flashing somehow does not have @eenblam 's changes from 0e6d123.

@bennlich bennlich changed the title retrieve_ip still disappearing from /opt/mesh after autoconf auto-builds are not picking up new commits May 6, 2019
@bennlich
Copy link
Collaborator Author

bennlich commented May 6, 2019

It looks like the build has been broken since last October :-/

Following the readme, I was able to complete a build in a local docker container. My built image reproduced the bug above--it did not have any of the latest commits, just like the latest images built by travis and deployed to our build server.

As far as I can tell, the most recent commit in any published firmware build is 619e263.

The problem appears to be that the sudowrt firmware image files in the sudowrt-firmware docker image are not getting cleaned out during the build step. There is probably an openwrt clean command we can run to properly clean out the build directories or force the files to be overwritten.

@bennlich bennlich changed the title auto-builds are not picking up new commits new builds are not picking up new commits May 6, 2019
@bennlich
Copy link
Collaborator Author

bennlich commented May 6, 2019

Oh. Maybe the bug is simply that we are running build_pre without ever running build: 67a8b12. I'll check tomorrow.

@bennlich
Copy link
Collaborator Author

I verified that build does properly build the firmware including all the files you'd expect. It takes ~20 minutes on my machine.

I imagine there is a way to write a build-lite script that just copies the files into the image instead of rebuilding a bunch of tooling that is already built, but build_pre is apparently not the way.

Next steps are:

  1. make the travis job use build instead of build_pre (hopefully this doesn't time out travis)
  2. look for another way to speed up the build (maybe ask on openwrt mailing list or look through the docs or the makefile)

@ghost
Copy link

ghost commented May 10, 2019

if there was a way to indicate that files have not changed then new compiling/file-generation will not happen and just cp -r from/to where they need to go.

@paidforby
Copy link

@bennlich you are on the right track. This appears to be caused by a mistake on my part.
Looking at entrypoint.sh line 16, we run.

time ./build_pre $ARCH

However, older, presumably working commits had

time ./build_only $ARCH

One problem, build_only no longer exists in the current repo.
Rummaging through other commit history, you'll find that build_only appeared a script that called an openwrt_rebuilder function from build_lib. openwrt_rebuilder also no longer exists. IIRC, I realized that this rebuilder function was mostly a copy of openwrt_builder function, which explains why I got rid of it. Yet, that does not explain why I got rid of build_only, or why I would replace build_only with build_pre which would seem to serve the exact inverse function. I don' know if running build instead of build_pre will have the desired effect. It may be best to reintroduce build_only and make sure it works as intended.

Sorry about that confusion, it seems like I just got mixed up about the build process somewhere in all my commits.

@wrought
Copy link
Contributor

wrought commented Dec 3, 2019

I'm confused about what the issue is that requires bringing build_only back and why build won't work, or build_clean for that matter.

@paidforby
Copy link

The idea of build_only is that it would only build the firmware, not build the entire openwrt builder toolchain. This could then be used in conjunction with build_pre, which would only fetch dependencies and build the openwrt builder toolchain. All of this is related to work i did in #137 to reduce openwrt build times and to deploy builds using travisCI. This comment describes the changes I made to the build process fairly well, #137 (comment)

What I realized, IIRC, is that the issue with build times wasn't an openwrt issue, but an issue with how we were managing our docker containers. That is, if you build your images using the same computer (or docker container) over and over again, the first build takes ~40mins but each build after that only takes a few minutes, depending on the specs of your machine. What we previously suggested was spinning up a new docker container every time you wanted to build the firmware. Instead, I suggested that we provide an "already baked" docker image in which someone (i.e. @paidforby) had already run a build once.

That being said, I think build might work fine, but you might lose the reduced build times. If builds take long than a few minutes, you can easily revive build_only, which looking at commit history looked like this https://github.com/sudomesh/sudowrt-firmware/blob/530a792b2a0d8c9034c2ba71b25fc45c666671a7/build_only

@paidforby
Copy link

So I've been messing around with this a little. It's all starting to come back.

The reason you should not run build_only (when it just ran openwrt_builder) is because it would not pull in changes to openwrt packages and configs. That is why I created openwrt_rebuilder, which took parts of the full build script in an attempt to only rebuild the images and do nothing else. What I realized is that openwrt_rebuilder was not that different from the full build script, so I got rid off it.

I also discovered an error in the docker image itself. For some reason, I had only ran build_pre in the image, rather than the full build. I fixed this in a new docker image here. This image also fixes some outdated packages that were baked into it.

You can see that I slightly improved build times with my last few commits, https://travis-ci.org/sudomesh/sudowrt-firmware/builds

Also someone should test the latest builds (3934f7 is the most recent) http://builds.sudomesh.org/sudowrt-firmware/latest/2019-12-04/ar71xx/ to see if they have pulled in changes and effectively resolved this issue (I would check myself but I don't have an N600 on hand).

@bennlich
Copy link
Collaborator Author

Thank you for reliving this again @paidforby ! And for documenting all your thinking--it's super helpful! 🐬 🐬 🐬

So, to recap, if I understand correctly:

  1. You think you fixed the "new builds are not picking up new commits" issue by replacing build_pre with build_only and openwrt_rebuilder in the docker image entrypoint.sh (3934f71 and 135e4b1)
  2. We now need to verify that the latest build actually works on an N600 and that the build is actually picking up new commits (i.e. that this issue is really truly fixed)

Follow-ups:

A) Do you think there's any more work to be done to speed up the builds?

What I realized is that openwrt_rebuilder was not that different from the full build script, so I got rid off it.

Z) I'm now wondering what is the difference between openwrt_rebuilder and build.
Is there any documentation about the build, pre_build, etc. scripts online, or does one just need to read the scripts? Did we write those or did openwrt people?

@paidforby
Copy link

Good recap @bennlich.

To address your questions,

A) I'm sure there are tricks to further optimizing OpenWrt builds. Though, I'm pretty sure the limitations of build times on TravisCI are mostly with bandwidth and computing power. Downloading the 7.7GB, pre-baked OpenWrt builder image takes at least 7mins. While the build takes 13mins. However, if I download the same image and execute the same entrypoint on my laptop, it only takes 1 minute and 11 seconds to build. As long as TravisCI builds complete before timing out, it doesn't matter how long they take. And I think 1 minute local builds are more than adequate.

Z) I think you, @bennlich, have a grasp of how the build process works, but it definitely can be a little confusing since there have been so many cooks in the kitchen, so to speak. For posterity's sake, I'll give a quick history of our build script development and a break down of the current files and thought process behind it all (maybe this is something that should be in the README?)

TL;DR,

  • openwrt_rebuilder is a function in build_lib that re-runs parts of the openwrt_builder and openwrt_build_configure functions.
  • build is a shell script that calls functions from build_lib
  • build, build_pre, and all the functions in build_lib are our original creations, they take inspiration for sources such as this and this.

Long version (skip to the end for a description of files)

I'm not 100% sure how it started, but I believe it began with two bash scripts. One called build and one called build_extender-node. build essentially took instructions from OpenWrt such as these, scriptified them, and made them specific to the needs of sudowrt. This script would then be run locally on a machine that had all the necessary dependencies installed, a la the legacy instructions. build_extender-node could then be run on the same machine which already had built the home node firmware and it would build quickly (this knowledge of the old build process is actually what gave me the idea of how to speed up the builds). This is how things were built almost 4 years.

In January 2017, we introduced Docker into the equation. Using a Dockerfile, you now did not need to install a bunch of dependencies on your personal laptop. The Dockerfile would get a plain version of ubuntu 14.04 and then would install all the necessary dependencies. This also created the need for entrypoint.sh, which would execute the build process inside of the docker container. This worked pretty smoothly but it always built the firmware from scratch in a new container every time, which meant it always took 40mins to build. In this way, we entered into a dark age in which some of us (i.e. myself) believed that a firmware build should always take 40mins and had no idea that they could ever take less than 2mins.

Then, in November 2017, we introduced a little bash script called build_lib. This was no more than a refactor of build. It took the functions from build and put them in a generic build_lib script that could be reused for different parts and types of builds. This was part of an effort in #116 to create an image that did not need an internet connection to build the firmware. In this way, we created multiple possible scripts for starting builds, e.g. build_pre which would run all the functions the build runs except openwrt_builder and build_only which would only run openwrt_builder.

Also, around this time, TravisCI was finally correctly setup to run passing builds (it existed in the repo since 2015, but was never working?). In Decemeber 2017, Travis only built the docker image, i.e. execute the Dockerfile (which ran build_pre), and did not run the docker container, i.e. run entrypoint.sh (which ran build_only). It then commited resulting the docker image to docker hub.

Finally, August 2018, I reapproached some old issues like #116 and #111 with the idea of making Travis more useful. This lead us to our current point where Travis actually runs the whole build by first downloading a completed build image and re-running the build, instead of only running build_pre.

To recap by listing files by approximate creation date, with a description of their current use,

  • build - a shell script that calls the following functions from build_lib, openwrt_build_configure, openwrt_buildprep, and openwrt_builder.
  • build_extender-node a legacy script for building firmware to flash on Ubiquiti radios (Nanostations, Nanobeams, etc).
  • Dockerfile - docker script that runs when you execute docker build and is baked to into docker hub image. It's last steps are to execute build for ar71xx architecture and set the docker entrypoint as entrypoint.sh.
  • entrypoint.sh - shell script that runs when you execute docker run or docker start, adds the git commit to the openwrt banner, copies in any updates in git repo files directory to the firmware build files directory, executes a timed build_only, copies binaries into an easier to access directory that is also linked from the docker container to your local computer.
  • build_lib - shell script that contains library of functions such as openwrt_build_configure, openwrt_buildprep, and openwrt_builder.
  • build_pre - shell script that runs following functions from build_lib, openwrt_build_configure and openwrt_buildprep
  • build_only - shell script that runs openwrt_rebuilder function from build_lib.
  • build_clean - same as build but deletes the binaries that are built, i.e. cleans the docker conatiner.
  • create_build_machine.sh - shell script to create a build machine on a remote server (such as a digital ocean droplet). Status unknown.
  • auto_build - meant to be set as a cronjob on a remote build machine. Status unknown.
  • .travis.yml - runs CI script on Travis, pulls in pre-baked docker hub image, executes entrypoint.sh, and deploys the newly built images to http://builds.sudomesh.org/sudowrt-firmware/latest/ using deploy_key.
  • deploy_key.enc - encrypted deployment key for pushing images to builds.sudomesh.org

Hopefully, this history and information is of use to someone, I just came to the realization that I may be the only person involved who remembers/knows any of this, so I figured I would dump my brain here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants