Skip to content

Commit 33548fc

Browse files
committed
Add post about .cargo directories
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
1 parent 1e25675 commit 33548fc

File tree

1 file changed

+148
-0
lines changed

1 file changed

+148
-0
lines changed

_posts/2023-01-24-cargo-dirs.md

Lines changed: 148 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,148 @@
1+
---
2+
title: ".cargo directories explained"
3+
date: 2023-01-24
4+
author: Daniel Bevenius
5+
---
6+
7+
This post takes a closer look at the `.cargo` directory with a focus on the
8+
`git`, and `registry` directories.
9+
10+
## .cargo/git directory
11+
If we do a listing of this directory we will find two subdirectories, namely
12+
`db`, and `checkouts`.
13+
14+
If we list the contents of one of those directories we will see that there is a
15+
hash appended to every crate. For example:
16+
```
17+
~/.cargo/git/db/sigstore-rs-874f7064c0c10336/
18+
```
19+
This is a hash of the url of the git repository. To verify this there is a
20+
[command line tool](https://github.com/trustification/source-distributed#print-git-project-hash)
21+
that can be used:
22+
```console
23+
$ cargo r --quiet --bin project-hash -- -u https://github.com/sigstore/sigstore-rs.git
24+
https://github.com/sigstore/sigstore-rs.git: 874f7064c0c10336
25+
```
26+
And we can check this hash against the hash above.
27+
28+
The directories in `.cargo/git/db` are the bare git repositories, and the
29+
directories in `.cargo/git/checkouts` are the checked-out revisions and they
30+
have a directory for each revision (short hash) used by Cargo.
31+
32+
## .cargo/registry directory
33+
The local dependencies from crates.io are located in `/.cargo/registry`:
34+
```console
35+
$ ls ~/.cargo/registry/
36+
cache CACHEDIR.TAG index src
37+
```
38+
There can be multiple registries which are located in the index directory:
39+
```
40+
$ ls ~/.cargo/registry/index/
41+
github.51.al-1ecc6299db9ec823
42+
```
43+
Now this was a little confusing to me as I did not expect a github.com directory
44+
here. It turns out that Cargo communicates with registries through a github
45+
repository which is called the `Index`. One such github repository is
46+
https://github.com/rust-lang/crates.io-index.
47+
48+
Lets clone this index and take a look at it:
49+
```console
50+
$ git clone https://github.com/rust-lang/crates.io-index.git
51+
$ cd crates.io-index/
52+
```
53+
If we list the contents of this directory we will see a number of subdirectories
54+
starting with one, or two characters/symbols/numbers. Additionally there is also
55+
a `config.json` file.
56+
57+
Now, notice that this index does not contain any crates:
58+
```console
59+
$ find . -name '*.crate' | wc -l
60+
0
61+
```
62+
Instead what the index stores is a list of versions for all known packages. Each
63+
crate will have a single file and there will be an entry in this file for each
64+
version.
65+
66+
Lets take a look at the `drg` crate:
67+
```console
68+
$ cat 3/d/drg
69+
{"name":"drg","vers":"0.1.0","deps":[],"cksum":"c6bfa8b0b1bcd485d5f783e77faf13ba9453e7ab78991936e50d6cfdca23d647","features":{},"yanked":true}
70+
{"name":"drg","vers":"0.2.1","deps":[{"name":"anyhow","req":"^1.0","features":[],"optional":false,"default_features":true,"target":null,"kind":"normal"},{"name":"chrono","req":"^0.4","features":["serde"],"optional":false,"default_features":true,"target":null,"kind":"normal"},{"name":"clap","req":"^2.33.3","features":[],"optional":false,"default_features":true,"target":null,"kind":"normal"},{"name":"oauth2","req":"^3.0","features":[],"optional":false,"default_features":true,"target":null,"kind":"normal"},{"name":"qstring","req":"^0.7.2","features":[],"optional":false,"default_features":true,"target":null,"kind":"normal"},{"name":"reqwest","req":"^0.11","features":["blocking","json"],"optional":false,"default_features":true,"target":null,"kind":"normal"},{"name":"serde","req":"^1.0","features":["derive"],"optional":false,"default_features":true,"target":null,"kind":"normal"},{"name":"serde_json","req":"^1.0","features":[],"optional":false,"default_features":true,"target":null,"kind":"normal"},{"name":"strum","req":"^0.20","features":[],"optional":false,"default_features":true,"target":null,"kind":"normal"},{"name":"strum_macros","req":"^0.20","features":[],"optional":false,"default_features":true,"target":null,"kind":"normal"},{"name":"tempfile","req":"^3.2.0","features":[],"optional":false,"default_features":true,"target":null,"kind":"normal"},{"name":"tiny_http","req":"^0.8.0","features":[],"optional":false,"default_features":true,"target":null,"kind":"normal"},{"name":"url","req":"^2.2.1","features":["serde"],"optional":false,"default_features":true,"target":null,"kind":"normal"}],"cksum":"cfb067bfabd64c3b4732a3afd2b9a757a88120f6dac6400eae5b865732be0404","features":{},"yanked":false}
71+
...
72+
```
73+
Notice that there are three directories named `1`, `2`, and `3` which are for
74+
crates that have one, two, or three letters/characters in their name. This is
75+
the case with `drg` above.
76+
77+
For other crates with longer names, the first directory matches the first two
78+
characters of the crate, and the subdirectory under that will have another
79+
directory matching the following two characters of the crate name.
80+
For example, if we want to find the `drogue-device` crate, we would search for
81+
`dr` as the first directory, and then `og` as the subdirectory:
82+
```console
83+
$ cat ./dr/og/drogue-device | jq
84+
{
85+
"name": "drogue-device",
86+
"vers": "0.0.0",
87+
"deps": [],
88+
"cksum": "2acc1a9827b5cd933ebef9824415789012f5202b6bcacddaae2c214486ac996a",
89+
"features": {},
90+
"yanked": false
91+
}
92+
```
93+
When new versions of this crate are released a new entry/line in this file will
94+
be created.
95+
96+
Updates to the index are fairly cheap, just like a normal git fetch and a
97+
git fast forward.
98+
99+
Alright, so we now have an effecient way to look up a crate version and its
100+
dependencies but we haven't seen any crates yet. This is where the file
101+
`config.json` comes in to play:
102+
```console
103+
$ cat config.json
104+
{
105+
"dl": "https://crates.io/api/v1/crates",
106+
"api": "https://crates.io"
107+
}
108+
```
109+
`dl` stands for `download` and is the url that can be used to download a
110+
specific crate to the `.cargo/registry/cache` directory.
111+
112+
We can do this manually using the value of `dl`:
113+
```console
114+
$ curl -v -L https://crates.io/api/v1/crates/drg/0.1.0/download --output drg-0.0.1.crate
115+
```
116+
And we should then be able to list the content of this crate:
117+
```console
118+
$ tar tvf drg-0.0.1.crate
119+
-rw-r--r-- 0/0 74 2021-03-18 15:57 drg-0.1.0/.cargo_vcs_info.json
120+
-rw-r--r-- 110147/110147 8 2021-03-18 15:55 drg-0.1.0/.gitignore
121+
-rw-r--r-- 0/0 134 2021-03-18 15:57 drg-0.1.0/Cargo.lock
122+
-rw-r--r-- 0/0 754 2021-03-18 15:57 drg-0.1.0/Cargo.toml
123+
-rw-r--r-- 110147/110147 327 2021-03-18 15:56 drg-0.1.0/Cargo.toml.orig
124+
-rw-r--r-- 110147/110147 45 2021-03-18 15:55 drg-0.1.0/src/main.rs
125+
```
126+
Cargo will download crates to the `.cargo/registry/cache` directory which
127+
will only contain the downloaded crates, the `.crate` compressed tar files.
128+
These never change for a version so they don't have to be downloaded again.
129+
130+
The `src` directory is where the downloaded crates in the cache directory are
131+
unpacked:
132+
```console
133+
$ ls ~/.cargo/registry/src/
134+
github.51.al-1ecc6299db9ec823
135+
```
136+
137+
The hash appended is a hash of the the identifier of the crates repository,
138+
in this case `crates.io` To verify this there is a
139+
[command line tool](https://github.com/trustification/source-distributed#print-cargo-index-hash)
140+
that can be used:
141+
```console
142+
$ cargo r --quiet --bin index-dir-hash
143+
crates-io: 1ecc6299db9ec823
144+
```
145+
And we can check this hash against the hash above.
146+
147+
Hopefully this post clarifies what some of the directories under the .cargo
148+
directory are used for.

0 commit comments

Comments
 (0)