-
-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow file name substitutions when generating symcache files #422
Comments
Hmm, I just saw the following remark in issue 330:
So maybe we need an auxiliary data file anyway, to store the hg revision and the sha-512 hashes. |
I think a callback solution would be the most flexible here, and also completely outside symbolic. |
+1 on the callback. though getting that across the Python API will be some extra fun but probably worth doing. On whether this API should exist if you consider symcaches to be purely cached info. That's an interesting question, I think there's no problem adding it anyway it's up to you to re-generate the same symcache if you're regenerating it and it makes sense given the purpose of the symcache that it has all the info ready to use, needing a post-processing step there would not be great. That's also my view for #354, I see no problem with including the PE name into the symcache. With a callback you'd still need a means to somehow store the transformations for that specific DWARF/PDB file if you want to be able to re-generate the symcache I guess, so it doesn't make the auxiliary file entirely obsolete for you if you need reproducability. |
Thanks for the feedback! So then the next step would be to decide on on API. I'd need help with that. For example, should the substitution callback only be specified when generating a symcache, or should the callback be a property of the Object, so that the substitution is also applied when function info is queried from the object without going through a symcache? (And then generating the symcache would probably respect the substitution automatically.) |
So there is precedent for doing it on the object for MachO BCSymbolMap files: https://docs.rs/symbolic/8.3.0/symbolic/debuginfo/macho/struct.MachObject.html#method.load_symbolmap. I think the same reasoning applies here and doing it on the object-level is probably preferred. I guess this one applies to all objects though, and not just to one kind. |
As it happens, Firefox PDB files already include this information in the Given that you wrote a parser for this stream I suspect you know this already. :) Since the format is based around variable definitions and substitution, it seems like it would be straightforward to also define a variable that you could evaluate to get a URL to link directly to a specific line in an HTML view of a source file. In the past I had pondered the idea of generating functionally-equivalent data for Linux/macOS binaries and inserting it into the debug info as an additional section, but never quite got around to it. For a simple proof of concept you could just generate debuginfod aims to provide similar functionality, along with functionality that overlaps Microsoft-style symbol servers, but AFAICT it provides source file contents wholesale, not a canonical URL from which you could fetch the source. This is probably due to Linux distros' habit of patching upstreams and not keeping the sources they actually build packages from in source control anywhere. Symbolic has some support for debuginfod, but not support for its source file API.
Do note that I invented the generated-sources scheme out of whole cloth, so I don't think it's worthwhile to bake support for it directly into |
The canonical URL is |
True! There's a small but surmountable problem here, though, and that's the fact that the Firefox PDBs don't contain mappings for generated source files in the srcsrv stream. The stream only has entries for hg.mozilla.org files, not for files stored in S3.
I suppose! But the raw source URL would be very much sufficient for my purposes. We can still detect known URLs later on in the pipeline, and map them to known "web viewer" URLs.
It actually doesn't sound terrible. It has a bit of an ad-hoc home-grown feel to it, so we'd need to document it in an easily accessible place.
Yeah, that sounds mostly reasonable. It does mean that the file list needs to enumerated twice: Once when creating this mapping, and then another time when generating a symcache from the augmented debug info file. The "callback" proposal further up in this issue would only have one traversal. But that's probably not a big deal anyway. |
You'll never guess who punted on implementing that, choosing to file a bug so someone could fix it later: |
I would like to request the capability to specify a file path substitution map when calling
Object.make_symcache
.Feature motivation
At Mozilla, our symbol pipeline for the in-progress Tecken symbolication service rewrite is currently as follows:
Unfortunately, the .sym file discards inline stack information, so we'd like to switch to the following instead:
(and we'd also keep generating a .sym file, because we still need it for crash report stack unwinding)
However, there's one crucial bit in the .sym file generation pipeline that this direct approach would lose: filename substitution. Here are some examples of the file substitution we do during .sym file generation:
This substitution effectively creates permalinks for all file paths. It allows us to look up the exact file contents long after the symbol file was processed.
It would be great if there was a way to create a symcache with these substitutions applied.
Specifics
Our current .sym file generation is controlled by a python script, so it would be best if we could use the symbolic python API for this. For example,
make_symcache
onObject
seems like it would need to accept some kind of substitution map.Something like the following could work:
(This is based on a similar capability in mozilla/dump_syms which is currently unused.)
The trickiest part here is the {DIGEST} part: The digest is the sha-512 hash of the file contents. So, for any paths that match a substitution rule that uses {DIGEST}, we would need to read the file at that path from disk during symcache generation.
Alternatives
These questions explore whether the substitution really needs to happen during symcache generation.
Could the substitution be applied at a different time? Maybe before or after symcache generation?
I think it's not practical to do it before: You'd need to substitute inside the pdb file or inside the DWARF information of an ELF or mach-O file.
Maybe it could be done after, if there was an API to transform an unsubstituted symcache into a substituted symcache.
Could you use an unsubstituted symcache and fix up the paths after each address lookup?
This might be possible if we have an extra artifact on the side, which contains the following information:
But it would be better if the symcache file was self-sufficient and we wouldn't need to keep track of an extra file along with it.
Couldn't the substitution map just be a callback instead?
Making the user supply a callback function instead of a regex map might be simpler in some ways. But it could be annoying to wire up through the python API.
However, it would have the following advantages:
The text was updated successfully, but these errors were encountered: