Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Container special type #340

Open
KOLANICH opened this issue Feb 2, 2018 · 3 comments
Open

Container special type #340

KOLANICH opened this issue Feb 2, 2018 · 3 comments

Comments

@KOLANICH
Copy link

KOLANICH commented Feb 2, 2018

There are some container formats. The easiest example is an archive: it contains files of any type. We may want to indicate that the type of the chunk of memory is any format which should be parsed. It may be useful for some applications like kaitai-powered binwalk parsing everything in the file.

So I propose to add a special built-in type which instantiation should pass the control flow to an algorithm which will try to match the blob against all the signatures (see #225) in the library, if it matches - tries to parse it, if it parses (including passing all the checks #81) - then assumes that the format is guessed correctly.

Obviously it will require RTTI in C++.

I don't know any good name for this kind of type. container maybe, or signature_matcher, but I don't really like the mentioned names.

@GreyCat
Copy link
Member

GreyCat commented Feb 2, 2018

An interesting idea! My proposals for type name would be type: autodetect or type: auto.

A practical implementation is probably pretty far away, though. Not only "signature only" checks and generla format validation framework are needed, but you'll need to build and maintain some sort of repository of "autodetectable file formats", I guess, and include them all into a parser that will use this feature. Also, there should be some way to actually disable this "deep auto-detect" parsing, as vast majority of people who research container formats are not interesting in parsing of JPEG/PNG/MP3/whatever files are insides, they're perfectly ok with exporting them as is (and using them later with standalone software). Last, but not least, probably it's worth implementing full lazy parsing first: #133.

@KOLANICH
Copy link
Author

KOLANICH commented Feb 3, 2018

The repository of autodetectable formats is kaitai_struct_formats (or any other by user's choice). If this feature is enabled, the compiler scans the repository, finds all the signatures, builds a mapping signature->ksy file, generates a finite automata for matching a signature and emitx the code into a separate file. This kind of behavior should be disabled by default and of course there should be a hook to redefine it into something scanning a dir for modules in runtime. If this feature is disabled the containers are just blobs.

Go disable autodetecting of some formats we can use #339.

I have thought about auto ... but it is a keyword in C++ (I know about the difference in case) and IMHO is too short.

@GreyCat
Copy link
Member

GreyCat commented Feb 3, 2018

but it is a keyword in C++

So, that's perfect :) We don't need to generate anything named "auto" for that.

and IMHO is too short.

That's totally ok too :) Keywords and predefined types shouldn't be too long.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants