Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checksumming, other constraints and asserts validation and exceptions #81

Open
KOLANICH opened this issue Jan 13, 2017 · 14 comments
Open

Comments

@KOLANICH
Copy link

KOLANICH commented Jan 13, 2017

http://download.intel.com/support/motherboards/desktop/sb/pnpbiosspecificationv10a.pdf

Checksum - Each Expansion Header is checksummed individually. This allows the software that wishes
to make use of an expansion header the ability to determine if the expansion header is valid. The system software can determine if the expansion header is valid by performing a Checksum operation. The method for validating the checksum is to add up Length bytes, including the Checksum field, into an 8-bit value. A resulting sum of zero indicates a valid checksum operation.

So we need a way to verify checksums.

the proposed way is if: checksum_checksum_type(....) == expected_value
where checksum_checksum_type calculates checksum, in the case from quote it sums every byte in the structure modulo 256 and we check if: checksum_sum() == 0.

@GreyCat
Copy link
Member

GreyCat commented Jan 14, 2017

Calculating checksums is a fairly complex issue, and, to be fair, it is not needed for the primary task that KS performs, i.e. parsing. Calculating checksums, of course, would be most useful for #27.

As far as I see it, there are several subproblems with this one:

  • We need to be able to identify addresses / ranges for multiple-bytes operations; this should be some sort of very flexible syntax that would allow to quickly address "whole structure", "whole structure minus current (checksum) field", "structure X, bytes ranges [r1..r2] and [r3..r4]", etc.
  • We need to specify checksum operation itself:
    • For "standard" algorithms (CRC, various hashes), it's better to just have some sort of dictionary that will result in accessing native (quick) target language methods => but we need to gather information on which standard algorithms are implemented in which languages' stdlibs and how to access them
    • For non-standard algorithms (like Java's hashCode), we need to implement array processing functions (something like map, each, etc) in expression language
  • As for parsing support, we need to be able to trigger some sort of exception or warning when something wrong is encountered

@KOLANICH note that if is already used for conditional parsing. More likely we'd use something like check or guard or something like that.

@KOLANICH
Copy link
Author

KOLANICH commented Jan 15, 2017

@KOLANICH note that if is already used for conditional parsing. More likely we'd use something like check or guard or something like that.

how about throw?

throw: #```throw``` block is proposed to be an ```instances``` block bound to the generated exception object but with one-time (at the time of creation) access to all the local variables.
  if: checksum_checksum_type(....) != expected_value
  result: # the member name in exception object
    type: u1 # the type of member
    enum: error # the values are from the ```error``` enum. 
    value: 'error::it_happened_again' # the value set to a member. Essentially it is the same that a ```value``` in instances, but we should be able to use visible variables here.
........
enums:
  error:
    1: it_happenned_again
    2: that_happenned_again

@ghost
Copy link

ghost commented Jan 31, 2017

My issue is that I have a function for checksumming, offered by the protocol developers. It expects the pointer to the parsed structure and its size in bytes. Then it does some operations on it and returns the checksum. Something like:

uint8_t CalcCRC(uint8_t *buf, uint8_t PktSize)
{
    uint8_t bytes_amount = PktSize - 1;
    do
    {
        //here it does something with the data bytewise 
        bytes_amount--;
    } while (bytes_amount);
    return *crc;
}

All I need to do is to give the CalcCRC the pointer to the structure, but since kaitai structure contains vectors and strings, the function does not receive the initial package that way.

What I do now is I read the stream into some dummy array and give the CalcCRC pointer to it. If the sum is okay, I seek PktSize bytes back and read the data again, but into kaitai structure that time.

I thought maybe there could be a more elegant solution.

@GreyCat
Copy link
Member

GreyCat commented Jan 31, 2017

@MariaChakchurina I believe, you can do something like that:

my_struct_t my_struct(&ks);

my_struct._io()->seek(0);
std::string buf = my_struct._io()->read_bytes_full();
uint8_t crc = CalcCRC(&buf[0], buf.size());
if (crc != my_struct.crc_field()) {
    // throw some exception or something
}

It's not a huge difference, but at least you don't have to derive CRC field from your structure manually.

@ghost
Copy link

ghost commented Feb 2, 2017

I have multiple datagrams in a stream so I can't move the pointer to its beginning, but want to return data_size bytes back, so I better do:

    d._io()->seek(d._io()->pos() - data_size);
    std::string buf = d._io()->read_bytes(data_size);
    uint8_t crc = CalcCRC((uint8_t*)&buf[0], data_size);

Just letting you know;)

@GreyCat
Copy link
Member

GreyCat commented Feb 2, 2017

Can't you do one substream per datagram then? It should be much more convenient.

@ghost
Copy link

ghost commented Feb 3, 2017

I found #33 and #44, but still don't understand how I can do that. I really need a kaitai wiki.

@GreyCat
Copy link
Member

GreyCat commented Feb 4, 2017

Um, #33 seems to have nothing to do with this topic, and #44 is planned future, which will optimize things a little, but you can still use substreams right now.

Generally, when you do:

seq:
  - id: packet
    type: some_datagram_type

you don't create a substream, i.e. stream for the some_datagram_type will be reused, and it won't start reading from 0.

However, if you'll do something like that:

seq:
  - id: packet
    type: some_datagram_type
    size: 100

=> it will create a substream in some_datagram_type. It will start from 0, and is guaranteed to have exactly 100 bytes in it (or whatever you'll specify). If you'll access _io for some_datagram_type object, you'll get access only to these 100 bytes, starting from 0.

@ghost
Copy link

ghost commented Feb 6, 2017

Excuse me, by 33 I meant that one https://github.com/kaitai-io/kaitai_struct_compiler/issues/33. Thank you, now I got it.

@GreyCat
Copy link
Member

GreyCat commented Jul 16, 2017

@KOLANICH Just for the record, I don't understand most of this proposal:

how about throw?

throw:
  if: checksum_checksum_type(....) != expected_value
  result: 
    type: u1
    enum: error
    value: it_happened_again

What is type? What is enum? That is value? If it throws an exception, then, by definition, it can't set an attribute to anything, it just stops execution there and unwinds the stack.

@KOLANICH
Copy link
Author

KOLANICH commented Jul 17, 2017 via email

@koczkatamas
Copy link
Member

koczkatamas commented Jul 17, 2017

You cannot throw arbitrary objects in several supported languages, so with this approach we should create a distinct exception class (according to the result type) on-the-fly which would make the parameters obsolate (as the class would identify the exception by itself).

What came into my mind regarding exceptions:

  1. we could store this (as reference) into the exception and/or parse every object as in debug mode: even in the case of an exception, the object would still be accessible (so postprocessing would be still possible even after an exception)

  2. we should make exceptions distinct, so the receiver can identify exactly which (ksy / compiled code) line/field/property/etc caused the exception (if he/she wants):

    • use string identifier, it's advantage is that it's easier to implement, but we have to make class specific and distinct inside the class (so if the exception name is invalid_magic in the PNG class then we have to create a string png.invalid_magic for example). Of course the "PNG" name can conflict so it would be better to use concrete namespace suggested in Ksy modularization / external ksy inclusion (discussion) #71 like io.kaitai.image.png.invalid_magic.
    • use enum: in this way every distinct exception would have a distinct enum value, but if we don't want put these enums into the exception object as a generic type (eg. object), then we should create an exception class per normal class. This has the advantage that these exceptions can be caught class-specifically at a higher level and the enum value can be checked without casting. This way the this reference could be specific (and not only object).
    • use parametric types suggested by Parametric types #192.

Of course even if we create specific exceptions (option 2 & 3) those should be inherited from a common exception class which is defined in the runtime. So every Kaitai exception could be caught at once if needed.

@KOLANICH
Copy link
Author

KOLANICH commented Jul 17, 2017 via email

@KOLANICH
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants