Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a Rational Number type #164

Open
robUx4 opened this issue Aug 9, 2017 · 23 comments
Open

Add a Rational Number type #164

robUx4 opened this issue Aug 9, 2017 · 23 comments
Labels
format addition To consider in future EBML versions
Milestone

Comments

@robUx4
Copy link
Contributor

robUx4 commented Aug 9, 2017

Since some Matroska elements would be better as rational numbers we need a way to store them. I think a type would be better than using two EBML Signed Integers. The bits in the VDATA would be divided in half. The first part (signed) for the numerator, the second part (signed?) for the denominator.

@robUx4 robUx4 added the format addition To consider in future EBML versions label Aug 9, 2017
@retokromer
Copy link
Contributor

Caveat: I didn’t check where this would be useful.

Generally speaking, in my personal experience, it’s however easier to implement the rational number type by two regular integers.

@dericed
Copy link
Contributor

dericed commented Aug 9, 2017

How do you understand the boundary between the signed integers? Advantage to store num and den as a single value?

@robUx4
Copy link
Contributor Author

robUx4 commented Aug 19, 2017

  • It might use less space, although it depends on the length of the ID and the value itself.
  • when reading the value the parser doesn't have to keep the first value in a state before using the next value
  • if a rational element can be found more than one (a list of values) it's not possible to do it with 2 elements, or you need to wrap them in a master element

@robUx4
Copy link
Contributor Author

robUx4 commented Oct 11, 2020

  • you can't have one value and not the other. You have both or none.

@hubblec4
Copy link

Is the rational type only for unsigned numbers or do we need a signed rational type too?

@robUx4
Copy link
Contributor Author

robUx4 commented Mar 28, 2021

The first part (signed) for the numerator, the second part (signed?) for the denominator.

Signed (which includes unsigned apart from very large values)

@mbunkus
Copy link
Contributor

mbunkus commented Mar 28, 2021

If we really need such a field[1], I propose the following:

  • Starts with the regular EBML ID & element size fields; let's call this size field element_data_size (in bytes)
  • Following is a second size field encoded the same way the element size field is encoded; let's call this field numerator_size (in bytes) and the size of this field itself numerator_size_size (in bytes)
  • Following is an unsigned integer number of length numerator_size bytes; this is the numerator
  • Following is a signed integer number of length element_data_size - numerator_size_size - numerator_size; this is the denominator

Restrictions:

  • the value of the denominator MUST NOT be 0
  • the value of numerator_size_size + numerator_size MUST be less than element_data_size

Rationale:

  • Why make the denominator signed, not the numerator? Wild guess here: the denominator will stay constant (or almost constant) for most well-known applications such as video timestamps (e.g. the denominator staying at exactly 30 * 1.000.000.000/1001 or something like that to represent 29.975 frames per second). The numerator usually grows, so let's give it that one bit more to grow into.
  • If we need a rational type, using a packed structure does indeed save a lot of space. The only other possible method of implementation would be a master element with two integer children, one signed, one unsigned. That's quite a lot of useless overhead: two more IDs (at least two bytes), one useless size field (the one of the master element). And it introduces a number of corner cases such as what if an element is missing? How to specify default values for nested elements? Etc.

[1] Not sure if we really need this data type given the smaple-accurate calculation method you specified the other day.

@robUx4
Copy link
Contributor Author

robUx4 commented Apr 4, 2021

I'd rather have [num_size][num_value][den_size][den_value]. In the case the rationale value is equal to an integer we just use [num_size][num_value] and assume the denominator is 1.

We do need a way to specify the default attribute values, like we have a format for floats. I suppose a string with "numerator/denominator" would work. Maybe make the denominator optional if we allow it, in which case such a default value would be "numerator".

@mbunkus
Copy link
Contributor

mbunkus commented Apr 4, 2021

We always have the size field for the full element. There's really no need for three size fields as we can always calculate the third size from the other two. It would be a waste of space, a significant one if we used a rational element in something like a hypothetical BlockV3 structure.

I agree about default values. Leaving out /denominator simply means that denominator is 1.

@robUx4
Copy link
Contributor Author

robUx4 commented Apr 4, 2021

OK my bad, I thought what you proposed was [num_size][den_size][num_value][den_value].

I agree with [num_size][num_value][den_value], the [den_value] part being optional.

@mbunkus
Copy link
Contributor

mbunkus commented Apr 4, 2021

Great!

@hubblec4
Copy link

hubblec4 commented Apr 4, 2021

I'm also agree with what @mbunkus proposed.
But one thing I would change: that is the size-field for the numerator [num_size].
This size-field could be a "normal" size integer not a VINT, which makes parsing faster.

Reason:
The size of the numerator(and also the size of denominator) is limited to 8, that means the length of the size-field is always 1.

[EBML-ID] [element size-field as VINT] [numerator: "normal" size-field as unsigend integer] [numerator value] [denominator value]

@robUx4
Copy link
Contributor Author

robUx4 commented Apr 5, 2021

Yes, given the limitation of the width of the numerator a regular unsigned 1-octet integer is easier for everyone.

@robUx4
Copy link
Contributor Author

robUx4 commented Mar 25, 2023

BTW we will probably need signed rational values. One way to encode the negative value (we should only allow one of the two values) could be to use the same encoding as the EBML-lacing in Matroska, see ietf-wg-cellar/matroska-specification#726 (comment)

@mbunkus
Copy link
Contributor

mbunkus commented Mar 25, 2023

In my original proposal the denominator is a signed integer already. Not sure what you're missing there.

Taking @hubblec4's suggestion into account, I revise my original proposal to:

  • Starts with the regular EBML ID & element size fields; let's call this size field element_data_size (in bytes)
  • Following is a one-byte long second size field, an unsigned integer specifying the length of the following numerator field; let's call this field numerator_size (in bytes)
  • Following is a numerator_size bytes long unsigned integer number in big-endian byte order; this is the numerator
  • If numerator_size is less than element_data_size: following is a element_data_size - numerator_size bytes long signed integer number; this is the denominator. The signed integer is stored in two's complement notation with the leftmost bit being the sign bit. If numerator_size equals element_data_size, there is no denominator field and the denominator's value is 1.

Restrictions:

  • the value of the denominator MUST NOT be 0
  • the value of numerator_size MUST be less than or equal to element_data_size

Rationale:

  • Why make the denominator signed, not the numerator? Wild guess here: the denominator will stay constant (or almost constant) for most well-known applications such as video timestamps (e.g. the denominator staying at exactly 30 * 1.000.000.000/1001 or something like that to represent 29.975 frames per second). The numerator usually grows, so let's give it that one bit more to grow into.
  • As a rational type might be used in each block, having a small encoding is a big plus — hence a packed structure instead of an EBML Master element with two children, one signed, one unsigned. That's quite a lot of useless overhead: two more IDs (at least two bytes), one useless size field (the one of the master element). And it introduces a number of corner cases such as what if an element is missing? How to specify default values for nested elements? Etc.

@retokromer
Copy link
Contributor

In my original proposal the denominator is a signed integer already.

@mbunkus Out of curiosity: Is there a reason why you chose the denominator rather than the numerator?

@mbunkus
Copy link
Contributor

mbunkus commented Mar 25, 2023

I actually explain my reasoning in the first bullet point of "Rationale".

@retokromer
Copy link
Contributor

@mbunkus Indeed, I saw it. I was asking, because I’m not convinced. I actually did the contrary (back in the 1980s and 1990s) when I implemented a few compilers. My reasoning, back then (!), was about rounding errors and processor cycles.

@mbunkus
Copy link
Contributor

mbunkus commented Mar 25, 2023

The rationals in media containers are somewhat unusual in that for most use cases their denominator remains constant.

I don't know whether having one more bit to play with in the numerator actually makes any noticeable difference wrt. file sizes. On the other hand, I'm even less convinced that making the numerator signed instead of the denominator has any measurable difference wrt. to processing time, given that a) reading an unsigned integer requires fewer instructions & b) that this is a media format — any potential miniscule gains while parsing the container are dwarfed by the time required for decoding the content anyway.

@JeromeMartinez
Copy link

I don't know whether having one more bit to play with in the numerator actually makes any noticeable difference wrt.

Still good to get IMO. The parser is in charge of rounding errors and actually could internally move the sign at the numerator if it wishes.

@retokromer
Copy link
Contributor

The rationals in media containers are somewhat unusual in that for most use cases their denominator remains constant.

That could be a reason, yes. Thank you!

@hubblec4
Copy link

hubblec4 commented Mar 26, 2023

a) reading an unsigned integer requires fewer instructions

For me is that the best argument that the numerator should be an unsigned integer.

And that's the reason why I want to speak about the third way for a signed rational number.
In mathematics it is possible/usual to set the sign before the rational number in the height of the fraction bar.
That means both numbers(numerator and denominator) are unsigned values.

Now we need only 1 bit to signal if a rational number is negative/signed or not.
My propose was(with @mbunkus words):

Following is a one-byte long second size field, an unsigned integer specifying the length of the following numerator field; let's call this field numerator_size (in bytes)

The numerator_size value has a range from 1 to 8.
That means only 3 Bits are used and we could use the first Bit for the sign option.
If the first Bit is set to 1, than is the rational number negative.

For example a rational number value: -"180/1"
The denominator has the default value "1" and can be omitted to keep this example simple.
The length of "180" is one byte. -> numerator_size in Bits 0-0-0-0-0-0-0-1
Now we set the sign Bit -> 1-0-0-0-0-0-0-1
It is more or less the same like a VINT with a length of one byte.

The word numerator_size have to changed because there are more information in this Byte now.
Maybe sign_and_numerator_size.

That was only some basis thoughts and I hope I'm not wasting your time.

@robUx4
Copy link
Contributor Author

robUx4 commented Mar 27, 2023

In my original proposal the denominator is a signed integer already. Not sure what you're missing there.

I did not check. I just added a reference to an existing (efficient) EBML-like encoding for signed values, so we don't forget about it.

Looking at your proposal it seems easier to use than this (no need code needed for those not supporting EBML lacing in Matroska).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
format addition To consider in future EBML versions
Projects
None yet
Development

No branches or pull requests

6 participants