-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimizes opt* functions #337
Conversation
Tested on a 2.1 Gb json file on an i7-620 with an ssd. This is in a real-life application. I would assume that micro-benchmarks would yield a gain of at least 20x. All in all, this is great, thanks! |
Is the 870s the original JSONObject implementation with no pre-checks, or the modified one you linked in the issue that uses the |
Sorry for the delay No changes origin/master: 870s |
* Add support to JSONArray and JSONObject to optionally get raw number values * Add support to JSONArray and JSONObject to optionally get float values
Thanks @TheMatthew . This PR should be feature complete now. I'll update the description to reflect all the changes. |
Tests updated and PR description updated to match testing changes. |
Nice, performance improved and code made more readable. Accepted pending 3 day timer. |
@johnjaylward |
@guidomedina it's easy to code review without erasing history. Please stop spaming |
1st time I have been told to "stop spamming" on Github in my 5 years here, I only made 3 comments, valid comments so not sure what you mean. It is easy to code review if using a diff between branches, maybe you are new to contributing but most projects I have contributed will ask you to squash the commits, it is a valid request, maybe not this project. |
The comments on squashing commits are not relevant to the pull request nor helpful to the review. This is not a discussion board for the merits or pitfalls of rebasing. |
JSONObject.java
Outdated
@@ -578,10 +581,10 @@ public double getDouble(String key) throws JSONException { | |||
Object object = this.get(key); | |||
try { | |||
return object instanceof Number ? ((Number) object).doubleValue() | |||
: Double.parseDouble((String) object); | |||
: new BigDecimal((String) object).doubleValue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Introducing a BigDecimal
to parse a double
will add too much overhead for no reason, if you try to represent a double
from a String
it will stay as is, if it was indeed a double
then nothing will be gained or lost, you are only adding more GC overhead.
Same thing for your other BigDecimal
introduction when working with float
or double
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, Double.parse
will throw an exception if the number can not be represented as a double.
From the javadocs:
double java.lang.Double.parseDouble(String s) throws NumberFormatException
Returns a new double initialized to the value represented by the specified String, as performed by the valueOf method of class Double.
Parameters: s the string to be parsed.
Returns:the double value represented by the string argument.
Throws: NullPointerException - if the string is null
NumberFormatException - if the string does not contain a parsable double.
Since:1.2
See Also:java.lang.Double.valueOf(String)
Double.valueOf
seems to do something closer to what you are trying to accomplish, but it would be the same amount of garbage collection.
new BigDecimal(String)
will always parse the value as long as it's a valid decimal number (i.e. not "abc"). The doubleValue
call will then do a narrowing conversion which may change the value to -Inf or +Inf.
Lastly, this change would be out of scope for this PR. the get*
methods were not modified here, only the opt*. If you feel we should evaluate it further, feel free to open another pull request with the changes and notes to performance impacts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new BigDecimal(...)
and Double.parseDouble(...)
have the same behavior, an input that fails for parseDouble
will also fail with new BigDecimal, the JavaDoc have different wording but the end result will be the same.
As for relevance, you are trying to make opt* operations faster and yet you are introducing an expensive and unneeded instantiation when the return value is a native (small d)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The JavaDoc is what to go by. If someone is running on a different VM with a different implementation, they may not function the same.
The BigDecimal conversion is not that expensive. I'd like to see some stats on it before making a change like that. We have no guarantees on the input. Just because someone asks for an int
does not mean I would want to use Integer.parse
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I'm not "introducing" the big decimal conversion. It was there before the refactor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but then int
typecast would be wrong for my proposed shortcut but it can be easily mitigated by checking if the double value is between Integer.MIN_VALUE
and Integer.MAX_VALUE
and if it isn't throw an exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's an opt* function. The entire point is that it does not throw an exception. I'm not disagreeing that the parsing performance can be improved. But I think it needs more than a cursory glance and wishful thinking.
This PR is focusing strictly on the request made in #337 which was to mitigate the Exception handling impact.
Any parsing impacts should be in a new PR after this one is merged.
Any narrowing coercion has the potential to change the number. This is a given and also why I added the optNumber call to the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Narrowing will not change the double for long, maybe for int, but then it is the same for BigDecimal calls, the reason I commented into this particular change is because of what I'm seeing:
- : Double.parseDouble((String) object);
+ : new BigDecimal((String) object).doubleValue();
Am I reading this incorrectly? I'm seeing Double.parseDouble(...)
changed to new BigDecimal(...)
in this commit.
Narrowing will cause loss always, but the same will happen with BigDecimal
hence my point of not bothering with such overhead at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me check that. If I did change it, I'll likely change it back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I must have changed that by accident when I was testing stuff and didn't notice. I'll revert that change back and update the respective optDouble and optFloat calls that I copied that to.
As for the narrowing, it is handled differently between BigDecimal and Double. Also, not all Long values fit in a double. see the test case changes @ stleary/JSON-Java-unit-test#71 that I updated today. A Double only has 53 bits for it's mantissa which will not let any value over 2^53 be represented correctly.
* Don't call toString when we know it's a string, just cast
* Updates optNumber to be smarter about which object it uses to parse strings
@stleary I added the optFloat/optNumber, but no associated getFloat/getNumber. Should I add them in for posterity? |
@johnjaylward Sure, in general when an opt* method is added, a corresponding get*method is a good idea. |
* Extracts the stringToNumber logic that the optNumber method uses to reuse it between classes * Fixes -0 issue with optNumber/getNumber
@TheMatthew can you retest the latest commit of this? I want to ensure the new pull out of the number parsing functions didn't adversely affect the performance of your test case. |
return val.indexOf('.') > -1 || val.indexOf('e') > -1 | ||
|| val.indexOf('E') > -1 || "-0".equals(val); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice "quick and dirty" function, what happens if running on a server with a different locale?
Anyway the following example can help:
static final char MINUS_SIGN;
static final char DECIMAL_SEPARATOR;
static {
final DecimalFormatSymbols currentLocaleSymbols = DecimalFormatSymbols.getInstance();
MINUS_SIGN = currentLocaleSymbols.getMinusSign();
DECIMAL_SEPARATOR = currentLocaleSymbols.getDecimalSeparator();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though I'm not sure what should be done if the decimal separator for the server locale is a comma, not sure how the parse function will behave, might not be worth the effort in supporting this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not looking for any number, we are looking for JSON numbers, which are not locale dependent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found it, ignore this completely, FloatingDecimal
line 198 & 224:
46 = dot
if(var4 != 46) {
break;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from BigDecimal:
} else if (c == '.') { // have dot
// have dot
if (dot) // two dots
throw new NumberFormatException();
dot = true;
Double.parse is similar.
* prevents erasure of stack trace for rethrown exceptions
Sorry, I was in a huge rush, I will retest. |
Before this patch AKA current git head: 870 s |
Performance still looks good. @johnjaylward let me know when you are ready to proceed with the pull request. |
Ok, I started profiling. the code in this commit looks moderately tight. Some low hanging fruits I saw: This brings the execution time down from 126 s to 115. Then if you have an array let's say called "CHARS_TO_APPEND" which has 256 values, Like this:
and replace the while loop in JSONTokener#nextValue() with
you reach 113s. Here is a quick profile I did. It can be opened with visualvm 1.3.9 . Hope this helps. edit: looking at the examples of acceleration really shows how close this is to optimal. at this point optimizing if statements yields a performance change. that is typically a sign that the low hanging fruits are almost all gone. |
For transparency purposes I've shared this github benchmark https://github.com/TheMatthew/json-benchmark it's under apache so it should be compatible for any copy-pasted tests Initial results: org.json.simple: 140s, |
@stleary I agree, this still looks good. Let's call this one done. I just pulled in the changes recently merged to master into this PR. @TheMatthew Lets take that optimization to a separate PR after this one is merged. |
Adjustments to tests for stleary/JSON-java#337
Adjustments to the opt* functions in reference to #334
Notable changes:
optFloat(key, default)
andoptFloat(key)
optNumber(key, default)
andoptNumber(key)
getNumber
andgetFloat
Test speedup over the old implementation is 870s -> 120s for a 2.1 GB file (about 14% of the time; or 86% time saved). See #337 (comment) and #337 (comment)
What problem does this code solve?
This is a performance enhancement, not a bug fix. Some code is refactored to prevent exceptions from being thrown. Type coercion is also standardized between all the opt* number functions.
Risks
low
Changes to the API?
New methods added for
optFloat
andoptNumber
.Will this require a new release?
No, can be rolled into the next release
Should the documentation be updated?
No
Does it break the unit tests?
Broke one unit tests for BigDecimal->BigInteger conversion due to updated coercion rules. Tests has been updated and new tests added for new methods
optFloat
andoptNumber
. See stleary/JSON-Java-unit-test#71 for full changes.Was any code refactored in this commit?
Yes
Review status
ACCEPTED, Starting 3 day timer for comments