Optimizes opt* functions #337

johnjaylward · 2017-05-16T23:40:49Z

Adjustments to the opt* functions in reference to #334

Notable changes:

New API functions for optFloat(key, default) and optFloat(key)
New API functions for optNumber(key, default) and optNumber(key)
New API functions for getNumber and getFloat
Update all opt* functions that relied on the get* function throwing an exception to no longer rely on that.

Test speedup over the old implementation is 870s -> 120s for a 2.1 GB file (about 14% of the time; or 86% time saved). See #337 (comment) and #337 (comment)

What problem does this code solve?
This is a performance enhancement, not a bug fix. Some code is refactored to prevent exceptions from being thrown. Type coercion is also standardized between all the opt* number functions.

Risks
low

Changes to the API?
New methods added for optFloat and optNumber.

Will this require a new release?
No, can be rolled into the next release

Should the documentation be updated?
No

Does it break the unit tests?
Broke one unit tests for BigDecimal->BigInteger conversion due to updated coercion rules. Tests has been updated and new tests added for new methods optFloat and optNumber. See stleary/JSON-Java-unit-test#71 for full changes.

Was any code refactored in this commit?
Yes

Review status
ACCEPTED, Starting 3 day timer for comments

TheMatthew · 2017-05-17T14:17:10Z

Tested on a 2.1 Gb json file on an i7-620 with an ssd.
Before parsing with lots of extra logic: 870 s
After: 120 s

This is in a real-life application. I would assume that micro-benchmarks would yield a gain of at least 20x.

All in all, this is great, thanks!

johnjaylward · 2017-05-17T14:38:25Z

Is the 870s the original JSONObject implementation with no pre-checks, or the modified one you linked in the issue that uses the has(key) calls? I'd just like to verify what we are looking at.

TheMatthew · 2017-05-17T15:22:03Z

Sorry for the delay

No changes origin/master: 870s
has(key): 122s
your change: 120s

* Add support to JSONArray and JSONObject to optionally get raw number values * Add support to JSONArray and JSONObject to optionally get float values

johnjaylward · 2017-05-17T15:35:49Z

Thanks @TheMatthew . This PR should be feature complete now. I'll update the description to reflect all the changes.

johnjaylward · 2017-05-17T16:37:33Z

Tests updated and PR description updated to match testing changes.

stleary · 2017-05-18T05:01:06Z

Nice, performance improved and code made more readable. Accepted pending 3 day timer.

guidomedina · 2017-05-18T11:41:16Z

@johnjaylward
Do you know how to squash your commits? will make it much easier now and in the future to code review.

johnjaylward · 2017-05-18T13:08:23Z

@guidomedina it's easy to code review without erasing history. Please stop spaming

guidomedina · 2017-05-18T14:01:34Z

1st time I have been told to "stop spamming" on Github in my 5 years here, I only made 3 comments, valid comments so not sure what you mean.

It is easy to code review if using a diff between branches, maybe you are new to contributing but most projects I have contributed will ask you to squash the commits, it is a valid request, maybe not this project.

johnjaylward · 2017-05-18T14:04:30Z

The comments on squashing commits are not relevant to the pull request nor helpful to the review. This is not a discussion board for the merits or pitfalls of rebasing.

guidomedina · 2017-05-18T14:35:01Z

JSONObject.java

@@ -578,10 +581,10 @@ public double getDouble(String key) throws JSONException {
        Object object = this.get(key);
        try {
            return object instanceof Number ? ((Number) object).doubleValue()
-                    : Double.parseDouble((String) object);
+                    : new BigDecimal((String) object).doubleValue();


Introducing a BigDecimal to parse a double will add too much overhead for no reason, if you try to represent a double from a String it will stay as is, if it was indeed a double then nothing will be gained or lost, you are only adding more GC overhead.

Same thing for your other BigDecimal introduction when working with float or double

No, Double.parse will throw an exception if the number can not be represented as a double.
From the javadocs:

double java.lang.Double.parseDouble(String s) throws NumberFormatException

Returns a new double initialized to the value represented by the specified String, as performed by the valueOf method of class Double.
Parameters: s the string to be parsed.
Returns:the double value represented by the string argument.
Throws: NullPointerException - if the string is null
NumberFormatException - if the string does not contain a parsable double.

Since:1.2
See Also:java.lang.Double.valueOf(String)

Double.valueOf seems to do something closer to what you are trying to accomplish, but it would be the same amount of garbage collection.

new BigDecimal(String) will always parse the value as long as it's a valid decimal number (i.e. not "abc"). The doubleValue call will then do a narrowing conversion which may change the value to -Inf or +Inf.

Lastly, this change would be out of scope for this PR. the get* methods were not modified here, only the opt*. If you feel we should evaluate it further, feel free to open another pull request with the changes and notes to performance impacts.

new BigDecimal(...) and Double.parseDouble(...) have the same behavior, an input that fails for parseDouble will also fail with new BigDecimal, the JavaDoc have different wording but the end result will be the same.

As for relevance, you are trying to make opt* operations faster and yet you are introducing an expensive and unneeded instantiation when the return value is a native (small d)

The JavaDoc is what to go by. If someone is running on a different VM with a different implementation, they may not function the same.

The BigDecimal conversion is not that expensive. I'd like to see some stats on it before making a change like that. We have no guarantees on the input. Just because someone asks for an int does not mean I would want to use Integer.parse.

Also, I'm not "introducing" the big decimal conversion. It was there before the refactor.

but then int typecast would be wrong for my proposed shortcut but it can be easily mitigated by checking if the double value is between Integer.MIN_VALUE and Integer.MAX_VALUE and if it isn't throw an exception.

No, it's an opt* function. The entire point is that it does not throw an exception. I'm not disagreeing that the parsing performance can be improved. But I think it needs more than a cursory glance and wishful thinking.

This PR is focusing strictly on the request made in #337 which was to mitigate the Exception handling impact.

Any parsing impacts should be in a new PR after this one is merged.

Any narrowing coercion has the potential to change the number. This is a given and also why I added the optNumber call to the API.

Narrowing will not change the double for long, maybe for int, but then it is the same for BigDecimal calls, the reason I commented into this particular change is because of what I'm seeing:

- : Double.parseDouble((String) object); + : new BigDecimal((String) object).doubleValue();

Am I reading this incorrectly? I'm seeing Double.parseDouble(...) changed to new BigDecimal(...) in this commit.

Narrowing will cause loss always, but the same will happen with BigDecimal hence my point of not bothering with such overhead at all.

let me check that. If I did change it, I'll likely change it back.

Yes, I must have changed that by accident when I was testing stuff and didn't notice. I'll revert that change back and update the respective optDouble and optFloat calls that I copied that to.

As for the narrowing, it is handled differently between BigDecimal and Double. Also, not all Long values fit in a double. see the test case changes @ stleary/JSON-Java-unit-test#71 that I updated today. A Double only has 53 bits for it's mantissa which will not let any value over 2^53 be represented correctly.

* Don't call toString when we know it's a string, just cast

* Updates optNumber to be smarter about which object it uses to parse strings

johnjaylward · 2017-05-18T17:20:26Z

@stleary I added the optFloat/optNumber, but no associated getFloat/getNumber. Should I add them in for posterity?

stleary · 2017-05-18T18:05:35Z

@johnjaylward Sure, in general when an opt* method is added, a corresponding get*method is a good idea.

* Extracts the stringToNumber logic that the optNumber method uses to reuse it between classes * Fixes -0 issue with optNumber/getNumber

johnjaylward · 2017-05-18T23:52:56Z

@TheMatthew can you retest the latest commit of this? I want to ensure the new pull out of the number parsing functions didn't adversely affect the performance of your test case.

guidomedina · 2017-05-19T11:46:07Z

JSONObject.java

+        return val.indexOf('.') > -1 || val.indexOf('e') > -1
+                || val.indexOf('E') > -1 || "-0".equals(val);
+    }
+


Nice "quick and dirty" function, what happens if running on a server with a different locale?

Anyway the following example can help:

static final char MINUS_SIGN; static final char DECIMAL_SEPARATOR; static { final DecimalFormatSymbols currentLocaleSymbols = DecimalFormatSymbols.getInstance(); MINUS_SIGN = currentLocaleSymbols.getMinusSign(); DECIMAL_SEPARATOR = currentLocaleSymbols.getDecimalSeparator(); }

Though I'm not sure what should be done if the decimal separator for the server locale is a comma, not sure how the parse function will behave, might not be worth the effort in supporting this.

We are not looking for any number, we are looking for JSON numbers, which are not locale dependent.

Found it, ignore this completely, FloatingDecimal line 198 & 224:

46 = dot

if(var4 != 46) { break; }

from BigDecimal:

} else if (c == '.') { // have dot // have dot if (dot) // two dots throw new NumberFormatException(); dot = true;

Double.parse is similar.

* prevents erasure of stack trace for rethrown exceptions

TheMatthew · 2017-05-21T02:29:12Z

Sorry, I was in a huge rush, I will retest.

TheMatthew · 2017-05-21T02:58:40Z

Before this patch AKA current git head: 870 s
My suggested simple patch: 122 s
@johnjaylward 's first patch: 120 s
@johnjaylward 's current patch: 126 s

stleary · 2017-05-21T04:00:03Z

Performance still looks good. @johnjaylward let me know when you are ready to proceed with the pull request.

TheMatthew · 2017-05-21T04:25:38Z

Ok, I started profiling. the code in this commit looks moderately tight.

Some low hanging fruits I saw:
JSONObject.keys() should probably never be used, rather it would be better to do an enhanced for loop on JSONObject#keySet()

This brings the execution time down from 126 s to 115.

Then if you have an array let's say called "CHARS_TO_APPEND" which has 256 values, Like this:

    private static final boolean CHARS_TO_APPEND[]= new boolean[256];
static{
	Arrays.fill(CHARS_TO_APPEND, true);
	for(int i = 0; i < ' '; i++){
		CHARS_TO_APPEND[i]=false;
	}
	CHARS_TO_APPEND[',']=false;
	CHARS_TO_APPEND[':']=false;
	CHARS_TO_APPEND[']']=false;
	CHARS_TO_APPEND['}']=false;
	CHARS_TO_APPEND['/']=false;
	CHARS_TO_APPEND['\\']=false;
	CHARS_TO_APPEND['"']=false;
	CHARS_TO_APPEND['[']=false;
	CHARS_TO_APPEND['{']=false;
	CHARS_TO_APPEND[';']=false;
	CHARS_TO_APPEND['=']=false;
	CHARS_TO_APPEND['#']=false;
}

and replace the while loop in JSONTokener#nextValue() with

while (CHARS_TO_APPEND[c]) {

you reach 113s.

Here is a quick profile I did. It can be opened with visualvm 1.3.9 .
JSON-java.nps.zip
The jist of it is that most of your time is now spent in the tokener, which is good and the reader, which is excellent.

Hope this helps.

edit: looking at the examples of acceleration really shows how close this is to optimal. at this point optimizing if statements yields a performance change. that is typically a sign that the low hanging fruits are almost all gone.

TheMatthew · 2017-05-21T14:35:05Z

For transparency purposes I've shared this github benchmark https://github.com/TheMatthew/json-benchmark

it's under apache so it should be compatible for any copy-pasted tests

Initial results:

org.json.simple: 140s,
org.json: 870s
after the patch: 126s

johnjaylward · 2017-05-21T17:13:52Z

@stleary I agree, this still looks good. Let's call this one done. I just pulled in the changes recently merged to master into this PR.

@TheMatthew Lets take that optimization to a separate PR after this one is merged.

Adjustments to tests for stleary/JSON-java#337

adjustments to opt methods in reference to stleary#334

a8d4e47

stleary mentioned this pull request May 17, 2017

Adjustments to tests for https://github.com/stleary/JSON-java/pull/337 stleary/JSON-Java-unit-test#71

Merged

Support for float to BigDecimal in optBigDecimal

bd4b180

John J. Aylward added 3 commits May 17, 2017 11:29

* Update opt* methods for JSONArray

c46774c

* Add support to JSONArray and JSONObject to optionally get raw number values * Add support to JSONArray and JSONObject to optionally get float values

grr, forgot to save changes on last commit

fcdb867

fixes for javadoc

0c7bd72

guidomedina reviewed May 18, 2017

View reviewed changes

stleary mentioned this pull request May 18, 2017

Opt performance enhancement #334

Closed

John J. Aylward added 2 commits May 18, 2017 11:41

* Prevent exceptions in cases where the value is not a string.

382f62e

* Don't call toString when we know it's a string, just cast

* reverts changes to getDouble and related optDouble and optFloat

c28a2bd

* Updates optNumber to be smarter about which object it uses to parse strings

John J. Aylward added 3 commits May 18, 2017 14:24

* Adds methods getNUmber and getFloat to JSONArray and JSONObject

1ab5260

* Extracts the stringToNumber logic that the optNumber method uses to reuse it between classes * Fixes -0 issue with optNumber/getNumber

correct string check for JSONObject optBigDecimal and optBigInteger

a7f8ff2

updates the getNumber/optNumber to not return invalid Doubles

849b392

guidomedina reviewed May 19, 2017

View reviewed changes

* Missed JSONArray optFloat and optDouble for the revert

04d6e83

* prevents erasure of stack trace for rethrown exceptions

Merge branch 'master' of github.com:stleary/JSON-java into OptimizeOpt

757b6ed

TheMatthew mentioned this pull request May 22, 2017

Performance & Code Style, avoid passing iterators around #340

Closed

stleary merged commit fbd2be7 into stleary:master May 23, 2017

stleary added a commit to stleary/JSON-Java-unit-test that referenced this pull request May 23, 2017

Merge pull request #71 from johnjaylward/OptimizeOpt

93ca7b1

Adjustments to tests for stleary/JSON-java#337

johnjaylward deleted the OptimizeOpt branch July 7, 2017 16:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimizes opt* functions #337

Optimizes opt* functions #337

johnjaylward commented May 16, 2017 •

edited

Loading

TheMatthew commented May 17, 2017 •

edited

Loading

johnjaylward commented May 17, 2017

TheMatthew commented May 17, 2017

johnjaylward commented May 17, 2017

johnjaylward commented May 17, 2017

stleary commented May 18, 2017

guidomedina commented May 18, 2017

johnjaylward commented May 18, 2017

guidomedina commented May 18, 2017 •

edited

Loading

johnjaylward commented May 18, 2017

guidomedina May 18, 2017

johnjaylward May 18, 2017 •

edited

Loading

guidomedina May 18, 2017

johnjaylward May 18, 2017

johnjaylward May 18, 2017

guidomedina May 18, 2017

johnjaylward May 18, 2017

guidomedina May 18, 2017

johnjaylward May 18, 2017

johnjaylward May 18, 2017

johnjaylward commented May 18, 2017

stleary commented May 18, 2017

johnjaylward commented May 18, 2017

guidomedina May 19, 2017 •

edited

Loading

guidomedina May 19, 2017 •

edited

Loading

johnjaylward May 19, 2017

guidomedina May 19, 2017 •

edited

Loading

johnjaylward May 19, 2017

TheMatthew commented May 21, 2017

TheMatthew commented May 21, 2017

stleary commented May 21, 2017

TheMatthew commented May 21, 2017 •

edited

Loading

TheMatthew commented May 21, 2017

johnjaylward commented May 21, 2017

Optimizes opt* functions #337

Optimizes opt* functions #337

Conversation

johnjaylward commented May 16, 2017 • edited Loading

TheMatthew commented May 17, 2017 • edited Loading

johnjaylward commented May 17, 2017

TheMatthew commented May 17, 2017

johnjaylward commented May 17, 2017

johnjaylward commented May 17, 2017

stleary commented May 18, 2017

guidomedina commented May 18, 2017

johnjaylward commented May 18, 2017

guidomedina commented May 18, 2017 • edited Loading

johnjaylward commented May 18, 2017

Choose a reason for hiding this comment

johnjaylward May 18, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johnjaylward commented May 18, 2017

stleary commented May 18, 2017

johnjaylward commented May 18, 2017

guidomedina May 19, 2017 • edited Loading

Choose a reason for hiding this comment

guidomedina May 19, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guidomedina May 19, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TheMatthew commented May 21, 2017

TheMatthew commented May 21, 2017

stleary commented May 21, 2017

TheMatthew commented May 21, 2017 • edited Loading

TheMatthew commented May 21, 2017

johnjaylward commented May 21, 2017

johnjaylward commented May 16, 2017 •

edited

Loading

TheMatthew commented May 17, 2017 •

edited

Loading

guidomedina commented May 18, 2017 •

edited

Loading

johnjaylward May 18, 2017 •

edited

Loading

guidomedina May 19, 2017 •

edited

Loading

guidomedina May 19, 2017 •

edited

Loading

guidomedina May 19, 2017 •

edited

Loading

TheMatthew commented May 21, 2017 •

edited

Loading