new benchmark module #2768

fikin · 2019-05-25T10:21:24Z

This PR is for the dev branch rather than for master.
This PR is compliant with the other contributing guidelines as well (if not, please describe why).
I have thoroughly tested my contribution.
The code changes are reflected in the documentation at docs/*.

this module offers methods which are timing various esp operations. there are timing methods from other people and as well as some new ones. right now it has methods to cover cpu and memory use, some os functions, some gpio operations and even methods to test timer1 performance in all possible combinations.

the module is not an operations module per se but rather something of testing and learning nature. one would use it mainly when wants to evaluate performance aspects of esp. being compiled as lua module makes its use rather easy and convenient.

note: right now the code is based on pwm2 branch as i'm timing exclusive and shared use of timer1 (exclusive api comes with pwm2 commits).

marcelstoer · 2019-05-25T11:48:16Z

You need to rebase this on dev once #2747 (pwm2) is merged to get a clean commit history.

marcelstoer · 2019-05-26T18:06:38Z

I don't know what you based your implementation on but the commit history is (still) a jungle. I tried to rebase it on the current dev but I get merge conflicts all over the place. I suggest you extract the relevant changes, run git reset --hard dev, then re-apply them, and then force push this branch. The PR will update here accordingly.

fikin · 2019-05-26T21:54:03Z

sorry for that hustle. now i think it should be ok.

docs/modules/benchmark.md

devsaurus · 2019-07-25T20:08:56Z

Build fails now since #2841 was merged:

In file included from benchmark.c:9:0:
../../sdk-overrides/include/c_types.h:5:2: error: #error "Please do not use c_types.h, use <stdint.h> instead"
 #error "Please do not use c_types.h, use <stdint.h> instead"

Please rebase onto current dev and adapt.

TerryE · 2019-07-25T21:25:54Z

@fikin Nikolay in principle this a great idea, and I personally thank you for doing this work. Even so, our process is to raise an issue to discuss, scope and agree a broad implementation strategy, the whats and hows before and in parallel with raising the PR.

I missed this when first raised -- my apologies -- I'll take a look over the weekend and then give you my review comments.

fikin · 2019-07-26T17:26:12Z

@TerryE : no problem to reshape it on my side. This module was simply an attempt to salvage some otherwise tooling code.

fikin · 2019-07-26T17:28:09Z

@devsaurus : right now I'm traveling abroad. First feasible timeslot to look into it is next Tuesday.

devsaurus · 2019-07-27T10:58:11Z

No problem.

…o benchmark

fikin · 2019-08-12T15:59:31Z

now import errors are fixed.

…benchmark

fikin · 2019-09-09T15:40:51Z

@TerryE : would you find some time to post some feedback about this module?

TerryE · 2019-09-12T21:38:59Z

@fikin Nikolay in principle this a great idea, and I personally thank you for doing this work.

Nikolay, I still hold the idea of a benchmarking suite is a great idea. However, my main reaction having gone through the code (and the -S option assembler code generated) is that we need to have a higher level discussion on our aims and objectives in doing this exercise at the vary basic levels. I see little to be gained in adding this in its current form.

What should we be benchmarking? NodeMCU is a Lua environment so most of our Lua developers are really interested in Lua performance and not in a low level machine code performance; what can I as a developer do to run efficient Lua code. OK, I agree that there might be a smaller group of C coders who are interested in Xtensa machine instruction level performance but even then trying to understand performance measures at this level is still extremely complicated.
You need to understand how the ESP implementation of the Xtensa architecture and the gcc optimiser / code generator interact. The Xtensa architectures, like the ARM ones, are highly customisable by the HW developer (Esspressif in this case) in terms of the CPU features that they implement in silicon (Google search). Even so the ESP implementation are still pipelined, so how long a single instruction takes all depends on the context: there are three instruction memory ports for ROM0, IRAM and ICACHE (IROM) and instructions can be pipelined from all three. Our code is executed from the slowest (IROM0) and the H/W has to precache this into ICACHE, at around 18-20 clocks per word fetched in the case of a cache miss, so at 80MHz, the flash fetch bandwidth is nearer 8Mb/s, though code loops and pre-cached hot code will run at nearer full clocked bandwidth.
Some instructions are fully implemented in H/W, so like 32×32 multiply are H/W assisted (there is a 16×16->32bit H/W multiply, and a normalised shift instruction to assist in F/P add, sub and multiply, but any divide and modulus operations are slow. The H/W can only handle data word-fetch from flash so 8-bit data access to flash is done using a S/W exception handler and this is slow.
On the compiler side, the optimiser will often inline short functions even if you haven't explicitly requested this. If a static function is only used once then the compiler pretty much always inlines it.

But as I say, what Lua application programmers really want to know is "how can I speed up my Lua applications" and this is usually a long way behind "how can I free up more RAM?"

The base set of tests as described in the references The timing testing code article is useful and whilst flawed is still a useful reference, but I see no advantage in wrapping this in a Lua-callable library.

TerryE

See my general comment. A lot further discussion and work needed.

fikin · 2019-09-23T16:42:02Z

what should we benchmark

my working assumption for the moment is that benchmarking is serving module developer rather than casual lua-user. hence this module is mainly platform functionality.
this is also reflected on the request notes.

right now if i have to project the future, i'd imagine this module as dev-branch only available, where people needing cross reference or leaning can compile and use it. and if new functionality comes along extend it.

but now that you mentioned about larger scope lua-space testing, i'm kind of intrigued.
personally i don't have any idea popping out of my mind, but if have some pls do share it?

current platform tests

you're right that most of these tests are not truly informative for experienced developers, except interrupts perhaps (nodemcu platform seems to be changing enough to want to track this behavior). but by having them provided, a reference is established to another published benchmark. this way comparison and outcomes become much more reliable to interpret. at least that was my thinking when opted for adding them.

about xtensa

sure, i'm learning it as it goes ;) and thanks for the help. i'll spend some time in next days to review how memory hints are being used (i largely copied the code rather than analyse it) and then open up some more on it.

TerryE · 2019-09-25T20:07:49Z

Nikolay, you'll find my email addr in commits on the git log. If you wish then ping me and we can set an email channel. Here are a few useful references that are quite hard to find:

Even if we are targeting the C module developer, then the benchmarks as proposed are misleading because of cached vs non-cached issues. Espressif don't give details of the cache design and Cadence (like ARM) provide a range of selection options for their licensed SoC manufacturers (see the ISA), so for example we don't even know whether it is direct mapped or two-way associative, but certainly IROM1 cache-miss execution is at least an order slower than cache-hit or ROM or IRAM1 execution.

Hence you really need to run the code out of IRAM1 when the instruction timing is absolutely essential, but IRAM is such a scarce resource that we should only put absolutely essential code here. For example, I2C master is sloppy cock tolerant so its drivers don't need to run in IRAM1; caching works well enough in practice.

When it comes to C module development, one of the trade-offs that you have to make is when to do stuff in C and when to use Lua. Changing C code requires use of a complete Xtensa toolchain and this is a major hurdle for most developers. Lua code it easily tractable by Lua developers so you will see a trend to only coding functionality in C if there is a compelling reason to. Without an understanding of comparative timing, it is difficult for developers to make this trade-off.

…benchmark

and explicit getter for ccount register.

…benchmark

fikin · 2020-01-26T12:05:07Z

@TerryE thanks for the links above

i've been sitting on this pull request for some time now, without clear idea where to take it to from here. basically i can't find a good enough way to make it general-purpose beneficial.

so my thinking is to close it and keep its codebase as dedicated branch in my repo.
if someone needs ready infrastructure to add/do some timing c-tests, he can pull it on top of dev-branch from there.

fikin changed the base branch from master to dev May 25, 2019 10:27

fikin changed the title ~~Benchmark~~ new benchmark module May 25, 2019

fikin closed this May 26, 2019

fikin force-pushed the benchmark branch from 561c3ba to 5f43a41 Compare May 26, 2019 21:44

new benchmark module

7f2d0da

fikin reopened this May 26, 2019

marcelstoer reviewed May 26, 2019

View reviewed changes

docs/modules/benchmark.md Outdated Show resolved Hide resolved

marcelstoer added the new module label May 26, 2019

fikin added 2 commits May 27, 2019 00:58

copy paste error fix

a9cfee7

fixing link ref syntax

7593bae

marcelstoer added this to the Next release milestone Jun 22, 2019

TerryE self-requested a review July 25, 2019 21:26

fikin closed this Jul 26, 2019

fikin reopened this Jul 26, 2019

marcelstoer removed this from the Next release milestone Jul 27, 2019

fikin added 2 commits August 12, 2019 18:52

Merge branch 'dev' of https://github.com/nodemcu/nodemcu-firmware int…

956581d

…o benchmark

using stdint imports instead of c-types

30cb765

fikin added 2 commits August 12, 2019 19:11

adding benchmarking for syste_adc_read function

8290195

Merge branch 'dev' of https://github.com/fikin/nodemcu-firmware into …

a1f4ca0

…benchmark

adding missing include statements

4b23e84

TerryE requested changes Sep 12, 2019

View reviewed changes

fikin added 6 commits October 3, 2019 20:25

Merge branch 'dev' of https://github.com/fikin/nodemcu-firmware into …

136e6c4

…benchmark

adding benchmarking of lua functions.

3340353

and explicit getter for ccount register.

more examples

c2f375c

Merge branch 'dev' of https://github.com/fikin/nodemcu-firmware into …

d9a1b2a

…benchmark

Merge branch 'dev' of https://github.com/fikin/nodemcu-firmware into …

a4b6af2

…benchmark

Merge branch 'dev' of https://github.com/fikin/nodemcu-firmware into …

111a102

…benchmark

fikin closed this Jan 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new benchmark module #2768

new benchmark module #2768

fikin commented May 25, 2019

marcelstoer commented May 25, 2019

marcelstoer commented May 26, 2019

fikin commented May 26, 2019

devsaurus commented Jul 25, 2019

TerryE commented Jul 25, 2019

fikin commented Jul 26, 2019

fikin commented Jul 26, 2019

devsaurus commented Jul 27, 2019

fikin commented Aug 12, 2019

fikin commented Sep 9, 2019

TerryE commented Sep 12, 2019

TerryE left a comment

fikin commented Sep 23, 2019

TerryE commented Sep 25, 2019 •

edited

Loading

fikin commented Jan 26, 2020

new benchmark module #2768

new benchmark module #2768

Conversation

fikin commented May 25, 2019

marcelstoer commented May 25, 2019

marcelstoer commented May 26, 2019

fikin commented May 26, 2019

devsaurus commented Jul 25, 2019

TerryE commented Jul 25, 2019

fikin commented Jul 26, 2019

fikin commented Jul 26, 2019

devsaurus commented Jul 27, 2019

fikin commented Aug 12, 2019

fikin commented Sep 9, 2019

TerryE commented Sep 12, 2019

TerryE left a comment

Choose a reason for hiding this comment

fikin commented Sep 23, 2019

TerryE commented Sep 25, 2019 • edited Loading

fikin commented Jan 26, 2020

TerryE commented Sep 25, 2019 •

edited

Loading