Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new benchmark module #2768

Closed
wants to merge 14 commits into from
Closed

new benchmark module #2768

wants to merge 14 commits into from

Conversation

fikin
Copy link
Contributor

@fikin fikin commented May 25, 2019

  • This PR is for the dev branch rather than for master.
  • This PR is compliant with the other contributing guidelines as well (if not, please describe why).
  • I have thoroughly tested my contribution.
  • The code changes are reflected in the documentation at docs/*.

this module offers methods which are timing various esp operations. there are timing methods from other people and as well as some new ones. right now it has methods to cover cpu and memory use, some os functions, some gpio operations and even methods to test timer1 performance in all possible combinations.

the module is not an operations module per se but rather something of testing and learning nature. one would use it mainly when wants to evaluate performance aspects of esp. being compiled as lua module makes its use rather easy and convenient.

note: right now the code is based on pwm2 branch as i'm timing exclusive and shared use of timer1 (exclusive api comes with pwm2 commits).

@fikin fikin changed the base branch from master to dev May 25, 2019 10:27
@fikin fikin changed the title Benchmark new benchmark module May 25, 2019
@marcelstoer
Copy link
Member

You need to rebase this on dev once #2747 (pwm2) is merged to get a clean commit history.

@marcelstoer
Copy link
Member

I don't know what you based your implementation on but the commit history is (still) a jungle. I tried to rebase it on the current dev but I get merge conflicts all over the place. I suggest you extract the relevant changes, run git reset --hard dev, then re-apply them, and then force push this branch. The PR will update here accordingly.

@fikin fikin reopened this May 26, 2019
@fikin
Copy link
Contributor Author

fikin commented May 26, 2019

sorry for that hustle. now i think it should be ok.

@marcelstoer marcelstoer added this to the Next release milestone Jun 22, 2019
@devsaurus
Copy link
Member

Build fails now since #2841 was merged:

In file included from benchmark.c:9:0:
../../sdk-overrides/include/c_types.h:5:2: error: #error "Please do not use c_types.h, use <stdint.h> instead"
 #error "Please do not use c_types.h, use <stdint.h> instead"

Please rebase onto current dev and adapt.

@TerryE
Copy link
Collaborator

TerryE commented Jul 25, 2019

@fikin Nikolay in principle this a great idea, and I personally thank you for doing this work. Even so, our process is to raise an issue to discuss, scope and agree a broad implementation strategy, the whats and hows before and in parallel with raising the PR.

I missed this when first raised -- my apologies -- I'll take a look over the weekend and then give you my review comments.

@TerryE TerryE self-requested a review July 25, 2019 21:26
@fikin
Copy link
Contributor Author

fikin commented Jul 26, 2019

@TerryE : no problem to reshape it on my side. This module was simply an attempt to salvage some otherwise tooling code.

@fikin fikin closed this Jul 26, 2019
@fikin fikin reopened this Jul 26, 2019
@fikin
Copy link
Contributor Author

fikin commented Jul 26, 2019

@devsaurus : right now I'm traveling abroad. First feasible timeslot to look into it is next Tuesday.

@marcelstoer marcelstoer removed this from the Next release milestone Jul 27, 2019
@devsaurus
Copy link
Member

No problem.

@fikin
Copy link
Contributor Author

fikin commented Aug 12, 2019

now import errors are fixed.

@fikin
Copy link
Contributor Author

fikin commented Sep 9, 2019

@TerryE : would you find some time to post some feedback about this module?

@TerryE
Copy link
Collaborator

TerryE commented Sep 12, 2019

@fikin Nikolay in principle this a great idea, and I personally thank you for doing this work.

Nikolay, I still hold the idea of a benchmarking suite is a great idea. However, my main reaction having gone through the code (and the -S option assembler code generated) is that we need to have a higher level discussion on our aims and objectives in doing this exercise at the vary basic levels. I see little to be gained in adding this in its current form.

  • What should we be benchmarking? NodeMCU is a Lua environment so most of our Lua developers are really interested in Lua performance and not in a low level machine code performance; what can I as a developer do to run efficient Lua code. OK, I agree that there might be a smaller group of C coders who are interested in Xtensa machine instruction level performance but even then trying to understand performance measures at this level is still extremely complicated.

  • You need to understand how the ESP implementation of the Xtensa architecture and the gcc optimiser / code generator interact. The Xtensa architectures, like the ARM ones, are highly customisable by the HW developer (Esspressif in this case) in terms of the CPU features that they implement in silicon (Google search). Even so the ESP implementation are still pipelined, so how long a single instruction takes all depends on the context: there are three instruction memory ports for ROM0, IRAM and ICACHE (IROM) and instructions can be pipelined from all three. Our code is executed from the slowest (IROM0) and the H/W has to precache this into ICACHE, at around 18-20 clocks per word fetched in the case of a cache miss, so at 80MHz, the flash fetch bandwidth is nearer 8Mb/s, though code loops and pre-cached hot code will run at nearer full clocked bandwidth.

  • Some instructions are fully implemented in H/W, so like 32×32 multiply are H/W assisted (there is a 16×16->32bit H/W multiply, and a normalised shift instruction to assist in F/P add, sub and multiply, but any divide and modulus operations are slow. The H/W can only handle data word-fetch from flash so 8-bit data access to flash is done using a S/W exception handler and this is slow.

  • On the compiler side, the optimiser will often inline short functions even if you haven't explicitly requested this. If a static function is only used once then the compiler pretty much always inlines it.

But as I say, what Lua application programmers really want to know is "how can I speed up my Lua applications" and this is usually a long way behind "how can I free up more RAM?"

The base set of tests as described in the references The timing testing code article is useful and whilst flawed is still a useful reference, but I see no advantage in wrapping this in a Lua-callable library.

Copy link
Collaborator

@TerryE TerryE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my general comment. A lot further discussion and work needed.

@fikin
Copy link
Contributor Author

fikin commented Sep 23, 2019

what should we benchmark

my working assumption for the moment is that benchmarking is serving module developer rather than casual lua-user. hence this module is mainly platform functionality.
this is also reflected on the request notes.

right now if i have to project the future, i'd imagine this module as dev-branch only available, where people needing cross reference or leaning can compile and use it. and if new functionality comes along extend it.

but now that you mentioned about larger scope lua-space testing, i'm kind of intrigued.
personally i don't have any idea popping out of my mind, but if have some pls do share it?

current platform tests

you're right that most of these tests are not truly informative for experienced developers, except interrupts perhaps (nodemcu platform seems to be changing enough to want to track this behavior). but by having them provided, a reference is established to another published benchmark. this way comparison and outcomes become much more reliable to interpret. at least that was my thinking when opted for adding them.

about xtensa

sure, i'm learning it as it goes ;) and thanks for the help. i'll spend some time in next days to review how memory hints are being used (i largely copied the code rather than analyse it) and then open up some more on it.

@TerryE
Copy link
Collaborator

TerryE commented Sep 25, 2019

Nikolay, you'll find my email addr in commits on the git log. If you wish then ping me and we can set an email channel. Here are a few useful references that are quite hard to find:

Even if we are targeting the C module developer, then the benchmarks as proposed are misleading because of cached vs non-cached issues. Espressif don't give details of the cache design and Cadence (like ARM) provide a range of selection options for their licensed SoC manufacturers (see the ISA), so for example we don't even know whether it is direct mapped or two-way associative, but certainly IROM1 cache-miss execution is at least an order slower than cache-hit or ROM or IRAM1 execution.

Hence you really need to run the code out of IRAM1 when the instruction timing is absolutely essential, but IRAM is such a scarce resource that we should only put absolutely essential code here. For example, I2C master is sloppy cock tolerant so its drivers don't need to run in IRAM1; caching works well enough in practice.

When it comes to C module development, one of the trade-offs that you have to make is when to do stuff in C and when to use Lua. Changing C code requires use of a complete Xtensa toolchain and this is a major hurdle for most developers. Lua code it easily tractable by Lua developers so you will see a trend to only coding functionality in C if there is a compelling reason to. Without an understanding of comparative timing, it is difficult for developers to make this trade-off.

@fikin
Copy link
Contributor Author

fikin commented Jan 26, 2020

@TerryE thanks for the links above

i've been sitting on this pull request for some time now, without clear idea where to take it to from here. basically i can't find a good enough way to make it general-purpose beneficial.

so my thinking is to close it and keep its codebase as dedicated branch in my repo.
if someone needs ready infrastructure to add/do some timing c-tests, he can pull it on top of dev-branch from there.

@fikin fikin closed this Jan 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants