Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expose cpu ccount registrer as platform and lua_node functions #2906

Merged
merged 6 commits into from
Jan 4, 2020

Conversation

fikin
Copy link
Contributor

@fikin fikin commented Sep 9, 2019

  • This PR is for the dev branch rather than for master.
  • This PR is compliant with the other contributing guidelines as well (if not, please describe why).
  • I have thoroughly tested my contribution.
  • The code changes are reflected in the documentation at docs/*.

Since I've used time duration with microsecond precision in c code, I kind of missed it in lua.
This request is exposing CCOUNT register value as new lua method to "node" module.
And also declares it as c function in "plarform.h" to allow for future removal of duplicated declarations.

I'll appreciate some feedback, especially if these are not appropriate places for such method.

docs/modules/node.md Outdated Show resolved Hide resolved
@TerryE
Copy link
Collaborator

TerryE commented Sep 12, 2019

Nikolay, IMO the ccount register is useful if you are developing exceptionally time critical C code libraries, and such developers should be aware of how to use it. But for the existing Lua applications developers, I feel that the existing tmr.now() which returns the ccount normalised to microseconds to be quite good enough.

For example:

a={}
for j=1,5 do 
  local t0=tmr.now()
  for i=1,10 do 
    local t1=tmr.now()
    a[i]=t1-t0
    t0=t1
  end
  print(table.concat(a,'  '))
end

160  216  37  122  28  121  32  28  27  121
41  28  27  27  27  27  28  27  27  27
41  27  27  28  27  27  27  27  28  27
41  27  27  28  27  27  27  27  28  27
41  27  27  28  27  27  27  27  28  27

OK, this is running under Lua 5.3 (which is what is on the Wemos chip plugged into my laptop at the moment). But the core 8 Lua VM opcode for loop (lines 4-7) is clocking in at around 27uSec or just over 3uSec per opcode.

	10	[4]	LOADK    	5 -2	; 1
	11	[4]	LOADK    	6 -6	; 10
	12	[4]	LOADK    	7 -2	; 1
	13	[4]	FORPREP  	5 7	; to 21

	14	[5]	GETTABUP 	9 0 -4	; _ENV "tmr"
	15	[5]	GETTABLE 	9 9 -5	; "now"
	16	[5]	CALL     	9 1 2
	17	[6]	GETTABUP 	10 0 -1	; _ENV "a"
	18	[6]	SUB      	11 9 4
	19	[6]	SETTABLE 	10 8 11
	20	[7]	MOVE     	4 9
	21	[4]	FORLOOP  	5 -8	; to 14

(The hickups on the first iteration are as the VM grows the table).

@TerryE
Copy link
Collaborator

TerryE commented Sep 17, 2019

And here are the results with the int32 / float (32-bit) build: the size and runtime advantages of the Int build but with the flexibility to do single precision FP calcs if you need to.

167  156  97  89  27  91  26  27  26  94
28  34  26  25  26  26  26  26  26  26
24  26  26  26  26  26  25  26  26  26
24  26  26  26  25  26  26  26  26  26
24  26  26  26  26  26  26  26  25  26
> return math.pi
3.141593
> return 10^39
Infinity

😄

@marcelstoer marcelstoer added this to the Next release milestone Dec 28, 2019
@marcelstoer
Copy link
Member

@TerryE should your valuable feedback be seen as a nay wrt to merging this? I don't have any objections to adding this new function.

@TerryE
Copy link
Collaborator

TerryE commented Dec 30, 2019

Marcel, the ccount register is perhaps useful in examining extremely time critical C code, but you would do this with a C macro that generates an inline asm instruction. At a Lua level the overhead of doing the setup and then API lua_call means that this function will give meaningless results over and above the existing tmr.now(). Nikoay has done some excellent work, but on this one a definite 👎 from me. Sorry.

@marcelstoer
Copy link
Member

You're definitely right about the Lua overhead distorting the results. Feel free to close the PR. Would be interesting to see a side-by-side comparison of tmr.now() and node.ccount() based calculations for sample operations to see how much they deviate. Maybe @fikin could provide those numbers?

@fikin
Copy link
Contributor Author

fikin commented Dec 30, 2019

side by side both ways using wemos, latest dev-branch 32-float build:

a={}
for j=1,5 do 
  local t0=tmr.now()
  for i=1,10 do 
    local t1=tmr.now()
    a[i]=t1-t0
    t0=t1
  end
  print(table.concat(a,'  '))
end
119  1258  712  710  93  715  93  98  93  717
111  129  89  90  89  90  89  89  90  89
112  121  89  89  90  89  89  90  89  90
108  124  90  89  90  89  89  90  89  89
112  121  89  89  90  89  89  90  89  90
'
a={}
for j=1,5 do 
  local t0=node.ccount()
  for i=1,10 do 
    local t1=node.ccount()
    a[i]=t1-t0
    t0=t1
  end
  print(table.concat(a,'  '))
end
9246  99727  56637  56755  7467  57165  7468  7792  7467  57813
9573  9672  7146  7155  7146  7146  7146  7155  7146  7146
8939  9986  7147  7155  7146  7146  7146  7156  7146  7146
9254  9672  7146  7155  7146  7146  7146  7155  7146  7146
8939  9986  7147  7155  7146  7146  7147  7155  7146  7146

ccount values are 80MHZ counters. they have to be multiplied with 0.0125 to get them in us.
i.e.

112.15  1265.6875  711.75  709.1875  93.8125  710.6625  93.8125  97.875  93.8125  717.75
116.2  141.0375  89.8125  89.925  89.8125  89.8125  89.8125  89.925  89.8125  89.8125
120.1375  133.1875  89.8125  89.925  89.8125  89.8  89.8125  89.925  89.8125  89.8125
116.2  137.125  89.8  89.925  89.8125  89.8125  89.8125  89.925  89.8125  89.8125
120.1375  133.1875  89.8  89.925  89.8125  89.8125  89.8125  89.925  89.8125  89.8125

@fikin
Copy link
Contributor Author

fikin commented Dec 30, 2019

@TerryE : i think your view on this is a bit too narrow seeing it as shootout between ccount and tmr.now only.

if i look from that angle alone, i'd add that ccount is not lua-portable function and being there only would extend the tech debt without clear benefit.

these are all fair and square but i see some other angles as well.

ccount function allows to compare c and lua duration results directly, no need of metrics conversion. for c-module development i think this is great.

and not to forget, ccount arithmetic doesn't suffer from 31-bit rollover of tmr.now.
i.e. on ccount rollover:

print(node.ccount()-node.ccount())
1

;)

@TerryE
Copy link
Collaborator

TerryE commented Dec 30, 2019

Nikolay, my vote is a weak -1, so I am happy to be outvoted on this. That being said, if we do add a ccount function then it certainly doesn't belong in node IMO. The purpose of node is provide a Lua callable interface to certain OS-style services provided by the SDK; it is not a catch-all for odd functions. ccount is a timing service just like now() so if it goes anywhere then it should go in tmr, in my view.

@fikin
Copy link
Contributor Author

fikin commented Jan 4, 2020

i've moved the method to tmr package as suggested by terry.

benchmarking lua functions with it:

function timeIt(fnc, cnt)
   local function loopIt(f2)
     local t0 = tmr.ccount()
     for i=1,cnt do
       f2()
     end
     local t1 = tmr.ccount()
     return math.ceil((t1-t0)/cnt)
   end
   assert(type(fnc) == "function", "function to test missing")
   cnt = cnt or 1000
   local emptyTime = loopIt(function()end)
   local deltaCPUTicks = math.abs(loopIt(fnc) - emptyTime)
   local deltaUS = math.ceil(deltaCPUTicks/node.getcpufreq())
   return deltaCPUTicks, deltaUS
end

print( timeIt(function() end) )
6	1
print( timeIt(function() tmr.ccount() end) )
6274	79
print( timeIt(function() tmr.now() end) )
6325	80

same logic works just fine with tmr.now() instead, without the extra precision of ccount of course.

@fikin
Copy link
Contributor Author

fikin commented Jan 4, 2020

@marcelstoer if there is no other person sharing my taste in cpu timing ;), i think we can close this request and forget about it. i'll keep it in a branch of my own rather than push it to master...

@marcelstoer
Copy link
Member

I'm actually fine to expose this. It's up to the user to use or not use it. As Terry only gave a weak -1 I'll merge it. Thanks for your efforts.

@marcelstoer marcelstoer merged commit 1c83f02 into nodemcu:dev Jan 4, 2020
@fikin fikin deleted the ccount branch January 8, 2020 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants