RFC - Sorting out the input handling #2787

TerryE · 2019-06-08T20:26:05Z

Another case of "I am currently doing my Lua 5.3 port and there are a number of issues on which I would like to give a heads-up / get feedback from the contributors." This one relates to how we integrate input handing into lua.c.

I have never really taken a close look at this bit of lua.c + user_main.c uart.c before, but it is really all a bit of a botch -- the sort of thing that might have evolved if you were hacking an application together against a deadline rather than designing it properly -- which was the case for the early versions 1.x of Lua51, but now I am trying to srt out Lua53 for the long term so it's time to design it properly, IMO.

Architecturally, the standard src/lua.c assumes that an underlying OS will provide some form of API for a synchronous readline() function that handles stuff like terminal echo; the Lua interpreter sits in the doREPL() function looping around this readline call. This approach doesn't map onto an event-driven task-orientated system, where the uart driver or posts a receive events for each line received which must then be handled and an atomic task. Hence this doREPL() logic must be inverted, so that the interactive processing is all triggered from a line-in callback.

Note that the way the Lua interpreter does chunking is a bit like a sledge hammer. Each source chunk is assembled from sequence of successive input lines. The entire chunk is recompiled each time an extra line is added .

If it is successful then compiled chunk is executed and the line list flushed.
If the compile raises an <eof> error then the chunk is assumed incomplete and the interpreter waits for another line.
Any other error is reported and the line list flushed.

The means that if a 20 line chunk is received then the 1st line compiles a 1-line chunk, the 2nd a 2-line chunk and so on so the 20th line compiles a 20 line chunk. Hence the per-line the compiler overhead goes uniformly as the chunk size increases. This isn't really an issue if the lines are really being typed interactively. However, if some IDE such as Esplorer or even if you are just pasting a chunk of code into PuTTY, then it is quite easy to get to the point where the input overruns the ability of the interpreter to flush input buffer.

Another complication in the case of NodeMCU is that there is nothing stopping other application callbacks firing whilst you are typing and thus threading other Lua processing tasks in between the lines in a chunk.

Given then everything outside the app/lua or app/lua53 directories is going to be shared then its going to be simpler for me to packport some of this tidy-up into app/lua/lua.c so that we can keep the rest common.

In terms of input tasks, I want to split the emptying of the fixed size input FIFO which should be short duration high-priority task from the activities relating to processing it, in the case of compiling, the input lines should be queued at high priority and should run at low priority. IMO, the best way to do this is to introduce a Lua string array to as a buffer of complete input lines. The UART event/readline function will be posted at high-priority and fills this line array. The interactive loop the processor-intensive compilation task will empty it at low priority. This high / low split priority spilt will largely mitigate some of the overrun issues can can occur during startup and for burst input.

The text was updated successfully, but these errors were encountered:

jmattsson · 2019-06-09T00:53:49Z

Off the top of my head I can't remember whether the esp32 port uses the same approach, or whether I got away with the regular stdio setup there.

TerryE · 2019-06-09T04:41:13Z

@jmattsson, I want this lua53 code base to be used for both branches, so I have taken a separate copy of this branch and will do a three way merge at some point. I'll post more later. Having my morning tea in bed at the mo. 😂

TerryE · 2019-06-09T09:47:44Z

On thing that we (at least I) have always taken as given is that the Lua VM just works as-is and dovetails into the NodeMCU task oriented paradigm. But I have never questioned why and how. We have three layers of execution:

Task master. This is a C layer called by the various event call backs. The first is the user_main startup which creates the L0 newstate(), and initialises the Lua stack, G(L) and L. The Lua API here runs at call level 0. The interpreter and any callbacks use the L0 and stack initialised by startup. These execute the Lua function layer via one of the lua_call() variants.
Lua layer. The function call frames use the Lua stack and a linked list of CallInfo call frames. This grows or shrinks during execution, but always returns control to the C task master at level 0. These functions will in turn call the C library modules.
C library modules. This is the set in app/modules plus the core app/lua/*lib.c ones. These in general don't directly callback to the Lua layer, but instead post even-based callbacks that subsequently execute when scheduled at the task master layer. (IIRS, the sjson module is the only one to do so.)

In general the Lua call interface handles stack cleanup on exit, so this works well as long as the individual task masters run atomically and are single threaded, and that the stack is balanced and doesn't have any leaking growth. IMO there are two weaknesses here:

The task layer isn't executed through the Lua call mechanism so if error paths leave data on the stack then the stack will just grow continually.
In general, the task layer doesn't pcall the Lua code, so any Lua exceptions will be uncaught and panic the ESP.

In fact the Lua interpreter polling readline uses this first feature to build up the Lua chunk from line to line.

In my view both of these are highly undesirable. Ideally we should just wrap the task callback code in a pcall, but that is going to be another bulk editing exercise.

PS. Given that the task code itself rarely throws an error, and nearly all tasks immediately exit, it might be better to bulk edit all CBs to include a LUA_TASK_PROLOG(), which declares and sets L, pushes an error handler, and notes the stack top. The last line in the task code would be a LUA_TASK_CALL_AND_EXIT(n) which would do the lua_pcall() with the error handler and print the error / call a catchall Lua function on error, lua_settop() the stack if necessary, possibly kick off a Lua GC sweep above some heap amber level and return from the routine. This would be a straightforward bulk edit for perhaps 95% of CBs in the modules folder.

HHHartmann · 2019-06-11T13:59:20Z

Returning to the original issue I would also sort out the node.input () issue which does not collect input until it is compiled correctly but fails on partial commands which breaks all telnet and serial forwarder examples out there including our own.
Also there could be even more input streams via UART2 or bluetooth which should work the same way.
The latter two could be handled by a read and node.input loop.
Maybe it would be good to have separate input buffers for both allowing to still enter commands via serial if other input goes crazy.
It might also be nice to allow multiple buffers for node.input to allow multiple command streams. But that also might be overdone for or Hardware.

TerryE · 2019-06-11T20:44:21Z

Gregor, we are a little constrained by the interpreter architecture and the way that it defines a "chunk": that is the smallest number of complete lines that does not raise an incomplete error when parsed (parser hit end of stream when still looking for lexical components), and also that we can have multiple source streams. The current implementation is a botch.

The Lua VM should also be in a well determined state when doing this parsing, again which it isn't for node.input() requests as this is called in the middle of a Lua execution stack. And there is nothing to stop UART and node.input() lines being interleaved which creates more mess.

I also need to be careful not to waste too many RAM resources whilst doing all of this.

Let me do the first cut and I'll post it as a PR

TerryE · 2019-06-11T22:23:49Z

The standard Lua interpreter sits at call level 0 looping around the readline() and sending output to printf(). However, NodeMCU has two input sources: UART0, and the running application through node.input() -- and as you say possibly more sources in the future such as bluetooth, etc..

We also want to be able to redirect output (both stdout and stderr) to an output listener as well as sending it to UART0.

The standard interpreter accumulates lines on the Lua stack, but this model does not work with multiple input sources. What I have done so far is to create a lineQ resource which is hung off a Lua registry entry, and this:

is LUA_NOREF when empty
points to a string containing the line when Q is has one entry
points to an array of lines when multiple lines are queued.

This may seem a little convoluted, but under normal circumstances the Q will bounce between empty and 1 entry, so this strategy avoids needing to create Lua array resources unless necessary. However, the Q is filled by a high priority task which then posts a low priority task to do the compile, so burst input will cause the queue to grow.

Another advantage of this approach is that we can simply maintain two Qs instead of 1 (one for each of the API and the UART), and build the chunk in the Q.

We can also use a similar approach to tidy up output handling -- which currently breaks the Lua architectural rules as the NodeMCU function runs in a single task, but can generate many field-level printf()s which typically need to go through the net callback interface and multiple tasks to send the content back over TCP/IP. I used the marshaling in my Lua telnet example to try to get around this, but this is really just putting more sticking plaster over a botch.

The correct approach is to do is either direct output to the UART or to buffer output, but to handle this buffering properly in the VM. The output from a single task will be buffered, and the handler to clear the buffer will be posted as a separate task to run on completion of the outputing task. If the application is redirecting output then the application tasks and the logging tasks will interleave within normal SDK task priorities.

node.output()

Syntax

node.output(function(array), [serial_debug,] [number/end_char] )

Parameters

output_fn(array) a function accept every output in an array, and can send the output to a socket (or maybe a file). If the function is omitted then normal output to the UART is restored.
serial_debug 1 output also show in serial. 0: no serial output.
number/end_char if a single char string then this delimits the records; if a number then the output is aggregated into this maximum record size.

Returns

nil

Example

local function tonet(q)
  if type(q) == 'table' and #q > 0 then
    sk:send(table.remove(q,1), tonet)
end
node.output(tonet, 1, 1400)  -- serial also get the Lua output.

Note that even though the (maximum) buffer size can be specified, the output task is still scheduled to run after every application task that does at least one print. So this redirection Q is flushed on the completion of each callback. Taking the example above if single application CB generates 2,000 bytes output, then the tonet() CB will get invoked twice once with #q[1] == 1400 and once with (the next) #q[1] == 600, and in fact there is nothing to stop the application itself firing another CB which can add to the Q array between these calls.

TerryE · 2019-06-15T10:48:04Z

As I mentioned about, I've decided to back-port a lot of these changes into Lua 5.1. My reasoning is that I want to keep everything outside of app/lua and app/lua53 common, so it is just easier to unify the interface from Lua to the rest of the firmware. (It also means that I can stress test all this independent of all of the other Lua 5.3 stuff.) An upside is that there are a number of wish list items that have come up from time to time are going to be available in the current Lua 5.1 as a side effect:

Buffering of input now works properly. So I can use a terminal program to connect to UART0 and paste in a 200 line program and no overrun! This plus file.putcontents() makes initialising the FS a doddle.
Sensible marshaling of node.output() takes place.
Stderr messages now get passed to the node.output() handler
I am going to add a node.atpanic(function) CB handler.

I've been using Node.js and Node-RED for my own Home Automation work, I am struck by the architectural similarities between the Node.js and NodeMCU execution models. The non-blocking atomic task architecture is common to both. Since this is core to NodeMCU, I've started to use the tasking model within some VM aspects, and it really does make like a lot easier.

Alfrederson · 2019-06-15T14:21:59Z

@TerryE does that buffering part mean you could stream input from another task too? I have a queue for receiving received data, once I get a \n I put the partial string inside the lua_Load thing, I wonder if this is why any function that ends up calling vfs_open() triggers a wdt reset

TerryE · 2019-06-15T19:16:24Z

I have a queue for receiving received data, once I get a \n I put the partial string inside the lua_Load thing

IMO, that's the correct way to invoke Lua code; the node.input() is really for simple terminal emulation. Remember that strings terminating in \n are necessarily complete Lua chunks, for example:

a=someFunc(

TerryE · 2019-06-16T22:36:16Z

I have also added the complimentary output stuff, and especially a new pipe module that is a C variant of @nwf Nathaniel's lua fifo module, but using a userdata aggregator to avoid excessive string GC. It is really a flexible way of piping byte-stream data between separate tasks. So in the case of terminal output, we can use this to separate all despooling to network only CBs:

do 
  local p  = pipe.create()
  local fp = p:reader(1400)
  local running = true
  
  local function despooler() -- Upvals: fp, sk, despooler, running
    local buf = fp()
    if buf then
      sk:send(buf, despooler)
    elseif not running then 
      p, fp = nil
      -- you might also want to close/dispose sk here as well
    end
  end
      
  local function spool_rec(rec) -- Upvals: p, despooler
    if #p == 0 then
      node.task.post(node.task.HIGH_PRIORITY, despooler)
    end
    p:write(rec)
  end
  
  function close_output()  -- Upvals: p, fp, running
    node.output()
    if #p then
      running = false
    else
      p, fp = nil
    end    
  end
  
  function spool_output()  -- Upvals: spool_rec
    node.output(spool_rec)
  end
end

Here, the despooler CBs interleave as separate tasks between the application tasks. All of the marshalling and blocking of the network sends is handled through the p:write() and function returned by p:reader()

TerryE · 2019-06-17T12:20:13Z

Sometimes you go to sleep pleased that a new piece of functionality like my pipe implementation is working well -- only to wake up in the morning to the realisation that you could have used a neater strategy. Time to recode, I think.

TerryE · 2019-06-18T22:45:18Z

OK, the new version is all working. Since it's a new module and won't break existing code, then I'll push and commit it tomorrow. @nwf perhaps you might want to have a play.

TerryE · 2019-06-19T13:01:16Z

Coming back to this thought of putting the NodeMCU and Node.js architectures along side each other, really the main difference between standard Lua and NodeMCU Lua (whether on the ESP8266 or the ESP32 variants) is that the standard lua.c is a pure interactive wrapper around stdin / stdout whereas NodeMCU is structured much more like Node.js in that it decomposes an application into small non-blocking tasks, each of which runs to completion.

However the current NodeMCU lua.c and user_main.c framework sort of bolts this on as an afterthought, which complicates the implementation, IMO. We can make this structure a simpler and a lot more robust by integrating the node.task.post() mechanism into the core Lua API as lua_posttask() and making use of this in a number of ways. I would go so far as to say that we should continue to support but deprecate the non-Lua version of the task handler except for ISRs.

Let me do the PR but this is one that will need careful review and comment before we commit it.

jmattsson · 2019-06-26T03:46:34Z

Having just tried using node.input() on the ESP32 with no success, I'm wondering why we even have the function in the first place. What is it meant to offer that a print(str) loadstring(str)() does not (and in a much less fraught manner)? I don't think "interleaving of input characters" as a feature in this case... :D

TerryE · 2019-06-26T05:42:35Z

Usecases such as in COAP where the string is supposed to be a complete Lua chunk should definitely use a load(string) implementation. The node.input() only valid use IMO is for stdin redirection of the interactive loop, such as in telnet implementations. And after thinking about this I agree that there is little point in having multiple streams.

We are getting there 😊😬

TerryE · 2019-09-12T23:09:18Z

Fixed in #2836

TerryE assigned TerryE and jmattsson Jun 8, 2019

TerryE added the Lua 5.3 label Jun 15, 2019

TerryE mentioned this issue Jun 19, 2019

Baseline version of Pipe Library #2797

Merged

4 tasks

TerryE mentioned this issue Jul 2, 2019

Major PRs in the run up to Lua53 #2816

Closed

TerryE mentioned this issue Jul 18, 2019

Lua 5.1 to 5.3 realignement phase 1 #2836

Merged

4 tasks

TerryE closed this as completed Sep 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC - Sorting out the input handling #2787

RFC - Sorting out the input handling #2787

TerryE commented Jun 8, 2019

jmattsson commented Jun 9, 2019

TerryE commented Jun 9, 2019

TerryE commented Jun 9, 2019 •

edited

Loading

HHHartmann commented Jun 11, 2019

TerryE commented Jun 11, 2019 •

edited

Loading

TerryE commented Jun 11, 2019

TerryE commented Jun 15, 2019 •

edited

Loading

Alfrederson commented Jun 15, 2019

TerryE commented Jun 15, 2019

TerryE commented Jun 16, 2019

TerryE commented Jun 17, 2019

TerryE commented Jun 18, 2019

TerryE commented Jun 19, 2019

jmattsson commented Jun 26, 2019

TerryE commented Jun 26, 2019

TerryE commented Sep 12, 2019

RFC - Sorting out the input handling #2787

RFC - Sorting out the input handling #2787

Comments

TerryE commented Jun 8, 2019

jmattsson commented Jun 9, 2019

TerryE commented Jun 9, 2019

TerryE commented Jun 9, 2019 • edited Loading

HHHartmann commented Jun 11, 2019

TerryE commented Jun 11, 2019 • edited Loading

TerryE commented Jun 11, 2019

node.output()

Syntax

Parameters

Returns

Example

TerryE commented Jun 15, 2019 • edited Loading

Alfrederson commented Jun 15, 2019

TerryE commented Jun 15, 2019

TerryE commented Jun 16, 2019

TerryE commented Jun 17, 2019

TerryE commented Jun 18, 2019

TerryE commented Jun 19, 2019

jmattsson commented Jun 26, 2019

TerryE commented Jun 26, 2019

TerryE commented Sep 12, 2019

TerryE commented Jun 9, 2019 •

edited

Loading

TerryE commented Jun 11, 2019 •

edited

Loading

TerryE commented Jun 15, 2019 •

edited

Loading