Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lua 5.1 to 5.3 realignement phase 1 #2836

Merged
merged 5 commits into from
Sep 7, 2019
Merged

Conversation

TerryE
Copy link
Collaborator

@TerryE TerryE commented Jul 18, 2019

See #2033, #2787 #2802 #2803 #2808 and #2823

  • This PR is for the dev branch rather than for master.
  • This PR is compliant with the other contributing guidelines as well.
  • I have thoroughly tested my contribution.
  • The code changes are reflected in the documentation at docs/*.

Overview

My aim here is to make the app/lua53 interface with the platform and other libraries as clean as possible, and since all of the other directories are going to be shared, it is just simpler to backport this tidying into the app/lua code hierarchy. This PR makes the first tranche of these changes to achieve this alignment.

This has ended up pretty large change and it does change the feel of the Lua interpreter:

  • You can now paste 100s of lines into either the UART or telnet interface without data overrun. (The 4Kb string limit hasn't been changed.)
  • Error reporting is more robust and works with telnet. The hook to remove CB panics is include (but not yet deployed in all modules)
  • Telnet works as expected with the VM facilitating the necessary field marshalling

Detailed changes

  • lua/lua.c. Strip out all dead code left over from the eLua and core versions but not used in NodeMCU (e.g standard Lua supports command line arguments but we don't use them so there's no point in having all of these argument decoding functions. Tidy up how startup and how the input loop is handled using the task interface. (Pasting bulk content into Putty etc. not longer overruns the input buffer).

  • app/lua/lnodemcu.c. The NodeMCU extensions to the Lua API are now located in 3 API files: lflash.c for LFS (lauN_*() calls), lrotables.c for ROTables (lauF_*() calls) and lnodemcu.c for other extension (for other lauN_*() calls). The current items in this last category include Lua task handling and catch-all panic handling. The new luaN_call() variant establishes a PANIC call handler that uses luaN_posttask() to send the error traceback to the stdout pipe. Note that the error reporter defaults to using the base print function to send the error to stdout, but I plan add a node.setatpanic() API call in a subsuquent PR to allow production system to establish an alternative logging function for logging such panic errors.

  • lua/lua.h. Add entries for luaN_posttask() and luaN_call() as these functions are used as part of the core Lua VM. New any variants lua_isanyfunction(L,n) and lua_isanytable(L,n) to hide the type test differences between the Lua 5.1 and 5.3 APIs. (Note that these are used in pipe.c and uart.c, but will also be adopted in a later PR for all modules.)

  • lua/lbaselib.c. Stderr-type errors now use the base print function and this calls c_puts() rather than dbg_printf() as the latter is really only for debugging and bypasses node.output() redirection. (It was this change to use dbg_printf() that broke telnet).

  • Task Library. This functionality has now been moved into the platform core. See RFC Why have different coding standards for ESP32 and EP8266 modules? #2811 (comment) for background to this. Note that I have temproarily retained a task.h header that remaps the old interface to the platform calling conventions so that modules using the task interface don't need recoding except that the modules that use the taksk inferface have the #include "task.h" hoisted in the include order. The luaX_ functions now include the Lua call interface.

  • app/module/pipe.c add the ability to bind a Lua CB function at pipe.create() to read and empty the pipe.

  • app/module/node.c the node.output() function now takes a piper reader as an argument. The field output writes to a stdout pipe and the output reader is now executed as a separate task. This means that node.output() works fine from the interactive session and the stdin and stdout pipes handle of the field marshalling automatically.

  • Make for lua now includes -Wall plus minor fixes to lua files to remove -Wall warnings. The one non-trivial change here is that app/platform/vfs.h previously used static declarations in this header which genrated compiler warnings when these should should have been static inline. Plus the lflash.c changes also picked up by Johny on the esp32 branch. Also includes fixing some operator precedence issues because of lack of extra guard parentheses in some define macros in driver/rotary.c and driver/spi.c

  • Move the driver file receive.cto input.c plus enchancements, and ditto for its header. The old file didn't actually do the UART receive handling as this had been moved into lua.c. I've moved the receive handling back and since it does all of the line input handling for the UART0, I decided to rename it to input.c as this better reflects its purpose.

  • The driver file uart.c has had its initialisation binding split into two parts: uart_init() which is called from user_main.c and uart_init_task() which is called from the input.c driver as part of Lua startup to enable input delivery.

  • .gdbinit some minor enhancements to this remote GDB macro set fro API developers

Notes

This is a large change because of all of the interdependencies. I suggest we flush out the pending PRs that we want to do in the next master drop, and merge this immediately after.

@TerryE
Copy link
Collaborator Author

TerryE commented Jul 19, 2019

And the Pipe version of the telnet server. This one is actually usable. I was pasting the following to test out the marshalling

r=debug.getregistry() 
function list(name, t)
  if #name < 15 and (name == '_G' or t ~= _G) then
    print(name,t)
    for k,v in pairs(t) do
      local name = name..'.'..k
      if type(v):sub(-5) == 'table' and name:sub(1,8) ~= '_G.r.std' then
        list(name, v)
      else
        print(name,v)
      end
    end
  end
end
list("_G",_G)

It kept running out of memory, until I twigged was also enumerating r.stdin and r.stdout and the pipes support the tostring intrinsic for debugging so r.stdout[2] contains the string representation of the first stdout pipe UData -- recursive chain reaction, hence the name:sub(1,8) ~= '_G.r.std' guard. Phew!!

Copy link
Member

@jmattsson jmattsson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Phew! That's a lot of changes!

Over all looks really good! There are a few serious questions I have (in the comments above), but for the most part the comments are for grunt-work cleanup.

The input handling of course looks different on the esp32, but there's nothing here that makes me concerned that it'll cause issues on that side. A bit of integration work and platform adjustments, but should be pretty easy (famous last words, I know).


/**DEBUG**/extern void dbg_printf(const char *fmt, ...) __attribute__ ((format (printf, 1, 2)));

static void input_handler(platform_task_param_t flag, uint8 priority);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

task_prio_t ?


/*
** input_handler at high-priority is a system post task used to process pending Rx
** data on UART0. The flag is used as a latch to stops the interrupt handler posting
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/stops/stop/

int lua_main (void);
static bool input_readline(void);

static void input_handler(platform_task_param_t flag, uint8 priority) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

task_prio_t

lua_main();
return;
}
ins.input_sig_flag = flag & 0x1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd quite prefer a symbolic name here. Even if the lack of one is my fault so far >.>

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's now a 0/1 field so this is once where I think that a symbolic isn't really needed.

static void input_handler(platform_task_param_t flag, uint8 priority) {
(void) priority;
if (!ins.data) {
lua_main();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the advantage of doing the init here, rather than when establishing this task handler?

** Note that pipes also support the undocumented length and tostring operators
** for debugging puposes, so if p is a pipe then #p[1] gives the effective
** length of pipe slot 1 and printing p[1] gives its contents
** Note that pipe tables also support the undocumented length and tostring
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it's no longer undocumented! ;)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documented as in readthedocs general documentation. Anyone reading this is digging around in the pipe internals anyway.

#define CB_QUIESCENT 4
/*
** Note that nothing precludes the Lua CB function from itself writing to the
** pipe and in this case this routine will call itself recursively.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recursively, or merely retasking itself?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recursively vs retasking. I leave it to you to propose alternative wording.

The real point here was that node.output() used to be a minefield as it was really breaking the VM architectural assumptions in that the print base function was never intended to embed a Lua CB, so doing a print inside this Lua CB code could trash the whole runtime environment and PANIC the processor.

Because the output interface now uses pipes and the tasking mechanism we now comply with the architecture and there is actually nothing stopping the output CB doing a print.

lua_rawgeti(L, LUA_REGISTRYINDEX, uart_receive_rf);
lua_pushlstring(L, buf, len);
lua_call(L, 1, 0);
return !run_input;
luaN_call(L, 1, 0, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here for example, I believe if the call fails we've just littered the stack with an error message.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually touches on a more systemic issue in that all CB action routines should always balance the stack from entry to return but we don't have any guidance / quality check / coding pattern to ensure this and failure to do so leads to resource leakage that will ultimately exhaust memory. We need to address this wider issue, IMO.

However as to this specific point, we need to decide the stack rules for luaN_call(). IMO we have two options:

  1. It follows the lua_pcall()convention of returning 1 argument and an error status if the called routine throws an error, or
  2. It dummies the lua_call() of returning nresults arguments (which might be nil in the case of an error) and an error status.

IIRC, there only one use in the modules library where the library routine accepts any result from the Lua CB, and all that typically happens is that the wrapper function returns control to the SDK on return from the CB. I think that we should go for easy of migration to luaN_call() and that adopting option 2 will best achieve this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My vote would be for #2 as well. If there are any instances where a module is actually interested in the error message, they can use lua_pcall() for that. For everyone else, I think the simplicity of lua_call() behaviour will give us the most robust outcome.

{
need_len = ( uint16_t )luaL_checkinteger( L, stack );
data_len = luaL_checkinteger( L, stack );
luaL_argcheck(L, data_len >= 0 && data_len <= 255, stack, "wrong arg range");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not LUA_MAXINPUT?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dunno. Ask Zeroday why he used 255. All I did was to recode the existing validation logic to use luaL_argcheck() whilst getting rid of some warnings.

The actual UART input buffer is allocated with input_setup() and this is called in lua.c using LUA_MAXINPUT so I agree that this would be a better symbolic value. So updated.

if (lua_type(L, stack) == LUA_TFUNCTION || lua_type(L, stack) == LUA_TLIGHTFUNCTION){
if ( lua_isnumber(L, stack+1) ){
run = lua_tointeger(L, stack+1);
if (lua_isfunction(L, stack) || lua_islightfunction(L, stack)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you added a nice lua_isanyfunction(L, stack) macro above.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, but I'll be doing all of these as a separate sweep.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted. And thumbs up.

@jmattsson
Copy link
Member

Oh, and Travis is unhappy with you. :)

@TerryE
Copy link
Collaborator Author

TerryE commented Jul 19, 2019

Oh, and Travis is unhappy with you.

@jmattsson, you can blame @marcelstoer and @galjonsfigur and #2790 for this one 😄 This does a lint- style check of all our Lua examples, This is a great idea, IMO _but we haven't as yet done the hard bit of going through all of the examples and fixing the issues, so it is supposed to be optional for now, but for some reason after I rebaselined against devit is running in this PR. The Travis fail arises from these check failing.

As to your other comments, I will go through them and do a separate consolidated reply for most as this will probably be the most readable for you and others.

@galjonsfigur
Copy link
Member

@TerryE I just looked at the Travis log and lint check does not break the build - it only spills lots of warnings, but because of true in this line it won't break the build. The real error is here in the build log:

lflash.c: In function 'flashBlock':
lflash.c:124:3: error: format '%x' expects argument of type 'unsigned int', but argument 3 has type 'const void *' [-Werror=format=]
   NODE_DBG("flashBlock((%04x),%08x,%04x)\n", curOffset,b,size);

@TerryE
Copy link
Collaborator Author

TerryE commented Jul 19, 2019

Reasoning behind some of these changes:

  • Tasking. We added the task interface to Lua quite late on, but the more that I use it and node.js I realise that NodeMCU Lua should fully embrace the tasking paradigm. As node.js demonstrates this isn't really an SDK vs RTOS issue, but is core to the NodeMCU execution model. The SDK vs RTOS aspect simply dictates whether the next event scheduler is provided as an external (SDK) service or is part of the platform environment (under RTOS).
    We need the tasking interface to be available before the Lua environment has had a chance to initialise itself and any ROM-based libraries, so it makes sense to hoist this code into the app/platform code set, hence the rename to the platform_xxx "namespace", though since we have other namespaces such as vfs_ already as part of platform there would be no harm in retaining the task_ but simply moving the service into the hierachy. This includes decisions like renaming types like platform_task_param_t.
    One thing that I want the platform headers to do is to fully encapsulate any links to SDK or RTOS headers to that we handle these withing the two platform layers and not #ifdef bracketed include sets in our code. Reviewers thoughts, please
  • stdin and stdout pipes are an example of how we now use tasking. The uart ISR posts a high priority task to request the UART buffer to be emptied. This task is handled by the input driver code which buffer whole input lines and then does a stdin:write() to stash them in the stdin pipe. The pipe is read by a low priority pipe reader task internal to lua.c which processes one (extra) Lua line per invocation. Since the stashing is done at a higher priority then this will be done preferentially to starting processing the next line. What this means in practice is that it is really difficult to overflow the Lua input: you can just cut and paste large blocks of code into the UART or telnet without source data drop. For me this makes using tools like Esplorer obsolete. I also don't bother with SPIFFSimg any more since it is just easy to have a block of source Lua containing file.putcontents('file1.lua [=[ ... ]=]) chunks and pasting this into the interpreter to set up the FS.
    Likewise the core print function / output_redirect writes to the stdout pipe and this is emptied by a CB task which reads the pipe.
  • luaN_call() uses this task interface as well. Like lua_pcall() it doesn't throw an error and so always returns. Internally it established a traceback error handler which catches any error traceback as a string and then posts a separate task to print this out / send it to the stdout pipe ( or in future to a application selectable handler if the app wants to log these errors separately). So no more PANIC reboots for modules with use luaN_call() to call their Lua CBs, just a proper error traceback or the ability to log this separately.

I'll fix the minor review points that you also picked up. Thanks :)

@jmattsson
Copy link
Member

When Travis is happy, I'll be happy. This is a very nice (and large) improvement - thanks for all the work Terry!

@TerryE
Copy link
Collaborator Author

TerryE commented Jul 20, 2019

Thanks Jonny for all of the feedback. I've tried to action most of your comments except for a few where I've given my reasons. As I say in the title, this is only phase 1. We've got a few more steps to go. However now that you've taken off #2838, I'll be pausing further work here until #2838 is landed and then I can rebase this against the new dev.

However as a general comment we've had 3 committers contributing to this review and one other monitoring on the sidelines, so unless any of the others chip in with objections, I now view this broadly acceptable as a general canon of work, so I will round this off and merge it in immediately after the next master drop.

@devsaurus
Copy link
Member

devsaurus commented Jul 20, 2019

These changes look good and are a significant step forward. Thanks a lot Terry!
Haven't had the time to actually test drive this PR though.

Can't formally approve atm with the android client here 😐

Copy link
Member

@devsaurus devsaurus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@TerryE
Copy link
Collaborator Author

TerryE commented Jul 23, 2019

I've just rebaselined against the current dev, hence the forced push. Not ready for dev merge yet. More testing needed and I need to do master drop anyway.

@TerryE
Copy link
Collaborator Author

TerryE commented Jul 24, 2019

@jmattsson @devsaurus @nwf, I have another big batch of changes pending tidying up the modules. I suspect that it is going to be another couple of weeks before we eventually do the next master drops and are positioned to merge this PR. That will mean another month before we also review and do the module changes as well. Would there be any advantage also to rolling in the module changes into this PR? That way we can drop a couple of weeks from the timeline.

@marcelstoer
Copy link
Member

I suspect that it is going to be another couple of weeks before we eventually do the next master drops

Why? Most issues in https://github.com/nodemcu/nodemcu-firmware/milestone/13 are either completed or close to completion. If it's any help we can drop some PRs from the milestone and snap to master next week. What's your preference?

@TerryE
Copy link
Collaborator Author

TerryE commented Jul 24, 2019

Marcel whenever you, Gregor and Nathaniel can make the cut ... but if these are joined, then I can roll on the prep work. It's really up to the reviews as to whether they think this rolled up patch will be too big to review.

@jmattsson
Copy link
Member

I'd prefer a second PR for the next round of clean-up, it'd be a lot easier to review. Nothing stopping you from just branching off from this PR branch and continuing work though.

@TerryE
Copy link
Collaborator Author

TerryE commented Jul 25, 2019

OK will do it that way. Just have to rebase twice.

@jmattsson
Copy link
Member

Might not need any more rebasing. This one is looking all green at the moment, so unless there's something conflicting going in soon it should just be a couple of squash-n-merges :)

@TerryE
Copy link
Collaborator Author

TerryE commented Jul 26, 2019

I've got my dev-rotable-opts rebased against this PR and everything seems to be working fine, though there is little point in raising this as a PR until this one is merged, as I will want to rebase against the current dev before pushing the new PR.

@TerryE TerryE merged commit fff9f95 into nodemcu:dev Sep 7, 2019
@TerryE TerryE deleted the dev-new-lua.c branch September 7, 2019 09:54
@marcelstoer marcelstoer added this to the Next release milestone Sep 10, 2019
@marcelstoer
Copy link
Member

@TerryE can you please update or close all the referenced issues if necessary.

@TerryE
Copy link
Collaborator Author

TerryE commented Sep 12, 2019

@TerryE can you please update or close all the referenced issues if necessary.

On my TODO list 😄

@TerryE
Copy link
Collaborator Author

TerryE commented Sep 12, 2019

Done. #2808 carried forward to next PR. All others are closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants