Skip to content

Replace non-functional 64-bit long int CMS updaters with int64 types#3803

Open
BsAtHome wants to merge 2 commits intoLinuxCNC:masterfrom
BsAtHome:fix_cms-int64
Open

Replace non-functional 64-bit long int CMS updaters with int64 types#3803
BsAtHome wants to merge 2 commits intoLinuxCNC:masterfrom
BsAtHome:fix_cms-int64

Conversation

@BsAtHome
Copy link
Contributor

This fixes #3770 by removing the [unsigned] long CMS updater functions and replacing them with [u]int64_t updater functions using tirpc's xdr_[u]int64_t() mappings (available on all Debians from at least 10). The changes fix 64-bit communication and have been tested to be functional.

The PR also fixes the ascii update functions. They seem to be unused in LCNC, but should be just as functional. [At first glance, there may be several buffer overflows in any and all of the ascii updater functions. At least some existing checks look fishy. But that must be for another PR if these ever are gonna be used.]

@BsAtHome BsAtHome changed the title Replace non-functional 64-bit long int CMS updaters int64 types Replace non-functional 64-bit long int CMS updaters with int64 types Feb 13, 2026
@andypugh
Copy link
Collaborator

Thanks.
Possibly we should get rid of all long and long long as they have become somewhat ambiguous.

@BsAtHome
Copy link
Contributor Author

Possibly we should get rid of all long and long long as they have become somewhat ambiguous.

The updater code no longer has any "long" types specified ("long long" was never there).
Well, except for "long double", but that is typecast down to double anyway.

The problem, of course, is the ambiguity what value sizeof(int) actually has with any compiler and how other libraries are compiled (like tirpc's xdr_*() functions). But fixing this generally would involve a lot mote than a simple change because we'd have to review and evaluate all variables in the RCS_STAT_MSG derived messages (see emc/nml_intf/emc_nml.hh).

I'd be in favour of only using [u]int{8,16,32,64}_t types from stdint, but that is quite a bit of work. And then, I'm not sure NML might survive that long after the discussions we've had about the problems it gives anyway. Some updates and changes might be reusable, but that needs to be evaluated.

@BsAtHome
Copy link
Contributor Author

BTW, it might be a good idea to change the integer updaters to:

    CMS_STATUS update(int8_t &x);
    CMS_STATUS update(uint8_t &x);
    CMS_STATUS update(int16_t &x);
    CMS_STATUS update(uint16_t &x);
    CMS_STATUS update(int32_t &x);
    CMS_STATUS update(uint32_t &x);
    CMS_STATUS update(int64_t &x);
    CMS_STATUS update(uint64_t &x);

The compiler should automatically map to the right size for whatever the caller actually uses, be it int, long int, short int, etc..
However, such a change can wait a moment because it needs some testing before it can go in. The [u]int64_t change (this PR) is rather trivial because the "long int" updaters were never used and didn't function anyway. This PR makes them actually work and there is code in the pipeline that depends on it working.

@BsAtHome
Copy link
Contributor Author

Well then, that was easy. Usually you need to pay tribute, offer goats and pigs for lunch, bribe computers to look the other way and ingest reasonably large quantities of stimulants when changing deep down in a stack like this. This time, it all went smooth.

All updaters now use [u]int{8,16,32,64}_t types. The only required addition was one updater with char * for strings, but that inline-maps to int8_t *. Works without any problems on all (VM) machines tested using Debian 10(clang), 11, 12 and 13 as well as Fedora 43.

@rmu75
Copy link
Collaborator

rmu75 commented Feb 14, 2026

I wonder if you figured out what the heck is going on with https://github.com/LinuxCNC/linuxcnc/blob/master/src/libnml/cms/cms_aup.cc#L167 and the neutral size factor?

It uses sizeof of datatypes to calculate required string length in conjunction with neutral_size_factor (which seems to be hardcoded to 4) and therefore requires way too large message buffers. Shouldn't hurt much in principle but I wonder what I am missing?

@rmu75
Copy link
Collaborator

rmu75 commented Feb 14, 2026

looks good to me.

@BsAtHome
Copy link
Contributor Author

You are right, the buffer checks are wrong. I'm reasonably sure that both ascii updaters (cms_aup and cms_dup) are non-functional due to artificial range constraints in integers and contain buffer overflows. And then, I don't think anybody ever tested them.

That is why I added a warning to both constructors in this patch that will be printed when you try to use them. Maybe they should be removed entirely because there is no point in using them and only generate disadvantages.

@rmu75
Copy link
Collaborator

rmu75 commented Feb 14, 2026

I didn't want to touch that stuff because once you start... you end up replacing it with more ad-hoc stuff. It would probably make sense to use something that is actually maintained.

I rebased my zeromq-exploration branch to current master and am currently working on replacing NML in emccanon / interplist, but I still have to figure out some things and am not sure if flatbuffers is the right library to use for serialization.

@BsAtHome
Copy link
Contributor Author

[BTW, I think we should move this part of the discussion to a separate issue.]

It would probably make sense to use something that is actually maintained.

Well said. But it does require work either way ;-)

I rebased my zeromq-exploration branch to current master and am currently working on replacing NML in emccanon / interplist, but I still have to figure out some things and am not sure if flatbuffers is the right library to use for serialization.

Yeah, about flatbuffers, I'm not sure you need serialization like this, unless you need sophisticated cross-machine/architecture compatibility. I guess that 99.9% of all use would involve IPC or shared memory between the processes which run on the same machine anyway. The final 0.1% can use a remote shell (like linuxcncrsh) to control the machine. Any other-way control (LCNC->device) can, for example, use an Ethernet connected I/O card for what may be located remotely.

So, essentially, all (relevant) communication is within the scope of one machine and could therefore be binary, fixed format and simply packetized. The whole idea and ability of running NML via TCP has always freaked me out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

on 64bit architectures, xdr_long silently truncates 64bit values to 32bit or 0

3 participants