FEX: Switch serialization to use mmap instead of write #5177

bylaws · 2025-12-29T18:12:21Z

A significant proportion of time with the prior approach was spent in the unbuffered write calls used once per code page/entrypoint; taking upwards of 15 seconds to serialize a 700MB P3R cache. Switching to an mmap approach reduces the CPU time there to less than a second, leaving I/O to dominate.

A two-pass approach is used:

Calculate the required size for the cache file, issue a callback to frontend to map a cache file of that size.
Write cache data into the buffer provided by that callback.

CC: @neobrain needs linux testing

A significant proportion of time with the prior approach was spent in the unbuffered write calls used once per code page/entrypoint; taking upwards of 15 seconds to serialize a 700MB P3R cache. Switching to an mmap approach reduces the CPU time there to less than a second, leaving I/O to dominate. A two-pass approach is used: 1. Calculate the required size for the cache file, issue a callback to frontend to map a cache file of that size. 2. Write cache data into the buffer provided by that callback.

neobrain · 2026-01-12T15:59:33Z

Sorry, but Serialize.template operator()<false>() is an instant red flag for me :)

But yeah, good find - the core issue here is the number of small-size writes that became necessary after the recent changes to the LookupCache data structures. I would much prefer using a simple buffering (std::vector) here, since it doesn't involve nested template lambda expressions or a funky mmapping-callback. Using mmap over write plays a negligible role, so I expect equivalent performance here.

bylaws · 2026-01-12T16:15:18Z

Sorry, but Serialize.template operator()<false>() is an instant red flag for me :)

But yeah, good find - the core issue here is the number of small-size writes that became necessary after the recent changes to the LookupCache data structures. I would much prefer using a simple buffering (std::vector) here, since it doesn't involve nested template lambda expressions or a funky mmapping-callback. Using mmap over write plays a negligible role, so I expect equivalent performance here.

Hmm I was wanting to avoid just doubling the memory usage for no reason, especially when mmap is basically free, we could just buffer some of them and not the code buffer write but felt like mmap was just a nicer catch all future proof solution. I could split the lambda out into a separate function? Like the nested lambda isn't inherent to the approach, main reason for templates is as I thought it was kinda cool to reuse the same expression to calculate size as to write out. But a templated static function could do exactly the same.

With that I suppose the only thing would be the mapping callback, which I feel isn't that obtuse? Up to you but when it allows a technically better solution that's easier to extend in future for cache changes I lean towards. (It also means running out of disk space is safely handled which is kinda nice)

neobrain · 2026-01-14T18:03:50Z

Hmm I was wanting to avoid just doubling the memory usage for no reason, especially when mmap is basically free, we could just buffer some of them and not the code buffer write but felt like mmap was just a nicer catch all future proof solution

Yes, we don't need to reduce it to a single write, but just eliminate the egregious high-frequency writes for the BlockList. Could either be one medium-size allocation per write loop or a more generic "WriteBuffered" utility that performs sector-sized buffering. The former will add negligible memory overhead, whereas for the latter it depends on the typical size of inner code page data, which you can probably estimate better than me.

I could split the lambda out into a separate function? Like the nested lambda isn't inherent to the approach, main reason for templates is as I thought it was kinda cool to reuse the same expression to calculate size as to write out. But a templated static function could do exactly the same.

Indeed the nested template lambda was just one particularly inappropriate element here given how simple the underlying problem is. Still as you mentioned, without this quasi-metaprogramming we'd need to duplicate the logic to some extent. And even then it's still writing data through magic mmapped memory, which most people would stumble over at least.

None of that is intractable to figure out for a reader (or maybe I'm lacking imagination to see how simple it could be without the lambdas), but given that simple buffering addresses 99% of the performance concern here, I don't find it a compelling approach from a maintainability perspective.

bylaws · 2026-01-21T01:57:40Z

I thought about this, mmap means we can apply relocations without a temporary vector too since the copy comes for free. If we can avoid 700MB a memcpy the slight increase in complexity is definitely justified. Can you test this on Linux? I'll refactor to split out into a function when I somehow find the time

neobrain · 2026-01-26T11:54:11Z

I thought about this, mmap means we can apply relocations without a temporary vector too since the copy comes for free.

What's the specific plan here? mmap will equally allocate memory behind the scenes, or worse yet if pages get flushed to disk they'll need to be refetched when applying the relocations.

Can you test this on Linux?

Have you done any Linux smoke testing on your end yet?

I'm not terribly opposed to the idea of using mmap, but as mentioned metaprogramming and nested lambdas are overly complex means to implement this. If it can be done without, I can give it another look.

bylaws · 2026-01-26T14:50:09Z

I thought about this, mmap means we can apply relocations without a temporary vector too since the copy comes for free.

What's the specific plan here? mmap will equally allocate memory behind the scenes, or worse yet if pages get flushed to disk they'll need to be refetched when applying the relocations.

Mmap avoids the additional copy through the page cache that write would involve. If the pages get flushed to disk sure perf will be worse but the kernel wouldn't be doing that unless we are memory contained anyway in which case that's better than crashing.

Can you test this on Linux?

Have you done any Linux smoke testing on your end yet?

No, I don't have any Linux code caching env setup. If things trivially work on your side after just applying the the patch it would save time on both sides.

I'm not terribly opposed to the idea of using mmap, but as mentioned metaprogramming and nested lambdas are overly complex means to implement this. If it can be done without, I can give it another look.

Yeah I mean sure, as mentioned will split out lambda to a nested struct with some templating for the same. Should hopefully be easier to read while still avoiding duplication

bylaws marked this pull request as draft December 29, 2025 18:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEX: Switch serialization to use mmap instead of write #5177

FEX: Switch serialization to use mmap instead of write #5177

Uh oh!

bylaws commented Dec 29, 2025 •

edited

Loading

Uh oh!

neobrain commented Jan 12, 2026

Uh oh!

bylaws commented Jan 12, 2026

Uh oh!

neobrain commented Jan 14, 2026 •

edited

Loading

Uh oh!

bylaws commented Jan 21, 2026 •

edited

Loading

Uh oh!

neobrain commented Jan 26, 2026

Uh oh!

bylaws commented Jan 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

FEX: Switch serialization to use mmap instead of write #5177

Are you sure you want to change the base?

FEX: Switch serialization to use mmap instead of write #5177

Uh oh!

Conversation

bylaws commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

neobrain commented Jan 12, 2026

Uh oh!

bylaws commented Jan 12, 2026

Uh oh!

neobrain commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bylaws commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

neobrain commented Jan 26, 2026

Uh oh!

bylaws commented Jan 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bylaws commented Dec 29, 2025 •

edited

Loading

neobrain commented Jan 14, 2026 •

edited

Loading

bylaws commented Jan 21, 2026 •

edited

Loading