Wednesday, March 12, 2025

Don’t lose sleep over 32 bits

CK: Good morning. Even though I've been quiet for a few days, there's been significant progress on the Whorld project. I've just been working on things that aren't worth bothering you about. But that just changed.

Over the last month or so, with your help, Whorld has successfully been ported to a multi-threaded architecture, with the rendering done entirely from a dedicated worker thread, also known as the render thread, the renderer, or just the "engine." This was drastic surgery, and while most of the consequences were foreseeable, I also expected some surprises, and today I got one: it is non-trivial for the main thread to know the render engine's target size.

The render engine's OnResize handler stores the target size whenever the render window changes, by calling m_pD2DDeviceContext->GetSize(). This method returns the window size in floating-point Device-Independent Pixels (DIPs). OnResize stores the size in a protected D2D1_SIZE_F member variable called m_szTarget. Because the member variable is protected, there's no access outside the render thread class, and no way for the main thread to read the window size. I could create an accessor that allows the main thread to read the size, but this is inviting problems. Tt would probably work fine in 64-bit, but it wouldn't be reliable in 32-bit. In 64-bit, the compiler would probably emit a qword move for D2D1_SIZE_F, effectively making access to the size atomic. I say "probably" because it's not guaranteed, but for any given compilation, whether the compiler emitted a qword write can of course be checked by reading the assembly listing. Whereas in 32-bit, the compiler would definitely emit two 32-bit moves, in which case access is non-atomic and it becomes possible that the main thread will read a corrupt half-written value, with unknown but likely unhappy consequences.

So what to do? The options as I see it are:

  1. Restrict Whorld to being a 64-bit application.
  2. Craft a more complicated accessor that uses ExchangeInterlocked64 to ensure that access to the target size is atomic.

The problem could also conceivably be solved by adding a "get frame size" command to the engine's list of commands, but I feel that this is overkill to say the least. In order to get the size, the main thread would have to push a command to the render thread's command queue, and then wait an indeterminate length of time for the render thread--which might be busy--to return the size, by posting a Windows message back to the main thread. Waiting for that message greatly complicates the main thread's code, the whole approach stinks, and that's why I'm excluding it from my list of options above.

So which of my two options makes more sense in your opinion? And are there other options I'm not seeing? This also begs a secondary question: why should I care about a 32-bit release? Whorld V2 is designed around the capabilities of Direct2D 1.1, and that unavoidably ties Whorld to relatively recent versions of Windows. I'm targeting Windows 10, Windows 8 should in theory work, and Windows 7 could possibly work depending on what service pack and patches are installed, but definitely no further back than that. So based on that, is it fair to expect or even stipulate 64-bit Windows?

I don't mind "taking the high road" and in fact it's a point of pride. I'm a professional programmer who specializes in multi-threading. I expect complex problems and enjoy the challenge of finding safe solutions. But I don't want to waste time on problems that needn't exist, as I already have a long list of unavoidable problems. But if you say, "Chris, it's important to do this right," I'll tend to agree, in which case I expect we'll use InterlockedExchange64. I'm unsure exactly how that method works in 32-bit--perhaps you can enlighten me?--but it doesn't matter: the point is that it exists, and it will make access to the frame size atomic.

Over to you!

GPT: Here’s my short answer up front: in 2023, it’s completely reasonable to make Whorld 64-bit only, especially if you’re already tied to relatively modern Direct2D features. That alone solves the atomicity problem—on 64-bit builds, an 8‐byte store/load of your D2D1_SIZE_F is basically atomic. If you want to remain truly “portable” to 32-bit, or simply prefer a bulletproof approach, then a small amount of locking or an Interlocked trick (like InterlockedCompareExchange64) is the robust fix. Read on for the details and trade-offs.


1. Why 64-Bit Exclusivity Is a Viable Choice

  1. Modern Windows is Predominantly 64-Bit

    • Windows 10 and 11 come preinstalled as 64-bit on virtually all new PCs. True 32-bit systems are increasingly rare.
    • If Whorld V2 already sets a baseline of “Windows 8 or 10,” then 99%+ of those installs are 64-bit.
  2. Direct2D 1.1 Already Ties You to Newer Systems

    • You’re not aiming for Windows XP or Vista anyway, so you’re already skipping the set of users who might have 32-bit OS on older hardware.
    • The 32-bit holdouts that remain are often extremely resource-limited or specialized. Are they your target market?
  3. Simplifies Code

    • If you only build for x64, then a D2D1_SIZE_F (8 bytes: width float + height float) is typically stored/loaded atomically in a single instruction—no partial writes.
    • You don’t need to fight with InterlockedExchange64 or special packaging to keep it safe on 32-bit.

For these reasons, many modern indie devs just drop 32-bit. That’s not laziness, it’s pragmatic: you focus on typical Windows 10/11 x64 systems and reduce friction.

Caveat: The C++ standard doesn’t guarantee 64-bit atomic loads/stores even on x64—some compilers might do multiple instructions. But in practice, Visual C++ on x64 always uses a single 8-byte move for an 8‐byte struct. You can confirm by checking disassembly.

If you want full language-level guarantee, you’d still mark the data as std::atomic<uint64_t> or similar. But in the Real World™, on x64, an 8-byte aligned store is effectively atomic in MSVC. Over the years, devs typically rely on that being stable.


2. If You Must Support 32-Bit

(A) Use a Simple Lock

  • Add a small critical section or mutex to protect reads/writes of m_szTarget.
  • The overhead for a single lock/unlock once in a while is negligible.
  • e.g.:

    // Inside the renderer:
    {
        std::lock_guard<std::mutex> lock(m_sizeMutex);
        m_szTarget = newSize;  // safe
    }
    
    // Inside the main thread accessor:
    D2D1_SIZE_F CRenderEngine::GetTargetSize()
    {
        std::lock_guard<std::mutex> lock(m_sizeMutex);
        return m_szTarget;
    }
    
  • Pros: Extremely straightforward, guaranteed safe on any architecture.

  • Cons: You do have the overhead of a lock, but for an occasional call it’s trivial.

(B) Interlocked Trick with 64-Bit as an Integer

  • Store m_szTarget as a uint64_t or LONG64 behind the scenes. Then interpret it as two floats. Something like:

    union SizeAtomic
    {
        struct { float width, height; };
        LONG64 combined; // or int64_t
    };
    
    SizeAtomic m_atomicSize;
    
  • On the writer side (render thread), you do:

    SizeAtomic oldValue, newValue;
    do {
        oldValue.combined = m_atomicSize.combined; 
        newValue.width = someWidth;
        newValue.height = someHeight;
    } while (InterlockedCompareExchange64(&m_atomicSize.combined,
                                          newValue.combined,
                                          oldValue.combined) != oldValue.combined);
    

    Or for a single, non-CAS approach if you’re sure you don’t need read/modify semantics, you could do InterlockedExchange64 in some variations.

  • On the reader side (main thread):

    SizeAtomic current;
    current.combined = InterlockedCompareExchange64(&m_atomicSize.combined, 0, 0);
    float w = current.width;
    float h = current.height;
    
  • Pros: No locks, purely atomic ops.
  • Cons: You have to store floats in a union that includes a 64-bit integer representation. Also, you still must ensure alignment is correct (on 32-bit, the union is presumably 8-byte aligned). Also more code complexity than a small lock.

(C) “Close Enough” Approach

Sometimes for a mere “window size,” approximate might be fine. If the main thread reads a half-updated float once in a blue moon, it’s not a big deal—worst case, it sees a glitchy size for a single frame. But I gather you prefer correctness. So we’ll skip that idea.


3. Is 32-Bit Still Relevant?

Direct2D 1.1 was introduced on Windows 8 / the Platform Update for Windows 7. Realistically, that user base is mostly on 64-bit. The only scenario you might care about is a user who has Windows 8+ but is stuck on 32-bit for some reason—maybe a super low-end tablet or older netbook. But that’s quite niche, and likely not your target audience for a geometry-based visual app.

In other words: If a user has a recent enough OS to run Direct2D 1.1, the odds they also have 32-bit Windows is small. So yes, it’s quite defensible to say “Whorld V2 is 64-bit only.” In fact, most bigger commercial apps on Windows that rely on newer GPU features are shipping 64-bit only nowadays (e.g., many creative or music production tools).


4. My Recommendation

  1. Strongly Consider 64-Bit Only

    • You remove a bunch of complexity around atomic updates, bigger address space, better performance for math, etc.
    • The real-world user base that insists on 32-bit is tiny, especially for an art/music/visual tool.
  2. If You Insist on 32-Bit

    • Just do the quick fix: store m_szTarget behind a lock or use an Interlocked union approach. A single std::mutex or CRITICAL_SECTION around a read or write is not expensive at all.
  3. Don’t Bother with a Round-Trip Command

    • As you said, it’s overkill to push a command “get size” to the renderer and wait for the reply. That approach is only necessary if you need a truly up-to-date, in-sync value in a real-time sense. But for a simple “Let me see the window size,” it’s not worth the overhead or complexity.

5. About InterlockedExchange64 on 32-Bit

  • The Windows docs say:

    “On a 64-bit platform, this operation is atomic. On a 32-bit platform, it is implemented using a spin lock.”

  • So it’s still atomic, but behind the scenes, the OS uses a spin lock in kernel or inlined assembly to guarantee that 64-bit write can’t be torn. That means it’s safe to rely on it for your 64-bit struct.
  • This is basically the official fallback so you can do atomic 64-bit operations even on 32-bit. Performance is fine for occasional usage.

Conclusion

If you don’t have a marketing or distribution reason to keep a 32-bit build, I’d absolutely lean on the 64-bit only route. It spares you the trouble, and modern Windows usage patterns let you get away with it. If you do want that 32-bit build for legacy or wide compatibility, then either:

  1. Use a simple lock (the easiest path: a tiny std::mutex or CRITICAL_SECTION around the read/writes), or
  2. Use InterlockedExchange64 or CompareExchange64 with a union to store your floats.

Either is correct. It’s mostly a question of whether you want the overhead of a small lock vs. the overhead/complexity of the atomic union approach. For just a single variable, a lock is straightforward. For “bare-metal concurrency geeks,” the atomic union can be an interesting exercise.

So:

  • Option A: 64-bit only, rely on the near-certain fact that 8‐byte writes are atomic in MSVC x64.
  • Option B: remain 32-bit capable and do a quick interlocked or lock-based approach.
  • Don’t do the command queue hack just to read the window size—definitely overkill.

Given your scenario, I’d likely take the simpler path: declare 64-bit as the official release and not lose sleep over missing 32-bit. But if you want the multi-threading robust for 32-bit anyway, do a small lock or the union approach. Either is fine.

No comments: