CK: I’m making steady progress on the Whorld feature set. Today I got snapshots working. A “snapshot” is a Whorld-specific binary file format that contains all the necessary information to reconstruct a Whorld frame. It’s essentially Whorld’s equivalent of a vector-based format. It’s incredibly compact: a typical snapshot is between 10K and 50K bytes. The size varies with the number of “rings” in the drawing. Internally, a snapshot consists of a fixed-length portion containing various drawing state data, followed by a VLA (Variable Length Array) having one RING element for each of the drawing’s rings.
The advantage of snapshots—compared to the File/Export command, which exports the drawing as a PNG file—is that snapshots are resolution-independent. So a snapshot can subsequently be reloaded and exported at a much higher resolution than was being used when the snapshot was “taken” (captured).
In the original Whorld, it was also possible to record a “movie” consisting of a series of snapshots, one per displayed frame. The snapshot movie is resolution-independent; no pixels are recorded. The snapshot movie could subsequently be reloaded and exported as video at any desired resolution. The snapshot movie could also be viewed without exporting it, since the snapshots are easily redrawn as frames. And, it was possible to export only a subset (range) of the frames.
I hope to implement the above-described movie recording capability in Whorld V2, but with some important differences, as follows:
- Export image sequence only: V2 will only export a snapshot movie as a numbered image sequence. The primary advantage of an image sequence is that it keeps video compression completely out of Whorld’s code. Even today, Windows video compression is a bloated hellscape of obscurity, and will never be as flexible or satisfactory as ffmpeg. Not to defend Microsoft, but this is partly due to IP issues that ffmpeg’s developers sidestep.
- Reconsider snapshot storage: I would like to revisit how exactly the app stores the snapshots during recording. The render thread is responsible for recording the movie, so storing each frame needs to be very fast, otherwise we’ll fall behind the display frequency and rendering will be disrupted.
In my view, the most fundamental decision is whether to store the snapshots in a file, or in memory. To decide which option makes the most sense, we need some capacity analysis. The original Whorld app recorded the snapshot movie to a file, but it also was running at a slower frame rate, typically 25 Hz.
A typical patch with about 200 rings requires around 40K bytes per frame. At 60 Hz, that’s 2.5 MB per second, 144 MB per minute, or 8.64 GB per hour. Agreed?
Ideally the user should be able to record an entire VJ performance, which could potentially last for hours. That goal seems incompatible with recording to memory. A typical target system might have only 16 GB of memory, so we’d be out of memory in less than two hours of recording.
Let’s examine the file option, which will be much less constrained in terms of maximum recording size. According to Google, a typical hard disk write speed is 30-150 MB/s (in 2024). We need 2.5 MB/s, so we’re well within capacity there.
So based on that simple analysis, I’m leaning towards writing the frames to a file, just as Whorld V1 did. The question then is: what type of file will perform best? There are many options in Windows, but based on my experience (with another app of mine, WaveShop, which recorded audio) in my opinion we’ll get the best performance using CreateFile. Do you concur with that opinion?
CreateFile gives us some powerful and flexible options that other file types don’t. In particular, we should in my opinion consider this sentence from MSDN “When FILE_FLAG_NO_BUFFERING is combined with FILE_FLAG_OVERLAPPED, the flags give maximum asynchronous performance, because the I/O does not rely on the synchronous operations of the memory manager.”
Our top priority is that the write operation should be as fast as possible. Is it reasonable in your view to use asynchronous (overlapped) writes for this application? On the surface, it seems ideal, but I know from experience that it brings many complications. In particular, it’s necessary to keep the data buffer and the OVERLAPPED structure allocated and unmodified until the write completes. So in practice I suspect we wouldn’t be able to do better than overlapping by one frame. But this could still be a big win. Let’s compare the cases:
- Synchronous write: The render thread finishes rendering the frame. Before presenting the frame to the swap chain, it stores the snapshot data to a buffer, and then writes that buffer to the snapshot movie file. If the file write takes too long, the render thread calls Present too late and misses the vertical retrace, causing a glitch in the displayed output. Disaster!
- Asynchronous write: As before, the render thread finishes rendering the frame, stores the snapshot data to a buffer, and then writes it to the file. But in this case, the write function returns almost instantly. Now the render thread calls Present, and while it’s blocked in Present waiting for vertical retrace, the OS writes to the file asynchronously. When Present unblocks, the render thread begins drawing the next frame, and again, the write continues asynchronously. Only when the next frame is completely drawn do we need to wait for previous frame’s write to complete.
This sure sounds like a significant optimization. But, overlapped I/O incurs OS overhead. The question we must answer: is the increased overhead of overlapped I/O justified by the potential performance gain from asynchronous writing? In order to determine that, we need a basis for comparison. So it seems to me that the first step is to set up CreateFile for regular synchronous I/O, and then have Whorld record a snapshot movie, and while it does so, accumulate statistics—minimum, maximum, average, and standard deviation—on how long the file writes are taking. If it turns out that writing 40K is consistently too fast to be worth optimizing, we can just go with that, and avoid all the additional development time and complexity risk of using overlapped I/O.
Your thoughts?
ChatGPT: Weighing write strategies
I’m thinking through comparing synchronous and asynchronous snapshot recording methods. Synchronous write to a file seems most viable, using 2.5 MB/s.
ChatGPT: Short Answer: It’s perfectly reasonable to start with a regular synchronous file write (CreateFile in normal mode) and measure how long it takes to write ~40 KB per frame. If it’s consistently small (well under the ~16 ms budget at 60 Hz), you can skip the complexity of overlapped I/O. If it’s borderline, then you can explore overlapped (asynchronous) writes with a small ring buffer of snapshots. Yes, CreateFile is the standard approach for best performance/flexibility (especially if you consider flags like FILE_FLAG_NO_BUFFERING and FILE_FLAG_OVERLAPPED).
Below are the detailed considerations:
1) Writing to Disk vs Memory: The 2.5 MB/s Rate
A) Disk Should Easily Handle ~2.5 MB/s
- Modern HDDs do 30–150 MB/s, SSDs are far faster.
- 2.5 MB/s is typically trivial. Even random writes at 40 KB chunks might be okay, but the best-case scenario is a decently contiguous file on disk.
B) Why Not Store in Memory?
- As you said, 8.64 GB/hour is feasible on disk, but not always feasible in memory if the user records for hours. You’d risk using all your 16 GB of system RAM. So disk is the correct solution.
2) Synchronous Write vs Overlapped I/O
A) Synchronous Write Approach
- Render thread finishes frame, obtains the ~40 KB snapshot data, calls WriteFile(...) (synchronous).
- If the OS can write it quickly, your thread returns in time for the next Present call.
- If the disk is momentarily slow, the write call blocks, the render thread might miss its vertical retrace scheduling → stutter in visuals.
Pros:
- Very simple code.
- No concurrency overhead, no buffer management.
Cons:
- Any disk stall >16 ms can cause a missed frame.
B) Overlapped (Asynchronous) Approach
- Render thread obtains the 40 KB snapshot, calls WriteFile with an OVERLAPPED structure. The OS returns immediately if the call is queued successfully.
- Meanwhile, the OS writes data to disk in parallel while your thread continues.
- You must keep that 40 KB buffer + OVERLAPPED struct valid until the write completes. Possibly use a ring buffer with a few entries so you can keep writing new frames without stalling.
- Before reusing that same buffer for the next snapshot, you wait on the previous write’s completion event or poll GetOverlappedResult with bWait=FALSE.
Pros:
- If the disk is normally fast enough, the thread never or rarely stalls.
- Rare slow writes might get hidden behind the next frame’s rendering/present.
Cons:
- More complex code: ring buffer of snapshot memory, overlapped data structures, etc.
- If the disk can’t keep up for many consecutive frames, eventually your ring buffer fills and you stall anyway.
3) Do a Measurement First
Your plan to “do synchronous I/O and gather min/avg/max time” is excellent:
- Set up a test scenario: 60 Hz rendering, 40 KB per frame => ~2.5 MB/s.
- Log each WriteFile call’s duration (endTick - startTick).
- If you see typical times well under 1 ms, you’re safe with synchronous.
- If you see occasional spikes of 50 ms or 200 ms, that might cause dropped frames. Then overlapped is more appealing.
Typical Realities
- On an SSD or a lightly loaded system, 40 KB writes are often <1 ms, especially if the OS does caching.
- On an HDD with other processes or random seeks, you might see occasional large stalls.
4) FILE_FLAG_NO_BUFFERING and FILE_FLAG_OVERLAPPED
A) FILE_FLAG_NO_BUFFERING
- Bypasses the OS file cache, reading/writing directly to/from disk. That requires sector-aligned buffers and sizes. This can improve or degrade performance, depending on the disk usage.
- Usually used in high-throughput servers or certain real-time apps. For your case, you might get better performance from letting the OS cache the writes in memory (the default), especially for many small ~40 KB writes.
B) Overlapped I/O
- If you do overlapped, you can combine it with or without NO_BUFFERING.
- If you do NO_BUFFERING, each write must be multiple-of-sector-size and aligned, which is more complexity.
- If you do overlapped with default buffering, you can just do 40 KB, the OS caches it, returns quickly, then flushes to disk asynchronously. That might be enough to avoid stutters.
Conclusion: Overlapped + default buffering is typically simpler. NO_BUFFERING might not help for small writes unless you’re extremely performance-critical and prepared to handle alignment constraints.
5) Implementation Approach
- Try Synchronous:
- CreateFile with normal flags, do WriteFile each frame.
- Measure durations. If it’s always <2 or 3 ms, likely no problem at 60 Hz.
- If you see random 100 ms stalls, you either accept the occasional stutter or move to asynchronous.
- If Stalls:
- Then do overlapped. Maintain, for instance, a ring of, say, 3–5 snapshot buffers. Each time you finish a frame, pick the next buffer, do WriteFile overlapped, queue the completion. If you loop back around to a buffer that’s still in progress, you stall until that I/O completes.
- Typically, you’ll never stall unless the disk truly can’t keep up with average throughput.
- (Optional) More advanced:
- If your throughput is borderline, consider chunking multiple frames into one larger write or compressing the snapshots in memory. But 2.5 MB/s is so small that it’s likely your disk can handle it with minimal overhead.
6) Memory Mapped Files?
You could also consider a memory-mapped file approach for writing. But for streaming large amounts of data, typical direct WriteFile calls are simpler and more predictable. The memory map might do some lazy writes or cause page faults. So not necessarily better for real-time streaming.
Conclusion
Your plan is exactly right:
- Start with synchronous WriteFile in normal buffered mode.
- Log your write durations.
- If they’re consistently short (likely on modern hardware), done—no complexity needed.
- If you see stutters or big outliers, consider overlapped writes with a ring buffer.
- CreateFile is indeed the standard method. FILE_FLAG_NO_BUFFERING can be beneficial in special contexts, but for a small 40 KB repeated write at 2.5 MB/s, the OS caching usually helps.
Hence you can proceed with confidence, measure, and only adopt asynchronous I/O if the real data demands it.