Monday, April 24, 2006

more optimizations

In Draw:
replaced CRect::OffsetRect in Bounds calc with inline
replaced CRect::PtInPect in inner loop with inline (BIG difference)
only calculate iorg if making Curves

benchmarks:
hot rod, 1024 x 768, maximized (but not full screen)
default patch, fill & outline, speed = 20 "curve fill bench.whl"
total Draw time for 1000 frames at 25 FPS, in seconds
average ring count = 193

v1.1.01 v1.4.04
---------------
12.643 12.831
12.630 12.641
12.629 12.639
12.629 12.830
12.638 12.651
12.633 12.638
12.637 12.833
12.636 12.645
12.630 12.840
12.643
12.633
12.671
12.644
12.639

AVG AVG
12.634 12.698

1.4.04 has a noticeably larger deviation: most of the samples cluster around 12.64 but every fourth sample or so clusters around 12.83. No idea why! Generally the results are encouraging however. If we go by the averages, 1.4.04 is 64 microseconds slower per frame, which translates to an extra 1.6 milliseconds per second at 25 FPS. If we go by the worst case, 1.4.04 is 210 microseconds slower per frame, i.e. an extra 5.25 milliseconds per second (half a percent). Neither difference is likely to be significant.

Same exact test but without fill (still on hot rod):
v1.1.01 v1.4.04
---------------
0.428 0.413
0.428 0.413

For non-fill case, 1.4.04 is FASTER! Awesome. Presumably inlining PtInRect in the innermost loop made the big difference, let's see.

1.4.04 no fill, using CRect::PtInRect:
0.447
0.448

Yup, CRect::PtInRect was bad stuff. Not sure whether it was the function call or inefficiency within PtInRect itself, or maybe both.

An interesting question: why is the difference worse with fill/outline? Almost all of the added code gets executed regardless of draw mode. The only exceptions are the Convex test to decide rp.Color vs. PrevColor, and the (Curve || PrevCurve) test. Surely these can't account for 64..210 microseconds per frame?

per-ring curve decision benchmarks

Bottom line: The per-ring version (1.4.03) is very slightly faster than 1.4.02. The data shows a consistent improvement of between 5 and 10 microseconds per frame. This is the opposite of the expected result. Perhaps moving the curvature test into the main loop allowed the compiler to better optimize the initial pass (for trail)? It could also be a change in cache behavior.

#include "benchmark.h"
float sum;
int cnt;
void CWhorldView::Draw(HDC dc)
{
CBenchmark b;
.
.
.
sum += b.Elapsed();
cnt++;
if (cnt == 1000) {
CString s;
s.Format("%d %f %f\n", cnt, sum, sum / cnt);
AfxMessageBox(s);
}

total time (sum) in seconds for 1000 frames

default patch
1.4.02 1.4.03
--------------
1.448 1.436
1.448 1.436
1.442 1.435
1.442 1.439
1.446 1.440

default patch, speed and canvas scale at max
1.4.02 1.4.03
--------------
5.400 5.390
5.396 5.390

help changes for 1.4

add Odd Curve and Even Curve to parameters (done)
move Canvas Scale and Hue Loop Length from Options/General to Master (done)
change ReadFromPatch to Patch Mode and expand as needed (done)
update keyboard accelerators (done)

Sunday, April 16, 2006

using multimedia timer instead of windows timer

We can't use a multimedia timer all the time (as was suggested on the MFC forum), because it significantly increases CPU useage, e.g. 75% vs. 33%. Windows task-switching overhead is the most likely culprit, especially since using a custom timer thread instead of a multimedia timer produces identical behavior.

We could use a multimedia timer only during non-client modal states, but there's still a visible glitch, due to the phase difference between the windows timer and the multimedia timer. The difference varies from 0 to 1 timer periods, and I can't see any obvious way to avoid it. It's better than doing nothing, but it may have other side effects, so the SendMessage technique shown above may still be the best shot.

Friday, April 14, 2006

corrupt Mirror, Origin, Drawmode in patches

The first cases appeared on 10/08/2005. Many of the "frosty" patches had it, and the corrupt data propagated from them to other patches, via hybridization. All patches were fixed today. I can't replicate the behavior nor can I find any obvious cause in the current code.

Tuesday, April 11, 2006

closing aux frame displays file save dialog

In CMainFrame::DetachView, must remove aux view from document.

GetDoc()->RemoveView(m_AuxView); // remove aux view from our document

prevent non-client clicks from pausing app

The following works, provided the "Show window contents while dragging" system property is unchecked. The only side effects are a) left-clicking on the menu bar moves the cursor to the center of the menu bar (strange, but not really a problem), and b) close happens on button down instead of button up.

void CPersistDlg::OnNcLButtonDown(UINT nHitTest, CPoint point)
{
switch (nHitTest) {
case HTCLOSE:
SendMessage(WM_SYSCOMMAND, SC_CLOSE, 0);
break;
case HTCAPTION:
SendMessage(WM_SYSCOMMAND, SC_MOVE, 0);
break;
default:
CDialog::OnNcLButtonDown(nHitTest, point);
break;
}
}

void CPersistDlg::OnNcRButtonDown(UINT nHitTest, CPoint point)
{
switch (nHitTest) {
case HTCAPTION:
case HTSYSMENU:
SendMessage(WM_CONTEXTMENU, (LONG)m_hWnd, MAKELONG(point.x, point.y));
break;
default:
CDialog::OnNcRButtonDown(nHitTest, point);
break;
}
}

Wednesday, April 05, 2006

benchmarks for drawing AVI frames


test video: "Movie_051112214725 comp.avi"
Release mode
Master Rings = 0
Window size: 632 x 459
# samples: 1000 (40 seconds)
averages (times in seconds)

Dell / Windows 2000
---------------------
using BitBlt, SRCCOPY
total 0.032182 <- i.e. avg duration of DrawAviFrame
get frame 0.013667 (42.5%)
create bitmap 0.009647 (30.0%)
blit 0.008469 (26.3%)
misc 0.000400 (1.2%)

using StretchBlt, SRCCOPY
total 0.047449 <- exceeds timer period! 47% worse than BitBlt
get frame 0.013653 (28.8%)
create bitmap 0.009758 (20.6%)
blit 0.023656 (49.9%)
misc 0.000383 (0.8%)

using StretchBlt, SRCINVERT
total 0.059013 <- 148% of timer period! 25% worse than SRCCOPY
get frame 0.013646 (23.1%)
create bitmap 0.010457 (17.7%)
blit 0.034508 (58.5%)
misc 0.000402 (0.7%)

Hotrod / XP
---------------------
using BitBlt, SRCCOPY
total 0.004478 <- i.e. avg duration of DrawAviFrame
get frame 0.002263 (50.5%)
create bitmap 0.001238 (27.6%)
blit 0.000881 (19.7%)
misc 0.000096 (2.1%)

using StretchBlt, SRCCOPY
total 0.006912 <- 54% worse than BitBlt
get frame 0.002257 (32.7%)
create bitmap 0.001240 (17.9%)
blit 0.003314 (48.0%)
misc 0.000100 (1.4%)

using StretchBlt, SRCINVERT
total 0.008200 <- 20% of timer period, 19% worse than SRCCOPY
get frame 0.002267 (27.6%)
create bitmap 0.001239 (15.1%)
blit 0.004592 (56.0%)
misc 0.000102 (1.2%)

Conclusion:
Hotrod is 7.2 times faster, but StretchBlt mode is bad stuff.

#include "benchmark.h"
double blitsum;
double createsum;
double getfrmsum;
double totsum;
int samps;
...
totsum += b.Elapsed();
samps++;
if (samps == 1000) {
CString s;
double total = totsum / samps;
double getfrm = getfrmsum / samps;
double create = createsum / samps;
double blit = blitsum / samps;
double misc = total - (getfrm + create + blit);
s.Format("total\t%f\ngetfrm\t%f (%.1f%%)\ncreate\t%f (%.1f%%)\nblit\t%f (%.1f%%)\nmisc\t%f (%.1f%%)",
total,
getfrm, getfrm / total * 100,
create, create / total * 100,
blit, blit / total * 100,
misc, misc / total * 100);
AfxMessageBox(s);
}