FFFF323-src

所属分类:OpenGL
开发工具:Visual C++
文件大小:124KB
下载次数:1
上传日期:2008-10-29 22:22:52
上 传 者lidawei
说明:  VC中对GPU编程的程序,对学习GPU很有帮助
(VC of GPU programming procedures, GPU very helpful in learning)

文件列表:
FFFF323-src (0, 2006-03-18)
FFFF323-src\calcPixelRow.s (3283, 2006-02-27)
FFFF323-src\calcPixelRowR8000.s (5289, 2006-02-26)
FFFF323-src\extensions.cpp (14475, 2006-03-18)
FFFF323-src\extensions.h (7211, 2006-03-02)
FFFF323-src\FFFF.xcode (0, 2006-03-18)
FFFF323-src\FFFF.xcodeproj (0, 2006-03-18)
FFFF323-src\FFFF.xcodeproj\DDT.mode1 (38982, 2006-03-09)
FFFF323-src\FFFF.xcodeproj\DDT.pbxuser (17948, 2006-03-09)
FFFF323-src\FFFF.xcodeproj\project.pbxproj (19224, 2006-03-09)
FFFF323-src\FFFF.xcodeproj\skortze.mode1 (38559, 2005-08-02)
FFFF323-src\FFFF.xcodeproj\skortze.pbxuser (7109, 2005-08-02)
FFFF323-src\FFFF.xcodeproj\stevenkortze.mode1 (38543, 2005-08-03)
FFFF323-src\FFFF.xcodeproj\stevenkortze.pbxuser (5807, 2005-08-03)
FFFF323-src\FFFF.xcode\DDT.mode1 (36349, 2004-12-22)
FFFF323-src\FFFF.xcode\DDT.pbxuser (26726, 2004-12-22)
FFFF323-src\FFFF.xcode\project.pbxproj (13226, 2004-12-22)
FFFF323-src\FFFF3.cpp (71990, 2006-03-09)
FFFF323-src\FFFF3.sln (869, 2006-02-27)
FFFF323-src\FFFF3.vcproj (6919, 2006-02-28)
FFFF323-src\FragmentProgram.cpp (251, 2003-02-14)
FFFF323-src\FragmentProgram.h (333, 2003-02-14)
FFFF323-src\FragmentProgramARB10.cpp (11332, 2006-03-01)
FFFF323-src\FragmentProgramARB10.h (984, 2006-02-17)
FFFF323-src\glati.h (72967, 2004-12-22)
FFFF323-src\glext.h (153806, 2005-07-31)
FFFF323-src\GPUProgram.cpp (203, 2003-02-15)
FFFF323-src\GPUProgram.h (1357, 2006-03-01)
FFFF323-src\Makefile.irix (1701, 2006-03-01)
FFFF323-src\Makefile.linux (1688, 2006-03-02)
FFFF323-src\PixelBuffer.cpp (2522, 2006-02-17)
FFFF323-src\PixelBuffer.h (875, 2004-12-16)
FFFF323-src\RowData.h (216, 2002-04-05)
FFFF323-src\RowInfo.h (158, 2002-09-28)
FFFF323-src\VertexProgram.cpp (241, 2004-12-15)
FFFF323-src\VertexProgram.h (325, 2003-02-14)
FFFF323-src\VertexProgramATI.cpp (8964, 2006-03-02)
FFFF323-src\VertexProgramATI.h (762, 2005-07-31)
FFFF323-src\VertexProgramNV.cpp (4116, 2006-03-02)
FFFF323-src\VertexProgramNV.h (868, 2003-02-15)
... ...

========================================================= FFFF - Fast Floating Fractal Fun v3.2.3 Author: Daniele Paccaloni (daniele.paccaloni@dylogic.com) Project page: http://www.sourceforge.net/projects/ffff ========================================================= FFFF is the fastest mandelbrot generator. It uses brute force, every pixel you see has been calculated. FFFF is GNU GPL OpenSource. HISTORY ------- v3.2.3 - SGI IRIX support. Requires IRIX v6.5 with one or more MIPS4 ***bit CPUs. Be sure to install glut to run/compile (get it from SGI freeware CD or web site). - The new code paths for MIPS4 processors are 2: 1) Code path for generic R1x000 CPU (single pixel). 2) Code path for R8000 dual FPUs (dual pixel), also used by default on R1x000. - To compile on IRIX you need MIPSpro, the glut library, and gmake (use "gmake CFG=release"). If you want to test with gcc, be my guest. Note that you need IRIX v6.5 because FFFF wants UNIX pthreads. - SSE code has been optimized by Peter Kankowski (MOV-ADD method). SSE2 code was also optimized to reflect Peter's SSE one. The new code is faster on Intel (seems that on the AMD there no visible improvement). - Fragment Program is now supported on OS X, thanks to Steven Kortze. - Linux support, just because you really want it. Thanks to Richard Rauch for the port hints. The port runs fine on Fedora Core 2, sorry I have no time to test on the other zillions Linux versions. GPU is only supported as Fragment Programs (no Vertex Programs yet). The gcc FPU Asm has been disabled because it does not work as expected. You want it ? you fix it. - Mac x86 machines should now be supported thanks to Steven Kortze. The gcc FPU Asm has been disabled because it does not work as expected. - Mac code now compiles as universal binary (PowerPC and x86). - Fixed detection of number of available CPUs. - Fixed incompatibility with 3dNow! (cmove is not supported on older CPUs) thanks to falco[SCT]. - Fixed OpenGL pixel centering problem with some boards (thanks to Richard Rauch). - Fixed x86 FPU routine which left the FP stack trashed on return and then propagate denormals (so slow on Intel). - Fixed OpenGL arrays ext detection bug which caused them to be disabled with OpenGL 2.0 drivers. - Projects for different platform reorganized and ported to latest IDEs. v3.2.2 - At last Apple Macintosh OS X support. - The new code paths are 2: 1) Code path for PowerPC FPU. 2) Code path for PowerPC AltiVec. - Please note that I could not get the OpenGL extensions working on the Mac. This means that the Mac version does not yet support GPU acceleration. Any help on this issue is greatly appreciated. - Scrambled the code in such a way that you were wrong when you were saying that the previous version was a mess. At last we have total chaos in the source code too ! - Any suggestion is welcome since I had never touched a Mac before coding this version ;) v3.2.1 - Optimized Fragment Program on ATI R3xx processors (thanks to Benjamin Lipchak of ATI). Now we reach 1 GigaIters/sec on a plain Radeon 9700 (40% speed increase). Also, the new scheduler can fit up to 30 iterations per pixel on an R300 ! *NOTE* If you have problems with the new code, please update your ATI board drivers. - Added a zoom-reset key (press 'r'). Many users complained of having to close and reload the program (slow HD ? :). v3.2.0 - Added GPU OpenGL ARB Fragment Program calculation (couldn't yet get less than 3 instructions per iter on the ATI R300) [DP]. - Added a decent mouse center zoom feature. [DP] - Benchmarking fixes [DP]. - Minor fixes [DP]. v3.1.2 - Optimized SSE calculation code [GB]. - Added SSE2 support (DualPixel, double precision) [DP]. - Calculation mode keys are changed (see help text), press '0' for GPU calc [DP]. - Minor fixes [DP]. v3.1.1 - Fixed a *stupid* bug that caused crashing of benchmarks on Intel CPUs (sorry !!) [DP]. - Fixed a *stupid* bug that caused crashing of benchmarks on systems without a GPU (sorry again !!) [DP]. - Documented the new orbit draw mode (press o and move the mouse in single buffer mode) [DP]. v3.1.0 - Added AMD 3DNow! dual pixel calculation [GB]. - Added Radeon GPU pixel calculation using GL_VERTEX_SHADER_EXT (see notes). Maybe it works on other GPUs, tell me ! [DP]. - Added experimental GPU benchmarking (does not work on my Radeon 9000..) [DP]. - Slightly improved speed of SSE2 quad pixel calculation by constant data alignment on *** bytes boundaries [DP]. - Slightly improved speed FPU code. - Code restructured to resemble an object oriented program (quite not yet :). - Fixed minor bugs [DP, thanks for your bug reports]. v3.0.0 Reborn. In 1994 FFFF v2.0 was the first program to live in TrueColorFractals-Land, and with the fastest mandel algorithm available. It made users scream of joy and sorrow, due to the beauty of the images and the incompatibility problems caused by its raw coding (100% machine code, 5 KB executable). Now it comes back after 8 years - version 3.0.0 - back at the top of technology, speed, and... hopefully compatibility :) INTRO ----- Thanks to the GPU optimized code, and to the "QuadPixel" algorythm running on all CPU supporting SSE instructions (P3, AthlonXP or better) or AltiVec instructions (PowerPC G4 or better), and thanks to the multiprocessor support, FFFF claims to be the fastest Mandelbrot generator. On a fast machine you should get realtime zooms for a while. Those of you having a CPU that doesn't support SSE, will get a fast dual pixel 3DNow! routine. For archaic machines, there's still the classic monopixel FPU algorythm. If it seems slow: 1) You know OpenGL ! You need a fast board (or new driver). We are drawing all image dots, one by one, using GL_POINTS. 2) OpenGL rendering can be the boottleneck! Try zooming in a bit, increase maxiters and think again. 3) There are no zoom/move tricks. All frames you see are entirely calculated. It's *not* fake or guessed. 4) Use a huge window for final renderings only. It's not Quake ! SSE/AltiVec calculation: QuadPixel calculation means computing 4 pixels at the same time using SSE instructions. The algorythm does not access memory while calculating all 4 pixels (nice tricks, please look at the src !). Forgive me if the PowerPC code is not yet optimized. SSE2 calculation: DualPixel calculation means computing 2 pixels at the same time using SSE2 instructions. The algorythm does not access memory while calculating all 2 pixels (nice tricks, please look at the src !). What will you gain using SSE2 instead of SSE ? Speed: NO. It's just a half than that of SSE. Precision: YES. You will be able to zoom deeper than SSE. So, use SSE2 only when the zoom level renders artifacts using SSE. 3DNow! calculation: Thanks to Gérard Basler, we have a DualPixel 3DNow! calculation routine. 3DNow! is single precision, just like SSE but using dual pixel loops. FPU calculation (x86): Based on that of FFFF v2.0, it's a bit faster but i haven't got time for a pipeline tuning analysis. Didn't yet have time to check if the one of AMandel is faster (thanks to Amichai Rothman) but from a raw comparison of the two programs i'd say that FFFF is 25% faster on my Athlon. FPU calculation (PowerPC): Nothing special. This is the first code I wrote on the PowerPC. Forgive me if it is not yet optimized. A dual-pixel code path should be added for fair comparisions against MIPS. FPU calculation (MIPS): Comes in two flavours: 1) Generic single-pixel calculation for MIPS4 ISA (*** bit) R1x000. Nothing special, and is quite straightforward. 2) R8000 optimized dual pixel (R8010 FPU) calculation. This code uses 2x unroll to reach 25% theoretical-peak of the R8010. Alas, the dual FPUs should really deliver 50% which is the R8010 real-world peak (when not using MADDs only), Seems we cannot hide the *huge* 4 cycles ADD/SUB latencies in the short mandel loop. A future quad-pixel version with 4x unroll should reach 50%. The R8010 is a great FPUs but the lame 4 cycles latency for ADD/SUB is a showstopper. So far we only have 7.5 MegaIters/sec on my R8000 75MHz. GPU Vertex Program calculation: If you've got an OpenGL board with a GPU supporting Vertex Programs, please check the experimental GPU (Graphics Processing Unit) calculation routine based on that of Erik Lindholm of nVidia (mine was almost identical: i've included Erik's one but with my coloring calc). It runs fine on GeForce3, GeForce4, Radeon 9x00, nForce (SW emulated... colors are different!). Please note: -1) This feature is NOT supported on the Mac (could not get these OpenGL extensions working). 0) Uses standard Vertex Programs, so it is capable to run on older boards too. 1) Does not use branches nor early exit. 2) Coloring the pixel is tricky so that the normal palette won't be available (btw, i love that coloring). 3) Maxiters (max possible iteration limit) is 60 on the GeForce 3/4, 125 on the Radeon R300 (beta drivers). 4) You'll see CPU usage even if calculating with the GPU, please note that we must still send data to it ! 5) Yes, many GPUs are SLOWER than P4s or Athlons. 6) If the program hangs using more than 12 iters on the Radeon, you should get the latest drivers from ATI (current beta drivers are OK!). GPU Fragment Program calculation: If you've got an OpenGL 1.3 board with a GPU supporting ARB Fragment Programs, please check the experimental GPU (Graphics Processing Unit) calculation routine. It runs fine on GeForce3, GeForce4, Radeon 9x00, nForce (SW emulated... colors are different!). Please note: 0) Needs a GPU floating point Fragment Program support (currently the ATI R300, don't know if it works on the NV30). 1) Does not use branches nor early exit. 2) Coloring the pixel is tricky so that the normal palette won't be available (btw, i love that coloring). 3) Maxiters with HW native (not emulated) is 28 on the R300, much more on the NV30. 4) You shoudn't see any CPU usage while calculating with Fragment Programs. 5) A good GPU is much FASTER than current P4 or Athlon when using Fragment Programs. FEATURES -------- - 100% asm optimized AltiVec quad pixel calc (calculates 4 pixel simultaneously at single precision) - 100% asm optimized SSE quad pixel calc (calculates 4 pixel simultaneously at single precision) - 100% asm optimized SSE2 dual pixel calc (calculates 2 pixel simultaneously at double precision) - 100% asm optimized 3DNow! dual pixel calc (calculates 2 pixel simultaneously at single precision) - 100% asm optimized x86 FPU per pixel calc (fast FPU at double precision) - 100% asm optimized PowerPC FPU per pixel calc (fast FPU at double precision) - 100% asm optimized MIPS R1x000 FPU per pixel calc (fast FPU at double precision) - 100% asm optimized MIPS R8000 FPU dual pixel calc (fast FPU at double precision) - 100% GPU asm (Fragment Program) calc (experimental GPU rendering), tested on ATI and nVidia cards. - 100% GPU asm (Vertex Program) calc (experimental GPU rendering), tested on ATI and nVidia cards. - Benchmarking of CPU and GPU (experimental) performance (test your pc floating point performance) - OpenGL support (realtime zoom with anti-flicker double buffer support) - OpenGL 1.1 vertex arrays support for fillrate improvement - Multiprocessor support, up to [insert your number here] CPUs :) INSTRUCTIONS ------------ Left mouse button: Zoom in Right mouse button: Zoom out Middle mouse button: Move Keys (depends on platform, check console after running the program): 1: Lame FPU computation, C code. 2: Fast FPU computation, 100% machine code. 3: Quadfast SSE computation, 100% machine code. 4: Dualfast SSE2 computation (double precision), 100% machine code. 5: Dualfast 3DNow computation, 100% machine code. 9: Experimental GPU Fragment Program computation (OpenGL 1.3 ARB only)! 0: Experimental GPU Vertex Program computation (nVidia or ATI cards only)! d: Toggle double/single buffer (you flicker haters :). May not work on some systems. +,-: Inc/Dec max iters (press shift for +/-20). /,*: Rotate palette (please toggle double buffer if this does not work). h: Shows this help. o: Draw orbits in realtime (single buffered mode only; not supported in GPU mode; keep 'o' pressed and move the mouse). r: Reset zoom position. b: Speed benchmark in current mode (resets max iters to 40). See result in the console. Please be sure that CAPSLOCK is off. HINTS ----- - The program starts in single buffering mode. Please press "d" if you have a fast system and want to enjoy the realtime zoom. - The SSE/3DNow!/Altivec/GPU routines are fast but uses single float precision. When you run out of zoom precision (you'll know when you see it), switch to SSE2, or FPU mode if you haven't got SSE2 (zoom up to 10^15 times). - Please note: the current image is never cached. If you touch something or move/raise/resize the window, FFFF will recalculate the entire image. - You can always get realtime interactive render rates shrinking the window size :) When you get to the interesting zone, render full screen. - On deep zooms with high iterations, please switch to single buffer mode to see the rendering in progress. Otherwise you won't see anything till the render is complete ! Use double buffering only at interactive rates. - Don't touch that lever ! :) - When you are lost in the dark, all alone, press the '1' key. If it doesn't work, close the program and restart. HOW CAN YOU COMPILE THE CODE ON YOUR [put any esotic name here] PLATFORM ? -------------------------------------------------------------------------- If you can't figure out yourself, just don't bother ;) COMING SOON ----------- 1) You will get a color editor. By now, the AutoColorizer AI algorithm chooses the color YOU would have chosen !! :) Ahem, i haven't got time to code the color editor. Since the program is OpenSource, i'd like one of you to code it ! 2) Zoom with autoswitch to best calc mode for the specified zoomlevel. 3) Multiprocessor benchmarking (thanks to Jean-Philippe Perois). REQUIREMENTS ------------ I tested it on: OS: Win2000 / XP / Mac OS X 10.4.x / IRIX 6.5 CPU: 1xP3@800 / 2xP3@866 / 1xAthlonXP@1433 / 2xAthlonXP@1433 / 1xP4@2533 / 1xP4@3400_prescott / 1xP4@3000_HT / 2xCodeDuo@1800 / 2xG5 / 1xR8000 / 1xR12000 SVGA: GeForce6800 / GeForceFX / GeForce3 / GeForce4 / nForce420 / ATI Rage M3 / ATI Radeon 9x00 / ATI Radeon X1600 / Matrox G550 Should run on most configurations, please report problems or succes on different configs, expecially on the OS X and IRIX platforms. THANKS TO ---------- [See FFFF3.cpp source code]. BYE --- Feel free to contact me (daniele.paccaloni@dylogic.com). Please report any bug (using the project home page on SourceForge if possible). Send me X if you want X support. And use FFFF as a benchmark, too (press that 'b' button and post the results) ! Daniele Paccaloni BENCHMARK EXAMPLES ------------------ [IRIX: SGI Octane2 1xR12000 @400MHz] FFFF v3.2.3 BENCHMARK (Using 1 CPU, no render) size: 500*500 maxiters: 9999 rangex: -2.00 to 1.00 rangey: -1.50 to 1.50 [4f] Vector benchmark: Not supported. [2d] R8000 dual FPU benchmark: 6.680 sec 63.084 MegaIters/sec [2f] 3DNow! benchmark: Not supported. [1d] FPU ASM benchmark: 8.496 sec 49.597 MegaIters/sec [1d] FPU C benchmark: 10.676 sec 39.472 MegaIters/sec [4?] GPU VertexProgram benchmark (beta! maxiters=10) on VPRO/A/32: Required extension (GL_EXT_vertex_shader) not supported. Not supported. [4?] GPU FragmentProgram benchmark (beta! maxiters=20) on VPRO/A/32: Not supported. [LINUX: HP Pavilion zd7000 1xP4HT Northwood @3000MHz] FFFF v3.2.3 BENCHMARK (Using 1 CPU, no render) size: 500*500 maxiters: 9999 rangex: -2.00 to 1.00 rangey: -1.50 to 1.50 [4f] SSE benchmark: 0.795 sec 529.242 MegaIters/sec [2d] SSE2 benchmark: 1.511 sec 278.245 MegaIters/sec [2f] 3DNow! benchmark: Not supported. [1d] FPU ASM benchmark: 0.000 sec 0.000 MegaIters/sec [1d] FPU C benchmark: 3.588 sec 117.181 MegaIters/sec [4?] GPU VertexProgram benchmark (beta! maxiters=10) on GeForce FX Go5700/PCI/SSE2: 3.041 sec 1***.424 MegaIters/sec [4?] GPU FragmentProgram benchmark (beta! maxiters=20) on GeForce FX Go5700/PCI/SSE2: Maximum number of FP ALU instructions: 1024 Maximum number of FP native params: 1024 FP is hardware native (63 ALU instructions). 6.845 sec 146.083 MegaIters/sec [WIN32: Daniele's "acme" 2xAthlonXP @1480MHz] FFFF v3.2.3 BENCHMARK (Using 1 CPU, no render) size: 500*500 maxiters: 9999 rangex: -2.00 to 1.00 rangey: -1.50 to 1.50 [4f] SSE benchmark: 1.625 sec 258.7*** MegaIters/sec [2d] SSE2 benchmark: Not supported. [2f] 3DNow! benchmark: 2.572 sec 163.438 MegaIters/sec [1d] FPU ASM benchmark: 4.250 sec ***.941 MegaIters/sec [1d] FPU C benchmark: 5.343 sec 78.701 MegaIters/sec [4?] GPU VertexProgram benchmark (beta! maxiters=10) on GeForce 6800 Ultra/AGP/SSE/3DNOW!: 0.844 sec 592.450 MegaIters/sec [4?] GPU FragmentProgram benchmark (beta! maxiters=20) on GeForce 6800 Ultra/AGP/SSE/3DNOW!: Maximum number of FP ALU instructions: 4096 Maximum number of FP native params: 1024 FP is hardware native (63 ALU instructions). 0.410 sec 2440.431 MegaIters/sec

近期下载者

相关文件


收藏者