UnitZeroOne

Avatar

A blog written by Ralph Hauwert, freelance developer, specialized in realtime visualisation, 3D and application development.

Flash 10, Massive amounts of 3D particles with Alchemy (source included).

pushingpixels

Pushing around +300.000 3D particles, realtime, on screen, using Flash ? No problem, if you are using Adobe Alchemy & PixelBender to compile and run your code!

During my session "professionally pushing pixels" at FITC Amsterdam this year, amongst other things, I talked about how to best utilize parts of the Flash Player to get top performance. This is one of the examples I showed. What you are seeing in this example, is +300.000 particles being 3D transformed, projected and draw to 2D. And it does so at quite a good framerate (well, it'll depend on your machine too).

So, how do we achieve this ? The answer is a combination of PixelBender and Alchemy.

First, let's look at the 3D transformation and projection.

Flash 10 has a number of native features to allow for 3D transformation and projection. You'll find that this is a combination of using the Vector, Vector3D, Matrix3D, PerspectiveProjection, etc. Although these features are great, we can't use them in combination with Alchemy easily. I'll explain why later, for now, let's look at an alternative method to do the projection.

Where oh where in the Flash Player do we have a method of doing very fast math ? The answer is; pixelbender! Although Pixelbender is normally used for image based manipulation, you can make it do any type of number-crunching which is able to be executed in parallel and without loops.

To calculate rotations and projecting our 3D data, we use Pixelbender in "ShaderJob" mode. When using pixelbender in image based mode, it operates in 8 bits per channel. Thankfully, when using it with a ShaderJob, it allows 32 bits precision per channel for the data processing. Since 8 bit precision wouldn't be enough for this example, we use a shaderjob.

The VertexProjector pixelbender kernel, included with the source is a simple way of transforming and projecting vertices (representing particles, in this case) in 3D space. We feed this kernel a bytearray of x,y,z paired data, and execute the shaderjob. It then returns the data as a bytearray, in px, py, pz format.

Drawing things to screen.

Now we have all the 2D projected 3D data, we need to draw things to screen, and we have to do so as quickly as possible. This step is traditionally called rasterization. In AS3, you're most likely to use getPixel when drawing on a per pixel basis. Doing so in a loop for 300.000 pixels turns out to be very slow. The solution for this would be to optimize that loop as much as possible. Either by writing your own bytecode, or maybe writing your own post-processor for you code, before you compile. But we don't have too, since Adobe Alchemy exists.

As you can read in my earlier post about Adobe Alchemy, I openly questioned why it was so speedy, as compared to regularly compiled ActionScript 3 code. Although the answer is rather complex, the combination of C Based code, the LLVM compiler and "Alchemy Virtual Memory" are the base of this. The large difference between Alchemy compiled actionscript and regular compiled Actionscript can be further explained by the regular AS3 compiler not doing any optimisation. This example shows off those performance increases.

One thing to worry about when using Alchemy in your ActionScript projects is marshalling. You can read Branden Hall's post on Alchemy for more info on that. Since we wouldn't be able to marshal 300.000 vertices from a Vector.<Number> in AS3 to our alchemy code, we need to find a better solution. This is exactly why we are using Pixelbender and more-over, the bytearray data.

It is possible to manipulate the memory Alchemy uses in the runtime. This memory is represented as an AS3 ByteArray object. If we directly write and get our data from this memory block, no marshalling is needed. Although this means not all things can be done this way, for some things, this can be very useful. For instance, getting large blocks of data, like images and bytearrays of coordinates.

Getting all these 3D particles to screen is simply 1 inner loop. While we would normally call setPixel for that, in Alchemy code, we don't have that luxury. Instead of that, we write directly to our screenbuffer memory, which is represented as a set of int's. Here, one more problem comes into play. Endianess, defines the byte ordering for a set of data. Alchemy uses little-endianess for it's internal memory representation. Specificall, it uses a small class called LEByteArray. This class extends ByteArray and ensures no changes are made to the endianess of the memory. Makes sense, since otherwise your code would blow up.

Writing to the screen is then a piece of cake. We take the alchemy processed data from it's memory, and write it to a bitmapdata using the formerly much less usable setPixels() command. It's amazing to see how fast this is.

Look at the example here, and download the full sourcecode here. As you can see from the example, the difference between doing this with regular ActionScript versus Alchemy nears a 5 fold speed increase.

Thanks to Keith Peters, for providing me with the 3D Strange Attractor code! And additional thanks to Mr.Doob for the stats object.

In future I'll be posting more demos of the technology. Amongst which there will be one appliance for the future version of Papervision3D, PapervisionX.

Digg, StumbleUpon, Slash and others:
  • Digg
  • del.icio.us
  • DZone
  • StumbleUpon
  • Slashdot
  • Technorati
  • blogmarks
  • BlogMemes
  • Ma.gnolia

Related posts:

  1. Another scream on Flash, Alchemy Memory and compilers.
  2. More play with Alchemy : Lookup table effects.

67 Comments, Comment or Ping

  1. you rock Ralph, incredibly impressive demo!

  2. oh my god…
    that’s freaking sweet! :D

    wonderful job

  3. nicely done. demonstrates the advantage of Alchemy really well.

  4. Ralph you own! :)
    nice job. Keep it up!

  5. Great research! More more! :D

  6. Awsome! Totally Awsome man.

  7. Nice! And there I thought i was being clever gettign 40k particles at ~20fps using shaders and flash 10: http://www.mikecann.co.uk/?p=384

    Hahaha! Ill have to update my little project with the things learnt here!

  8. Very nice Ralph, we need about 10x more particles for 1920×1600 :)

  9. Can’t wait to see first papervisionX demos :D

  10. Thanx Ralph!

    You keep on Rocking! We keep on reading ;)

  11. excellent work (as usual), ralph!

    one question — why not use the Vector methods of BitmapData in the non-alchemy branch, instead of get/setPixel? i’m sure it still wouldn’t touch the speed of alchemy, but it would almost certainly be faster to avoid the method call overhead…

    looking forward to seeing these improvements make their way into PVX!

  12. @Erisco : you are right, that would have been better. But in this case I just added the as version at the last minute, to show at least a halfway comparison of it. Would love to see anyone optimizing a actionscript version to the bone.

  13. here I see 4 fps alchemy version vs 2 fps as3. well, itdoes make it x2 faster, I guess :)

  14. Makc : what are your machine specs ? Double / single core ? Mem / speed ?

  15. Ralph, I’m gonna be using this technique on http://www.peternitsch.net/blog/?p=139. You’ve saved me a ton of work. ;)

  16. @peternitsch great work man! Looking forward. Let me know if you need a hand.

  17. piv 3ghz 1gb ram

  18. I’d check your flash player and machine install then. I got an IV 3.2 ghz here and it does 15 to 20 fps.

  19. Great job. I would like to contribute to such projects. How about 3D dynamic fractals?

  20. Alex : the source code is fully open and downloadable. For dynamic fractals, I’d move the point generation code to alchemy. 300.000 is probably a bit high then too, due to the iterative nature of most fractal generation.

  21. yonatan

    Very impressive! and great news for my fractal fetish. Check this one out: http://www.zozuar.org/pub/sierpinski-pusher/Main.swf

    Code is at http://www.zozuar.org/pub/sierpinski-pusher/src.tar.gz – it’s a bit slower than yours (does some z-buffering) and needs to be linked with papervision cause I was too lazy to make my spinner use native matrices.

    keep up the good work:)

  22. Incorporated your marshaling technique and went from 5,000 to 30,000 moving particles. Awesome stuff. http://www.peternitsch.net/blog/?p=166

  23. wow very cool stuff cant wait for PapervisionX.

  24. Impressive demo Ralph, even on my meager MacBook Air. I’m curious how well haXe would compare, as it gives access to the “Alchemy Virtual Memory” and does some optimizations, though I don’t know how that compares to Alchemy’s LLVM implementation.

    See:
    http://lab.polygonal.de/2009/03/14/a-little-alchemy-in-hx3ds/

  25. FrankPepermans

    Many thanks for this, was able to implement Pixel Bender and Alchemy in a similar way in my current project thanks to your explanation.

    Now Pixel Bender is the new bottleneck, let’s hope the next Flash release will improve on that :)

  26. oh my good!KICK ME!!!

  27. Impressive stuff! Alchemy can really improve performance. I did an old school plasma effect with it:
    http://www.rozengain.com/blog/2009/04/02/alchemy-experiment-incredibly-fast-plasma/
    The difference is amazing :-)

  28. Awesome!!!!

    I’ve done several Flash 3D particle things too (check my website), but I had to quit at around 10,000 particles. Thanks for posting this example – I think I’ll have to give Alchemy and Pixelbender a try!

  29. Woo! Good work man!

  30. Terry Corbet

    01. Many thanks for a good written explanation, for complete source code, and for code that is nicely formatted and easy to follow.

    02. If you have any inclination to extend this example, I have made some minor, incremental changes [which I would be happy to provide] for the primary purpose of testing my understanding of the subtle interactions between the various pieces of code. As a side benefit, this also helps to increase one’s understanding of the throughput that can be expected. On this 4-core, 2.4GHz XP workstation, I can easily get greater than 30FPS even after introducing the time-consuming change of re-initializing tBuffer in each rasterize step.

    03. The primary surprise from your helpful tutorial is to find that no costly code — trigonometric functions – has been offloaded from the Flash Player execution stack to the Alchemy stack. Without asking you to give away all your trade secrets, can you make any comments about whether or not the Pre-10 3D engines will be able to utilize Alchemy to improve the performance of their central code for projecting whatever sorts of polygons they have in their core architecture onto the screen?

    04. Perhaps you chose this example for this very reason, and it is a wonderful example, but is there anything similar you might show using Alchemy and/or PixelBender to address Z-sorting issues?

  31. @Terry Corbet

    01. Thanks, good to hear you found it useful.

    02. The interactions between pieces of code and how they could work together more optimized are greatly dependent on what you want to do. In this case, this is one possible path to follow, but there are other, better optimized options out there. (Like using a linked list in C code, etc).

    03. In 10, projectVectors and transformVectors is really fast. Doing huge chunks of the same type of transformation using pixelbender seems to be similarly speedy. Doing that in the Alchemy code might slow things down (for instance, using math.h sin() is damn slow, but there would be some work arounds, inlining code using the __asm op.) PapervisionX is going to use some of these tricks, but the engine architecture would allow you to do a full alchemy version too.

    04. Well, the sorting can be easily fixed using alchemy and a z-buffer next to the pixelplotting routine. I think someone did one.

  32. Hi Ralph, this work has been inspiration for me for the past few months, and rekindled my fascination of strange attractors + optimizing as3 – anyways cutting to the chase I’ve managed to replicate this with pure haXe and the results are pretty sweet:

    http://webr3.org/blog/haxe/flash-10-massive-amounts-of-3d-particles-with-haxe/

    regards!

  33. Hi there !
    Cngrats for everything I see that is so inspiring !!
    Well I’m really in the way to play with alchemy but I have a silly question before :
    where can I get the cmodule.pixelpusher.CLibInit class ????
    Do I need to downlaod other specific libraries (like such at the alchemy adobe lab page) ???

    thanks !!!

  34. @benjamin : you need to put the SWC path in for the compiler. Depending on what you use (Flash, Flex, or just plain compiler), this path changes. At least the classes are in the SWC.

  35. wow, that’s actually… well unbelievable.
    300 000 in a brawoser window…. no way.
    well it seems it works, even in this crappy Internet cafe machine…

  36. Nikos

    Wow just the techy hardcore blog I was looking for :)

  1. PapervisionX - Blog - Mar 21st, 2009

Reply to “Flash 10, Massive amounts of 3D particles with Alchemy (source included).”

Search

Site hosted by :