Sunday, April 13, 2014

C++11 Variadic Templates

In recent news, I've been revisiting my programming roots and tinkering with some of the latest C++11 has to offer. There are many new awesome features, including variadic templates. These templates give you the ability to define an arbitrary list of types in your template parameters. This removes the need to define N template parameters using specialization. Consider the following naive example tcontainer:

Besides being horrific to look at and a pain to maintain, it limits the implementer to, in our case 4, types. This type of implementation always reminds me the first time I realized that the Action<T> was implemented with 16 variations. Yikes.


Now there is a syntax that allows us to compress the N types down to a single "Packed Parameter" using the splat (...) syntax. Here is a very basic example:

Note that we are essentially dequeing the first element (T) and recursively (kind of) passing the rest of the arguments until there is only one argument left. The other wonderful thing about templates is if any T passed to the print method doesn't have an overloaded stream operator <<, then the error is compile-time. If we were to call this method, here's a very simplified view of how it might expand:

A moment ago, I mentioned that the call to print<T,Ts...> was kind of recursive. While it certainly has recursive appearance and qualities, the compiler is going to treat each variation as its own method. The most important aspect of meta-programming to remember is that templates are resolved at compile time. Personally, I'm constantly having to remind myself that template resolution must be known ahead of time.

Container (Tuple)

Let's take a look at another example, this time using a class/struct. For all intents and purposes, you could really call this implementation a basic tuple. The std::tuple is a much more complete design, but it can be quite intimidating to look at for learning purposes. So, let's create a basic container class which will store a list of Ts... instances. Just like in our previous example, we'll need to create two basic implementations:

  1. Link Implementation: The purpose of this implementation is to extract the front T node, and pass the remaining Ts... onto the next link.  

  2. Terminator Implementation: The purpose of this implementation is to halt the "recursion." In other words, we want to continue to sub-class our container<class T, class... Ts> implementation until we have one remaining T instance. This implementation specializes the container to do just that.  

Now, let's create a factory method so we can let type inference do it's thing.

Access value Fields

Now that we can create an instance of our container, you'll immediately notice that if you print out the value field, it will output the std::string instance in the first element of our container creation ("hello"). Since our container<T, Ts...> class uses the same field name (value) for each sub-class, then in order to extract a specific value of type T, we'll need pick the subclass that matches our desired value, cast our container, and then access the value field. This is somewhat difficult to explain accurately, so here's an example to clarify:

Indexed Approach

Based on the above code to extract the desired value from our container, it seems that an implementer would have to know a lot about the implementation details of the container in order to actually use it. Let's provide an additional utility that will allow us to specify an index representing which type we'd like to access. We'll need to create two more template classes such that we can dive into the subclasses, and reduce the index until we hit 0. Just like our terminator class for container, we'll also need a specialized template for the 0-case.

Here again, we're using the same pattern as we did in the first print example and with our container<T, Ts...> class, which is the use of "recursive" access. The termination with this case is dependent on the value of int index, which is decreased by 1 each depth. The idea is that once the index is 0, then the typedefs will point to the correct container type and T-type.

The last step into turning this into a generic utility for use with our container<T, Ts...> class is to create a get method that takes int index, class C template parameters, define the return type as typename container_index::type, and then perform the container_type lookup and static_cast in the method body and return the value. This will look something like this:


Now that we have our get method, we can use it to extract the specific values. Also, remember that templates are a compile-time construct, so incorrectly assuming a type will yield a compile-time error (a helpful one at that!"). Here's the example use:

Final Thoughts

Hopefully this walk-through has proven helpful if you are new to exploring the newer C++11 features. It's extremely important to note that I left out a lot of "conventional" C++ for the sake of clarity. I wouldn't suggest using the container<T, Ts...> for anything other than a guide. As mentioned previously, you should use std::tuple for serious use.

Admittedly, I still struggle a lot with C++ semantics, especially as it pertains to the newer features. This blog was a result of my struggle to find a clear set of examples for variadic templates, at least such that I felt comfortable using them. Currently, my plan was to use this feature to represent shader inputs, so if you needed to add a new input, the added T could propagate to any implementations, rather than be required to add support for the new type anywhere it was needed. Point being, don't limit the use of this amazing feature to an arbitrary type/value list!

All the code from the examples can be downloaded here: variadic_templates.tar.gz.

The specific version of the compiler I used for these examples:

I'm also nearly positive that Visual Studio 2013 will also compile without a problem.

Thursday, November 1, 2012

A Heap of Voxels

I'm a little late to the Voxel engine party, and it may not be the "cool" thing to do anymore, but it's definitely an interesting topic, so perhaps there is still some buzz floating around. I've been playing around with procedural mesh generation in Unity3D and thought I'd have a go at building a little Voxel "engine." I will make references to MineCraft (how could I not?), but consider claims relative to C#, Mono, and Unity3D; not Java.

During the initial world generation sandbox phase. 2.1 million dynamically generated triangles running at 80 fps. A testament to Unity's ability to make it rain.

Chunk It Up

When generating procedural environments or "worlds," I found one of the major issues to be temporary object creation. I'm not talking about wasteful code or unnecessary data copying; I'm talking about data that is absolutely required to generate the environment. For example, in a game like Minecraft, the world is generated as the player's position moves towards a location they have not traveled before. This data is then encoded and stored on the local disk. Should the player leave and return to that particular part in the world, the data would be loaded from disk instead of generated. Regardless of how the data is provided (generated or loaded), we can be sure that:

  • The amount of data per "chunk" is identical once it reaches application memory 
  • There are many more "chunks." Meaning, more memory is required. 

The process from generated/loaded "chunk data" to a rendered mesh in the environment involves the creation of locally scoped storage, copying, and potentially, many different threads containing their own stack and added overhead. As this data is pulled through the render pipeline, it's organized into data structures which, optimally yield the most efficient access. Most of the time, this means there's at least one occurrence of added overhead. It's up to the developer to determine whether or not that overhead is worth the gain from using such a data structure. That is the game you play when building this sort of application (most applications, really!), and this post is about my first struggle with finding that balance.

Thread it to Death

Sure of myself, my first approach to generating/building the environment was to create a pool of consumer threads and a pool of producer threads, and use a blocking priority queue for flow control. The producers would generate segments of the world as fast as they could and queue the chunk data in the priority queue (prioritized on the chunk position relative to the player position). If a consumer thread was available, it would grab the data and generate all of the geometry needed to create the Mesh. Once the consumer had generated the geometry for the mesh, it would dump that data back into a "special" queue where the main thread could dequeue the geometry and generate the mesh.

Sounds fast huh? Yeah, I wouldn't know... I received a heap error after a few seconds of execution.

WAT? I suppose it's better than "Unknown Error"

I spent a lot of time tweaking the thread counts, re-using previously allocated memory, switching a few classes to structs to reduce heap usage, and I could not figure out why the application was dying. I read a Unity forum post concerning the error I was experiencing, and all I could really conclude is that I was creating and storing too many things at once. Perhaps "small arrays?"

Garbage Man

Of course, after years of being bitten in the ass by nagging concurrency issues, I started there. Low and behold, switching to a single background thread would avoid the horrible heap error, but the performance was miserable. On the bright side, at least, at that point, I could fire up the Unity Profiler, and see that the state of the garbage collector was constantly in "Oh Shit" mode. The problem was right in front of my face the entire time, and it was the simplest phase of the entire pipeline: Geometry -> Mesh

Screenshot when I thought I was at 60 fps, but moments later, it took a nose dive.

Thinking back to my first approach, having each Thread generate a part of the world isn't really an over-the-top use of concurrency. It's not like I was creating hundreds of threads. I was simply using a pool of 3-4 consumer threads and that's it! Even after I switched to using single background thread, the garbage collector was still sucking wind! Why?

For the record, using a multi-threaded generation approach is problematic for other reasons as well. Specifically, when attempting to correctly implement an on-the-fly occlusion culling algorithm by comparing data in neighboring chunks. That's an entirely different topic, but worth noting here.

Geometry isn't Trash

Geometry generation from chunk data, explained simply:
  1. Figure out which faces aren't completely surrounded by other faces.
  2. Add the face vertices, normals, UVs, and triangles (ordered vertex indices) for that face to the respective buffers. 
  3. Create new Mesh instance and set vertices, normals, and UVs to the data in the buffer. 

Because the size of the buffer is unknown, I decided to use a List<Vector3> for the vertices, a List<Vector3> for the normals, a List<Vector2> for the UVs, and a List<int> for the triangles. The class looked like this: 

After I've already lead into the problem a bit, you can see where this is going. At the core of the problem, I was creating one of these ChunkMesh classes for each Chunk, it collected all of the face data, then at Mesh generation, copies each data set into an array and dumps it on mesh. Once it's on the Mesh, the ChunkMesh instace is no longer needed and is thrown away (awaiting garbage collection). The more you create, the more data has to be hauled to the dump, spread that same problem across multiple threads, and you're in for a real nasty result.

Update: I failed to mention this, but the ToMesh() method was called in a MonoBehaviour#Update(), which was explicitly pointed out in the post on the Unity forums as being a big "no no." That was the key to tracking down this issue in the first place.

Once the issue was resolved, Mesh generation was pleasantly speedy.

In closing, I've found that the best solution is to create a single buffer, limit it to the max vertices allowed for a single mesh (65536), and simply write the face data into the buffer, and copy it out, but retain the reference to that buffer so it can be reused. There is some light lifting involved with the buffering in terms of tracking the current vertex count and then using Array.Copy() to move the data around. The *big* tradeoff is that you shift the memory management responsibility to the Mesh cleanup, which Unity handles beautifully via the Destroy() method.

Using a single buffer also means that you're limited to single thread access as well. Refusing to see the light, I did try creating a ThreadLocal implementation and experimented with different thread counts, and while this resulted in a successful world generation (no more heap error), it was really CPU intensive, and seemed to take away a bit from the Unity performance.

Unity has proven to be an extremely dependable platform for building games. Most of the issues I have encountered were instances where I was trying to solve a problem that Unity had already solved (and solved 100x better than I ever could). While there are certainly more than a handful of circumstances where multithreading can be used, a single background job and/or intelligent use of Coroutines are usually more than enough to achieve *incredible* performance for any style of game.

Sunday, April 22, 2012

After Two Years: AS3 Vector.<T> Review

Let me start off initially by saying that this post is pretty much a rant with examples. On a given day, I complain a lot, mostly about complainers (not including myself), objectively stupid people, or arrogant people/organizations who make stupid choices (the mythical programming unicorns to which all evil code emerges).

Concerning objectively stupid people, If you wrote a blog post about "Why Adobe Killed Flash," you are stupid. There is a decline in Flash's popularity and all of a sudden, you think you're the Nostradamus of technology. No one cares that you lost your contracting job making banner ads for Viagra.

Concerning the evil mythical programming unicorns, this will be my topic for today. More specifically, the Flash Platform's Vector.<T> class and the mindless semantic differences between it and the Array class. I suppose this is a fairly out of date topic, being as the Vector.<T> class appeared in the Flash Player 10.0 release. However, I have not noticed any changes or improvements to the APIs since the 10.0 release, and thus the complaining commences:

Just a note, that I use the "shorthand" instantiation of both Array and Vector. If you're unfamilar with this syntax you can read more here, or here's a quick rundown:



The conversion from Array to Vector.<T> is done via a top-level conversion method, similar to XML(xml:String). This is a good choice in my opinion. This is done like so:


The conversion from Vector to Array doesn't exist as a utility in AS3. Um, ok. So, I guess we write our own. Since we want to be able to convert any T-type Vector, let's write a static helper function that accepts a * parameter:

There are problems with this, most notably:
  • No typing on the vector object, so it's going to execute substantially slower.
  • There's no type checking at compile time, so we could pass anything to this method and end up with unexpected run-time errors.

No typing on the vector object
The first bullet point could be solved if we get rid of the generic helper method, and just in-lined the conversion via an anonymous function or loop.

I personally like the anonymous function, just to make it clear that we're executing a conversion function on the Vector input and getting an Array output, but it might seem too "cute" for co-workers, so you may find work a more friendly place if you just use a for loop.

Unexpected run-time errors
The second bullet point, we can "solve" in a few ways as well. Let's say that my application has decent error handling, which requires a semi-intelligent use of Error objects when expected exceptions occur. Since the point is to avoid unexpected run-time errors, let's validate our input, and throw a more reasonable error, which can be caught and handled:

Using these new helpers, let's validate our original helper method before performing the iteration:


Why does it have to be so complex? Why is it so simple to convert an Array to a Vector, but not the other way around? The reason I have a problem with the lack of conversion is due to the nature of Vector and Array. Vectors are not only nice type-friendly collection objects, but they're also are very fast during iteration. They're also unlike Array in that they're strictly indexed (in order). Because of this, certain operations like element removal via splice() and shift() are slower using a Vector compared to with an Array. This makes sense because the Vector has the additional overhead of maintaining index order internally.

So, if I'm executing some sort of filtering logic on a collection of elements, it's typically best for me to stick with Array. However, in terms of creating a user-friendly API, I prefer strongly typed Vector structures.

What I am left with is either the decision to just stick with Array and avoid Vector, or implement two way conversions (which we've already seen is a bit of a headache). Perhaps there are better ways to go about this. Let's continue.



Array + Array concatenation is fairly simple, and probably what anyone would expect:

The concat() method will also accept multiple Array objects:


Vector + Vector concatenation is also fairly straightforward, and works similarly to Array:

And just like Array's concat(), we can specify multiple Vector.<T> objects to concatenate:

Pretty much the same, right? Let's go back to Array.


Array's concat() also let's you concatenate individual elements:

You can also use a combination of individual elements and Array objects:


For some reason, whoever wrote the Vector concat() method decided he/she would subtly rip-off the entire AS3 community by silently killing the ability to pass individual elements to the concat() method.

So, clearly you are only allowed to pass Vector objects with the same T-type to the concat() method. This seems wasteful:

We have to write three lines of code to create, copy, and append for Vector, while this can be done with 2 lines (preserving the original) with Array.


Again, we have a very similar set of APIs on two collection classes, and while the semantics of both seem identical (even after reading the documentation), they have subtle, yet annoying, differences.



I'm certainly a fan of the native array sorting that can be done in flash. It's very easy to use, and much faster than a homemade AS3 sorting function. Let's talk about the two sorting methods available on Array:

Custom Sorting Function
The sort() method allows you to pass a few different types of parameters, which you can read about in the AS3 Array sort() documentation, but for the sake of time, let's use the custom sort function method:


Not surprisingly, the Vector class has a sort() method as well, and would you believe that it takes the exact same parameters as Array's sort. At least, from what I can read in the ASDocs and from all my tests, they take the same type of arguments.

Alas, it seems the same lazy asshole who wrote the Vector concat() must have also copy and pasted the sort() method from Array because according to the Vector ASDocs, here's how I should use the sort options with the sort() method.

Of course! Now it's ok to pass Array data to a Vector method!


Being able to use a custom sort() function on Array a huge plus since the there may not always be a clear way to sort a collection of elements. Even though the guts of the sort() method run natively, if you use a custom sorting function, it has to be called on N elements, and obviously, the code execution isn't native. Either way, sort(compare) is still fast, and to make it possible to do a semi-custom sort, the Array APIs added an elegant and simple hook for sorting objects using their properties: sortOn():

From an AS3 perspective, this is such a powerful tool. Native speed, custom property hooks, and basic sorting strategies all implemented in one line of code that's immediately easy to understand.


Vector does not have the method: sortOn(). This forces us to use the sort() method, or manually swap things in AS3. I just don't understand!

Final Thoughts

I've been using the Vector class since it's addition to the flash APIs for the Flash Player 10 release, and I've definitely seen how much speed improvement you can get by simply just swapping in Vector for Array. Let's also not forgot how great it is to finally be able to associate a single type with a collection of elements. Let me also commend the Flash Platform team for taking a non-existing generics standard, and cutting a few corners to bring developers something that "was as close as possible" at the time.

However, in the past few years, the only real reason (unfortunately) I have used Vector is for type clarity, and because it didn't really matter whether I used Array or Vector. To me, Vector.<int> is a lot less confusing to look at than just plain ole Array. In most of the flash games we've written at Electrotank, we just don't use a lot of super heavy iteration like you'd see in the "300,000 pixel pushing particles demo," and we quickly noticed that swapping in Vector for every occurrance of Array mindlessly was NOT the way to go. In fact, the conversion problem I mentioned earlier will turn your code into spaghetti very quickly, and many times, we saw a slower performance (especially using splice() and shift()).

Of course, I don't cover all of the differences between Array and Vector in this article, as I've chosen to focus on the two that effect me the most. In my opinion, using a faster, strongly typed data structure shouldn't be a burden on the developer. In fact, I may go as far as to say that you should be able to pass a Vector anywhere you can pass an Array. At the minimum, a native conversion from Vector -> Array is needed.

My disappointment is in the lack of effort that has been applied to the Vector class in the past few years. The subtle differences between Vector and Array make them far too difficult to use together. Collections are supposed to be available to a developer to use for different circumstances, and in most cases (like Java for example), collections can be easily converted into another. As a product, you end up with well organized, high performing code.

Unfortunately, the Flash Platform didn't see it the same way.

The claims I've made in this post are based on my general experience with AS3 and the Flash Player 10 and higher. I'm a bit over the top at times, some of the views expressed may be over exadurated, so take it all with a grain of salt. It'd be unfair for me to dish out comments like this if I wasn't prepared to accept corrections or disagreement, so please feel free to set me straight, or disagree!

Saturday, May 21, 2011

JavaRay: Spot Lights

I've finally gathered enough time to starting writing some more JavaRay code, so I went ahead and finished the Spot Light implementation...

Here's a scene with two planes, a specular sphere, and 3 spot lights:

Here's the change-set:

The project is coming along quite nicely and it hasn't lost it's "fun" quality yet, but I'm looking for some ways to optimize. If you have some suggestions, drop me a line!

Matt Bolt

Monday, April 25, 2011

JavaRay Updates...

Just committed a fairly large update which you can see/read here:

I did manage to fix the issue where reflections were showing up on all sides of a scene object:

Next, I'm going to be working on some new types of objects, lights, and externally loaded scenes.

Feel free to fork the project and contribute:

-Matt Bolt

Sunday, April 17, 2011

JavaRay: Ray Tracing in Java

I've been tinkering around lately with Java, and trying to catch up on the last 8 years I've missed. I was first introduced to Java in 2002 at UNC-CH, but it was nothing like the Java today.

One thing I still find very interesting is that while Java's performance is exceptional, and the language itself is flexible and OO-centric, you don't see many games written in Java (besides the highly successful title, MineCraft). Perhaps my perspective is blurred, but nonetheless; I didn't understand why.

At Clemson University in 2004, I took a Computer Science course from a professor named Mike Westall in which we built a ray tracer in C. It was a very basic single-threaded application that took a "scene configuration" and image dimensions as parameters, then output a PPM file.

I thought re-writing this project in Java would be fun, so I started working on it this weekend, and here's what I have so far:

It supports both ambient and diffuse lighting. I plan on adding:
- specular lighting
- multithreading
- custom scene configuration

Made a few updates tonight including multi-threaded rendering (blog post to come) and specular lighting, although there is a bug in the specular lighting that I can't figure out. Here's the latest:

Anyone have ideas about why the objects reflect on the *other* side of the sphere? It should only be the objects between the sphere and eye point. Looks as if there's some incorrect calculation concerning the normal, but I've been unable to locate it. :(

Wednesday, January 19, 2011

Dusty Les Paul

So, I decided to pick up the old Les Paul during my lunch break today - Surprisingly, I can still play... kinda.