We are going to implement and study parallel versions of the Catmull Clark algorithm, using CUDA and running on the NVIDIA GTX 480’s.
We will use starter code from a pre-existing Computer Graphics (15-462) project. This starter code includes a Makeﬁle that includes the OpenGL libraries and ﬁles with vector, matrix, camera and scene classes and implementations.
Our main reference will be this paper:
Parallel View-Dependent Tessellation of Catmull-Clark Subdivision Surfaces
The main goal behind the project is to obtain high-quality visual results as quickly as possible, with an acceptable frame-rate and level of accuracy.
The challenges to such an implementation are as follows:
a) Are there ways to use shared memory on the GPU side to minimize memory latency/transfer of data between the CPU/GPU?
b) What data-structures would be most appropriate to represent the mesh data structures, keeping in mind considerations such as ease of access/construction (hopefully in parallel), memory usage, etc?
c) How much computation should be placed on the GPU/CPU sides respectively?
Plan to achieve:
1) Develop a working parallel model of the Catmull Clark algorithm, based on the aforementioned paper.
2) Match the performance results obtained in the paper and identify bottlenecks that limit speedup.
Hope to achieve:
1) Improve upon the implementation of the algorithm described in the paper by removing latency operations and optimizing aggressively.
2) Study a variety of other approaches, based on different data structures (simple arrays, STL maps, half-edge mesh, quad-trees) and approximating techniques (bicubic patches, feature-adaptive subdivision, shaders, bezier patches, etc).
3) Identify ways to handle extraordinary vertices, cracks, t-junctions and other abnormal mesh constructs efﬁciently, without comprising on visual results/performance.
We plan to use OpenGL and CUDA in our ﬁnal product. We will run our code on the GTX 480 cards in the Gates 5th ﬂoor cluster. Given the amount of computation required in the algorithm and the scope of parallelism, our choice of a GPU/CUDA platform is quite sound.
Week 1: (April 4-11) We initially were doing research in a variety of topics related to natural language processing, convex optimization and machine
learning. However, we abandoned the project concept when we realized that the combination of the amount of material that we would have to assimilate, along with the difﬁculties arising from working with a complex code-base, would make this an unfeasible project that we could deliver on time.
We will quickly do some research in the area and see which implementations of Catmull-Clark are most appropriate for usage on the GPU.
Week 2: (April 11-18) We need to have a serial implementation of the Catmull-Clark algorithm ready in the first few days of this week. We will perform some basic tests on increasingly complex geometries and do analysis to determine which steps in the algorithm are most timeconsuming.
We will also start working on an implementation of the CUDA version of the code, which we hope to have working by the weekend. This will be our ‘benchmark’ implementation while investigating other possibilities.
Week 3: (April 18-25) Having implemented algorithms that faithfully adhere to the algorithm, we need to start exploring other data-structures and approximating techniques that we mentioned above.
We’ll try to match these performance times (which are in real-time) for the meshes specified (given that the GPU’s used were GTX 280’s, this ‘replication’ may be a problem, but we’ll do our best).
The paper also said that their implementation had not been ‘aggressively optimized and suffered from several incoherences.
Week 4: (April 25-May 2): Study for the midterm and continue working on alternative approaches and measure their performance. We want to identify the precise reasons why certain approaches are better than others.
Week 5: (May 2-9): Finish documentation, compile results/images, complete our final proposal with appropriate data analysis, prepare final presentation.