博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
CUDA application design and development
阅读量:4607 次
发布时间:2019-06-09

本文共 5178 字,大约阅读时间需要 17 分钟。

 

Author: Rob Farber

Published by Elsevier Inc

 

Foreword

Arguably, for any language to be successful, it must be surrounded by an ecosystem of powerful compilers, performance and correctness tools, and optimized libraries. --Jeffrey S. Vetter

 

Preface 

CUDA (Compute Unified Device Architecture)
harness those tens of thousands of threads of execution(I like this verb.)

 

Book organization

Chapter 1. Introduces basic CUDA concepts and the tools needed to build and debug CUDA applications. Simple examples are provided that
demonstrates both the thrust C++ and C runtime APIs. Three simple rules for high-performance GPU programming are introduced.
Chapter 2. Using only techniques introduced in Chapter 1, this chapter provides a complete, general-purpose machine-learning and
optimization framework that can run 341 times faster than a single core of a conventional processor. Core concepts in machine learning and numerical optimization are also covered, which will be of interest to those who desire the domain knowledge as well as the ability to
program GPUs. 

Chapter 3. Profiling is the focus of this chapter, as it is an essential skill in high-performance programming. The CUDA profiling tools are introduced and applied to the real-world example from Chapter 2. Some surprising bottlenecks in the Thrust API are uncovered. Introductory data-mining techniques are discussed and data-mining functors for both Principle Components Analysis and Nonlinear Principle Components Analysis are provided, so this chapter should be of interest to users as well as programmers.

Chapter 4. The CUDA execution model is the topic of this chapter. Anyone who wishes to get peak performance from a GPU must
understand the concepts covered in this chapter. Examples and profiling output are provided to help understand both what the GPU is doing
and how to use the existing tools to see what is happening. 

Chapter 5. CUDA provides several types of memory on the GPU. Each type of memory is discussed, along with the advantages and

disadvantages.
Chapter 6. With over three orders-of-magnitude in performance difference between the fastest and slowest GPU memory, efficiently using memory
on the GPU is the only path to high performance. This chapter discusses techniques and provides profiler output to help you understand and
monitor how efficiently your applications use memory. A general functor-based example is provided to teach how to write your own generic
methods like the Thrust API.
Chapter 7. GPUs provide multiple forms of parallelism, including multiple GPUs, asynchronous kernel execution, and a Unified Virtual
Address (UVA) space. This chapter provides examples and profiler output to understand and utilize all forms of GPU parallelism.
Chapter 8. CUDA has matured to become a viable platform for all application development for both GPU and multicore processors. Pathways
to multiple CUDA backends are discussed, and examples and profiler output to effectively run in heterogeneous multi-GPU environments are
provided. CUDA libraries and how to interface CUDA and GPU computing with other high-level languages like Python, Java, R, and FORTRAN are
covered. 

Chapter 9. With the focus on the use of CUDA to accelerate computational tasks, it is easy to forget that GPU technology is also a splendid platform for visualization. This chapter discusses primitive restart and how it can dramatically accelerate visualization and gaming applications. A complete working example is provided that allows the reader to create and fly around in a 3D world. Profiler output is used to demonstrate why

primitive restart is so fast. The teaching framework from this chapter is extended to work with live video streams in Chapter 12.
Chapter 10. To teach scalability, as well as performance, the example from Chapter 3 is extended to use MPI (Message Passing Interface). A
variant of this example code has demonstrated near-linear scalability to 500 GPGPUs (with a peak of over 500,000 single-precision gigaflops)
and delivered over one-third petaflop (1015 floating-point operations per second) using 60,000 x86 processing cores.
Chapter 11. No book can cover all aspects of the CUDA tidal wave. This is a survey chapter that points the way to other projects that provide free
working source code for a variety of techniques, including Support Vector Machines (SVM), Multi-Dimensional Scaling (MDS), mutual
information, force-directed graph layout, molecular modeling, and others. Knowledge of these projects—and how to interface with other
high-level languages, as discussed in Chapter 8—will help you mature as a CUDA developer.
Chapter 12. A working real-time video streaming example for vision recognition based on the visualization framework in Chapter 9 is
provided. All that is needed is an inexpensive webcam or a video file so that you too can work with real-time vision recognition. This example
was designed for teaching, so it is easy to modify. Robotics, augmented reality games, and data fusion for heads-up displays are obvious
extensions to the working example and technology discussion in this chapter.

转载于:https://www.cnblogs.com/JohnShao/archive/2012/10/29/2745575.html

你可能感兴趣的文章
20145202马超《JAVA》预备作业1
查看>>
云推送注意(MSDN链接)
查看>>
OpenMobile's Application Compatibility Layer (ACL)
查看>>
竞价广告系统-广告检索
查看>>
强哥PHP面向对象学习笔记
查看>>
[转]基于.NET平台常用的框架整理
查看>>
Symbian (Read Inbox)读取收件箱的内容
查看>>
良好的编程规范
查看>>
struts2 入门
查看>>
.net 编译原理
查看>>
mean 快速开发和现有技术的对比分析
查看>>
Metro Style app :浏览器扩展
查看>>
linux的kernel是怎样工作的(TI_DM36X_ARM系统)(1)
查看>>
[luogu4310] 绝世好题 (递推)
查看>>
[luogu3203 HNOI2010] 弹飞绵羊 (分块)
查看>>
-Dmaven.multiModuleProjectDirectory system propery is not set.
查看>>
Python2 unichr() 函数
查看>>
Python 字典 copy()方法
查看>>
Minimum Path Sum
查看>>
Remove Duplicates from Sorted Array II
查看>>