Language:
English
繁體中文
Help
回圖書館首頁
手機版館藏查詢
Login
Back
Switch To:
Labeled
|
MARC Mode
|
ISBD
Intelligent Scheduling and Memory Ma...
~
Lee, Shin-Ying.
Linked to FindBook
Google Book
Amazon
博客來
Intelligent Scheduling and Memory Management Techniques for Modern GPU Architectures.
Record Type:
Electronic resources : Monograph/item
Title/Author:
Intelligent Scheduling and Memory Management Techniques for Modern GPU Architectures./
Author:
Lee, Shin-Ying.
Published:
Ann Arbor : ProQuest Dissertations & Theses, : 2017,
Description:
161 p.
Notes:
Source: Dissertation Abstracts International, Volume: 79-01(E), Section: B.
Contained By:
Dissertation Abstracts International79-01B(E).
Subject:
Computer engineering. -
Online resource:
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10617259
ISBN:
9780355159783
Intelligent Scheduling and Memory Management Techniques for Modern GPU Architectures.
Lee, Shin-Ying.
Intelligent Scheduling and Memory Management Techniques for Modern GPU Architectures.
- Ann Arbor : ProQuest Dissertations & Theses, 2017 - 161 p.
Source: Dissertation Abstracts International, Volume: 79-01(E), Section: B.
Thesis (Ph.D.)--Arizona State University, 2017.
With the massive multithreading execution feature, graphics processing units (GPUs) have been widely deployed to accelerate general-purpose parallel workloads (GPGPUs). However, using GPUs to accelerate computation does not always gain good performance improvement. This is mainly due to three inefficiencies in modern GPU and system architectures.
ISBN: 9780355159783Subjects--Topical Terms:
621879
Computer engineering.
Intelligent Scheduling and Memory Management Techniques for Modern GPU Architectures.
LDR
:03629nmm a2200349 4500
001
2126984
005
20171128112459.5
008
180830s2017 ||||||||||||||||| ||eng d
020
$a
9780355159783
035
$a
(MiAaPQ)AAI10617259
035
$a
AAI10617259
040
$a
MiAaPQ
$c
MiAaPQ
100
1
$a
Lee, Shin-Ying.
$3
555496
245
1 0
$a
Intelligent Scheduling and Memory Management Techniques for Modern GPU Architectures.
260
1
$a
Ann Arbor :
$b
ProQuest Dissertations & Theses,
$c
2017
300
$a
161 p.
500
$a
Source: Dissertation Abstracts International, Volume: 79-01(E), Section: B.
500
$a
Adviser: Carole-Jean Wu.
502
$a
Thesis (Ph.D.)--Arizona State University, 2017.
520
$a
With the massive multithreading execution feature, graphics processing units (GPUs) have been widely deployed to accelerate general-purpose parallel workloads (GPGPUs). However, using GPUs to accelerate computation does not always gain good performance improvement. This is mainly due to three inefficiencies in modern GPU and system architectures.
520
$a
First, not all parallel threads have a uniform amount of workload to fully utilize GPU's computation ability, leading to a sub-optimal performance problem, called warp criticality. To mitigate the degree of warp criticality, I propose a Criticality-Aware Warp Acceleration mechanism, called CAWA. CAWA predicts and accelerates the critical warp execution by allocating larger execution time slices and additional cache resources to the critical warp. The evaluation result shows that with CAWA, GPUs can achieve an average of 1.23x speedup.
520
$a
Second, the shared cache storage in GPUs is often insufficient to accommodate demands of the large number of concurrent threads. As a result, cache thrashing is commonly experienced in GPU's cache memories, particularly in the L1 data caches. To alleviate the cache contention and thrashing problem, I develop an instruction aware Control Loop Based Adaptive Bypassing algorithm, called Ctrl-C. Ctrl-C learns the cache reuse behavior and bypasses a portion of memory requests with the help of feedback control loops. The evaluation result shows that Ctrl-C can effectively improve cache utilization in GPUs and achieve an average of 1.42x speedup for cache sensitive GPGPU workloads.
520
$a
Finally, GPU workloads and the co-located processes running on the host chip multiprocessor (CMP) in a heterogeneous system setup can contend for memory resources in multiple levels, resulting in significant performance degradation. To maximize the system throughput and balance the performance degradation of all co-located applications, I design a scalable performance degradation predictor specifically for heterogeneous systems, called HeteroPDP. HeteroPDP predicts the application execution time and schedules OpenCL workloads to run on different devices based on the optimization goal. The evaluation result shows HeteroPDP can improve the system fairness from 24% to 65% when an OpenCL application is co-located with other processes, and gain an additional 50% speedup compared with always offloading the OpenCL workload to GPUs.
520
$a
In summary, this dissertation aims to provide insights for the future microarchitecture and system architecture designs by identifying, analyzing, and addressing three critical performance problems in modern GPUs.
590
$a
School code: 0010.
650
4
$a
Computer engineering.
$3
621879
650
4
$a
Computer science.
$3
523869
650
4
$a
Electrical engineering.
$3
649834
690
$a
0464
690
$a
0984
690
$a
0544
710
2
$a
Arizona State University.
$b
Computer Engineering.
$3
3289092
773
0
$t
Dissertation Abstracts International
$g
79-01B(E).
790
$a
0010
791
$a
Ph.D.
792
$a
2017
793
$a
English
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10617259
based on 0 review(s)
Location:
ALL
電子資源
Year:
Volume Number:
Items
1 records • Pages 1 •
1
Inventory Number
Location Name
Item Class
Material type
Call number
Usage Class
Loan Status
No. of reservations
Opac note
Attachments
W9337589
電子資源
01.外借(書)_YB
電子書
EB
一般使用(Normal)
On shelf
0
1 records • Pages 1 •
1
Multimedia
Reviews
Add a review
and share your thoughts with other readers
Export
pickup library
Processing
...
Change password
Login