194 lines
5.1 KiB
TeX
194 lines
5.1 KiB
TeX
\documentclass[11pt]{article}
|
|
\usepackage{mhchem}
|
|
\usepackage{booktabs}
|
|
\usepackage{multirow}
|
|
\usepackage{textcomp}
|
|
\usepackage{epsfig}
|
|
\usepackage{hyperref}
|
|
\usepackage{hyphenat}
|
|
\usepackage[noabbrev, capitalize]{cleveref} % hyperref must be loaded first
|
|
\usepackage[
|
|
detect-weight=true,
|
|
per=slash,
|
|
detect-family=true,
|
|
separate-uncertainty=true]{siunitx}
|
|
|
|
\usepackage[dvipsnames]{xcolor}
|
|
\usepackage{upquote}
|
|
|
|
\usepackage[framemethod=tikz]{mdframed}
|
|
\usepackage{adjustbox}
|
|
\usepackage{listings}
|
|
\usepackage{xparse}
|
|
\NewDocumentCommand{\codeword}{v}{%
|
|
\texttt{\textcolor{blue}{#1}}%
|
|
}
|
|
\lstset{language=C++,keywordstyle={\bfseries \color{blue}}}
|
|
|
|
\usepackage{forest}
|
|
|
|
\begin{document}
|
|
\title{Moving window decomposition implementation using GPU}
|
|
\author{Nam H. Tran \\ Boston University}
|
|
\date{\today}
|
|
\maketitle
|
|
|
|
\section{Introduction}%
|
|
\label{sec:introduction}
|
|
The purpose of this work is assessing the feasibility and performance of using
|
|
GPU to process raw waveforms from a HPGe detector. Expected result of the MWD
|
|
algorithm is shown in \cref{fig:mwdInputOutput}. The input is a
|
|
\num{250000}-sample long waveform taken at the preamplifier output of the HPGe
|
|
detector. The MWD algorithm transforms each jump in the input waveform into a
|
|
flat-top peak, of which height is proportional to charge deposited in the
|
|
detector.
|
|
|
|
\begin{figure}[tbp]
|
|
\centering
|
|
\includegraphics[width=0.90\linewidth]{figs/mwdInputOutput}
|
|
\caption{MWD algorithm: input waveform on the left, and expected output on
|
|
the right.}%
|
|
\label{fig:mwdInputOutput}
|
|
\end{figure}
|
|
|
|
\section{Set up}%
|
|
\label{sec:set_up}
|
|
\subsection{Hardware}%
|
|
\label{sub:hardware}
|
|
There are two consumer computers used in this study:
|
|
\begin{itemize}
|
|
\item PC 1\@:
|
|
\begin{itemize}
|
|
\item CPU\@: AMD Ryzen 5 2400G, running at \SI{3.60}{\giga\hertz},
|
|
maximum frequency \SI{3.90}{\giga\hertz}
|
|
\item GPU\@: GeForce GTX 1060, DDR5 memory \SI{6}{GB},
|
|
maximum frequency \SI{1.70}{\giga\hertz}
|
|
\end{itemize}
|
|
\item PC 2\@:
|
|
\begin{itemize}
|
|
\item CPU\@: Intel Core i5\hyp{}4590 CPU, running at \SI{3.30}{\giga\hertz},
|
|
maximum frequency \SI{3.70}{\giga\hertz}
|
|
\item GPU\@: GeForce GTX 1650, DDR5 memory \SI{4}{GB},
|
|
maximum frequency \SI{1.70}{\giga\hertz}
|
|
\end{itemize}
|
|
\end{itemize}
|
|
|
|
\subsection{Software}%
|
|
\label{sub:software}
|
|
The computers run two different versions of Linux:
|
|
\begin{itemize}
|
|
\item PC 1\@: CentOS 7.2
|
|
\begin{itemize}
|
|
\item gcc
|
|
\item CUDA
|
|
\end{itemize}
|
|
\item PC 2\@: Debian 10.2
|
|
\begin{itemize}
|
|
\item gcc (Debian 8.3.0\hyp{}6) 8.3.0
|
|
\item nvcc V9.2.148, CUDA 10.1, driver version 418.74
|
|
\end{itemize}
|
|
\end{itemize}
|
|
|
|
\section{Implementations}%
|
|
\label{sec:implementations}
|
|
There are two implementations of the MWD algorithm:
|
|
\begin{itemize}
|
|
\item C++ implementation which does all calculations on the CPU. This will be
|
|
used to verify the accuracy of the other code, as well as a benchmark
|
|
\item CUDA implementation: offloads the digital pule processing part on to
|
|
the GPU, CPU only handles input/output related tasks
|
|
\end{itemize}
|
|
|
|
\subsection{C++ code}%
|
|
\label{sub:c_code}
|
|
This implementation uses raw array wrapped in a \codeword{struct} to represent
|
|
waveforms, pointers are managed manually. There are 3 methods
|
|
\codeword{Deconvolute}, \codeword{OffsetDifferentiate}, and
|
|
\codeword{MovingAverage} corresponds to 3 stages of the MWD algorithm.
|
|
|
|
The related files are:
|
|
|
|
\begin{forest}
|
|
for tree={
|
|
font=\ttfamily,
|
|
grow'=0,
|
|
child anchor=west,
|
|
parent anchor=south,
|
|
anchor=west,
|
|
calign=first,
|
|
edge path={
|
|
\noexpand\path [draw, \forestoption{edge}]
|
|
(!u.south west) +(7.5pt,0) |- node[fill,inner sep=1.25pt] {} (.child anchor)\forestoption{edge label};
|
|
},
|
|
before typesetting nodes={
|
|
if n=1
|
|
{insert before={[,phantom]}}
|
|
{}
|
|
},
|
|
fit=band,
|
|
before computing xy={l=15pt},
|
|
}
|
|
[mwd
|
|
[mwd.c]
|
|
[srcs
|
|
[vector.h]
|
|
[vector.c]
|
|
[algo.h]
|
|
[algo.c]
|
|
]
|
|
]
|
|
\end{forest}
|
|
|
|
\subsection{CUDA code}%
|
|
\label{sub:cuda_code}
|
|
The CUDA code implements 3 GPU functions \codeword{gpuDeconvolute},
|
|
\codeword{gpuOffsetDifferentiate}, and \codeword{gpuMovingAverage} which
|
|
replace 3 C++ methods in the other implementation. There are also helpers for
|
|
moving data between main memory and GPU memory, error checking and time
|
|
keeping. The I/O part is the same as in the C++ version.
|
|
|
|
Related files are:
|
|
|
|
\begin{forest}
|
|
for tree={
|
|
font=\ttfamily,
|
|
grow'=0,
|
|
child anchor=west,
|
|
parent anchor=south,
|
|
anchor=west,
|
|
calign=first,
|
|
edge path={
|
|
\noexpand\path [draw, \forestoption{edge}]
|
|
(!u.south west) +(7.5pt,0) |- node[fill,inner sep=1.25pt] {} (.child anchor)\forestoption{edge label};
|
|
},
|
|
before typesetting nodes={
|
|
if n=1
|
|
{insert before={[,phantom]}}
|
|
{}
|
|
},
|
|
fit=band,
|
|
before computing xy={l=15pt},
|
|
}
|
|
[mwd
|
|
[gmwd.cu]
|
|
[srcs
|
|
[gpuAlgo.cu]
|
|
[gpuAlgo.h]
|
|
[gpuTimer.h]
|
|
[gpuUtils.h]
|
|
[prefixScan.cu]
|
|
[prefixScan.h]
|
|
]
|
|
]
|
|
\end{forest}
|
|
|
|
\section{Results}%
|
|
\label{sec:results}
|
|
|
|
\section{Code}%
|
|
\label{sec:code}
|
|
A tarball of the code is attached as \codeword{mwd.tar.gz}
|
|
|
|
|
|
\end{document}
|