completed mwd report
This commit is contained in:
BIN
mwd_gpu/figs/amdNormTimeCost_full.png
Normal file
BIN
mwd_gpu/figs/amdNormTimeCost_full.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 48 KiB |
BIN
mwd_gpu/figs/amdTimeCost_full.png
Normal file
BIN
mwd_gpu/figs/amdTimeCost_full.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 48 KiB |
Binary file not shown.
|
Before Width: | Height: | Size: 51 KiB After Width: | Height: | Size: 49 KiB |
@@ -55,38 +55,21 @@ detector.
|
|||||||
\label{sec:set_up}
|
\label{sec:set_up}
|
||||||
\subsection{Hardware}%
|
\subsection{Hardware}%
|
||||||
\label{sub:hardware}
|
\label{sub:hardware}
|
||||||
There are two consumer computers used in this study:
|
A consumer computer is used in this study:
|
||||||
\begin{itemize}
|
\begin{itemize}
|
||||||
\item PC 1\@:
|
\item CPU\@: AMD Ryzen 5 2400G, running at \SI{3.60}{\giga\hertz},
|
||||||
\begin{itemize}
|
maximum frequency \SI{3.90}{\giga\hertz}
|
||||||
\item CPU\@: AMD Ryzen 5 2400G, running at \SI{3.60}{\giga\hertz},
|
\item RAM\@: \SI{16}{GB} DDR4 3000
|
||||||
maximum frequency \SI{3.90}{\giga\hertz}
|
\item GPU\@: GeForce GTX 1060, DDR5 memory \SI{6}{GB},
|
||||||
\item GPU\@: GeForce GTX 1060, DDR5 memory \SI{6}{GB},
|
maximum frequency \SI{1.70}{\giga\hertz}
|
||||||
maximum frequency \SI{1.70}{\giga\hertz}
|
|
||||||
\end{itemize}
|
|
||||||
\item PC 2\@:
|
|
||||||
\begin{itemize}
|
|
||||||
\item CPU\@: Intel Core i5\hyp{}4590 CPU, running at \SI{3.30}{\giga\hertz},
|
|
||||||
maximum frequency \SI{3.70}{\giga\hertz}
|
|
||||||
\item GPU\@: GeForce GTX 1650, DDR5 memory \SI{4}{GB},
|
|
||||||
maximum frequency \SI{1.70}{\giga\hertz}
|
|
||||||
\end{itemize}
|
|
||||||
\end{itemize}
|
\end{itemize}
|
||||||
|
|
||||||
\subsection{Software}%
|
\subsection{Software}%
|
||||||
\label{sub:software}
|
\label{sub:software}
|
||||||
The computers run two different versions of Linux:
|
The OS is Debian 10.2, with following compilers:
|
||||||
\begin{itemize}
|
\begin{itemize}
|
||||||
\item PC 1\@: CentOS 7.2
|
\item gcc (Debian 8.3.0\hyp{}6) 8.3.0
|
||||||
\begin{itemize}
|
\item nvcc V9.2.148, CUDA 10.1, driver version 418.74
|
||||||
\item gcc
|
|
||||||
\item CUDA
|
|
||||||
\end{itemize}
|
|
||||||
\item PC 2\@: Debian 10.2
|
|
||||||
\begin{itemize}
|
|
||||||
\item gcc (Debian 8.3.0\hyp{}6) 8.3.0
|
|
||||||
\item nvcc V9.2.148, CUDA 10.1, driver version 418.74
|
|
||||||
\end{itemize}
|
|
||||||
\end{itemize}
|
\end{itemize}
|
||||||
|
|
||||||
\section{Implementations}%
|
\section{Implementations}%
|
||||||
@@ -104,7 +87,8 @@ There are two implementations of the MWD algorithm:
|
|||||||
This implementation uses raw array wrapped in a \codeword{struct} to represent
|
This implementation uses raw array wrapped in a \codeword{struct} to represent
|
||||||
waveforms, pointers are managed manually. There are 3 methods
|
waveforms, pointers are managed manually. There are 3 methods
|
||||||
\codeword{Deconvolute}, \codeword{OffsetDifferentiate}, and
|
\codeword{Deconvolute}, \codeword{OffsetDifferentiate}, and
|
||||||
\codeword{MovingAverage} corresponds to 3 stages of the MWD algorithm.
|
\codeword{MovingAverage} corresponds to 3 stages of the MWD algorithm. The code
|
||||||
|
is run on a single thread.
|
||||||
|
|
||||||
The related files are:
|
The related files are:
|
||||||
|
|
||||||
@@ -182,12 +166,44 @@ Related files are:
|
|||||||
]
|
]
|
||||||
\end{forest}
|
\end{forest}
|
||||||
|
|
||||||
|
\subsection{Benchmarking}%
|
||||||
|
\label{sub:benchmarking}
|
||||||
|
Several tests are done with different waveform lengths from
|
||||||
|
\numrange{246}{250000} to show performance of these two implementations. When
|
||||||
|
a waveform length less than that of the original input is requested, successive
|
||||||
|
sub\hyp{}waveforms from the original are fed into the algorithms. At each
|
||||||
|
waveform length, the calculations are repeated from \numrange{1000}{3000}
|
||||||
|
times, time costs of I/O and each MWD stages are recorded.
|
||||||
|
|
||||||
\section{Results}%
|
\section{Results}%
|
||||||
\label{sec:results}
|
\label{sec:results}
|
||||||
|
Average time costs of the two implementations as functions of waveform length are
|
||||||
|
shown in \cref{fig:amdTimeCost_full}. The CPU is much quicker the GPU when data
|
||||||
|
size is small, but the CPU time grows linearly much faster than the GPU time.
|
||||||
|
For waveforms of about \num{10000}-sample long, the two codes take roughly the
|
||||||
|
same time to complete. The GPU is about 30\% faster than the CPU when
|
||||||
|
processing \num{250000}-sample long waveforms. The growing rates are shown more
|
||||||
|
clearly on \cref{fig:amdNormTimeCost_full}, the CPU time cost grows about twice as
|
||||||
|
fast as the GPU time cost does.
|
||||||
|
|
||||||
|
\begin{figure}[tbp]
|
||||||
|
\centering
|
||||||
|
\includegraphics[width=0.9\linewidth]{figs/amdTimeCost_full.png}
|
||||||
|
\caption{Time costs of C++ (blue) and CUDA (red) codes as functions of
|
||||||
|
waveform length.}%
|
||||||
|
\label{fig:amdTimeCost_full}
|
||||||
|
\end{figure}
|
||||||
|
|
||||||
|
\begin{figure}[tbp]
|
||||||
|
\centering
|
||||||
|
\includegraphics[width=0.9\linewidth]{figs/amdNormTimeCost_full.png}
|
||||||
|
\caption{Normalized time costs of C++ (blue) and CUDA (red) codes as
|
||||||
|
functions of waveform length.}%
|
||||||
|
\label{fig:amdNormTimeCost_full}
|
||||||
|
\end{figure}
|
||||||
|
|
||||||
\section{Code}%
|
\section{Code}%
|
||||||
\label{sec:code}
|
\label{sec:code}
|
||||||
A tarball of the code is attached as \codeword{mwd.tar.gz}
|
A tarball of the code is attached as \codeword{mwd.tar.gz}
|
||||||
|
|
||||||
|
|
||||||
\end{document}
|
\end{document}
|
||||||
|
|||||||
Reference in New Issue
Block a user