GPU Database — A Complete Introduction
What is a GPU Database?
A GPU database uses graphics processing units (GPUs) to perform database operations. A GPU is a programmable processor designed to quickly render high resolution images and video. Because GPUs can perform parallel operations on multiple sets of data, they are now commonly adopted for non-graphical uses. A GPU database uses GPU computation power to analyze massive amounts of information and return results in milliseconds.
A Brief History of GPU Databases
GPUs — graphics processing units — were originally built to process video graphics. It didn’t take long for enterprising people to realize that GPUs had massively parallel capabilities for analyzing volumes of data. GPUs have thousands of cores that can solve difficult computational problems with great speed.
The initial idea for a GPU database was to create an enhanced database operation by adding compute workloads to the graphics shaders of GPUs. When NVIDIA GPUs pioneered and launched CUDA in 2007, CUDA and GPU computing quickly took hold in HPC — high performance computing — and scientific workloads. After CUDA, the academic space started developing the first GPU-accelerated databases.
More recently, GPUs emerged as the primary enabler of deep learning. Commercial vendors building first-class compute and analytics on top of these GPUs have proliferated. OmniSci (now HEAVY.AI) launched the world’s first open source GPU database and SQL engine in 2017.
How Do GPU Databases Work?
GPU databases use standard drivers and SQL to query data. While on-premise deployments are good an option for large enterprises, most GPU database engines run in the cloud. Because an individual GPU contains so much computing power, scaling a GPU database simply requires adding more GPUs to a server rather than adding more servers. Up to 100TB of raw data can be stored and queried in a GPU database on a standard server.
GPUs greatly accelerate operations that can be parallelized. A GPU database can return complex queries in milliseconds even as the dataset grows to millions or billions of records. GPU SQL Databases query SQL natively for extraordinary increases in performance at the scale of Big Data.
GPU Database vs CPU Database
CPUs — central processing units — steadily became 30 to 40 percent faster every year into the early 2000s. Today, there continues to be 15 to 20 percent growth in CPU processing power per year. Yet CPUs are not keeping up with the explosive growth of data from multiple sources. CPUs contain only 2-8 cores per processor, and the tend to process data in series.
GPUs – graphics processing units — have a different architecture and processing paradigm than CPUs. This allows for more than 50 percent growth in performance each year, which gives GPUs an advantage in keeping up with today’s big data demands. GPU based databases achieve orders of magnitude speedups and price-performance gains over CPU-based analytic technologies.
A CPU server might have 10 to 30 very fast cores. But a GPU server can have as many as 40,000 cores. Individual CPU cores are faster and smarter than individual GPU cores, but the sheer number of GPU cores, and the massive amount of parallelism that they offer, more than makes up the difference.
The growth of GPU speed is exceeding the rate of data growth, which is why GPUs have promise to deliver next generation platforms for accelerating analytics at scale. GPUs are also vastly outstripping CPUs in terms of memory bandwidth.
Benefits of Accelerated Databases
Accelerated databases provide significant benefits compared to mainstream databases, especially when it comes to repetitive queries on massive amounts of data.
The power of accelerated databases makes it easier to work with extremely large data sets or extremely fast data streams from sources such as the Internet of Things, clickstreams and business transactions.
The superior compute power of GPUs give the GPU database a clear speed advantage for complex queries. This is important when patterns and insights need to be discovered in real time, for immediate action.
Benefits of GPU-accelerated databases include:
- Light IT footprint: Thousands of GPU processing cores and high bandwidth memory can fit on a single card, which means a dozen GPU-accelerated servers can nearly match the performance of a CPU cluster with 1,000 servers.
- Innovation velocity: The rate of efficiency improvement in GPUs is twice that of CPUs.
- Analytic efficiency: The ability to process data in real time.
Bigger Data, Better Insights
NVIDIA Showcases the Power of OmniSci (now HEAVY.AI) at GTC
We want to accelerate data scientists work, by giving them the instrument of their science, so they can accomplish their life's work as quickly as possible.
- Jensen Huang CEO, NVIDIA
When to Use a GPU Database?
The ability to do real-time data exploration and ingest more data faster make analytical databases powered by GPUs attractive to data scientists and engineers working on machine learning algorithms. Speedy queries can reduce a data scientist’s wait time from hours to minutes — or even from minutes to milliseconds. This can also result in reduced IT costs and energy consumption.
Database acceleration is essential for visual analytics and machine learning support. A GPU database leverages supercomputing infrastructure to deliver SQL queries across billions of records in milliseconds. Beyond preparing sales or user reports, a GPU database can visualize high velocity data to help determine tactical decisions in real time.
What is an Open Source GPU Database?
Many databases are open source by design, which allows them to support a broad range of data analytics environments. Open source designs for analytics databases include the following features:
- Connectors to simplify integration with the most popular open source frameworks.
- Drivers for Open Database Connectivity (ODBC) and Java Database Connectivity (JDBC) that enable seamless integration with existing visualization and business intelligence tools.
- APIs to enable bindings with commonly used programming languages.
GPU Database Use Cases
Telco Data Analytics — A major telco provider uses an HEAVY.AI GPU database to process billions of rows of telecommunication data in real time, which allows the carrier to monitor network performance and maintain its high standards of network reliability. Learn more about the future of telecom industry here.
Advertising — Simulmedia uses a GPU database to analyze billions of weekly television viewing records to inform advertising decisions.
Mobile Location Services — Skyhook, a mobile positioning and location provider, uses a GPU database to help run up to 10 billion transactions daily as it leverages Wi-Fi, cellular and other sensor data to refine user and device locations.
Research — The Spatiotemporal Innovation Center uses a GPU database to advance spatiotemporal thinking, computing and engineering. From disaster response to energy sustainability, solving global challenges requires an understanding of how phenomena are linked in space and time.
Which Technologies Interconnect With GPU Analytics?
GPU databases solve many of the scale, speed and interactivity limitations of earlier analytic tools, but they do not require a complete replacement of other databases, BI or GIS analysis systems. In fact, the open source nature of the HeavyDB analytic database make it easy to integrate into existing ecosystems, in ways like the following:
- As hyper-fast SQL engine to accelerate existing analytic processes.
- For real-time visualization of streaming data.
- To build custom apps and visualizations that benefit from server side rendering.
- To improve performance of traditional business intelligence platforms.
- To help data scientists visually explore large datasets for feature engineering and training of machine learning models, or
- Leveraging the open source Apache Arrow framework move data through an end-to-end data science workflow, across many technologies, with the lowest amount of latency.
History of GPUs
1950s — MIT built a Navy flight simulator called the Whirlwind in 1951 that attempted 3D virtualization by displaying real-time text and graphics on a video terminal for the first time. It also pioneered the use magnetic-core memory to make digital computing more possible.
1970s — The first 3D graphics system was created to carry information from a central processor to a display screen. RCA built a video chip called the Pixie in 1976, which output a video signal. The graphics hardware for the Namco Galazian arcade system in 1979 supported RGB color and tilemap backgrounds.
1980s — IBM displayed video in its PCs in 1981. Intel’s Video Graphics Controller Multimodule Board could display eight colors in 1983. ATI Technologies was created in 1985, which featured the Wonder line of graphics boards.
1990s — Application programming interfaces (APIs) significantly improved the level of integration of video cards. By 1997, NVIDIA held 25 percent of the graphics market. The modern GPU was created in 1995 with the the first 3D add-in cards. The advent of 32-bit operating systems and lower cost personal computers helped enterprises move beyond 2D and non-PC architectures. NVIDIA announced “the world’s first GPU” in 1999 with the launch of GeForce 256.
2000s — ATI and NVIDIA competed to add more features to their graphics cards. Nvidia released CUDA in 2007, which became a widely adopted programming model for GPU computing. HEAVY.AI launched the world’s first open source GPU database and SQL engine in 2017.
Learn More
Learn more about the technology behind the HEAVY.AI platform.