Apache Spark

Using NVIDIA GPU-Accelerated XGBoost and Apache Spark for Shrink Training Time and Cost

This blog will tell the best way to use XGBoost and Spark scratch pad and the arrangement steps necessary to exploit NVIDIA GPUs to fundamentally reduce training time and cost


We illustrate the advantages of GPU-acceleration with a real-world use case from NVIDIA's GeForce NOW group and tell you the best way to empower it in your own journals. 

About XGBoost 

XGBoost is an open source library that provides a gradient boosting framework usable from many programming dialects (Python, Java, R, Scala, C++ and more). XGBoost can run on a single machine or on various machines under several different distributed processing frameworks (Apache Hadoop, Apache Spark, Apache Flink). XGBoost models can be trained on the two CPUs and GPUs. However, data researchers in the GeForce NOW group run into huge difficulties with cost and training time when using CPU-based XGBoost. 

GeForce NOW Use Case 

GeForce NOW is NVIDIA's cloud-based, game-streaming service, delivering real-time interactivity straight from the cloud to workstations, work areas, SHIELD TVs, or Android gadgets. Network traffic dormancy issues can influence a gamer's user experience. GeForce NOW utilizes a XGBoost model to predict the network nature of numerous internet transit providers so a gamer's network traffic can be routed through a transit vendor with the most elevated predicted network quality. XGBoost models are trained using gaming meeting network metrics for every internet service provider. GeForce NOW generates billions of occasions per day for network traffic, consisting of structured and unstructured data. NVIDIA's large data platform merges data from different sources and generates a network traffic data record for each gaming meeting which is utilized as training data. 

As network traffic varies dramatically over the course of a day, the prediction model should be re-trained frequently with the most recent GeForce NOW data. Given a myriad of features and large datasets, NVIDIA GeForce NOW data researchers rely upon hyperparameter searches to assemble profoundly accurate models. For a dataset of many million rows and a non-trivial number of features, CPU model training with Hyperopt requires more than 20 hours on a single AWS r5.4xlarge CPU instance. Indeed, even with a scale-out approach using 2 CPU server instances, the training idleness requires 6 hours with spiraling infrastructure costs. 

Unleashing the Power of NVIDIA GPU-accelerated XGBoost 

A recent NVIDIA developer blog illustrated the critical advantages of GPU-accelerated XGBoost model training. NVIDIA data researchers followed a similar approach to accomplish a 22x accelerate and 8x expense savings compared to CPU-based XGBoost. As illustrated in Figure 1, a GeForce NOW production network traffic data dataset with 40 million rows and 32 features required just 18 minutes on GPU for training when compared to 3.2 hours (191 minutes) on CPU. Also, the right hand side of Figure 1 compares CPU cluster expenses and GPU cluster costs that include both AWS instances and Databricks runtime costs 

Watch this space to learn Spark Course about new Data Science use-cases to leverage GPUs and Apache Spark 3.0 version on Databrick 7.x ML runtimes

Concerning model performance, the trained XGBoost models were compared on four different metrics. 

Root mean squared error 

Mean total error 

Mean total percentage error 

Correlation coefficient 

The NVIDIA GPU-based XGBoost model has similar accuracy in every one of these metrics. 

Since we have seen the performance and cost savings, next we will examine the arrangement and best practices to run an example XGBoost journal on a Databricks GPU cluster. 

Fast Start on NVIDIA GPU-accelerated XGBoost on Databricks 

Databricks supports XGBoost on several ML runtimes. Here is an elegantly composed user control for running XGBoost on single hub and various hubs. 

To run XGBoost on GPU, you just need to make the following changes: 

Set up a Spark cluster with GPU instances (instead of CPU instances) 

Alter your XGBoost training code to switch 'tree_method' parameter from 'hist' to 'gpu_hist' 

Set up data loading 

Set Up NVIDIA GPU Cluster for XGBoost Training 

To lead NVIDIA GPU-based XGBoost training, you need to set up your Spark cluster with GPUs and the proper Databricks ML runtime. 

We utilized a p2.xlarge (61.0 GB memory, 1 GPU, 1.22 DBU) instance for the driver hub and two p3.2xlarge (61.0 GB memory, 1 GPU, 4.15 DBU) instances for the worker hubs. 

We picked 6.3 ML (includes Apache Spark 2.4.4, GPU, Scala 2.11) as our Databricks runtime version. Any Databricks ML runtime with GPUs should work for running XGBoost on Databricks. 

Code Change on 'tree_method' Parameter 

After starting the cluster, in your XGBoost journal you need to change the treemethod parameter from hist to gpu_hist. 

For CPU-based training: 

xgb_reg = xgboost.XGBRegressor(objective='reg:squarederror', ..., tree_method='hist') 

For GPU-based training: 

xgb_reg = xgboost.XGBRegressor(objective='reg:squarederror', ..., tree_method='gpu_hist') 

Getting Started with GPU Model Training 

NVIDIA's GPU-accelerated XGBoost helped GeForce NOW meet the service-level goal of training the model every eight hours, and reduced expenses essentially. Switching from CPU-based XGBoost to a GPU-accelerated version was very straightforward. In case you're likewise struggling with accelerating your training time or reducing your training costs, we encourage you to try it

Comments
Write a Comment