Sorting on GPU

In the previous article I introduced the concept of the Oracle data computation on GPU. As an example a simple number addition was provided. But probably the example is too simple to be interested. This article is more complicated and it shows how to manipulate data arrays. The target is to get an Oracle integer array from the test table, to download this array to the GPU, to perform an array sorting on the GPU and to provide a sorted array to the user. Let’s see.

The main PL/SQL code is

function fnc_call_external_array (l_collection in typ_sum) return typ_sum
is external
language C
name "array_sort"
library external_c_lib
with context
parameters (context, l_collection by reference OCIColl, l_collection indicator short);
function fnc_load_array (l_cursor in sys_refcursor) return typ_sum pipelined
g_row_limit simple_integer:=1000000;
l_collection typ_sum;
ll_collection typ_sum;
cursor ll_cursor is select column_value from table(pkg_fcbmath.fnc_call_external_array(l_collection));
    fetch l_cursor bulk collect into l_collection limit g_row_limit;
    pragma inline(fnc_call_external_array,'YES');
    open ll_cursor;
       fetch ll_cursor bulk collect into ll_collection limit g_row_limit;
       for i in 1 .. ll_collection.count loop pipe row(ll_collection(i)); end loop;
       exit when ll_cursor%notfound;
    end loop;
    close ll_cursor;.
    exit when l_cursor%notfound;
 end loop;
 close l_cursor;.
end fnc_load_array;

C/C++ code looks like

#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/execution_policy.h>
#include <thrust/sort.h>
#include <oci.h>

extern "C" OCIColl *array_sort(OCIExtProcContext *p_context, OCIColl *p_collection)
    sb4 arraySize = 0;
    OCIEnv *p_ocienv = (OCIEnv *)0;
    OCISvcCtx *p_ocisvcctx = (OCISvcCtx *)0;
    OCIError *p_ocierror = (OCIError *)0;
    OCINumber *p_ocinum = (OCINumber *)0;
    sb4 index = 0;
    short size_of_element=sizeof(int);
    boolean exists;
if ((OCIExtProcGetEnv(p_context,&p_ocienv,&p_ocisvcctx,&p_ocierror)!=OCI_SUCCESS)||(OCICollSize(p_ocienv,p_ocierror,p_collection,&arraySize)!=OCI_SUCCESS))
    {return NULL;}

    thrust::host_vector<int> h_vector(arraySize,0)

for (index=0; index < arraySize; index++) {
     OCICollGetElem(p_ocienv, p_ocierror, p_collection, index, &exists, (void**)&p_ocinum, 0);
     OCINumberToInt(p_ocierror, p_ocinum, size_of_element, OCI_NUMBER_SIGNED,(dvoid *)&h_vector[index]);

    thrust::device_vector<int> d_vector=h_vector;

for (index=0; index < arraySize; index++) {
    OCINumberFromInt(p_ocierror, &h_vector[index], size_of_element, OCI_NUMBER_SIGNED,p_ocinum);
    OCICollAssignElem(p_ocienv, p_ocierror, index, (const void *)p_ocinum, (dvoid *)0, p_collection);
return p_collection;

And the final SQL statement to test all above is

select column_value from table(PKG_FCBMATH.FNC_LOAD_ARRAY(cursor(select FLOAT_VALUE from test_schema.TBL_FLOAT_VALUES_PART)));

You do not need to copy/paste all the code above if you want to try it. Just download an archive file. Please follow readme.txt file. In case of C/C++ compilation problems please refer to the Prerequisites section of the previous article

Test results
A test host equipped with Intel Core2 Duo (2.33GHz, 2 CPU cores), Nvidia GeForce 8800 GS GPU, Oracle 12c SE2, Oracle Linux 6.8. The test table contained 789896 integer number rows. The test statement was run in Oracle SQL*Plus console utility. 19 seconds were spent to perform CPU only sorting in descending order with full data console output. 22 seconds were spent to perform GPU sorting in descending order with full data console output.

Questions? Propositions? Comments? Let me know what you think via email


3 thoughts on “Sorting on GPU”

  1. Hello! This post couldn’t be written any better! Reading through this post reminds me of my old room mate! He always kept chatting about this. I will forward this write-up to him. Fairly certain he will have a good read. Thanks for sharing!

Comments are closed.