...making Linux just a little more fun!

Writing PostgreSQL Functions in C, Part Two

By Ron Peterson

Introduction

In my last article, I introduced the basic framework for creating your own PostgreSQL function in C. In this article, I'd like to expand on that introduction. I'll introduce:

I'm also going to eschew the use of the PostgreSQL extension building infrastructure I used last time, in order to illustrate the details of how PostgreSQL shared object files are built in Linux.

The same prerequisites as in my previous article still apply. All of the code presented here can be downloaded as a single tarball if you would prefer to avoid typing practice (and the consequent frustration of debugging typos, rather than code.)


The End

Before we begin, let's look at what we want to accomplish. Let's say we'd like to create a set of PostgreSQL functions that implement the features of Mark Galassi's excellent GNU Scientific Library. Let's pick one of the library's functions, gsl_complex_add, and see what we need to do to create a corresponding PostgreSQL function. When we're finished, we'll be able to write SQL statements like this:

> select gsl_complex_add( ROW( 3.2e4, -3.2 ), ROW( 4.1, 4.245e-3 ) );

   gsl_complex_add   
---------------------
 (32004.1,-3.195755)

I think it's appropriate to represent complex numbers in PostgreSQL as tuples, where the real and imaginary components get passed around together as a pair. Think of a tuple as a structure in C. The tuple concept jibes with the way we're taught to think about these things in other domains. We'll be using PostgreSQL's CREATE TYPE statement to define the composite type we use as follows:

DROP FUNCTION gsl_complex_add ( __complex, __complex );
DROP TYPE __complex;

CREATE TYPE __complex AS ( r float, i float );

CREATE OR REPLACE FUNCTION
  gsl_complex_add( __complex, __complex )
RETURNS
  __complex
AS
  'example.so', 'c_complex_add'
LANGUAGE
  C
STRICT
IMMUTABLE;

The Stuff in the Middle

OK, so now that we know what we would like to do, let's look at how we get there. I'll dump all of the code on you at one time, and follow up by trying to explain how it works. I won't spend too much time repeating what I say in the code comments though, because that would be redundant, just like this sentence.

// example.c:

// PostgreSQL includes
#include "postgres.h"
#include "fmgr.h"
// Tuple building functions and macros
#include "access/heapam.h"
#include "funcapi.h"

#include <string.h>

// GNU Scientific Library headers
#include <gsl/gsl_complex.h>
#include <gsl/gsl_complex_math.h>

#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
#endif

// forward declaration to keep compiler happy
Datum c_complex_add( PG_FUNCTION_ARGS );

PG_FUNCTION_INFO_V1( c_complex_add );
Datum
c_complex_add( PG_FUNCTION_ARGS )
{
   // input variables
   HeapTupleHeader   lt, rt;

   bool           isNull;
   int            tuplen;
   bool           *nulls;

   // things we need to deal with constructing our composite type
   TupleDesc         tupdesc;
   Datum             values[2];
   HeapTuple         tuple;

   // See PostgreSQL Manual section 33.9.2 for base types in C language
   // functions, which tells us that our sql 'float' (aka 'double
   // precision') is a 'float8 *' in PostgreSQL C code.
   float8                *tmp;

   // defined by GSL library
   gsl_complex           l, r, ret;

   // Get arguments.  If we declare our function as STRICT, then
   // this check is superfluous.
   if( PG_ARGISNULL(0) ||
       PG_ARGISNULL(1) )
   {
      PG_RETURN_NULL();
   }

   // Get components of first complex number
   //// get the tuple
   lt = PG_GETARG_HEAPTUPLEHEADER(0);
   ////// get the first element of the tuple
   tmp = (float8*)GetAttributeByNum( lt, 1, &isNull );
   if( isNull ) { PG_RETURN_NULL(); }
   GSL_SET_REAL( &l, *tmp );
   ////// get the second element of the tuple
   tmp = (float8*)GetAttributeByNum( lt, 2, &isNull );
   if( isNull ) { PG_RETURN_NULL(); }
   GSL_SET_IMAG( &l, *tmp );

   // Get components of second complex number
   rt = PG_GETARG_HEAPTUPLEHEADER(1);
   tmp = (float8*)GetAttributeByNum( rt, 1, &isNull );
   if( isNull ) { PG_RETURN_NULL(); }
   GSL_SET_REAL( &r, *tmp );
   tmp = (float8*)GetAttributeByNum( rt, 2, &isNull );
   if( isNull ) { PG_RETURN_NULL(); }
   GSL_SET_IMAG( &r, *tmp );

   // Example of how to print informational debugging statements from
   // your PostgreSQL module.  Remember to set minimum log error
   // levels appropriately in postgresql.conf, or you might not
   // see any output.
   ereport( INFO,
            ( errcode( ERRCODE_SUCCESSFUL_COMPLETION ),
              errmsg( "tmp: %e\n", *tmp )));

   // call our GSL library function
   ret = gsl_complex_add( l, r );

   // Now we need to convert this value into a PostgreSQL composite
   // type.

   if( get_call_result_type( fcinfo, NULL, &tupdesc ) != TYPEFUNC_COMPOSITE )
      ereport( ERROR,
              ( errcode( ERRCODE_FEATURE_NOT_SUPPORTED ),
                errmsg( "function returning record called in context "
                      "that cannot accept type record" )));

   // Use BlessTupleDesc if working with Datums.  Use
   // TupleDescGetAttInMetadata if working with C strings (official
   // 8.2 docs section 33.9.9 shows usage)
   BlessTupleDesc( tupdesc );

   // WARNING: Architecture specific code!
   // GSL uses double representation of complex numbers, which
   // on x86 is eight bytes.  
   // Float8GetDatum defined in postgres.h.
   values[0] = Float8GetDatum( GSL_REAL( ret ) );
   values[1] = Float8GetDatum( GSL_IMAG( ret ) );

   tuplen = tupdesc->natts;
   nulls = palloc( tuplen * sizeof( bool ) );

   // build tuple from datum array
   tuple = heap_form_tuple( tupdesc, values, nulls );

   pfree( nulls );

   // A float8 datum palloc's space, so if we free them too soon,
   // their values will be corrupted (so don't pfree here, let
   // PostgreSQL take care of it.)
   // pfree(values);
   
   PG_RETURN_DATUM( HeapTupleGetDatum( tuple ) );
}

Wow, those comments are so illustrative, I think the article is almost finished! Alright, I'll try to explicate a few of the finer points. After all, that's what I don't get paid for.

There's nothing much new going on here relative to my last article until we see the declaration of our HeapTupleHeader variables lt and rt (for "left tuple" and "right tuple".) We're not taking simple data types as arguments here, we're taking tuple arguments that we defined with our CREATE TYPE statement. Each of our tuples have two double precision components, representing our complex number's real and imaginary components.

First, we read our tuple arguments in to rt and lt, using the PG_GETARG_HEAPTUPLEHEADER macro. Then we pick the component values out of our tuple using the GetAttributeByNum function. Refer to the Base Types in C Language Functions section of the manual (33.9.2) for information about how to represent PostgreSQL data types in your C code. In our case, this table tells us that our double precision (aka "float") values in SQL are represented in PostgreSQL C code as "float8 *".

It so happens that our GSL library's complex number functions expect "double" values as input, which on the x86 Linux platform I'm running, are conveniently eight bytes, and map directly to the float8 values used by PostgreSQL. Pay close attention here, because if your data types don't map properly, you'll get a headache.

We then use the GSL library's GSL_SET_REAL and GSL_SET_IMAG macros to construct complex number representations that we can pass to the gsl_complex_add function. We convert the data that GSL understands back into a form that PostgreSQL understands by using the Float8GetDatum function. You can see the set of other typical C type to Datum conversion functions in postgres.h.

To create the tuple we'd like to return, we first construct an array of datum values in our "values" variable. The heap_formtuple function converts this array into a PostgreSQL tuple, which the HeapTupleGetDatum function converts into a datum form we can return with PG_RETURN_DATUM.

If we were working with C strings, we would probably do things a bit differently. I'm not going to illustrate how that works, because The Fine Manual already includes a nice example. Note that the example in the manual is also illustrating how to return a set of tuples, which we are not concerning ourselves with here.

Note the ereport( INFO ... ) function in the middle of our code. I find this function very handy for printing debugging information to the SQL console while I'm developing new code. You can see how this works if you leave this uncommented when you compile and install this code.


Shake and Bake

It's time to turn this code into something we can use. Instead of using the PGXS infrastructure as I did in my last article, we'll get under the hood. It's not only educational to see how to build a shared module, but creating your own Makefile also gives you a little more latitude to tweak your build options just the way you like. It might also make it easier for you to handle building projects with lots of dependencies.

Here's a simple Makefile to illustrate how we build our shared object file. In real life, I'd probably use some automatic variables and such, but I don't want to obfuscate the basic build process with Makefile arcana. The pg_config command is your friend, and will help you ascertain where the include files and such are installed on your system. Building the shared object file is a simple matter of first building a position independent (the -fpic flag) object file, and then linking against all required libraries using the -shared flag to build the shared object file. This is all detailed in section 33.9.6 of the manual, which also includes instructions for other architectures besides Linux.

INCLUDEDIRS := -I.
INCLUDEDIRS += -I$(shell pg_config --includedir-server)
INCLUDEDIRS += -I$(shell pg_config --includedir)
# If you are using shared libraries, make sure this location can be
# found at runtime (see /etc/ld.so.conf and ldconfig command).
LIBDIR = -L$(shell pg_config --libdir)
# This is where the shared object should be installed
LIBINSTALL = $(shell pg_config --pkglibdir)

example.so: example.c Makefile
			gcc -fpic -o example.o -c example.c $(INCLUDEDIRS)
			gcc -shared -o example.so example.o $(LIBDIR) -lpq -lgsl -lgslcblas -lm
			cp example.so $(LIBINSTALL)

The Makefile copies the shared object file into the PostgreSQL library directory, so that we can execute the SQL I showed you at the beginning of this article to create our __complex composite type and our gsl_complex_add function. Just fire up psql as a user with permissions to do such things, and then type '\i example.sql' to do so. And that brings us to...


The Beginning

Well, we started at the end, so I guess that means we're finished. As you can see, once you have grasped the basic framework, you have the whole world of C library functions available for you to use directly within PostgreSQL. This gives you all of the attendant advantages of working within a transactional database system. I hope you find this prospect interesting enough to port some intriguing libraries into PostgreSQL, because Lord knows I certainly don't have time to do it all myself. :)

Happy hacking. And a special thanks to the PostgreSQL coding gurus who made this fantastic database in the first place.


Resources

Talkback: Discuss this article with The Answer Gang


Bio picture

Ron Peterson is a Network & Systems Manager at Mount Holyoke College in the happy hills of western Massachusetts. He enjoys lecturing his three small children about the maleficent influence of proprietary media codecs while they watch Homestar Runner cartoons together.


Copyright © 2007, Ron Peterson. Released under the Open Publication License unless otherwise noted in the body of the article. Linux Gazette is not produced, sponsored, or endorsed by its prior host, SSC, Inc.

Published in Issue 142 of Linux Gazette, September 2007

Tux