Mapreduce is a very trendy software framework. It has been introduced by Google (TM) in 2004. It is a large topic, and it is not possible to cover all of...
In the first part of this article we have created a job, a database connection and defined the flow in Kettle. In the second part we’ll see how Kettle manages...
In this article, I am going to upgrade a Greenplum cluster from version 4.0 to 4.1 using `gpmigrator`. `gpmigrator` is an utility shipped with Greenplum Community Edition whose purpose is...
Recently I have shown you how to perform a data import from a CSV file into a Greenplum database, using Talend Community Edition. In this article I’m going to perform...
I’m going to demonstrate how it is possible to use dblink in Greenplum 4.0.4.0 What’s dblink? —————— dblink is a PostgreSQL contrib module that allows to execute queries on another...
In the first part of this tutorial, we have set up all the connections required for creating the job, now we can proceed with data import. Let’s drag and drop...
In this article we are going to show you how to write PL/Java functions in Greenplum. I assume that you have a working Greenplum (or Greenplum Community Edition) at your...
hen working with databases, one of the most common task is to load data from one or more CSV files. Several tools are available to achieve this task. Some are...
[*MADlib*](http://madlib.net) is an open-source library for scalable in-database analytics which targets the PostgreSQL and the Greenplum databases. MADlib version 0.2beta needs to be installed properly to follow this article, so...
Greenplum Community Edition is available in different flavours, including a VMWare virtual machine based on CentOS with all the fancy tools and the documentation already installed. This allows you to...