project logo

GeoKettle Quickstart

GeoKettle is a “spatially-enabled” version of Pentaho Data Integration (also known as Kettle). It is a powerful, metadata-driven spatial ETL (Extract, Transform and Load) tool dedicated to the integration of different data sources for building and updating geospatial databases, data warehouses and web services.

GeoKettle enables the Extraction of data from data sources, the Transformation of data in order to correct errors, clean data, change data structure, make data compliant with standards, and Loading of transformed data into a target DataBase Management System (DBMS), GIS file, or geospatial web service. GeoKettle is particularly useful for automating complex and repetitive data processing without producing specific code, converting between data formats, migrating data between databases, feeding data into databases, etc.

This Quick Start describes how to:

  • Load an existing data transformation
  • Create a new data transformation

Start GeoKettle

  1. Choose Spatial Tools ‣ GeoKettle from the Geospatial start menu
  2. Please wait, the application will take a few moments to start up
  3. You will be prompted with the following dialog. Fill in the repository information or simply click the “No repository” button to enter the GeoKettle’s workbench
../../_images/geokettle_welcome.png

Workbench

As illustrated in the following screenshot, the Workbench window is composed of different panels.

../../_images/geokettle_workbench.png

The left part acts as a catalog containing all the steps which could compose a data transformation. The right part of the workbench is the area where the transformation itself would be designed and run/debugged.

The contents of these panels will be described further as we demonstrate their use.

Loading an existing transformation

To load an existing transformation, select File ‣ Open. Browse to the transformation samples subdirectory /opt/geokettle/samples/transformations/geokettle, then select one of the available sample transformations and click OK. GeoKettle transformation are stored in files with the extension *.ktr.

The following picture shows the sample « intersection » transformation.

../../_images/geokettle_intersection_transformation.png

A description of the transformation and optional directives can be seen in the yellow tooltip area.

Before starting the transformation, you will need to specify which shapefile to use. In order to do that, double click on each of the « GIS file input » steps to make the following dialog appear.

../../_images/geokettle_shapefile_input_step.png

Enter the name of your shapefile including the *.shp extension or leave it as is to use the sample dataset and click OK.

You are now ready to start the transformation. To do so, simply hit the play button in the toolbar above your transformation.

Creating a new data transformation

Launch GeoKettle and access the workbench in the same way you would do when loading an existing transformation (see previous section).

To create a new transformation, select File ‣ New ‣ Transformation. You can specify the name of the transformation by saving it under a different name (select File ‣ Save as...).

As shown in the following picture, all available steps are listed by category in the left area of the workbench. Expand any category to see its available steps.

../../_images/geokettle_your_transformation.png

To add a new step to the transformation, drag it from the Steps panel to the transformation panel. You may then customize this new added step to your transformation by double clicking on it.

Hops

A hop, represented as an arrow between 2 steps, defines the dataflow between those steps. As shown in the following picture, adding a hop from Table Input to Add sequence means that the resulting output of Table Input will be sent to the Add sequence step for further processing and etc.

../../_images/geokettle_hop.png

To create a new hop, select 2 steps, right click on one of them and select New hop. Another way of doing it is to press and hold Ctrl while selecting the 2 steps.

Any hop can be edited at any time by double clicking on it or right clicking on it and selecting Edit hop in the popup menu.

Setting up the transformation

Most of the steps in a transformation will require custom parametrization before being usable. Double click on any step to display a dialog interface in which you can see and specify each requested parameter values.

Running a transformation

When executing a transformation, a new panel appears below the one where the transformation is designed. This panel (aka the Execution Results panel) contains information about data flow through all steps involved in the transformation.

The Step Metrics tab (shown in the next figure) will be initially displayed. You can see in this tab general information regarding the transformation’s dataflow such as the number of rows read, written, in input and in output in each step. The column Active informs the user if the step is started, running, finished, aborted, etc. The time elapsed since the step has been started is shown in the column Time, as well as the average speed (column Speed) of the step (rows/seconds).

../../_images/geokettle_running_transformation.png

Previewing a transformation

Trying to execute a transformation may result in errors in the Execution Results panel (see next figure). Please then review the content of the Logging tab. There is always a lot of useful information dealing with the source and reason of the error. Modify the parameters of the faulty step and restart the transformation.

../../_images/geokettle_transformation_fail.png

To help in finding the source of an error, you can also preview the results of a transformation from another step earlier in the workflow. To do so, right click on the step, and select Preview in the popup menu that appears. This way, you can see in a tabular and cartographic way what the data looks like at this point in the overall process without executing the whole transformation.

Things to Try

Here are some additional challenges for you to try:

  1. Explore the diversity of all the steps that GeoKettle provides
  2. Try the GeoKettle debugger in order to debug a faulty transformation
  3. Try to build a transformation with your own data

What Next?

Take a look at the GeoKettle user and developer documentation and tutorials available on the wiki of the project. Do not hesitate also to ask for help on the Spatialytics forum.

Copyright & Disclaimer