Geocoding Using SSIS

In this article I want to discuss the concept of GeoCoding.
 

Introduction

Geocoding is the mechanism or I should say process via which one can find associated Geographical Coordinates of a location. Each coordinate is represented by a pair. One is termed as Latitude and another Longitude. Latitude is an angle that varies from 0 to 90 degrees. Lines of constant latitude, or parallels, run east–west as circles parallel to the equator. On the contrary Longitude runs from north to south. Latitude is used together with longitude to specify the precise location of features on the surface of the Earth. So how does it looks like? Let's say we need to know the Geographical coordinates of CIA Headquarters (always wanted to be an agent...Jason Bourne :D ), of which the address is:
 
1000 Colonial Farm Rd, Mc Lean, VA 22101, United States
 
When we Google it we find that Latitude of CIA HQ is 38.9512066 and Longitude is -77.1512908
 
So basically a geographical coordinate is the point where a Latitude and a Longitude intersect.
Popular search engine giant "Google" has created some APIs for its users to help them out in such requirements. In our example we are going to use one of these. So before you start using these APIs, I would like you to go through Google Geocoding API, and read about their usage policy.
Enough of the chit chat, now lets do some espionage :)

Scripts

We would need some tables for our source and destination.
Let's assume that CIA is sending some addresses and we need to geocode them.

DDL & DML

Source

GO

IF OBJECT_ID('ForumGeoCode_ArticleSource') IS NOT NULL

DROP TABLE ForumGeoCode_ArticleSource

CREATE TABLE ForumGeoCode_ArticleSource(

FGC_S_ID INT IDENTITY PRIMARY KEY,

FGC_Area VARCHAR(200) NOT NULL,

FGC_City VARCHAR(50) NOT NULL)

Destination

GO

IF OBJECT_ID('ForumGeoCode_ArticleDestination') IS NOT NULL

DROP TABLE ForumGeoCode_ArticleDestination

CREATE TABLE ForumGeoCode_ArticleDestination(

FGC_D_ID INT IDENTITY PRIMARY KEY,

FGC_AreaCity VARCHAR(200) NOT NULL,

FGC_Latitude NUMERIC(18,15) NOT NULL DEFAULT 0,

FGC_Longitude NUMERIC(18,15) NOT NULL DEFAULT 0,

FGC_URL VARCHAR(MAX) NOT NULL)

Insert into Source

GO

INSERT ForumGeoCode_ArticleSource

SELECT '1000 Colonial Farm Rd, Mc Lean','Virginia',GETDATE() UNION

SELECT 'Vatva','Ahmedabad',GETDATE() UNION

SELECT 'Dharavi Road','Mumbai',GETDATE() UNION

SELECT 'Green Park Extension','Delhi',GETDATE() UNION

SELECT 'R K Puram Sector-4','Delhi',GETDATE()

 

Extraction & Load

The backend is done, we have created the objects now time to use them. Now we will be using SSIS for rest of the process.
 
We need:
 

1.

Connection Manager

Create a OLEDB connection manager which points to the database you have created these objects in.

2.

OLEDB Source

Pointing to the source we have created above. Here in Data Access Mode, instead of table we have to choose SQL command* with below SQL statement:

SELECT  DISTINCT FGC_Area+', '+FGC_City AS Source,
    CONVERT(VARCHAR(18),'0') AS Latitude,
    CONVERT(VARCHAR(18),'0') AS Longitude,
    'http://maps.googleapis.com/maps/api/geocode/xml?address='
    +FGC_Area+' ,'+FGC_City+'&sensor=false' AS GeoCodeURL
FROM ForumGeoCode_ArticleSource
WHERE  FGC_DateStamp>=DATEADD(DD,-1,GETDATE())
    AND FGC_Area+' ,'+FGC_City NOT IN 
    (SELECT FGC_AreaCity FROM ForumGeoCode_ArticleDestination)

Connect the Precedence Constraint to Script Component

3.

Script Component##

Choose Visual Studio C# as the language and in the input columns option choose all the columns as ReadWrite

Connect the Precedence Constraint to OLEDB Destination

4.

OLEDB Destination

Choose the destination object create above and Map the columns by name.

 
*We have chosen SQL Command as the Data access mode because we are assuming that this will be a daily schedule, we don't want to pick all the records daily just those which are added today and yesterday. This decreases the load if there are bulk entries happening in the source. We can also have a flag to check, but I am going with this right now.
 

##Transformation

 
In the script component we have chosen C# as our language. We'll go step by step in this phase.

Step 1. Edit the Script.
Step 2. Add a class named GeoCoding.cs to the project.
Step 3. Add following references if not added:
    a)Microsft.SqlServer.DTSPipelineWrap
    b)Microsft.SqlServer.DTSRunTimeWrap
    c)Microsft.SqlServer.TxScript
    d)System.Data
    e)System.Web
    f)System.Xml
Step 4.Open the GeoCoding class and add below script:
 
using System;

using System.Collections.Generic;

using System.Text;

using System.Data;

using System.Net;

using System.Web;

using System.Xml;

using System.Threading;

namespace himspace

{

public interface IGeoLocation

{

string latitude { get; set; }

string longitude { get; set; }

string geocodeurl { get; set; }

}

public struct GeoLocation : IGeoLocation

{

private string _latitude;

private string _longitude;

private string _geocodeurl;

public GeoLocation(string latitude,string longitude,string geocodeurl)

{

_latitude = latitude;

_longitude = longitude;

_geocodeurl = geocodeurl;

}

public string latitude

{

get { return _latitude; }

set { _latitude = value; }

}

public string longitude

{

get { return _longitude; }

set { _longitude = value; }

}

public string geocodeurl

{

get { return _geocodeurl; }

set { _geocodeurl = value; }

}

}

public class GeoCode

{

const string _googleUri= "http://maps.googleapis.com/maps/api/geocode/xml?address=";

private static Uri GetGeoCodeURI(string Source)

{

Source = HttpUtility.UrlEncode(Source);

string uri = String.Format("{0}{1} &sensor=false", _googleUri, Source);

return new Uri(uri);

}

public static GeoLocation GetCoordinates(string Source)

{

WebClient wc = new WebClient();

Uri uri = GetGeoCodeURI(Source);

WebProxy proxyObj = new WebProxy("ip and port here");

proxyObj.Credentials = new System.Net.NetworkCredential("username","password");

wc.Proxy = proxyObj;

try

{

string geoCodeInfo = wc.DownloadString(uri);

XmlDocument xmlDoc = new XmlDocument();

xmlDoc.LoadXml(geoCodeInfo);

string checkURL;

checkURL = "";

string status = xmlDoc.DocumentElement.SelectSingleNode("status").InnerText;

double geolat = 0.0;

double geolong = 0.0;

XmlNodeList nodeCol = xmlDoc.DocumentElement.SelectNodes("result");

foreach (XmlNode node in nodeCol)

{

Thread.Sleep(1000);

if (checkURL == uri.ToString())

break;

else

{

geolat= Convert.ToDouble(node.SelectSingleNode("geometry/location/lat").InnerText, System.Globalization.CultureInfo.InvariantCulture);

geolong= Convert.ToDouble(node.SelectSingleNode ("geometry/location/lng").InnerText, System.Globalization.CultureInfo.InvariantCulture);

checkURL = uri.ToString();

}

}

string geolatString;

string geolongString;

geolatString= geolat.ToString();

geolongString = geolong.ToString();

return new GeoLocation(geolatString, geolongString, uri.ToString());

}

catch

{

return new GeoLocation("0.0", "0.0", "http://");

}

}

}

}

Step 5. Open the main class add below code to
public override void Input0_ProcessInputRow(Input0Buffer Row) and also add himspace as a namespace

himspace.GeoLocation geolocation = himspace.GeoCode.GetCoordinates(Row.Source);

Row.Latitude = geolocation.latitude.ToString();

Row.Longitude = geolocation.longitude.ToString();

Row.GeoCodeURL = geolocation.geocodeurl;

Implementation

Now we have GeoCodes for these locations. Now we'll try finding out the approximate distance between these locations. For that we'll create a procedure in SQL Server.For distance we'll use a simple mathematical formulae,which will give us the direct distance between two points irrespective of the streets, one ways, highways, tolls. So this distance is not your drive path.(But definitely shorter than that)
 

Procedure Script

GO

CREATE PROCEDURE ForumGeoCode_Distance

@Src SMALLINT,

@Dest SMALLINT

AS

DECLARE @SRCLat NUMERIC(18,15)

DECLARE @SRCLng NUMERIC(18,15)

DECLARE @DestLat NUMERIC(18,15)

DECLARE @DestLng NUMERIC(18,15)

DECLARE @Distance VARCHAR(20)

DECLARE @SrcAC VARCHAR(100)

DECLARE @DestAC VARCHAR(100)

SET @SRCLat=(SELECT FGC_Latitude FROM ForumGeoCode_ArticleDestination WHERE FGC_D_ID=@Src)

SET @SRCLng=(SELECT FGC_Longitude FROM ForumGeoCode_ArticleDestination WHERE FGC_D_ID=@Src)

SET @DestLat=(SELECT FGC_Latitude FROM ForumGeoCode_ArticleDestination WHERE FGC_D_ID=@Dest)

SET @DestLng=(SELECT FGC_Longitude FROM ForumGeoCode_ArticleDestination WHERE FGC_D_ID=@Dest)

SET @SrcAC=(SELECT FGC_AreaCity FROM ForumGeoCode_ArticleDestination WHERE FGC_D_ID=@Src)

SET @DestAC=(SELECT FGC_AreaCity FROM ForumGeoCode_ArticleDestination WHERE FGC_D_ID=@Dest)

SET @Distance =(SELECT ACOS((SIN(RADIANS(@SRCLat))*SIN(RADIANS(@DestLat)))+ (COS(RADIANS(@SRCLat))*COS(RADIANS(@DestLat))* COS(RADIANS(@DestLng)-RADIANS(@SRCLng))))*6378.137)

PRINT('DISTANCE BETWEEN "'+@SrcAC+'" AND "'+@DestAC+'":'+@Distance+' KM')

GO

Execution

EXEC ForumGeoCode_Distance 2,3

Check Google Maps

 
Note: This formula only calculates distance when both the coordinates are on the same side of the GMT(Prime Meridian).

Conclusion

Using the Google direction API we can find out the coordinates of any location. These coordinates are subject to an error rate of 15 meters only if the location is correctly identified by the Google servers. If not then API will parse only a part of the location address and calculates the coordinates. In this case coordinates may differ by a huge percentage of error.
 
The distance calculated by the formula is aerial distance, irrespective of FAA rules and regulations.
One can use this solution in many ways. The geocoded locations can be used for a Map representation using PowerView or any other third party mapping tool.
A dynamic report can be generated using PPS which maps the approximate distance between locations.
Travelling salesman problem can be solved to certain extent. And many more...

References

1. Wiki
2. Movable Type Scripts
3. System.Xml Namespace


See Also