Visualizing the Indian Railways in 3D

IDC IIT Bombay India

This post describes the concept, process, insights and extensions of a geovisualization project done as a part of Interactive Data Viz course at IDC.

In the end, I had this 3D visualization.

Concept(s)

The concept was to visualize the trains that originate from every station on the Indian Railways network. This would allow users to compare the number and even draw conclusions based on their locations. The goal was to be able to make the viz insightful and must atleast follow William Cleveland’s Three Visual Operations i.e Detection, Assembly and Estimation. These were covered during the course. One can also look at this presentation by John Rauser. He covers very interesting aspects about perception of data. Cleveland’s model was mentioned on slide 115 of his presentation. I thought if i could make 3d bars at the location of the stations, such that the height would represent the number of source trains.

In short for the visualization these would be the parameters(based on description tags by Isabel Meirelles in Design for information):

I also thought if we could make a ‘coarse’ contour map that ‘encodes’ the ease of reaching a place. This would be based on an index derived from number of trains passing, stoppage time and time of the day of a particular station. The contour would be drawn inverted such that if a place has less trains passing, its height will be higher as it is difficult to reach. This, I feel, would give some physical property to the contour too. Imagine someone sitting on the ground and it is super difficult to reach a station say A as only 1 train passes while its easy to go to say B because 10 trains pass by. So, ‘A’ will be shown higher than ‘B’. An icing could be exploring it in VR. I could not figure out the tools to implement this in the short time we had.

Process

The process involved:

The weapons of choice were : R, Python, Mapbox and Google Sheets

Sourcing and cleaning data The data was provided by data.gov.in. The format used was csv. I used this source of R codes for reference.

Deriving useful data

First I read the data in R using R Studio.

library(readr)
timetable <- read_csv("C:/Users/my pc/Downloads/isl_wise_train_detail_03082015_v1.csv")

Next,I made a subset of trains. Hence, I could capture trains that have reached destination.

timetable.srs.dest <- subset(timetable, timetable$`station Code` == timetable$`Destination station Code`)

using dplyr library, I grouped and summarized data. Then I wrote the file out as a CSV.

library(dplyr)
traingp<-group_by(timetable.srs.dest,`source Station Name`)
trainsfromStation<-summarise(traingp,count=n())
trainsfromStation
arrange(trainsfromStation,desc(count))
write.csv(trainsfromStation,"NumbOfTrainsoriginating.csv")

I also did some other exploratory analysis like, trains passing per station.

trainflux<-group_by(timetable,`Station Name`)
trainflux
trainfluxCount<-summarise(trainflux,count=n())
trainfluxCount
arrange(trainfluxCount,desc(count))
arrange(trainfluxCount,count)
write.csv(trainfluxCount,"trainsstoppingperstation.csv")

Halt times and Average halt times per station too. This would make give an idea of how busy a station is. This was not trivial as it seems, thanks to my n00by R skills. First, all source and destination stations were removed using Distance !=0. Then because of the funny way times were mentioned, I had to remove ‘single’ quotes in all time vectors. I used gsub for that. Then, I had to convert the char to POSIXlt date format. This would enable time computation using difftime. Using difftime I made a new vector for HaltTime with a minute resolution. I am told, these steps are just a few clicks in excel.

trainNoSource=subset(timetable,dataset$Distance!=0)
trainNoSource$`Arrival time`<- gsub("'",'',trainNoSource$`Arrival time`)
trainNoSource$`Departure time`<- gsub("'",'',trainNoSource$`Departure time`)
trainNoSource$`Arrival time`<-strptime(trainNoSource$`Arrival time`,format = ("%H:%M:%S"))
trainNoSource$`Departure time`<-strptime(trainNoSource$`Departure time`,format = ("%H:%M:%S"))
trainNoSource$`Halt Time`<-as.difftime(trainNoSource$`Departure time`-trainNoSource$`Arrival time`,format = "%H:%M:%S",units="mins")

Next, I removed trains that had halt times less than 0 and grouped them by stations. I removed Arrival and Departure vectors. I then studied the average halt times.

trainNoSource=subset(trainNoSource,trainNoSource$`Halt Time`>=0)
trainNoSource$`Arrival time`<-NULL
trainNoSource$`Departure time`<-NULL
trainNoSource_grpstation<-group_by(trainNoSource,`Station Name`)
trainNoSource_haltTimes<-summarise(trainNoSource_grpstation,count=n(),totalhalt=sum(trainNoSource_grpstation$`Halt Time`))
arrange(trainNoSource_haltTimes,desc(count))
trainNoSource_haltTimes<-summarise(trainNoSource_grpstation,count=n(),totalhalt=sum(`Halt Time`))
arrange(trainNoSource_haltTimes,desc(count))
arrange(trainNoSource_haltTimes,desc(totalhalt))
trainNoSource_haltTimes$`avgstop`<-trainNoSource_haltTimes$totalhalt/trainNoSource_haltTimes$'count'
arrange(trainNoSource_haltTimes,desc(avgstop))
write.csv(trainNoSource_haltTimes,"avgHaltTimeperstation.csv")

Converting that data into geojson

The csv file was uploaded on google sheets. I used the awesome tables google sheets plugin to convert station codes to lat and long values.

Unfortunately there is very little documentation for extrusion using mapbox. Just an example, which took me quite some time to get to work. From the example geojson, it seems a height, base height and color properties are required. I wanted less complications, so i went with a single color. I structured the the google sheet file as below.

Then, I used geojson.io to convert the csv to a geojson file. One can also use the mapbox studio for the same.

This might seem enough, but

Little did i know

The extrusion property can be applied only on polygons. How to convert a point to a polygon? This might not be the most relevant question to ask at 3:45 am in the morning. But unfortunately, it was ! Python to the rescue. Thanks to some previous work .

In times when, ‘I wrote a script’ has become as common as ‘I took a bath’, I wrote a script.

The script would simply take the points and convert it into a square of a fixed size. The geoJSON spec sheet was helpful. The process was take the point as the center of the square and them compute the corners in a cyclic manner.

Presentation on a map

By 6am, I had all polygons on the map in mapbox studio. To extrude the height, I had multiplied the train number by 7000 as it seemed to make a lot of effect on Visual detection and helped estimation. I later used the mapbox example and modified it to add tooltips too.

The result is hosted here. The project is in this repo. Feel free to correct me if I am wrong.

Insights

Some insights we had during our class discussion were:

Extensions

Hope this helps !

Cheers,

Rohit