Indian Water Quality Visualization Using Heatmaps


Recently I download a file with Indian data quality data in Kaggle (https://www.kaggle.com/venkatramakrishnan/india-water-quality-data). I decided to explore data visualization package to learn how to make neat and tidy visualization using heat maps.File has the following structure:

water.quality<-readRDS("water.quality.rds")
head(water.quality)
##       State.Name     District.Name     Block.Name  Panchayat.Name
## 1 ANDHRA PRADESH EAST GODAVARI(04) PRATHIPADU(10)   GOKAVARAM(04)
## 2 ANDHRA PRADESH EAST GODAVARI(04) PRATHIPADU(10)   GOKAVARAM(04)
## 3 ANDHRA PRADESH EAST GODAVARI(04) PRATHIPADU(10) GAJJANAPUDI(06)
## 4 ANDHRA PRADESH EAST GODAVARI(04) PRATHIPADU(10) GAJJANAPUDI(06)
## 5 ANDHRA PRADESH EAST GODAVARI(04) PRATHIPADU(10)  CHINTALURU(10)
## 6 ANDHRA PRADESH EAST GODAVARI(04) PRATHIPADU(10)       ELURU(16)
##               Village.Name                      Habitation.Name
## 1           VANTHADA(014 )           VANTHADA(0404410014010400)
## 2     PANDAVULAPALEM(022 )     PANDAVULAPALEM(0404410022010400)
## 3         G. KOTHURU(023 )         G. KOTHURU(0404410023010600)
## 4        GAJJANAPUDI(029 )        GAJJANAPUDI(0404410029010600)
## 5         CHINTALURU(028 )         CHINTALURU(0404410028011000)
## 6 P. JAGANNADHAPURAM(035 ) P. JAGANNADHAPURAM(0404410035011600)
##   Quality.Parameter     Year
## 1          Salinity 1/4/2009
## 2          Fluoride 1/4/2009
## 3          Salinity 1/4/2009
## 4          Salinity 1/4/2009
## 5          Salinity 1/4/2009
## 6          Fluoride 1/4/2009

I decided to organize the work in two R scripts: Data preparation and data visualization My first attempt was to use an advance heat maps function (heatmap.2) I found in blog post by Joseph Rickert (http://www.r-bloggers.com/r-for-more-powerful-clustering). As heatmap uses clustering, a matrix of at least 2 rows by 2 columns was required. I added five custom fields, each one with the name of chemical existing in each sample. As heatmaps requires data to be of class numeric matrix, I had to prepare date in order to ensure that a chemical ocurrence in matrix was marked as 1 (existing) or 0 (absent).I gave up using all data as object generated by scaled function was 116GB in size, and went through using samples of state names. Graphics obtained were cluttered and I was not able to add the year in visualization

source("water-quality-data-preparation-v1.R")
source("water-quality-exploration-v1.R")
## 
## Attaching package: 'gplots'
## The following object is masked from 'package:stats':
## 
##     lowess

I searched then in my collection of R-Blogger post and found a recipe of R code for visualization posted by Dikesh Jariwala (https://www.r-bloggers.com/7-visualizations-you-should-learn-in-r/) and found a code for plotting heatmaps using ggplot. I adapted the code to a discrete variable and obtained a more neat and tidy heatmap.

library(devtools)
library(ggplot2)
water.quality1 <- readRDS("water.quality1.rds")
# Create a function for a data frame, from which the plot is drawn, and uploaded to Github.
source("https://github.com/rventuradiaz/indian-water-quality/raw/master/water-quality-data-preparation-v4.R")
wq_df <- wq1_df(water.quality1) #use the function and obtain the wrangled data frame.
# Plot the heatmap
ggplot(wq_df, aes(Year, VillageName))+
  geom_raster(aes(fill = QualityParameter))+
  labs(title ="Heat Map", x = "Year", y = "Village Name" )+
  scale_fill_discrete(name = "QualityParameter")+
  facet_grid(facets = . ~ PanchayatName  )+
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, size= 5), strip.text.x = element_text(angle = 90, vjust = 0.5, size = 8))

Source data: Indian Water Quality see https://www.kaggle.com/venkatramakrishnan/india-water-quality-data Reference: R for more powerful clustering (see http://www.r-bloggers.com/r-for-more-powerful-clustering/)


Add a Comment

Your email address will not be published. Required fields are marked *