Database dependencies graph visualization

Database dependencies graph visualization

The journey from JSON file to graph format, and visualization with Gephi

Rafael Ventura



To enable Business Intelligence team visualising legacy database table dependencies on designing a migration process


As this an exploratory study, R was chosen to read, process and transform a JSON file with database table dependencies into a file with a graph format.


1.- Read the JSON file

To read the JSON file, I used the rjon library to read from a text file having the data in a JSON structure.

json_file <- "jsonTV.txt"
json_data <- fromJSON(file=json_file)

2.- Transform JSON structure into a data frame

For transforming JSON structure into a data frame, I developed a function populateData.

populateDFData <- function(json_data,df_data) {
  rowIndex <- 0
  for (i in 1:length(json_data)) {

  lengthImport <- length(json_data[[i]]$imports)
  if (lengthImport>0) {
    for (j in 1:lengthImport) {
      rowIndex <- rowIndex+1
      import <- json_data[[i]]$imports
    } } else{
      rowIndex <- rowIndex + 1

I declare a data frame whose number of rows were calculated using a function calculateLength, that defines a record for each occurrence of the tuple (name, import), where name is the table and import its dependencies.

calculateLength <- function(jsonObject) {
  dfLength <- 0
  for (i in 1:length(jsonObject)) {
    importSize <- length(jsonObject[[i]]$import)
    if (importSize == 0) {incrm <- 1} else {incrm<-importSize}
    dfLength <- dfLength + incrm
dfLength <- calculateLength(json_data) 
df_data <- data.frame(name=rep("",dfLength), 
df_data$name<- as.character(df_data$name)
df_data$size<- as.character(df_data$size)
df_data$imports<- as.character(df_data$imports)

Data frame is populated

df_data <- populateDFData(json_data = json_data, 
                          df_data = df_data)

3.- Export data to file with graph format

As we wanted to visualise the graph with Gephi, data input needs to contain headers named Source and Target. A data frame declaration is necessary. Besides, we need to filter only tables containing the expression “AMS” and dependencies with one or more tables. Hence, it was necessary to define two filter conditions. A graph data frame was defined using the function, and this object was exported to a graphml file using the function write.graph.

yaml {r} library("igraph") gephi_df <- data.frame(Source=rep("",nrow(df_data)), Target=rep("",nrow(df_data)), Label=rep("",nrow(df_data))) gephi_df$Source <- df_data$imports gephi_df$Target <- df_data$name gephi_df$Label <- df_data$name gephi_df.filter1 <- grepl("AMS",gephi_df$Target) gephi_df.filter2 <- !grepl("<NA>",gephi_df$Source) gephi_df.export <- gephi_df[gephi_df.filter1&gephi_df.filter2,] gephi_df.g <-[,c("Source", "Target")]) write.graph(gephi_df.g, file="tvia-objects-dependencies.graphml", format = "graphml")

Add a Comment

Your email address will not be published. Required fields are marked *