3

I am storing the result of a PostgreSQL query as a data.frame in R. One of the "columns" is an integer[] array type. In R, this is imported as a character string by the RPostgreSQL package.

How do I convert the character string type into a numeric list type column (or as separate numeric columns) in my data.frame?

connect and get data

require(RPostgreSQL)
drv = dbDriver("PostgreSQL")
con = dbConnect(drv, host = ..., post =..., dbname =..., user=..., password=...)
df = dbGetQuery(con, query_string)
dbDisconnect(con)

minimum working example

library(dplyr)
# randomized arrays of 10 numbers
set.seed(10)
df = data.frame(id = c(1:10)) %>%
  mutate(arrcol = lapply(id, function(X) sample(1:99, 10, replace=T)),
         arrcol = gsub("c(","{{",arrcol,fixed=T),
         arrcol = gsub(")","}}",arrcol,fixed=T))

remove brackets

df$arrcol = gsub(fixed=T, "{", "", df$arrcol)
df$arrcol = gsub(fixed=T, "}", "", df$arrcol)

convert to numeric list

# Attempt 1: 
df$arrcol = as.numeric(df$arrcol)
# Error: (list) object cannot be coerced to type 'double'

# Attempt 2:
df$arrcol = lapply(df$arrcol, 
                   function(x) strsplit(x, ",", fixed=T))
# no error, but now the data appears to be stored as a list of character lists: 
# arrcol[1]: list(c("1", "2", "3", "4", "5",...

# Attempt 3:
df$arrcol = lapply(df$arrcol, 
                   function(x) as.numeric(
                     unlist(
                       strsplit(x, ",", fixed=T))
                   )
)
# this one seems to work
1

1 Answer 1

1

My own best answer:

df$numcol = gsub(fixed=T, "{", "", df$arrcol)
df$numcol = gsub(fixed=T, "}", "", df$numcol)

df$numcol <- lapply(df$numcol, 
                    function(x) as.numeric(
                      unlist(
                        strsplit(x = x, split = ",", fixed=T)
                      )
                    )
)

[updated to perform all steps in one go]

df$numcol <- lapply(df$arrcol, 
                    function(x) as.numeric(
                      unlist(
                        strsplit(
                          x = gsub("[{]|[}]", "", x), 
                          split = ",", fixed=T))))

or, equivalently:

df$numcol <- lapply(df$arrcol, 
                    function(x) as.numeric(
                      strsplit(
                          x = gsub("[{]|[}]", "", x), 
                          split = ",", fixed=T)[[1]]
                    )
)

Alternatively, (as long as each of the arrays has the same length) you could use this trick (Splitting a dataframe string column into multiple different columns) to parse the string into separate columns. Note that read.table is clever enough to recognize each of the new variables as integers.

newdf = read.table(text = df$arrcol, header = F, sep = ",")

Additionally, you can easily append those as their own columns onto the original data.frame:

df = cbind(df, newdf)

or, knowing how many new columns will be produced:

df[,3:101] <- read.table(text = gsub("[{]|[}]", "", df$arrcol), 
                         header = F, sep = ",")
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.