Tell me more ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

Hi I'm trying to import CSV data into a spatially enabled Postgres database. The data is available here. I'm unsure where I went wrong and any help is greatly appreciated! What I'm attempting to do is visualize that data with D3.js and maybe display a heat density of the most libraries per town or something to that degree.

File: lib/tasks/import_incidents_csv.rake

require 'csv'

namespace :import_incidents_csv do

  task :create_incidents => :environment do

    csv_text = File.read('/home/mgmacri/data/PublicLibraryBranchLocations.csv')
    csv = CSV.parse(csv_text, :headers => true)

    csv.each do |row|
      row = row.to_hash.with_indifferent_access
      Moulding.create!(row.to_hash.symbolize_keys)
    end

  end

end


user@server:/spatial_project$: rake import_incidents_csv:create_incidents --trace
** Invoke import_incidents_csv:create_incidents (first_time)
** Invoke environment (first_time)
** Execute environment
** Execute import_incidents_csv:create_incidents
rake aborted!
invalid byte sequence in UTF-8
/usr/lib/ruby/1.9.1/csv.rb:1855:in `sub!'
/usr/lib/ruby/1.9.1/csv.rb:1855:in `block in shift'
/usr/lib/ruby/1.9.1/csv.rb:1849:in `loop'
/usr/lib/ruby/1.9.1/csv.rb:1849:in `shift'
/usr/lib/ruby/1.9.1/csv.rb:1791:in `each'
/usr/lib/ruby/1.9.1/csv.rb:1805:in `to_a'
/usr/lib/ruby/1.9.1/csv.rb:1805:in `read'
/usr/lib/ruby/1.9.1/csv.rb:1379:in `parse'
/home/mgmacri/rails/mymap/lib/tasks/import_incidents_csv.rake:8:in `block (2 levels) in                                     
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/task.rb:228:in `call'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/task.rb:228:in `block in execute'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/task.rb:223:in `each'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/task.rb:223:in `execute'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/task.rb:166:in `block in         invoke_with_call_chain'
/usr/lib/ruby/1.9.1/monitor.rb:211:in `mon_synchronize'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/task.rb:159:in `invoke_with_call_chain'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/task.rb:152:in `invoke'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/application.rb:143:in `invoke_task'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/application.rb:101:in `block (2 levels)     in top_level'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/application.rb:101:in `each'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/application.rb:101:in `block in   top_level'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/application.rb:110:in `run_with_threads'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/application.rb:95:in `top_level'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/application.rb:73:in `block in run'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/application.rb:160:in  `standard_exception_handling'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/application.rb:70:in `run'
/var/lib/gems/1.9.1/gems/rake-10.0.3/bin/rake:33:in `<top (required)>'
/usr/local/bin/rake:19:in `load'
/usr/local/bin/rake:19:in `<main>'
Tasks: TOP => import_incidents_csv:create_incidents
share|improve this question

2 Answers

up vote 1 down vote accepted

Using postgresql's native CSV import is a couple of orders of magnitude faster than using Ruby's CSV API, and may also avoid the same encoding issues.

For example:

namespace :import_incidents_csv do
  task :create_incidents => :environment do
    ActiveRecord::Base.connection.execute "COPY moulding (name, state, postcode, lat, long) FROM '/home/mgmacri/data/PublicLibraryBranchLocations.csv' DELIMITER ',' CSV;"
  end
end

More info: http://www.postgresql.org/docs/9.2/static/sql-copy.html

share|improve this answer

Excel encodes the file into ISO-8859-1 and not in UTF-8. So tell Ruby to open the file with read only in ISO-8859-1

file=File.open("input_file", "r:ISO-8859-1")
share|improve this answer
Excel uses an ANSI codepage by default, not an ISO charset. These aren't quite the same thing. For example, ISO-8859-1 is similar but not identical to cp1252 (otherwise known as Windows-1252). It's best to use the correct codepage rather than guess at a close-enough encoding - or better yet, save your Excel sheets in UTF-8 using OpenOffice and retain your sanity. Note that Excel will use a different codepage on different systems; eg a central european users will probably send you cp1251 text. See stackoverflow.com/questions/508558/… – Craig Ringer Mar 21 at 1:04
1  
See also en.wikipedia.org/wiki/Windows-1252 which explains among other things that Windows-1252 isn't even ANSI-standard even though Windows calls it an "ANSI" codepage. – Craig Ringer Mar 21 at 1:10
@Craig: Thanks for the explanation. – Sagar Bommidi Mar 21 at 4:05

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.