I've been doing some self directed learning with weather data in Scala, because the data is free, big, and feeds into lots of other stuff I would like to do. Immediately I ran into trouble with how to represent data from a text file with more than 22 columns.
What is the idiomatic way of handling data with more than 22 columns? I've been trying to read in the NWS-USAF-NAVY station list at the NOAA and they have 32 pieces of information per line.
My initial inclination was to use case classes, but the most straightforward way of defining them:
/*
first goal is to be able to read the inventory of WBAN stations from NOAA at
ftp.ncdc.noaa.gov/pub/data/inventories/WBAN.TXT
formats are listed in:
ftp://ftp.ncdc.noaa.gov/pub/data/inventories/WBAN-FMT.TXT
although I don't think that file is completely right
*/
case class WBAN(
CoopStationID: Option[String], // 01 - 06 Coop Station Id
ClimateDivision: Option[String], // 08 - 09 Climate Division
WBANStationID: Option[String], // 11 - 15 WBAN Station Id
WMOStationID: Option[String], // 17 - 21 WMO Station Id
FAALOCID: Option[String], // 23 - 26 FAA LOC ID
// and so on, for 32 elements!
Is not permitted because scala does not allow case classes beyond 22 items because it uses tuples to represent the data.
Nested tuples seem like a possible solution, so instead of having a field for each of the items listed by the NOAA, things like latitude, longitude, elevation, etc. could be nested:
// class representing a latitude or longitude's information
case class DMS(
Degrees: Int,
Minutes: Int,
Seconds: Int
)
// class combining a lat lon with elevation data
case class LatLonElevation(
Latitude: DMS,
Longitude: DMS,
LatLonPrecision: String,
ElevationGround: Option[Int],
Elevation: Option[Int],
ElevationTypeCode: Option[Int]
)
Or do you put it into a map with a vector in each value?
It seems like there should be a succinct way of doing this, but after implementing it I ended up repeating quite a bit of meaning in different formats, which was extremely ugly. Is there some way to import this kind of data using SLICK or other library or will it have the same limitations as the case classes? As an aside, is it better to use lazy val, Future, or another library to handle the connection?