Warning: lots of code. If it's too much, please focus primarly on channel.h
and channel.hpp
.
This is my first time posting on Code Review, so I apologize in advance if anything is unusual or could have been written better.
Context
The program in which I am using this data structure is a DAQ to CSV conversion tool. Put simply, the goal is to read from binary and write to text. Ignoring a small header which describes its contents, the DAQ file is structured using "frames" and "channels". Each frame contains a number of channels. Each channel has a fixed number of items per channel. For example, I can have three channels: A, B, and C, where A contains 3 items per frame, B contains 1 item per frame, and C contains 20 items per frame.
There are three things to consider: first, the channels in each frame are not guaranteed to be consecutive. Using the example above, on frame 1, I might have data for channels A, B, and C, but on frame 2, I may only have data for channels A and C.
Second, each channel has its own arithmetic type, indicated in the header by a char. For example, channel A may contain data of type 'd' (double), B with type 'i' (integer), and C with type 'c' (char). This means that even the type of the items being read is not known until runtime.
Third, all this data must be read and stored within the program to allow the user to selectively choose what channels / data to print, as well as create their own channels using existing ones. This requires the method of storage to be very efficient in both space and speed. A typical DAQ file ranges from 300MB to 400MB, with +65,000 frames, +250 channels, and up to +600 items in a channel. For perspective, in one case, I wrote all items from a DAQ file to CSV. Excel reported more than 2.5 million cells.
Code
This is the data structure (class Channel
) I designed for this task. I have included the relevant files and dependencies. I've also included a sample main.cpp
to illustrate how channel
is intended to be used.
The code uses templates to handle the variable arithmetic types, smart pointers and polymorphic structs to store the template struct, an internal type (Data::Type
) in combination with static asserts and exceptions for type safety, and C++14 return type deduction and generic lambdas to access data. Please let me know what you think. Thank you very much!
main.cpp
#include "channel.h"
#include <iostream>
#include <vector>
int main() {
// User-defined channels.
auto c1 = Channel{Channel::create<int >("ints :")};
auto c2 = Channel{Channel::create<char>("chars :")};
// Sample data.
auto v1 = std::vector<int>{1, 2, 3};
auto v2 = std::vector<char>{'a', 'b', 'c'};
// Push-back method for user-defined channels.
// User-defined channels can only have one item per frame.
// SFINAE also provides clear error messages if a user
// attempts to push back a type which does not match internals.
for (auto i = std::size_t{}; i < v1.size(); ++i) c1.push_back(i, v1[i]);
for (auto i = std::size_t{}; i < v2.size(); ++i) c2.push_back(i, v2[i]);
// Push-back method for DAQ channels with multiple items.
// During conversion, size of vector can range from 1 to +600.
c1.push_back(4, v1);
c2.push_back(4, v2);
auto const print = [](auto const& data) {
for (auto datum : data) std::cout << datum << ' ';
std::cout << std::endl;
};
// Access data using generic lambdas.
// This provides a type-safe way to access the templated
// vector (or individual datum by frame and index) without
// the user needing to know the underlying type (use casts).
c1.data(print);
c2.data(print);
// Access data by frame and index (for channels with multiple items).
// Channels should have a consistent number of items per frame, and
// no duplicate frame numbers (preferably, in increasing order as well).
c1.datum(2, 0, [](auto datum) { std::cout << datum << " == 3\n"; });
c2.datum(4, 2, [](auto datum) { std::cout << datum << " == c\n"; });
}
record.h
#ifndef RECORD_H
#define RECORD_H
#include "data.h"
#include <string>
struct Record {
int32_t id; // ID in DAQ file.
int32_t items; // Number of items.
std::string name = std::string(53, '\0'); // Name of channel.
int16_t rate; // Rate of updates.
Data::Type type; // Type of data.
int32_t varlen; // Variable length (?)
};
#endif
data.h
#ifndef DATA_H
#define DATA_H
#include <stdexcept>
#include <vector>
// Abstract base class for DataT<T>.
struct Data {
// Used to store internal type information in class Channel.
enum struct Type : char {
Double = 'd'
, Float = 'f'
, Int = 'i'
, Short = 's'
, Char = 'c'
};
// Template Metaprogramming. Returns corresponding Data::Type for type T.
template <typename T> static Type get_type() {
static_assert(std::is_arithmetic<T>::value, "type T must be arithmetic");
if (std::is_same<T, double>::value) return Data::Type::Double;
if (std::is_same<T, float >::value) return Data::Type::Float;
if (std::is_same<T, int >::value) return Data::Type::Int;
if (std::is_same<T, short >::value) return Data::Type::Short;
if (std::is_same<T, char >::value) return Data::Type::Char;
throw std::logic_error("invalid type T in Data::get_type<T>()");
}
// Mark class as abstract.
virtual ~Data() = 0;
};
// Destructor must still be defined.
inline Data::~Data() = default;
// Concrete derived class used to store data in class Channel.
template <typename T> struct DataT : Data { std::vector<T> data; };
#endif
channel.h
#ifndef CHANNEL_H
#define CHANNEL_H
#include "data.h"
#include "record.h"
#include <memory>
#include <vector>
class Channel {
std::unique_ptr<Data> m_data;
std::vector<int> m_frames;
Record m_record;
// Private Constructor.
Channel(std::unique_ptr<Data> data, Record record)
: m_data(std::move(data)), m_record(std::move(record)) { }
// Private Data Interface Function.
template <typename Function> auto get_data(Function func);
public:
// Disable default constructors. Enable moves only.
Channel() = delete;
Channel(Channel const&) = delete;
auto operator=(Channel const&) = delete;
Channel(Channel&&) = default;
// Factory Method Pattern.
template <typename T> static Channel create(std::string name);
static Channel create(Record record) {
switch (record.type) {
case Data::Type::Double : return {std::make_unique<DataT<double>>(), std::move(record)};
case Data::Type::Float : return {std::make_unique<DataT<float >>(), std::move(record)};
case Data::Type::Int : return {std::make_unique<DataT<int >>(), std::move(record)};
case Data::Type::Short : return {std::make_unique<DataT<short >>(), std::move(record)};
case Data::Type::Char : return {std::make_unique<DataT<char >>(), std::move(record)};
default : throw std::invalid_argument("invalid Data::Type in Channel::create(Data::Type)");
}
}
// Channel Interface Functions.
decltype(m_frames) const& frames() const { return m_frames; }
decltype(m_record) const& record() const { return m_record; }
std::size_t size() const { return m_frames.size(); }
// Data Interface Functions.
template <typename T> decltype(DataT<T>::data) const& data() const;
template <typename T> T const& datum(int frame, int offset) const;
template <typename Function> auto data(Function func) const;
template <typename Function> auto datum(int frame, int offset, Function func) const;
// Record Interface Functions.
decltype(Record::id) const& id() const { return m_record.id; }
decltype(Record::items) const& items() const { return m_record.items; }
decltype(Record::name) const& name() const { return m_record.name; }
decltype(Record::rate) const& rate() const { return m_record.rate; }
decltype(Record::type) const& type() const { return m_record.type; }
decltype(Record::varlen) const& varlen() const { return m_record.varlen; }
// Channel Initialization Functions. Uses SFINAE for descriptive error messages.
template <typename T> auto push_back(T) -> std::enable_if_t<!std::is_arithmetic<T>::value>;
template <typename T> auto push_back(int frame, T datum) -> std::enable_if_t< std::is_arithmetic<T>::value>;
template <typename T> auto push_back(int frame, std::vector<T> const& data) -> std::enable_if_t< std::is_arithmetic<T>::value>;
private:
// Enforce type match between internal type and template type deduction.
template <typename T> void check_type() const;
};
#include "channel.hpp"
#endif
channel.hpp
/** Included by channel.h. **/
template <typename Function> auto Channel::get_data(Function func) {
switch (m_record.type) {
case Data::Type::Double : return func(static_cast<DataT<double>*>(m_data.get())->data);
case Data::Type::Float : return func(static_cast<DataT<float >*>(m_data.get())->data);
case Data::Type::Int : return func(static_cast<DataT<int >*>(m_data.get())->data);
case Data::Type::Short : return func(static_cast<DataT<short >*>(m_data.get())->data);
case Data::Type::Char : return func(static_cast<DataT<char >*>(m_data.get())->data);
default : throw std::logic_error("invalid Data::Type set in Channel");
}
}
template <typename T> Channel Channel::create(std::string name) {
static_assert(std::is_arithmetic<T>::value, "type T must be arithmetic.");
auto record = Record{0, 1, std::move(name), 1, Data::get_type<T>(), 0};
return {std::make_unique<DataT<T>>(), std::move(record)};
}
template <typename T> decltype(DataT<T>::data) const& Channel::data() const try {
check_type<T>();
return static_cast<DataT<T> const*>(m_data.get())->data;
} catch (std::invalid_argument const& e) {
throw std::invalid_argument(std::string(e.what()).append(" in Channel::data<T>()"));
}
template <typename T> T const& Channel::datum(int frame, int offset) const try {
check_type<T>();
auto index = std::distance(m_frames.begin(), std::lower_bound(m_frames.begin(), m_frames.end(), frame));
return static_cast<DataT<T> const*>(m_data.get())->data[index * m_record.items + offset];
} catch (std::invalid_argument const& e) {
throw std::invalid_argument(std::string(e.what()).append(" in Channel::datum<T>(int, int)"));
}
template <typename Function> auto Channel::data(Function func) const {
switch (m_record.type) {
case Data::Type::Double : return func(static_cast<DataT<double> const*>(m_data.get())->data);
case Data::Type::Float : return func(static_cast<DataT<float > const*>(m_data.get())->data);
case Data::Type::Int : return func(static_cast<DataT<int > const*>(m_data.get())->data);
case Data::Type::Short : return func(static_cast<DataT<short > const*>(m_data.get())->data);
case Data::Type::Char : return func(static_cast<DataT<char > const*>(m_data.get())->data);
default : throw std::logic_error("invalid Data::Type set in Channel");
}
}
template <typename Function> auto Channel::datum(int frame, int offset, Function func) const {
auto index = std::distance(m_frames.begin(), std::lower_bound(m_frames.begin(), m_frames.end(), frame));
switch (m_record.type) {
case Data::Type::Double : return func(static_cast<DataT<double> const*>(m_data.get())->data[index * m_record.items + offset]);
case Data::Type::Float : return func(static_cast<DataT<float > const*>(m_data.get())->data[index * m_record.items + offset]);
case Data::Type::Int : return func(static_cast<DataT<int > const*>(m_data.get())->data[index * m_record.items + offset]);
case Data::Type::Short : return func(static_cast<DataT<short > const*>(m_data.get())->data[index * m_record.items + offset]);
case Data::Type::Char : return func(static_cast<DataT<char > const*>(m_data.get())->data[index * m_record.items + offset]);
default : throw std::logic_error("invalid Data::Type set in Channel");
}
}
// SFINAE Catch-All Function. Prints descriptive error message in case of type deduction failure.
template <typename T> auto Channel::push_back(T)
-> std::enable_if_t<!std::is_arithmetic<T>::value> {
static_assert(std::is_arithmetic<T>::value, "type T must be arithmetic");
}
template <typename T> auto Channel::push_back(int frame, T datum)
-> std::enable_if_t<std::is_arithmetic<T>::value> try {
this->check_type<T>();
m_frames.emplace_back(frame);
this->get_data([&datum = datum](auto& m_data) { m_data.push_back(std::move(datum)); });
} catch (std::invalid_argument const& e) {
throw std::invalid_argument(std::string(e.what()).append(" in Channel::push_back<T>(int, T)"));
}
template <typename T> auto Channel::push_back(int frame, std::vector<T> const& data)
-> std::enable_if_t<std::is_arithmetic<T>::value> try {
this->check_type<T>();
m_frames.emplace_back(frame);
this->get_data([&](auto& m_data) { m_data.insert(m_data.end(), data.cbegin(), data.cend()); });
} catch (std::invalid_argument const& e) {
throw std::invalid_argument(std::string(e.what()).append(" in Channel::push_back<T>(int, std::vector<T>)"));
}
template <typename T> void Channel::check_type() const {
if (m_record.type != Data::get_type<T>()) throw std::invalid_argument("type T does not match internal type in Channel");
}
void*
worked, but was dangerous and difficult to work with. Explicit casts (template <typename T> std::vector<T> const& data() const
worked better, but it was too easy to bad-cast. That is how I ended up with the current solution: contiguous and type safe. – EraZ3712 Jun 24 '16 at 13:25