0
\$\begingroup\$

At first, I had Database to contain std::list<Student> students, but because there are over 2000 students, searching by students should not be done in linear time. However, searching for students can be by name or by student number. To enable both searches in logarithmic time, I decided to give Database two maps that hold the same 2000+ students. Inserting and removing a student will have to update both maps. This doubles the time, but I figured that \$2*O(logN)\$ is still much better than \$O(N)\$. If there are to be k different search criteria later on, then \$k*O(logN)\$ is still better than \$O(N)\$ up to, say, k = 20 or so. Agree?

Space complexity is a different matter though. Is the doubling of memory usage bad design here, for 2000+ students? Tripling? Quadrupling?

Is there a better solution altogether?

#include <iostream>
#include <string>
#include <memory>
#include <map>

class Student {
    std::string surname, firstName;
    long studentNumber;
public:
    std::string getSurname() const {return surname;}
    std::string getFirstName() const {return firstName;}
    std::string fullName() const {return firstName + ' ' + surname;}
    long getStudentNumber() const {return studentNumber;}
};

class Database {
    std::multimap<std::string, std::shared_ptr<Student>> mapByName;
    std::map<long, std::shared_ptr<Student>> mapByStudentNumber;
public:
    void insertStudent (const std::shared_ptr<Student>& p) {
        mapByName.emplace (getName(p), p);
        mapByStudentNumber.emplace (p->getStudentNumber(), p);
        // 2 insertions needed.
    }
    inline void removeStudentByStudentNumber();
    inline void removeStudentByName();
private:
    void remove (const std::shared_ptr<Student>& p) {
        mapByName.erase(getName(p));
        mapByStudentNumber.erase(p->getStudentNumber());
        // 2 removals needed.
    }
    std::string getName (const std::shared_ptr<Student>& p) const {return p->fullName();}
};

inline void Database::removeStudentByStudentNumber() {
    long studentNumber;
    std::cout << "Student number? ";
    std::cin >> studentNumber;
    const auto it = mapByStudentNumber.find(studentNumber);
    if (it == mapByStudentNumber.end()) {
        std::cout << "No student with that student number exists.\n\n";
        return;
    }
    remove(it->second);
}

inline void Database::removeStudentByName() {
    // Similar to above
}

int main() {}
\$\endgroup\$
1
  • \$\begingroup\$ About now you probably want to take a look at Boost multi_index. \$\endgroup\$ Commented Feb 8, 2016 at 7:14

3 Answers 3

1
\$\begingroup\$

Algorithmic time analysis

If there are to be k different search criteria later on, then \$k∗O(log N)\$ is still better than \$O(N)\$ up to, say, k = 20 or so. Agree?

Sure, that's what \$O\$ notation means. A few issues though.

You'd normally say \$O(k* \log N)\$. Or just \$O(\log N)\$, as the number of search criteria is not variable in the data.

It matters whether \$k\$ grows with \$N\$. If you have to have a new search criterion for each entry in the database, then \$O(k * \log N)\$ is \$O(N * \log N)\$ which is of course worse than \$O(N)\$.

Worst, best, and average case time complexities are theoretical constructs. They mean that for some sufficiently large value of \$N\$, an \$O(\log N)\$ algorithm will beat an \$O(N)\$ algorithm. In practice, your particular value of \$N\$ may or may not be large enough for your choice of algorithms. The only way to say with reasonable surety is to do timing tests both ways.

Space complexity

Doubling the space should not be an issue, but you shouldn't need to double the space unless you duplicate the student records. Since you are using pointers, you are only doubling the index space, not the total space.

If space is at a premium, you may want to make some searches require a full scan while others have prebuilt indexes.

An alternative for student numbers

When I was in school, my student number had six digits. The first two were the last two digits of my graduation year. The third was always a 0. The other three were a sequence based on enrollment order. If your student numbers have a similar scheme, you could store the records in five arrays for a four year school. Put each typical year in a different array and put the extra years in a fifth array or a smaller map. Since arrays are more space efficient than maps, this should reduce the space required even if the array has missing values. It also gives you constant time access to students in the year-based arrays.

Bug

        mapByName.erase(getName(p));

This removes every element with that name. So if two students have the same name and one drops, you delete both.

        const auto pair = mapByName.equal_range(getName(p));
        for (auto it = pair.first; it != pair.second; ++it) {
            if (it->second->getStudentNumber() == p->getStudentNumber()) {
                mapByName.erase(it); break;
            }
        } 

Updated to use tested code from comment.

\$\endgroup\$
1
  • \$\begingroup\$ const auto pair = mapByName.equal_range(getName(p)); for (auto it = pair.first; it != pair.second; ++it) { if (it->second->getStudentNumber() == p->getStudentNumber()) { mapByName.erase(it); break; } } Thanks for pointing out the bug. One fix I've tested with is the above. \$\endgroup\$ Commented Feb 8, 2016 at 1:50
1
\$\begingroup\$

This code is not complete so difficult to give a full review, but here are my thoughts.

Student

Put public methods first

This point is debatable, but I generally think it's better to have public methods before private variables. This is the only thing users of your class should care about, so make it easier to find.

Use variable name forename

forename is a better variable name than firstName, and fits better with surname (as apposed to lastName).

Type alias std::string

I think it would be clearer to type alias std::string for the names, if you make this public, the database class can use this alias to make the map definition clearer too:

using NameType = std::string;

Add missing constructors

Not sure if these were left out intentionally, but a simple forwarding constructor here will suffice:

template <typename T1, typename T2>
Student(T1&& forename, T2&& surname, long studentNumber);

Make fullName a non-member

It's generally preferable to make your class interface as minimal as possible, and you really gain nothing here by having this as a member method, so make it a non-member:

Student::NameType getFullName(const Student& student);

Return fields by const &

You are making a copy on every call to getFirstName and getSurname, better to return a const reference - the caller can always make a copy if needed.

Database

Don't make Database responsible for reading user input

Your remove functions should not read anything from the user, it obscures the interface and purpose of the class. If you need this functionality, put it into a non-member method:

long getStudentNumberToRemoveFromUser();

Consider a better design

I'd say your design is flawed; you shouldn't be removing Student objects by name (not even full name), as you could have duplicates. Presumably the student number is unique, so this should be the only identifier used for removal. Instead you could offer a public method to gain access for student numbers. So you just have:

std::vector<Student> getMatchingStudents(Student::NameType surname forename, Student::NameType surname) const;
bool remove(const Student& student);

Where the return bool indicates if the record was removed.

Consider Boost.Bimap

Boost.Bimap already solves the two way access problem you are having.

Remove shared_ptr

Assuming you don't want to use Boost.Bimap.

There's really no need to use shared_ptr here; Database has complete control over the lifetime of the Student objects. Introducing the shared_ptr only adds code complexity, additional memory consumption, and confusion. Instead you can either store everything by value (completely feasible for just 2000 records), or store values in mapByStudentNumber and store a raw pointer, or std::reference_wrapper<const Student::NameType in mapByName.

Use better variable names

There's some debate about how to best name maps (xToY, ys, xToYMap etc), but I don't like mapByName and mapByStudentNumber. I'd probably go with namesToNumbers and numbersToNames.

\$\endgroup\$
1
  • \$\begingroup\$ For removing a student by name (Database::removeStudentByName()), I was going to use std::multimap::equal_range to output all students that have the same first and last name (if there is more than one), then the user can choose from those which student specifically to remove. \$\endgroup\$ Commented Feb 8, 2016 at 1:40
1
\$\begingroup\$

I see some things that may help you improve your code.

Create a constructor for Student

I'd suggest that Student should have a constructor. Something like this:

Student(std::string last, std::string first, long num) :
    surname{last},
    firstName{first},
    studentNumber{num}
{}

Eliminate spurious functions

I'd be inclined to eliminate the getName() function. The current code uses this:

mapByName.emplace (getName(p), p);

But it seems to me that it's just as easy to write this:

mapByName.emplace (p->fullName(), p);

Separate I/O from processing

The removeStudentByStudentNumber() does output, input and then actually does the removal. Better would be to separate those functions. The member function should only remove the object; I/O should be separate. This suggests that the removal function itself should accept the number as input and return a bool indicating success or not.

\$\endgroup\$

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.