Sign up ×
Database Administrators Stack Exchange is a question and answer site for database professionals who wish to improve their database skills and learn from others in the community. It's 100% free, no registration required.

We are planning to build a web based custom business analytics tool using PHP and MySQL/PostgreSQL. In our application, we are planning to upload customer’s data using a Excel sheet and these data will be displayed on custom build interactive dashboard.

I have few questions before I start working on this application

1) For the mass data upload from excel, which db is suitable MySQL or PostgreSQL?

2) At some point is it possible that multiple customers can start data upload at same time ?. To deal with such scenario is it advisable to create a separate db for each customer or to use a single db for application(single tenancy vs multi-tenancy). Please tell me pros and cons if any for each scenario?

3) How it would affect my dashboard performance, if I go with multiple databases design and I join tables across multiple databases?

Thanks

share|improve this question
    
In common usage the phrase "multi-tenant" refers to having all your customers' data in one database. Giving each customer their own database would be a single-tenant design. –  Michael Green Jul 27 '14 at 12:26
    
A MySQL "database" is actually called "schema" in other DBMS. Generally you can join between schemas but not databases. –  Neil McGuigan Jan 8 at 22:07

2 Answers 2

For a data warehousing/analytics type application (DW/OLAP), I would go with PostgreSQL. It has set operators, windowing functions (also known as analytic functions) and common table expressions. You will most likely have to implement some or all of these in your own code with MySQL with the attendant possibility of bugs. MySQL is more suited to read-heavy OLTP type applications, whereas PostgreSQL is better (IMHO) for OLAP work.

With respect to "multi-tenancy" - why not use the same database for all and simply identify different customers by a customer_id - or (depending on the number of customers) have separate tables for each customer and non-customer specific data in other tables? How many customers do you have and how much data do you expect to be storing/analysing in the next 10 years? Modern RDBMSs can store large amounts of data and retrieve quickly (with suitable disk configuration), but I would favour PostgreSQL for large databases.

Sharding might be interesting if you only occasionally query across customers and most of your analysis is done per customer - you could set up a table (or even schema) per customer (on different disks). Again, knowledge of data volumes would help here.

Furthermore, with respect to multi-tenancy, see my question to jynus about querying across servers - or even different schemas within the same server.

I take jynus' point about "holy wars" - this post is just my own opinion about your particular question - were I to be asked about a different scenario, I might well recommend MySQL.

[EDIT]

What, exactly, is "my dashboard performance"?

share|improve this answer
    
And as I predicted, a broad opinion "this is better for this, this is better for that" that shouldn't be in a Q&A site. –  jynus Jul 26 '14 at 12:25
    
In fairness, it's better than many. The OP mentions 2 specific servers, a given app (Excel) and dev tool (PHP - even if not very relevant) and also mentions particular scenarios. It appears that dba.se is fairly liberal in the posts that are acceptable - although what exactly is and isn't OK doesn't appear to be cast in stone. –  Vérace Jul 26 '14 at 12:58

1) Any modern RDBMS system will allow you to import and work with Excel-like files (usually, we use them as CSV files, for which both databases have import support: COPY and LOAD DATA).

If you had to choose, worry about in-house technology knowledge and advanced features, but nothing here suggest any impossible tasks. Please be wary of people promoting holy wars among database vendors.

2 and 3) All modern RDBMS allow for transactional, concurrent writes of different rows in a single table. Your design concerns should be directed about estimated write load/data size and its implications on performance, like -Do you need partitioning/sharding? Those questions are usually asked when you have to deal with huge numbers. Make sure you are not falling into the premature optimization anti-pattern.

There is a difference regarding joins, as schemas/databases on MySQL are mostly a logical categorization, and there is effectively no difference in performance between joing tables in the same or different databases inside the same instance. In Postgres, while it is tecnically possible (dblink, etc.), it is not an optimal action by design -different schemas are "different entities"- obviously you do not put tables that you plan to join on different schemas knowing that (the same way that you do not put them on different instances).

share|improve this answer
    
Could you explain how MySQL performs queries across schemas/databases? What do you mean when you say that "schemas/databases on MySQL are mostly a logical categorization"? How are transactions handled in the case of updates between schemas? What is the distinction between MySQL's and PostgreSQL's way of dealing with cross-schema (or cross-database) joins? For example, if I have my customer data for the last month on 2 different schemas (not changing - or updated rarely) - why is PostgreSQL any different from MySQL? How is MySQL better in this case? Any refs/URLs appreciated. –  Vérace Jul 26 '14 at 10:53
    
Feel free to create a new question and I will answer you in exchange for internet magical points. :-) This requires more than 500 characters. I am not saying MySQL is better, I am saying that he should not be doing that in the first place. –  jynus Jul 26 '14 at 12:22
    
@Vérace For links, in particular MySQL/InnoDB internals, check: blog.jcole.us/innodb and percona.com/live/london-2013/sessions/… , but MySQL is more than InnoDB. –  jynus Jul 26 '14 at 12:28
    
Thanks - will read, and if necessary post a question. Will leave comment here for followup should you choose to pursue. –  Vérace Jul 26 '14 at 12:31
    
"In Postgres ... you do not put tables that you plan to join on different schemas" - of course you do. And there is also no performance penalty in joining table between different schemas. A schema is not an "entity" it is a namespace. –  a_horse_with_no_name Jan 8 at 22:19

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.