I am trying to understand the difference between these three file system at a very basic level.
- Distributed FS: HDFS
- Parallel FS : Lustre
- Tradational FS : ext4/ext3/ NTFS/FAT etc.
I want to know what are the basic conceptual differences between these three file system. Most of my knowledge is of the traditional file systems , i.e. ext3/4 superblock , inode etc.
- If a MPI based process(np=8) tries to read a file or write a file A from file system , then how the file access mechanism differ in these contexts
- also how is a file stored in this environment ?. i.e. File A will be splitted across multiple disks or file A will have redundant copies on storage. or a more simple scenario will be say multiple users opens a word document then saves it , then how the write-back/synchronisation differs in these 3 scenarios
So far i have formed a few concepts that:-
- In local file system , the storage is physically mounted on server/nodes.
- In parallel file system , a disk is shared (mount) on multiple nodes, and,
- In distributed FS, the multiple nodes have multiple local storage but all of them are synchronized by some mechanism.
If i have A,B are a workstation and C,D is the disk:
- If C is physically mounted on A & formatted as ext4 then it is tradational file system.
- If C id physically mounted on storage server Z + C is network mounted (NFS) on both A & B then this is cluster FS.
- If C is physically mounted on A and network mounted on B, D is physically on B and network mounted on A. Then this gives rise to Distributed FS.
I understand these concepts are probably wrong. Though some answers state that metadata and data are on separate servers in parallel file systems , but here too i wish to understand how metadata is managed in Distributed File Systems?
I understand that the question is quite lengthy , but i am trying to put my question in as layman/simple terms as possible.