hashFS: Applying Hashing to Optimize File Systems for Small File Reads
Citations Over TimeTop 15% of 2010 papers
Abstract
Today's file systems typically need multiple disk accesses for a single read operation of a file. In the worst case, when none of the needed data is already in the cache, the metadata for each component of the file path has to be read in. Once the metadata of the file has been obtained, an additional disk access is needed to read the actual file data. For a target scenario consisting almost exclusively of reading small files, which is typical in many Web 2.0 scenarios, this behavior severely impacts read performance. In this paper, we propose a new file system approach, which computes the expected location of a file using a hash function on the file path. Additionally, file metadata is stored together with the actual file data. Together, these characteristics allow a file to be read in with only a single disk access. The introduced approach is implemented extending the ext2 file system and stays very compatible with the Posix semantics. The results show very good random read performance nearly independent of the organization and size of the file set or the available cache size. In contrast, the performance of standard file systems is very dependent on these parameters.
Related Papers
- → Hadoop Perfect File: A fast and memory-efficient metadata access archive file to face small files problem in HDFS(2021)32 cited
- → hashFS: Applying Hashing to Optimize File Systems for Small File Reads(2010)16 cited
- A New Approach to Intra-Sub-Cluster File Searching implemented in p2p File System(2016)
- → Sibling relationship and block allocation table in file system for smart card operating system(2016)
- → Time Administration of Virtual File System Operations(2022)