The number of pages within the document is: 13
The self-declared author(s) is/are:
van de Hulsbeek
The subject is as follows:
Original authors did not specify.
The original URL is: LINK
The access date was:
2019-02-15 18:19:57.829791
Please be aware that this may be under copyright restrictions. Please send an email to admin@pharmacoengineering.com for any AI-generated issues.
The content is as follows:
1 Available online at www.prace -ri.eu Partnership for Advanced Computing in Europe Dealing with small Files in HPC Environments : automatic Loop-Back Mounting of Disk Images Marcin Krotkiewski * University of Oslo/SIGMA2 Abstract Processing of large numbers (hundreds of thousands) of small files (i.e., up to a few KB) is notoriously problematic for all modern parallel file systems. While modern storage solutions provide high and scalable bandwidth through parallel storage servers connected with a high -speed network , accessing small files is sequential and latency -bounded . Paradoxically, p erformance of file access is worse than if the files were stored on a local hard drive. We present a generic solution for large -scale HPC facilities that improves the performance of workflows dealing with large numbers of small file. The files are saved inside a single large file containing a disk image, similarly to an archive. When needed, the image is mounted through the Unix loop -back device, and the contents of the image are available to the user in the form of a usual directory tree. Since mounting of disks under Unix often requires super -user privileges, security concerns and possible ways to address them a re considered. A complete Python implementation of image creation, mounting, and unmounting framework is presente d. A seamless integration into HPC environment s managed by SLURM is discussed on an example of read -only software modules created by administra tors, and user -created disk images with read -only application input data. Finally, results of performance benchmarks carried out on the Abel supercomputer facility in Oslo, Norway, are shown. 1. Introduction Modern storage solutions for HPC environments stri ve to provide scalable IO performance for supercomputers with thousands of compute nodes. This goal is challenging considering the large variety of applications, and their CPU and IO characteristics. When it comes to IO, some jobs require large bandwidth, while others may perform many meta -data operations, random IO access , and short reads. On top of that, modern HPC facilities serve hundreds of users and often support a large application base . It is common that parallel network file systems are used to host home directories , which often contain large numbers of small files , and to provide centralized software and development stacks to the compute nodes , in addition to providing compute jobs with space for in put/output data. Most modern storage architectu res are based on specialized server nodes interconnected with high -speed, low -latency networks. Dedicated storage servers provide clients with high total network bandwidth and scalable access to the back -end disk arrays. On the other hand, meta -data server s deal with file name, modification date, and operations like file creation, deletion, opening, closing, and locking. Communication with the storage infrastructure is usually implemented in the form of a network file system (FS), e.g., Bee GFS [1] , Lustre [2] , Gluster FS [3] , OrangeFS [4] . The role of such global parallel FS is to provide fast, parallel storage, to share fairly the IO resources among the clients, and to assure data integrity and a consistent data state across thousands of independent clients that can at the same time read, and write data. Depending on the characteristics of the software run by an HPC facilit y the number of server nodes and their configuration can be adjusted to provide the best overall storage performance in a cost -effective manner. This is of course important from the perspective of the system administrators, but more importantly from the user * Corresponding author. E-mail address : marcin.krotkiewski@usit.uio.no
Please note all content on this page was automatically generated via our AI-based algorithm (CYuLA9Hc6bLvLcJwnrXf). Please let us know if you find any errors.