There is a sequential file with
Sort the file with pieces called runs (a run - sequence of elements of the file already sorted in a given order): initial runs are merged, creating a fewer number of longer runs . Act that way iteratively until you get one run as long as the entire file. Disk files used in merge sort are called "tapes" due to sequentiality in processing of them. Sorting takes place in phases. Each phase consists of distribution and merging.
The program sorts the file using the natural merging method in the 2+1 scheme. This means that 3 tapes are used - two of them are used for distribution and one for merging. Tapes are realized as disk files. The record is a set of int
numbers in the range int
type has a size of 4 bytes, the maximum (and average) record size (R) is 60 bytes. To ensure that the read or written block contains only full records, it was assumed that
that each record, regardless of its length, takes 60 bytes. The variable record length is still preserved in the logical layer of the program. The maximum block size (B) was assumed to be 480 bytes. The blocking factor b, expressing the average number of records that fit in a block, is 8 (according to the formula
The program includes the following classes:
Tape
– a class representing a tape that is physically a disk file. It contains methods e.g. for adding and getting records.Block
– a class representing a disk block in which records are stored.Record
– a class representing a variable-length record that is a set of numbers.DiskOperationsHandler
– a class containing methods that support operations performed on disk filesFileSorter
– a class containing methods for handling the sorting process (distribution, merging) and the program itself, e.g. displaying menus and statistics after sorting.
Each record is stored in a separate line. Due to the assumption of a constant record size of 60 bytes, records containing less than 15 numbers are padded in the file with None
values.
The following python libraries were used in the project:
tabulate
math
matplotlib
os