Skip to content

bartoszLesniewski/files-sorting

Repository files navigation

Files sorting

Problem

There is a sequential file with $N$ records with keys $x_1$, $x_2$ , ..., $x_N$ . Operating memory (RAM) can simultaneously hold only $m \ll N$ records. Order the file in such a way that $x_1 \le x_2 \le ... \le x_N$.

General strategy ("tape merge pattern")

Sort the file with pieces called runs (a run - sequence of elements of the file already sorted in a given order): initial runs are merged, creating a fewer number of longer runs . Act that way iteratively until you get one run as long as the entire file. Disk files used in merge sort are called "tapes" due to sequentiality in processing of them. Sorting takes place in phases. Each phase consists of distribution and merging.

Implementation

The program sorts the file using the natural merging method in the 2+1 scheme. This means that 3 tapes are used - two of them are used for distribution and one for merging. Tapes are realized as disk files. The record is a set of int numbers in the range $[-100, 100]$, which may contain from one to a maximum of 15 numbers and is ordered by the sum of the numbers from the set. Assuming that the int type has a size of 4 bytes, the maximum (and average) record size (R) is 60 bytes. To ensure that the read or written block contains only full records, it was assumed that that each record, regardless of its length, takes 60 bytes. The variable record length is still preserved in the logical layer of the program. The maximum block size (B) was assumed to be 480 bytes. The blocking factor b, expressing the average number of records that fit in a block, is 8 (according to the formula $b = \frac{B}{R}$).

The program includes the following classes:

  • Tape – a class representing a tape that is physically a disk file. It contains methods e.g. for adding and getting records.
  • Block – a class representing a disk block in which records are stored.
  • Record – a class representing a variable-length record that is a set of numbers.
  • DiskOperationsHandler – a class containing methods that support operations performed on disk files
  • FileSorter – a class containing methods for handling the sorting process (distribution, merging) and the program itself, e.g. displaying menus and statistics after sorting.

Format of a file with records (tape)

Each record is stored in a separate line. Due to the assumption of a constant record size of 60 bytes, records containing less than 15 numbers are padded in the file with None values.

Used Python libraries

The following python libraries were used in the project:

  • tabulate
  • math
  • matplotlib
  • os

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages