Given a 10GB file of integers and a RAM that can hold only 4GB values, how would you sort the integers in the file.

Divide the file in 3 parts(3.5 GB+3.5 GB+3 GB,(each part must be less than  4GB )) and sort each chunk using any O(nlogn) sorting algo(as it is less than 4GB we can apply any in-place sorting algo). once each chunk of file sorted, bring 1GB of each chunk in memory, so it’ll occupy 3GB(1+1+1), now sort this 3GB data(by 3-way merge sort).

When executing the merge function select the minimum element add that in remaining 1GB, and while selecting this number, bring the first number from remaining  set from the corresponding chunk to replace it, finally write sorted 1GB to secondary storage.

Example:

let say we divide file in three chunk A,B,C and after sorting individully, content of these parts is like: A{12,18,20,25,33,35,37},B{8,13,14,40,41,45,47},C{2,15,50,70,72,75,78}.
Now suppose in memory, we have {12,18,20,25} {8,13,14,40} {2,15,50,70} respectively from A,B,C. now we’ll select 2 from C’s part as its minimum, so put this in remaining 1GB and replace 2 in memory by an element of chunk C. i.e. insert 72 in C’s part in memory.. next replace 8 by 41 and so on.. we are maintaining a min heap.

source:careercup.com

Leave a comment