Envisioned Solution




We have been tasked with designing a product that preemptively categorizes data uploaded to Spectrum Protect into its correct storage tier to avoid unnecessary overhead costs. We have created an end-to-end application that ingests file metadata and use a machine learning framework to create a model which then can be used to classify data into its appropriate storage tier. Our solution takes in metadata of users stored data, reads extracted features from metadata (potentially petabytes at a time), and uses machine learning to create a model that cann accurately assign data a correct storage tier. Our application uses metadata extracted from users’ data which is collected from IBM Spectrum Protect server. We can use the metadata as our training set in a machine learning framework to create and train a module that is capable of classifying incoming data into its appropriate storage tier. Our application will finally address our sponsor’s problem by automatically and preemptively categorizing the data into its correct storage tier using machine learning module instead of manually configuring policies to demote data from hot to cold storage, which avoids the unnecessarily overhead costs and saves work hours of backup administrators for IBM. As it stands we have created a product that has satisfied all our client's requirements and hopefully it can be integrated into the next generation of Spectrum Protect products that serve for all of their users.