Analysis of storage from couple of data sets. These datasets represent the cost factors and storage volumes, on which, this analysis is based. The important metric here is Cost Factor, on which the money spent to the outsourcer can be calculated. These costs are relative costs and baselined on 1, which is represented by a service “Storage, DAS, Onsite, Copper”. All costs are represented as an index of this cost.

About the data:

Cost Factors Dataset:

This dataset contains three variables with 22 observations.

  • Storage.Service
  • Unit
  • Cost.Factor.

Units have three different categories:

GB Used, GB Installed and GB Allocated.

Storage Volumes Dataset:

This data set contains six variables with 12926 observations.

  • Report.Site
  • Import.12NC
  • Report.Service
  • Report.Qty
  • Report.Unit
  • Date

Summary of both datasets are provided below. These show characteristic of every variable in the datasets.

Summary of “Cost Factors” dataset

##                      Storage.Service           Unit     Cost.Factor   
##  Storage, Archive, Off Site  : 1     GB Allocated: 1   Min.   :1.000  
##  Storage, DAS                : 1     GB Installed:11   1st Qu.:1.780  
##  Storage, DAS, Onsite, Bronze: 1     GB Used     :10   Median :2.080  
##  Storage, DAS, Onsite, Copper: 1                       Mean   :2.839  
##  Storage, DAS, Onsite, Silver: 1                       3rd Qu.:4.000  
##  Storage, DAS, Remote, Bronze: 1                       Max.   :6.440  
##  (Other)                     :16

Summary of “Storage Volumes” dataset

##   Report.Site            Import.12NC  
##  NL-066 :2127   432-227-851-080:3323  
##  GB-074 :1936   432-227-851-087:2945  
##  NL-004 :1311   432-227-851-075:1652  
##  SG-017 : 967   432-227-851-224:1625  
##  US-549 : 952   432-227-851-076:1099  
##  US-300 : 710   432-227-851-086: 808  
##  (Other):4923   (Other)        :1474  
##                            Report.Service   Report.Qty    
##  Storage, NAS, Remote, Silver     :3323   Min.   : -7156  
##  Storage, Utility, Central, Bronze:2945   1st Qu.:     0  
##  Storage, DAS, Remote, Silver     :1652   Median :    90  
##  Storage, SAN/NAS, Low Cost       :1625   Mean   :  1565  
##  Storage, DAS, Remote, Bronze     :1099   3rd Qu.:   500  
##  Storage, Utility, Central, Gold  : 808   Max.   :743243  
##  (Other)                          :1474                   
##        Report.Unit        Date     
##  GB Allocated:   8   Aug-12 :1758  
##  GB installed:7163   Jul-12 :1731  
##  GB used     :3946   May-12 :1673  
##  GB Used     :1809   Apr-12 :1648  
##                      Mar-12 :1567  
##                      Feb-12 :1535  
##                      (Other):3014

Exploratory analysis of Cost Factors Dataset

There are 22 observations in the Cost Factors Dataset. Most expensive Storage Service is “Storage, Utility, Central, Gold”. Top five Services are shown here with a relative cost with respect to “Storage, DAS, Onsite, Copper” plan who has a index of 1. This means that the most expensive plan is more than six times costly as compared to the baseline plan (Storage, DAS, Onsite, Copper).

Cost factors dataset is further analyzed on based on variable “Unit”. It is evident that “GB Used” units have the highest cost, followed by “Gb Installed” units. “GB Allocated” have only one observation in the dataset, which is “Storage, DAS”.


Boxplots are another useful way of showing the various attributes. With the help of these, we can understand what are the median values “Cost Factors”. In the following plot, the median value (the big horizontal line) of Cost Factor is found to be just above 2. This means that about 50% of the Services have a cost of 2 or less. Third quadrant is 4, which indicates that 75% of plans are below this value. This indicates that a service with a Cost Factor value of more than 4, is highly priced, with respect to the whole dataset.

Second plot shows Boxplots for various units. Here, we can see that mean value for “GB Allocated” and “GB installed” is close to 2.00, whereas median value for “GB Used” plans has increased to more than 4. Thus we see, that services with units of “GB used” cost more to the company, in fact, they appear to be twice more costly as compared to services belonging to other unit types.

Exploratory Analysis of Storage Volumes Dataset

Service usage pattern (most subscribed services)

Following figure represents usage of various Storage Services. This usage pattern is based on number of times a service appears in the dataset. This is evident from the figure that “Storage, NAS, Remote, Silver” is used most number of times and this belongs to the category of “GB installed”.

Note: Top 12 plans are shown here as rest of them have negligible counts.

Data usage pattern by Service (Most used services)

The usage pattern based on the plans was determined. Here, it is clear that “Storage SAN/NAS, Low Cost” is the most used service. This is of Unit type “GB used”.

Comparing with previous graph and checking with the cost factor analysis, we find that top data usage plans are the ones which have comparatively lower cost. Also, a notable observation here is that the unit type GB allocated has only one service, but in terms of data usage, it is next only to the top performing service(Storage SAN/NAS, Low Cost).

There are four services, who have reported total data usage of zero. * Storage, NAS, Remote, Copper
* Storage, NAS, Remote, Silver
* Storage, SAN, Remote, Bronze
* Storage, SAN, Remote, Gold

All of these belong to “GB installed” category.