## Data Center Schedule Backup Optimization Formulation

In Figure 1, we show multiple data centers connected to a network. In this scenario, data center $$D{C_s}$$ needs to do a bulk transfer of data to one or more remote data centers, $$D{C_r}$$, within prescribed time limits. In particular there $${Q_s}$$ is bytes of data to be transferred between $$D{C_s}$$ and the set of possible remote data centers $${\bf{D}}{{\bf{C}}_r}(s)$$, and the data transfer must start by time $$Star{t_s}$$ and end before time $$En{d_s}$$.

Let $$rate_t^{sr}$$ denote the rate that $$D{C_s}$$ is transferring data to $$D{C_r}$$ during time period $$t = 0,1,...,T$$. Here we break down the time interval over which scheduling is to occur, e.g., 24 hours, into a fixed number of equally spaced periods of length $$\tau$$ in time. The requirement that all the data must be transferred can be expressed as $\begin{equation}\sum\limits_{r \in {\bf{D}}{{\bf{C}}_{\bf{r}}}(s)} {\tau \sum\limits_t {rate_t^{sr}} } = {Q_s}\quad \forall s \end{equation}$

Suppose that each remote data center imposes a storage cost of $$storageCos{t_r}$$ regardless of when the transfer starts or stops, then a simple linear objective function could be: $\begin{equation}\alpha \sum\limits_r {storageCos{t_r}\tau \sum\limits_{s,t} {rate_t^{sr}} } + \sum\limits_{sr,t} {networkCost(rate_t^{sr},t)} \end{equation}$ Where the parameter $$\alpha \ge 0$$ in equation (1.6) can be used to emphasize or de-emphasize the relation of storage costs to networking costs. A more realistic objective function could include a sum of (non-linear) monotonically decreasing functions of the transfer rates to encourage faster transfers as suggested in .

To fully specify this network/compute optimization problem we need to include the network costs and constraints from one of our two different multi-commodity flow formulations.

The resulting problem formulation had 672 variables and 1251 constraints. Typical number crunching times with different $$\alpha$$ values and different link capacities were around 0.043 seconds with data loading times around 0.15 seconds. In our first demonstration optimization we reduced the link capacities to 25 units and set $$\alpha = 100$$. With these parameters data center costs were emphasized over network costs though with the reduced capacities more "backup time periods" were required to meet the requirements. With the problem info dialog open click: and go to the path table to see the solution paths and the time periods in which they are used. Try sorting first on "time" then "source" in the path table to get a nice ordering of paths.
In our second demonstration optimization we reduced the link capacities to 15 units and set $$\alpha = 1$$ to emphasize network costs over data center costs. click: to see the solution paths under these parameters. Notice how the lower cost (given) paths are used as much as possible. If you sort by "time" in the path table and look at time period 4 you will see DC0 communicating with DC3 over two different paths with the bulk of the traffic over the lower cost path. In addition during the same time period DC1 is talking to both DC4 and DC6 over the lower cost of the input paths.