Literate Review of Related Works on NAI (Network-Application Integration)
Recently, my research interest moves to NAI (Network-Application Integration), which we think will become a hot direction of networking systems in the next few years. While it is not a totally new topic, because of emerging application service architectures (e.g., CDN, MEC, Live Video Streaming, Cloud Gaming) and networking architectures (e.g., Programmable Dataplane, SD-WAN, Segment Routing, New IP), previous works may not work very well.
To better understand new challenges on this direction, we make this literate review. For each particular existing work, we try to figure out the following questions:
- What is its design space?
- In its design, which information is required to be fed by applications?
- In its design, which information is required to be exposed by the network?
- What is its potential issues?
- If possible, how easily can it be extended to other use cases?
- Why it cannot work well in some cases?
Related Works
As a starting point, we collect related works published on CoNEXT, SOSP, NSDI, and SIGCOMM from 2007 to 2019.
The current literate review covers papers in the following list. And the list may grow in the future, if we find more highly related works.
@inproceedings{Footprint,
author = {Hongqiang Harry Liu and Raajay Viswanathan and Matt Calder and Aditya Akella and Ratul Mahajan and Jitendra Padhye and Ming Zhang},
editor = {Katerina J. Argyraki and Rebecca Isaacs},
title = {Efficiently Delivering Online Services over Integrated Infrastructure},
booktitle = {13th {USENIX} Symposium on Networked Systems Design and Implementation, {NSDI} 2016, Santa Clara, CA, USA, March 16-18, 2016},
pages = {77--90},
publisher = {{USENIX} Association},
year = {2016},
url = {https://www.usenix.org/conference/nsdi16/technical-sessions/presentation/liu},
timestamp = {Thu, 28 Sep 2017 14:08:46 +0200},
biburl = {https://dblp.org/rec/conf/nsdi/LiuVCAMPZ16.bib},
bibsource = {dblp computer science bibliography, https://dblp.org},
series = {NSDI '16},
}
@inproceedings{E2E,
abstract = {Conventional wisdom states that to improve quality of experience (QoE), web service providers should reduce the median or other percentiles of server-side delays. This work shows that doing so can be inefficient due to user heterogeneity in how the delays impact QoE. From the perspective of QoE, the sensitivity of a request to delays can vary greatly even among identical requests arriving at the service, because they differ in the wide-area network latency experienced prior to arriving at the service. In other words, saving 50ms of server-side delay affects different users differently. This paper presents E2E, the first resource allocation system that embraces user heterogeneity to allocate server-side resources in a QoE-aware manner. Exploiting this heterogeneity faces a unique challenge: unlike other application-level properties of a web request (e.g., a user's subscription type), the QoE sensitivity of a request to server-side delays cannot be pre-determined, as it depends on the delays themselves, which are determined by the resource allocation decisions and the incoming requests. This circular dependence makes the problem computationally difficult. We make three contributions: (1) a case for exploiting user heterogeneity to improve QoE, based on end-to-end traces from Microsoft's cloud-scale production web framework, as well as a user study on Amazon MTurk; (2) a novel resource allocation policy that addresses the circular dependence mentioned above; and (3) an efficient system implementation with almost negligible overhead. We applied E2E to two open-source systems: replica selection in Cassandra and message scheduling in RabbitMQ. Using traces and our testbed deployments, we show that E2E can increase QoE (e.g., duration of user engagement) by 28{\%}, or serve 40{\%} more concurrent requests without any drop in QoE.},
address = {New York, NY, USA},
author = {Zhang, Xu and Sen, Siddhartha and Kurniawan, Daniar and Gunawi, Haryadi and Jiang, Junchen},
booktitle = {SIGCOMM 2019 - Proceedings of the 2019 Conference of the ACM Special Interest Group on Data Communication},
doi = {10.1145/3341302.3342089},
isbn = {9781450359566},
keywords = {NAI,Quality of Experience,Resource Allocation,Web Services},
month = {aug},
pages = {289--302},
publisher = {Association for Computing Machinery, Inc},
title = {{E2E: Embracing User Heterogeneity to Improve Quality of Experience on Theweb}},
url = {https://dl.acm.org/doi/10.1145/3341302.3342089},
year = {2019},
series = {SIGCOMM '19},
}
@inproceedings{Taiji,
abstract = {We present Taiji, a new system for managing user traffic for large-scale Internet services that accomplishes two goals: 1) balancing the utilization of data centers and 2) minimizing network latency of user requests. Taiji models edge-to-datacenter traffic routing as an assignment problem-assigning traffic objects at the edge to the data centers to satisfy service-level objectives. Taiji uses a constraint optimization solver to generate an optimal routing table that specifies the fractions of traffic each edge node will distribute to different data centers. Taiji continuously adjusts the routing table to accommodate the dynamics of user traffic and failure events that reduce capacity. Taiji leverages connections among users to selectively route traffic of highly-connected users to the same data centers based on fractions in the routing table. This routing strategy, which we term connection-aware routing, allows us to reduce query load on our backend storage by 17{\%}. Taiji has been used in production at Facebook for more than four years and routes global traffic in a user-aware manner for several large-scale product services across dozens of edge nodes and data centers.},
address = {New York, NY, USA},
author = {Chou, David and Xu, Tianyin and Veeraraghavan, Kaushik and Newell, Andrew and Lin, Sonia Margulis and Pol, Xiao and Ruiz, Mauri and Meza, Justin and Ha, Kiryong and Padmanabha, Shruti and Cole, Kevin and Perelman, Dmitri},
booktitle = {Proceedings of the 27th ACM Symposium on Operating Systems Principles},
isbn = {9781450368735},
keywords = {NAI},
publisher = {ACM},
title = {{Taiji: Managing Global User Traffic for Large-Scale Internet Services at the Edge}},
url = {https://doi.org/10.1145/3341301.3359655},
year = {2019},
series = {SOSP '19}
}
@inproceedings{VDX,
author = {Mukerjee, Matthew K. and Bozkurt, Ilker Nadi and Ray, Devdeep and Maggs, Bruce M. and Seshan, Srinivasan and Zhang, Hui},
title = {{Redesigning CDN-Broker Interactions for Improved Content Delivery}},
year = {2017},
isbn = {9781450354226},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3143361.3143366},
doi = {10.1145/3143361.3143366},
abstract = {Various trends are reshaping Internet video delivery: exponential growth in video traffic, rising expectations of high video quality of experience (QoE), and the proliferation of varied content delivery network (CDN) deployments (e.g., cloud computing-based, content provider-owned datacenters, and ISP-owned CDNs). More fundamentally though, content providers are shifting delivery from a single CDN to multiple CDNs, through the use of a content broker. Brokers have been shown to invalidate many traditional delivery assumptions (e.g., shifting traffic invalidates short- and long-term traffic prediction) by not communicating their decisions with CDNs. In this work, we analyze these problems using data from a CDN and a broker. We examine the design space of potential solutions, finding that a marketplace design (inspired by advertising exchanges) potentially provides interesting tradeoffs. A marketplace allows all CDNs to profit on video delivery through fine-grained pricing and optimization, where CDNs learn risk-adverse bidding strategies to aid in traffic prediction. We implement a marketplace-based system (which we dub Video Delivery eXchange or VDX) in CDN and broker data-driven simulation, finding significant improvements in cost and data-path distance.},
booktitle = {Proceedings of the 13th International Conference on Emerging Networking EXperiments and Technologies},
pages = {68–80},
numpages = {13},
keywords = {CDNs, content brokers, interfaces, content delivery},
location = {Incheon, Republic of Korea},
series = {CoNEXT '17}
}
@inproceedings{Akamai2018,
author = {Wohlfart, Florian and Chatzis, Nikolaos and Dabanoglu, Caglar and Carle, Georg and Willinger, Walter},
title = {{Leveraging Interconnections for Performance: The Serving Infrastructure of a Large CDN}},
year = {2018},
isbn = {9781450355674},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3230543.3230576},
doi = {10.1145/3230543.3230576},
abstract = {Today's large content providers (CP) are busy building out their service infrastructures or "peering edges" to satisfy the insatiable demand for content created by an ever-expanding Internet edge. One component of these serving infrastructures that features prominently in this build-out is their connectivity fabric; i.e., the set of all Internet interconnections that content has to traverse en route from the CP's various "deployments" or "serving sites" to end users. However, these connectivity fabrics have received little attention in the past and remain largely ill-understood.In this paper, we describe the results of an in-depth study of the connectivity fabric of Akamai. Our study reveals that Akamai's connectivity fabric consists of some 6,100 different "explicit" peerings (i.e., Akamai is one of the two involved peers) and about 28,500 different "implicit" peerings (i.e., Akamai is neither of the two peers). Our work contributes to a better understanding of real-world serving infrastructures by providing an original account of implicit peerings and demonstrating the performance benefits that Akamai can reap from leveraging its rich connectivity fabric for serving its customers' content to end users.},
booktitle = {Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication},
pages = {206–220},
numpages = {15},
keywords = {content providers, content delivery networks, peering},
location = {Budapest, Hungary},
series = {SIGCOMM '18}
}
@inproceedings{Timecard,
author = {Ravindranath, Lenin and Padhye, Jitendra and Mahajan, Ratul and Balakrishnan, Hari},
title = {{Timecard: Controlling User-Perceived Delays in Server-Based Mobile Applications}},
year = {2013},
isbn = {9781450323888},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/2517349.2522717},
doi = {10.1145/2517349.2522717},
abstract = {Providing consistent response times to users of mobile applications is challenging because there are several variable delays between the start of a user's request and the completion of the response. These delays include location lookup, sensor data acquisition, radio wake-up, network transmissions, and processing on both the client and server. To allow applications to achieve consistent response times in the face of these variable delays, this paper presents the design, implementation, and evaluation of the Timecard system. Timecard provides two abstractions: the first returns the time elapsed since the user started the request, and the second returns an estimate of the time it would take to transmit the response from the server to the client and process the response at the client. With these abstractions, the server can adapt its processing time to control the end-to-end delay for the request. Implementing these abstractions requires Timecard to track delays across multiple asynchronous activities, handle time skew between client and server, and estimate network transfer times. Experiments with Timecard incorporated into two mobile applications show that the end-to-end delay is within 50 ms of the target delay of 1200 ms over 90% of the time.},
booktitle = {Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles},
pages = {85–100},
numpages = {16},
location = {Farminton, Pennsylvania},
series = {SOSP '13}
}
@inproceedings{PVM,
author = {Naous, Jad and Walfish, Michael and Nicolosi, Antonio and Mazi\`{e}res, David and Miller, Michael and Seehra, Arun},
title = {{Verifying and Enforcing Network Paths with Icing}},
year = {2011},
isbn = {9781450310413},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/2079296.2079326},
doi = {10.1145/2079296.2079326},
abstract = {We describe a new networking primitive, called a Path Verification Mechanism (pvm). There has been much recent work about how senders and receivers express policies about the paths that their packets take. For instance, a company might want fine-grained control over which providers carry which traffic between its branch offices, or a receiver may want traffic sent to it to travel through an intrusion detection service.While the ability to express policies has been well-studied, the ability to enforce policies has not. The core challenge is: if we assume an adversarial, decentralized, and high-speed environment, then when a packet arrives at a node, how can the node be sure that the packet followed an approved path? Our solution, icing, incorporates an optimized cryptographic construction that is compact, and requires negligible configuration state and no PKI. We demonstrate icing's plausibility with a NetFPGA hardware implementation. At 93% more costly than an IP router on the same platform, its cost is significant but affordable. Indeed, our evaluation suggests that icing can scale to backbone speeds.},
booktitle = {Proceedings of the Seventh COnference on Emerging Networking EXperiments and Technologies},
articleno = {30},
numpages = {12},
keywords = {path enforcement, NetFPGA, default-off, consent},
location = {Tokyo, Japan},
series = {CoNEXT '11}
}
@inproceedings{CIPT,
author = {Stanojevic, Rade and Castro, Ignacio and Gorinsky, Sergey},
title = {{CIPT: Using Tuangou to Reduce IP Transit Costs}},
year = {2011},
isbn = {9781450310413},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/2079296.2079313},
doi = {10.1145/2079296.2079313},
abstract = {A majority of ISPs (Internet Service Providers) support connectivity to the entire Internet by transiting their traffic via other providers. Although the transit prices per Mbps decline steadily, the overall transit costs of these ISPs remain high or even increase, due to the traffic growth. The discontent of the ISPs with the high transit costs has yielded notable innovations such as peering, content distribution networks, multicast, and peer-to-peer localization. While the above solutions tackle the problem by reducing the transit traffic, this paper explores a novel approach that reduces the transit costs without altering the traffic. In the proposed CIPT (Cooperative IP Transit), multiple ISPs cooperate to jointly purchase IP (Internet Protocol) transit in bulk. The aggregate transit costs decrease due to the economies-of-scale effect of typical subadditive pricing as well as burstable billing: not all ISPs transit their peak traffic during the same period. To distribute the aggregate savings among the CIPT partners, we propose Shapley-value sharing of the CIPT transit costs. Using public data about IP traffic of 264 ISPs and transit prices, we quantitatively evaluate CIPT and show that significant savings can be achieved, both in relative and absolute terms. We also discuss the organizational embodiment, relationship with transit providers, traffic confidentiality, and other aspects of CIPT.},
booktitle = {Proceedings of the Seventh COnference on Emerging Networking EXperiments and Technologies},
articleno = {17},
numpages = {12},
keywords = {group buying, cost sharing, burstable billing, Shapley value, network economics},
location = {Tokyo, Japan},
series = {CoNEXT '11}
}
@inproceedings{Wiser,
author = {Mahajan, Ratul and Wetherall, David and Anderson, Thomas},
title = {{Mutually Controlled Routing with Independent ISPs}},
year = {2007},
publisher = {USENIX Association},
address = {USA},
url = {https://dl.acm.org/doi/abs/10.5555/1973430.1973456},
abstract = {We present Wiser, an Internet routing protocol that enables ISPs to jointly control routing in a way that produces efficient end-to-end paths even when they act in their own interests. Wiser is a simple extension of BGP, uses only existing peering contracts for monetary exchange, and can be incrementally deployed. Each ISP selects paths in a way that presents a compromise between its own considerations and those of other ISPs. Done over many routes, this allows each ISP to improve its situation by its own optimization criteria compared to the use of BGP today. We evaluate Wiser using a routerlevel prototype and simulation on measured ISP topologies. We find that, unlike Internet routing today, Wiser consistently finds routes that are close in efficiency to that of global optimization for metrics such as path length. We further show that the overhead of Wiser is similar to that of BGP in terms of routing messages and computation.},
booktitle = {Proceedings of the 4th USENIX Conference on Networked Systems Design & Implementation},
pages = {26},
numpages = {1},
location = {Cambridge, MA},
series = {NSDI'07}
}
Review Cards
Wiser
High-level Goal: design a practical interdomain routing protocol that enables ISPs to jointly control routing and compute good end-to-end paths while acting in their own interest.
Symbol | Description |
---|---|
An ISP | |
All possible end-to-end paths | |
Expected external cost for ISP |
|
The rate of traffic carried along path |
|
The internal cost of path |
Basic Idea: Each ISP estimate its own external cost and advertise to its neighbors the total cost to reach the destination. And each router selects the lowest cost path.
Potential Extension to NAI: For ISPs serving their own CDN, those ISPs can compute their own internal cost based on their CDN loads.
Limitation: Lowest-cost routing per destination may lead to congestion.