||Applications such as Internet-TV and software distribution often require multicast for their delivery. As of today, IP multicast still experiences difficulty in commercial deployment. To address it, researchers have shifted their focus to "application-level multicast" (ALM), where the multicast-related functionalities are moved to end-hosts. ALM builds an overlay topology consisting of end-to-end unicast connections between end-hosts. In this thesis, we first describe the basic mechanism of several representative ALM protocols, namely, Narada, Delaunay Triangulation, NICE and Scribe. Using Internet-like topologies, we then compare their performance in terms of network stress and stretch. Delaunay Triangulation protocol is scalable in terms of group size. However, it suffers from several weaknesses, including inaccurate host location estimation, high network resource consumption, unlimited fanout of a host and single point of failure. To address these weaknesses, we propose to use Global Network Positioning for host location estimation, aggregate long delay link to reduce their usages and limit the fanout of a host, and maintain connectivity by means of control messaging among them. We show that our improved DT, as compared to the original DT protocol, can substantially reduce network stress, stretch and network resource usages while meeting the processing capability of the hosts in the network. We also propose a distributed ALM protocol for applications such as stock quotes, where users may hop from one multicast group to another frequently, even through the total pool of end-hosts in the system may be more static. Our protocol efficiently maintains multiple multicast trees for these dynamic subsets of end-hosts, and hence is called subset-ALM (SALM). SALM first builds a mesh based on the improved Delaunay Triangulation (DT) consisting of all end-hosts for control messaging. The mesh supports continuous addition and deletion of end-hosts in the system, and is used to efficiently guide the construction of dynamic overlay trees. Our constructed trees not only have low maintenance cost, but also provide QoS by delegating paths and limiting end-host fanouts. We study two ways of tree construction depending on whether or not the tree branches are embedded in the mesh edges (embedded versus bypass trees). We show that a bypass tree in general performs better, and SALM achieves low costs in terms of network stress and stretch, even for large multicast groups.