||Crossbars are frequently used as the switching fabric for high-performance packet switches (IP routers, ATM switches, Ethernet switches). The performance, functionality, and scalability (in terms of line rate and/or number of ports) of these switches are directly related to the arbitration/scheduling algorithm which must retrieve the state information of input queues, compute a (pseudo-) optimum matching, and configure the crossbar accordingly, all within one packet cycle. As a result, designing schedulers for high-speed and large number of links is a very challenging task. In particular, the scheduler should achieve high-performance under any traffic pattern. In addition, it should lend itself well to a hardware implementation. Hence, an integrated method that carefully takes into consideration the algorithmic design and the circuit design of the scheduler should be adopted. In this thesis, we undertook this challenging task by proposing, evaluating, and designing a crossbar scheduler that is efficient under any traffic pattern, and carefully designed to operate under high line rates and a large number of ports. Specifically, we have proposed a scheduling algorithm that outperform state-of-the-art related algorithms in this area. We then architected its design through a novel pipelining technique that results in tremendous performance improvement without requiring any additional hardware. Finally, we have implemented a 256-port scheduler using sophisticated circuit techniques such as parallelizing most of the logic. Dynamic logic, and pass transistor logic techniques are used so that it can operate at an OC-192 (10 Gb/s) rate while at the same time minimizing the layout area. We have carefully evaluated our scheduling algorithm and its design. Through extensive simulations, we have demonstrated that our algorithm performs a lot better than existing algorithms in terms of average delay and throughput under a variety of traffic scenarios. We have designed and layout one pipeline stage of the scheduler using TSMC 0.25 μ m CMOS technology. From HSPICE simulation, it can be shown that the scheduler pipeline stage can run within a 12ns clock period. The scheduler can run three to four iterations which satisfy the requirement of OC-192 line rate. It is believed that by running four iterations with the novel scheme, the scheduler can achieve the best matching already.