Ostinato and Intel® DPDK 10G Data Rates Report on Intel® CPU

5,679 views
7,481 views

Published on

Ostinato performance report for 1G link: comparison between libpcap and DPDK performed on both Intel and AMD hardware platforms.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
5,679
On SlideShare
0
From Embeds
0
Number of Embeds
3,518
Actions
Shares
0
Downloads
62
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Ostinato and Intel® DPDK 10G Data Rates Report on Intel® CPU

  1. 1. plvision.eu OSTINATO AND INTEL® DPDK 10G DATA RATES REPORT ON INTEL CPU plvision.eu
  2. 2. WHAT IS OSTINATO? ‘Transmit’, ‘capture’ and ‘statistics’ functions are implemented using threading model Open-source and cross-platform Shows real-time port receive/transmit rates OSTINATO IS A SOFTWARE TRAFFIC GENERATOR AND ANALYZER BASED ON LIBPCAP. MAIN FEATURES: plvision.eu
  3. 3. WHAT IS INTEL® DPDK? INTEL® DPDK – DATA PLANE DEVELOPMENT KIT Intel® DPDK is a user-space software library for high-speed packet processing - Polling instead of interrupt handling - Lock-free structures to avoid kernel synchronization - Packets and packet descriptors are transferred between HW and user space memory using DMA to avoid kernel to user space copying - Huge pages to avoid TLB misses - SSE for copying data effectively - CPU affinity to minimize frequency of preemption - NUMA awareness to speed up memory access - Transmit/receive packets in a batch mode to use PCI bus effectively - Memory pre-allocation for fixed sized packets. plvision.eu
  4. 4. OSTINATO AND INTEL® DPDK Ostinato source code contains a proof of concept branch that supports Intel® DPDK PLVision’s engineers have have enabled support of Intel® DPDK for Ostinato, with all its functionality preserved* * PLVision is working on making it globally available in Ostinato repository plvision.eu
  5. 5. TESTBED 2 ports are connected with a loopback Send, capture, and gather statistics DUT plvision.eu CPU Intel i5-4440 CPU @ 3.10GHz 4 cores 8 GB RAM NIC X540-T2 from Intel Two 10G ports Attached using PCI-express 2.1 x8 bus HOST Ubuntu 12.04.4 Desktop, kernel 3.11.0
  6. 6. HOW INTEL® DPDK IS USED Main core (0) gather statistics, manage UI. First core (1) TX / RX / Capture Third core (3) idle Second core (2) TX / RX / Capture BINDING THREADS TO CPU CORES 10GE Port 1 10GE Port 2 plvision.eu
  7. 7. DATA RATE СALCULATION: 64 B PACKETS ON 1G LINK Maximum data rate 7.61 Gbit/s Packet body size 64 bytes Packet rate on 10G link 14880950 pkts Layer 1 frame size 84 bytes Packet body 64 bytes Preamble 8 bytes Inter-packet gap 12 bytes 14880950 * (64 * 8) / 1 billion 10 billions / 672 bit 64+8+12 = 84 bytes = 672 bits plvision.eu
  8. 8. TEST 1 SINGLE PORT TX Port 1 TX Port 2 RX Intel®NIC TEST PC plvision.eu
  9. 9. LIBPCAP TX (1 PORT): STATS 0.4 0.86 1.57 3.17 7.6 8.6 9.3 9.6 0 1 2 3 4 5 6 7 8 9 10 64 128 256 512 Datarate,Gbit/s Packet size, KBytes Intel CPU Theoretical highest data rate  Theoretical max throughput is not reached. Possible reason – lack of CPU performance. plvision.eu
  10. 10. INTEL® DPDK TX (1 PORT): STATS 7.61 8.64 9.27 9.627.61 8.64 9.27 9.62 0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00 10.00 64 128 256 512 Datarate,Gbit/s Packet size, KBytes Intel CPU Theoretical highest data rate • Theoretical max throughput is reached. • Intel CPU speed is enough on DPDK, but not so for libpcap. plvision.eu
  11. 11. INTEL® DPDK VS LIBPCAP TX (1 PORT): IMPROVEMENTS Packet size (B) Byte speed improvement (%) 64 x 19 128 x 10 256 x 6 512 x 3 plvision.eu
  12. 12. TEST 1 - CONCLUSION DPDK shows considerable improvement • x19 improvement on 64B packets – due to huge amount (14.8M) of packets • DPDK allows avoiding the burden of Linux stack and copying between Kernel and User space • DPDK works in batch mode (bursts of 32 packets). For large packets, improvement is less substantial • Less packets are needed to fulfill line rate • CPU effort for creating small and big packets is almost the same, but to fulfill the line rate larger amount of small packets is needed, thus more CPU effort is required. plvision.eu
  13. 13. TEST 2 PORT 1 AND 2 - BOTH TX AND RX Port 1 TX/RX Port 2 TX/RX Intel®NIC TEST PC plvision.eu
  14. 14. LIBPCAP TX (2 PORTS): STATS 0.20 0.40 0.70 7.61 8.64 9.27 9.62 0.40 0.70 1.50 0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00 10.00 64 128 256 512 Datarate,Gbit/s Packet size, KBytes Intel CPU (Port1) Theoretical highest data rate Intel CPU (Port2) • Different speeds between port 1 and port 2 on Intel CPU o Linux scheduler is beyond our control, and load balancing between cores might not be efficient • Traffic degradation observed as compared to single port traffic usage by Ostinato o Obviously, RX processing is affecting performance. plvision.eu
  15. 15. DPDK TX (2 PORTS): STATS 4.30 8.60 9.20 9.60 4.30 8.60 9.20 9.60 7.61 8.64 9.27 9.62 0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00 10.00 64 128 256 512 Datarate,Gbit/s Packet size, KBytes Intel CPU (Port2) Intel CPU (Port1) Theoretical highest data rate • Theoretical max throughput is reached for both ports starting from 128B packets, due to each core being responsible for both transmit and receive packets. plvision.eu
  16. 16. DPDK VS LIBPCAP TX (2 PORTS): STATS Packet size (B) speed improvement (%) 64 x21 128 x21 256 x13 512 x6 plvision.eu
  17. 17. TEST 3 PORT 1 TX and Port 2 RX + Capture Port 1 TX Port 2 RX/Capture Intel®NIC TEST PC plvision.eu
  18. 18. LIBPCAP TX (RX + Capture): STATS 0.36 0.75 1.24 7.61 8.64 9.27 9.62 0.75 1.24 2.37 0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00 10.00 64 128 256 512 Datarate,Gbit/s Packet size, KBytes Port1 (TX) Theoretical highest data rate Port2 (RX + Capture) • Different speeds between port 1 and port 2 on Intel CPU o Linux scheduler is beyond our control, and load balancing between cores might not be efficient • Traffic degradation observed as compared to single port traffic usage by Ostinato o Obviously, RX processing is affecting performance. plvision.eu
  19. 19. DPDK TX (RX + Capture): STATS 5.37 7.98 9.21 9.42 5.37 7.98 9.21 9.42 7.61 8.64 9.27 9.62 0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00 10.00 64 128 256 512 Datarate,Gbit/s Packet size, KBytes Port2 (RX + Capture) Port1 (TX) Theoretical highest data rate • Theoretical max throughput is reached on both ports starting only from 256B packet size, since the second core is occupied both with receiving and capturing. plvision.eu
  20. 20. DPDK VS LIBPCAP (RX + Capture): STATS Packet size (B) Speed improvement (%) 64 x14 128 x10 256 x7 512 x4 plvision.eu
  21. 21. ACHIEVEMENTS 10G full rate TX on any packet size 10G full rate TX/RX on packet size starting from 128B 10G full rate RX/Capture on packet size starting from 256B.
  22. 22. CONCLUSIONS Observations • Achieved 4,30Gbps only on 64B packets when both TX and RX are enabled on a port • Achieved 5,37Gbps only on 64B packets with both RX and capture enabled on a port. Why DPDK provides considerable speed improvements • Allows avoiding latency while copying through kernel space • Replaces interrupts with polling • Uses HW optimized features (batch mode, SSE, NUMA). Notes • Real CPU core usage cannot be observed: it is always 100% regardless of traffic rate and packets size (due to peculiarities of DPDK architecture). plvision.eu
  23. 23. REFERENCES DPDK overview from Intel Ostinato website plvision.eu
  24. 24. FIND OUT MORE VISIT OUR BLOG FOR MORE TECH STUFF plvision.eu SEE ALSO: SIMILAR COMPARISON FOR 1G CONFIGURATION.
  25. 25. Email: [email protected] Skype: sales.plvision http://plvision.eu/ THANK YOU FOR WATCHING! plvision.eu

×