• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Maximizing Your SPARC t4 Oracle Solaris Application Performance
 

Maximizing Your SPARC t4 Oracle Solaris Application Performance

on

  • 1,648 views

In this presentation, learn how Oracle Solaris customers and ISV partners have reached peak performance on Oracle’s new SPARC T4 servers and engineered systems with Oracle Solaris Studio. Learn ...

In this presentation, learn how Oracle Solaris customers and ISV partners have reached peak performance on Oracle’s new SPARC T4 servers and engineered systems with Oracle Solaris Studio. Learn about the latest Oracle Solaris Studio development tools for analyzing, reporting, and improving runtime performance, such as:

• Autoparallelizing, high-performance compilers

• Performance Analyzer (used to find performance hotspots)

• Thread Analyzer (to expose data races and deadlocks)

• Code Analyzer (used to discover latent memory corruption issues)

Explore the ways developers have been taking advantage of the full potential of the SPARC T4 multicore architecture and Oracle Solaris 11.

Statistics

Views

Total Views
1,648
Views on SlideShare
1,648
Embed Views
0

Actions

Likes
1
Downloads
18
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as OpenOffice

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Maximizing Your SPARC t4 Oracle Solaris Application Performance Maximizing Your SPARC t4 Oracle Solaris Application Performance Presentation Transcript

    • Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 131
    • Maximizing Your SPARC T4Oracle Solaris ApplicationPerformance§ Darryl Gove Senior Principal Software Engineer Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 132
    • Program Agenda § Hardware § Correctness § Performance § Parallelism Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 133
    • More Information§ Download, technical articles and more: http://oracle.com/goto/solarisstudioOpenWorld Sessions§ Mon, Oct 1, 10:45 - 11:45 AM: Maximizing Your SPARC T4 Oracle Solaris Application Performance, CON 6382 (Marriott Marquis - Golden Gate)§ Mon, Oct 1, 3:15 - 4:15 PM: Technical Panel: Developing High Performance Applications on Oracle Solaris, CON 7196 (Marriott Marquis - Golden Gate)Hands-on Lab§ Wed, Oct 3, 1:15 - 2:15 PM: Develop C/C++ Applications for the Cloud with Oracle Tuxedo and Oracle Solaris Studio, HOL 10276 (Marriott Marquis - Salon 5/6)JavaOne Sessions§ Mon, Oct 1, 8:30 – 9:30 AM: Mixed-Language Development: Leveraging Native Code from Java, CON 6714 (Hilton San Francisco -Continental Ballroom 6)§ Tues, Oct 2, 1:00 – 2:00 PM: Take Performance Tuning of Your Enterprise Java Applications to the Next Level , CON 10213 (Hilton San Francisco -Continental Ballroom 6)4 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Oracle Solaris Studio Compiler Suite Analysis Suite C, C++ Compilers utilize advanced Performance Analyzer provides code generation technology to unparalleled insight into your app, optimize apps for highest allowing you to identify bottlenecks performance on SPARC & x86 and improve performance by orders of magnitude Fortran Compiler optimizes compute intensive app performance New Code Analyzer ensures app reliability by detecting app vulnerabilities, Debugger ensures app stability with including memory leaks and memory event handling & multi-thread access violations support Thread Analyzer simplifies complex parallel programming errors by© 2011 Oracle Corporation – Proprietary Library maximizes Performance and Confidential detecting hard to pinpoint race and 4 compute-intensive app performance deadlock conditions using advanced numeric solver libraries Integrated Development Environment increases developer efficiency 5 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Oracle Solaris Studio 12.3 Highlights Ø 3x faster code on SPARC T4 than GCC; Accelerate 40% faster than Sun Studio 12 Performance Ø 1.5x faster code on Intel x86 than GCC; 20% faster than Sun Studio 12 Ø New Code Analyzer for more reliable applications; reports common coding & memory access errors faster Gain Extreme than competitive alternatives Observability Ø Enhanced Performance Analyzer with system-wide performance analysis Ø Remote access to Solaris Studio tools from local desktop (Oracle Solaris, Linux, Microsoft Windows, Mac) Improve Ø Streamlined Oracle DB application development Productivity Ø Simplify Oracle Tuxedo development with IDE plug-in Ø IPS distribution on Solaris 11 for simplified management Ø 20% faster compile time6 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • SPARC T4 Hardware Click icon to add picture Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 137
    • SPARC T4 - Overview § Not like T1 – T3 (only shares the T-series name) § Single thread performance § Multithread throughput8 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • SPARC T4 - Details § 1 to 4 chips per system § 8 cores per chip ● Dual issue ● Out-of-order § 8 threads per core § 3.0 GHz clock ● 48B (3.0GHz * 8 * 2) instructions / sec / chip9 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • SPARC T4 - Capacity § Chip capacity: 48 B instructions / sec § For fully active threads: ● Single thread: 6 B instructions / sec ● Each of eight threads: 0.75 B instructions / sec § Threads rarely fully active: ● I/O wait ● Processor stall (fetch from memory = 300-400 cycles)10 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Developing for T4 § Make it correct § Remove obvious performance issues § Make it scale (correctly)11 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Application Correctness Click icon to add picture Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 1312
    • Debug information § Always use -g § No optimisation flags: ● Full debug ● Lower performance § Optimised binaries: ● Best effort debug ● No/minimal performance impact § Debug what you ship!13 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Automatic Error Detection § Static/compile time error detection ● Code Analyzer § Dynamic/runtime memory access error detection ● Discover14 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Code Analyzer § Static analysis for common coding errors ● Uninitialised variables, etc. § Compile with: ● -xanalyze=code § View results with: ● code-analyzer <a.out>15 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Code Analyzer – example output16 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Memory Error Detection - discover § Common memory allocation and use errors: ● Uninitialised memory ● Access past bounds ● Memory leaks § Usage: ● discover <a.out> ● <a.out> ● Default = html output17 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Example of discover$ ./a.outERROR 1 (ABR): reading memory beyond array bounds at address0xffbff278 (8 bytes) on the stack at: average() + 0x228 <disc.c:8> 6: for (int i=1; i<=len; i++) 7: { 8:=> total+=array[i]; 9: } _start() + 0xd8 ... double array[20]; ... printf(" Average = %fn", average(array,20) ); 18 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Application Performance Click icon to add picture Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 1319
    • Optimisation – the Basics § No optimisation flags == no optimisation § Good optimisation: -O § Advanced optimisations: ● Guided by profile of appliaction ● Knowledge of deployment systems20 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Profiling § Profiling with the performance analyzer ● collect <a.out> ● collect -P <pid> ● analyzer test.1.er § Report generation with spot ● spot <a.out> ● spot -P <pid>21 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Performance Analyzer § Demo22 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Performance Analyzer § Demo23 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Aggressive Optimisation § One stop flag: -fast § Enables multiple optimisations ● Build machine = deployment machine ● Floating point simplification and optimisation ● Pointers to different types do not alias ● Function inlining § Investigate performance gain24 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Profile Drives Flag Selection Floating Point § Significant time in floating point computation: ● Floating point simplification ● -fsimple=2 § Significant time in floating point library code: ● Optimised floating point libraries ● -xlibmopt, -xlibmil § Use FP optimisations if performance improves and FP optimisations are acceptable25 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Profile Drives Flag Selection Flat profile § Many hot small functions ● At least -xO4 optimisation level ● -xipo for cross-file optimisations § Conditional code or inlining ● Profile feedback ● -xprofile=collect: ● Training run of application ● -xprofile=use:26 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Profile Drives Flag Selection Pointers § Pointers inhibit compiler optimisations § Compiler needs more information § restrict qualified pointers in C ● Localised action § Flags: ● -xrestrict (restrict qualified pointers passed into functions) ● -xalias_level=std [C] ● -xalias_level=compatible [C++] ● Actions at file level27 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Processor Specific Optimisations § Default: -xtarget=generic often good enough § T4 has useful instructions ● Compare and branch ● Floating point multiply add § One stop flag: -xtarget=T4 § Schedules for T4, uses entire T4 instruction set § Only runs on T4 (or later) processors28 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • SPARC Instruction Sets29 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Multi-threaded Applications Click icon to add picture Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 1330
    • Multi-thread or Multi-process § Multiprocess: ● Isolation ● Independence Throughput ● Large virtual memory footprint ● Potentially high synchronisation costs § Multithread ● Low synchronisation costs Latency ● Minimal memory footprint31 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Multi-threaded Application Development § POSIX threads (C11, C++11) ● Low level: Great control, significant complexity § OpenMP ● High abstraction: Easy to use, flexible § Automatic parallelisation ● Trivial to use: -xautopar -xreduction ● Works best for loop-intensive code (typically FP)32 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • OpenMP Parallel For § Distributes iterations across CPUs #pragma omp parallel for for (int i=0; i<length; i++) { // Do work }33 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • OpenMP Tasks § Distributes work across CPUs for (int i=0; i<length; i++) { #pragma omp task { // Do work for task ‘i’ } }34 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Parallel Program Correctness § Distributes work across CPUs int total=0; #pragma omp parallel for for (int i=0; i<length; i++) { total += i; } § Data race: Multiple threads updating the same variable35 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Thread Analyzer § Instrument application ● Compiler flag: -xinstrument=datarace ● Binary instrumentation: discover -i datarace <a.out> § Gather data: ● collect -r on <a.out> § View data: ● tha tha.1.er36 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Thread Analyzer - Example § Demo37 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Scaling to Many Threads § Minimise serial code ● Amdahl’s Law § Minimise lock contention § Minimise writes of shared data § Evenly distribute work38 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Scaling to Many Threads § Demo39 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Limits of Performance § Threads ● vmstat § Instruction Issue Width ● pgstat / cputrack / cpustat / ripc § Bandwidth ● busstat / bw40 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Conclusion: Optimising for T4 § Step 1: Profile and remove inefficient code § Step 2: Explore benefits of increased optimisation § Step 3: Identify opportunities for parallelisation § Step 4: Profile and tune parallel code § Step 5: Watch for hitting hardware limits41 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 1342
    • Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 1343