Most Common Customer Support Questions
Host Troubleshooting Wizard
Our Host Troubleshooting Wizard will assist you to quickly resolve common Driver and ULP technical problems.
Production Rollout Checklist
This Production Rollout Checklist describes the steps needed to verify site installation before rolling to production, please consider all the information below carefully while planning your grid production deployment.
Quick Troubleshooting Guide
Sometimes a little helpful direction can be just what you need to get through a tough installation problem. This Quick Troubleshooting Guides below provide some helpful instructions how to isolate problems.Quick Troubleshooting Guide
Grid Director Quick Help
| RcvSswitchRelay errors is it a problem? | |
| No, This counter increases every multicast ARP and you should ignore it | |
| Which errors do we use to verify the fabric? | |
| Counter Importance Width All links should be running at 4X. Any links reporting 1X are bad and need to be repaired SymbolError Can increase without a significant problem present LinkErrorRecovery Increasing LinkErrorRecovery errors may indicate a bad link LinkDowned Indicates number of times the port has gone down (usually for valid reasons) PortRcvErrors This counter should not be increasing. Increasing number indicates a bad link PortRcvRemotePhysicalErrors This indicates that a problem is occurring ELSEWHERE in the fabric and that this port received a packet that was intentionally corrupted by another switch in the fabric PortRcvSwitchRelayErrors Does not indicate a problem PortXmitDiscards May indicate HOQ or other parameter should be tweaked PortXmitConstraintErrors May indicate that a parameter should be tweaked PortRcvConstraintErrors May indicate that a parameter should be tweaked LocalLinkIntegrityErrors Counter should not be increasing. Increasing number indicates a bad link ExcessiveBufferOverrunErrors May indicate that a parameter should be tweaked Status If the value reads “IB-Timeout” this means that the link is in Initialize mode and failed to negotiate a logical link VL15Dropped This counter increasing in small increments is not seen as a problem. |
|
| How to upgrade ASICs firmware | |
Note! you may also upgrade firmware to a specific boards |
|
| How to Get Portcounter CSV file | |
|
|
| How to upgrade software | |
|
|
|
|
|
| How to Export Logs | |
|
Latest switch code 3.4 export-logs CLI command combines all logs together to a tar file and send it to local ftp. To run export log :
|
|
|
|
|
| How to disable ports | |
|
|
|
|
|
| How to reset all 1x links in one command | |
|
|
|
|
|
| How to Reconfigure routing to return fabric balance | |
| When servicing boards / removing restoring large number of links or even rebooting many nodes it is recommended to run sm-info sm-initiate-fabric-reconfiguration set command to regain fabric routing balance |
GridStack Quick Help
| GridStack 3.x How to Upgrade | |
|
|
| GridStack 4.x - How to install | |
|
|
| GridStack 4.x How to Configer IPoIB interface | |
|
|
| Voltaire MPI - how to run a quick test | |
Examples: mpirun_ssh -np 2 host host ./hello_world Hello world! I'm 0 of 2 on hydra5.support.voltaire.com Cleaning up all processes ... done. mpirun_ssh -np 2 host host ./cpi Process 0 of 2 on hydra5.support.voltaire.com Process 1 of 2 on hydra6.support.voltaire.com pi is approximately 3.1415926544231318, Error is 0.0000000008333387 wall clock time = 0.000386 Cleaning up all processes ... done. |
|