Network & Security VMUG Community

Expand all | Collapse all

Cross-Site NSX Troubleshooting

Jump to Best Answer
  • 1.  Cross-Site NSX Troubleshooting

    Posted 01-28-2020 12:42 PM

    Hey guys,

    Hoping someone could provide some assistance! 

    Here is my (general) setup:
    Two sites (Site A, Site B)
    Each site has vCenter
    NSX Primary Manager at Site A, NSX Secondary Manager at Site B
    Lowest MTU available is 4000, some devices configured at 9000 but most at 4000
    No local egress, layer 2 circuit joining both sites
    All firewalls (DFW, UDLR, ESG) set to allow any/any/all
    ESG resides at Site A -- OSPF connectivity down to UDLR and up to Fortigate physical edge
    Universal switches, UDLR spanning across both sites
    RDS servers at both sites living inside the NSX environment

    Here is my issue:
    On RDS A, I can access resources without issues

    On RDS B, I cannot access any services. While telnet works for connections (port 80, 443, etc), HTTPS requests time out, connects but cannot issue commands.

    When I perform a wireshark to monitor, here is a sample of what I get:

    Post image

    Sure looks like a lot of errors. I'm kinda new to this so please forgive me if you think my issue lies outside of networking, but any input is welcome!



    ------------------------------
    Michael Tomlinson
    System Administrator
    Ottawa ON
    ------------------------------


  • 2.  RE: Cross-Site NSX Troubleshooting
    Best Answer

    Posted 01-29-2020 12:37 PM
    Michael,

    You're not necessarily seeing any errors - that's just data that Wireshark doesn't know how to interpret (TCP Previous segment not captured) or oddball behavior that doesn't indicate to this problem (TCP Dup ACK). Both of those together typically mean that there was a pre-existing TCP flow before you started the capture.

    Let's start with what you ARE seeing:
     - RDS Server (assuming 10.151.1.5) sends a SYN packet to the destination - indicating that DFW is letting the traffic OUT
     - Destination (10.150.0.15) responds back with a SYNACK, taking about 1ms. This says these two resources are close. Is this the same site, or another one?
      - It also says that DFW allowed two-way traffic
      - It also says that you're able to reach the destination asset

    It will sound a bit rote, but this looks an awful lot like one host's VTEP may not have jumbo MTU enabled somewhere. What I'd recommend next is to try a ping with a full-sized payload (by default, 64-bytes is all ping uses to be efficient), and to set the DO NOT FRAGMENT bit:
     
    ping -f -l 1472 {destination server address}


    This will help you "sound out" the inside MTU - 1472 means you can send full-size packets, 1422 or lower as the highest MTU size would indicate that a downlink port probably has the MTU mismatched. On the physical network side, all downlink ports have to be explicitly configured for jumbo MTU in many hardware platforms - and all Layer 3 interfaces as well.

    Another quick way to test it would be to vMotion RDS B to the same host as RDS A.



    ------------------------------
    Nicholas Schmidt
    Network Administrator
    General Communication, Inc.
    Anchorage AK
    ------------------------------



  • 3.  RE: Cross-Site NSX Troubleshooting

    Posted 01-29-2020 01:22 PM
    Michael,
    I think your issue may be with TLS.  The client at 10.151.1.5 is trying to connection to the sever at 10.150.0.15 with TLS version 1.   Is the server at 10.150.0.15 configured to accept TLS version 1.0?  I don't see a reply from TLS 1 in the Wireshark trace.

    Here's a link on how to check the settings:

    https://docs.microsoft.com/en-us/windows-server/security/tls/tls-registry-settings

    From a security standpoint I would recommend using only TLS v1.2.  Web browsers are ending support for TLS 1.0 and TLS 1.1 in March 2020.
    https://www.godaddy.com/garage/browser-support-tls-10-11/



    ------------------------------
    Tony Centofanti
    Sr. Infrastructure Engineer
    Columbus Regional Airport Authority
    Columbus OH
    ------------------------------



  • 4.  RE: Cross-Site NSX Troubleshooting

    Posted 01-29-2020 02:11 PM
    Hi Tony!
    Thanks for the response. The issue was an upstream QOS policy.

    ------------------------------
    Michael Tomlinson
    System Administrator
    Bell Canada
    Ottawa ON
    ------------------------------



  • 5.  RE: Cross-Site NSX Troubleshooting

    Posted 01-29-2020 01:38 PM
    Michael,

    I'm not sure where your prior response ended up on this site, so...

    This is good to know! One thing I would like to point out, since you're using NSX-V, that VMWare is generally going to move away from that product as they shut it down.
    It's good until 2022 now (https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/support/product-lifecycle-matrix.pdf) but I wouldn't expect any new features. There are two concerns I see that your organization may want to address:

    • There's no true UDLR/Multi-site capabilities with NSX-T: https://communities.vmware.com/docs/DOC-39405
      • From the looks of it, you have only one point of egress, so it may be a non-issue for you, but having/using that capability and subsequently losing it may be...unpleasant... I'd recommend getting with your VMWare team and looking at upgrade planning soon, if you haven't already.
    • With NSX-T, the Edge appliances have their own VTEPs: This means that your other QoS policy *should* have 1600 MTU at some point before upgrading.

    HTH,

    ------------------------------
    Nicholas Schmidt
    Network Administrator
    AK
    ------------------------------



  • 6.  RE: Cross-Site NSX Troubleshooting

    Posted 01-29-2020 02:13 PM

    Wow, thanks for the information. I did not realize NSX-V was being discontinued; we're just now demo'ing this concept in our lab as part of an evergreen effort for our current production environment.

    Unfortunately I'm using VMUG licensing for the service. Hopefully I can get some trial licenses for NSX-T out of my VMWare reps :)


    Thanks again



    ------------------------------
    Michael Tomlinson
    System Administrator
    Bell Canada
    Ottawa ON
    ------------------------------



  • 7.  RE: Cross-Site NSX Troubleshooting

    Posted 01-29-2020 02:20 PM
    NSX-T and NSX-V consume the same license, all you need to do is ask your SE for the download link. I'm sure they'll be happy to send it your way!

    Let us know how you do! It's going to be very different at a fundamental level - does your group have much BGP experience?

    ------------------------------
    Nicholas Schmidt
    Engineer
    AK
    ------------------------------



  • 8.  RE: Cross-Site NSX Troubleshooting

    Posted 01-29-2020 02:27 PM
    BGP knowledge is good enough. I've configured both eBGP and iBGP w/ route reflectors in previous roles.

    Does NSX-T rely on BGP as an underlay, as opposed to OSPF?

    ------------------------------
    Michael Tomlinson
    System Administrator
    Bell Canada
    Ottawa ON
    ------------------------------



  • 9.  RE: Cross-Site NSX Troubleshooting

    Posted 01-29-2020 02:34 PM
    NSX-T only supports BGP as a routing protocol and has the full suite of BGP capabilities. It's pretty impressive.

    I'd recommend just using eBGP with a reserved ASN for every transport zone - it just makes it easier/really easy. Later on you can explore topics like AS-Path preservation and taking over all internal routing with it like the Borg...

    The other thing NSX-T can do well that will help with such deployments is automatic route aggregation - no TC changes when you add new VN-Segments.

    Also, make sure that you have 4 NIC ports available for use - NSX-T requires its own vDS.

    ------------------------------
    Nicholas Schmidt
    Engineer
    AK
    ------------------------------



  • 10.  RE: Cross-Site NSX Troubleshooting

    Posted 01-29-2020 02:02 PM
    Reads like a firewall issue on the physical FW for one of the sites.

    ------------------------------
    Paul Mancuso,
    Technologist Director
    VMware
    Palo Alto CA
    ------------------------------



  • 11.  RE: Cross-Site NSX Troubleshooting

    Posted 01-29-2020 02:11 PM
    Hi Paul!
    Thanks for the response. The issue was an upstream QOS policy.

    ------------------------------
    Michael Tomlinson
    System Administrator
    Bell Canada
    Ottawa ON
    ------------------------------



  • 12.  RE: Cross-Site NSX Troubleshooting

    Posted 01-29-2020 02:53 PM
    Edited by Paul Mancuso 01-29-2020 02:53 PM
    I was responding to the original message and missed the threaded discussion. I am glad it is resolved.

    ------------------------------
    Paul Mancuso,
    Technologist Director
    VMware
    Palo Alto CA
    ------------------------------



  • 13.  RE: Cross-Site NSX Troubleshooting

    Posted 01-29-2020 08:11 PM

    Is it possible your Foritgate Firewall is blocking TLS v1.   I have seen this in my environment.

    You may need to disable TLS v1 on your server and opt to use TLS v1.2  Check the TLS version in the handshake on the one that is working and see what version you are using.   



    ------------------------------
    BMartin

    ------------------------------