Validating An Xmpp Jid With Python?

February 10, 2024 Post a Comment

What is the correct way to validate an xmpp jid? The syntax is described here:, but I don't really understand it. Also, it seems pretty complicated, so using a library to do it wou

Solution 1:

First off, the current best reference for JIDs is RFC 6122.

I was just going to give you the regex in here, but got a little carried away, and implemented all of the spec:

import re
import sys
import socket
import encodings.idna
import stringprep

# These characters aren't allowed in domain names that are used# in XMPP
BAD_DOMAIN_ASCII = "".join([chr(c) for c inrange(0,0x2d) +
                    [0x2e, 0x2f] +
                    range(0x3a,0x41) +
                    range(0x5b,0x61) +
                    range(0x7b, 0x80)])

# check bi-directional character validitydefbidi(chars):
    RandAL = map(stringprep.in_table_d1, chars)
    for c in RandAL:
        if c:
            # There is a RandAL char in the string. Must perform further# tests:# 1) The characters in section 5.8 MUST be prohibited.# This is table C.8, which was already checked# 2) If a string contains any RandALCat character, the string# MUST NOT contain any LCat character.iffilter(stringprep.in_table_d2, chars):
                raise UnicodeError("Violation of BIDI requirement 2")

            # 3) If a string contains any RandALCat character, a# RandALCat character MUST be the first character of the# string, and a RandALCat character MUST be the last# character of the string.ifnot RandAL[0] ornot RandAL[-1]:
                raise UnicodeError("Violation of BIDI requirement 3")

defnodeprep(u):
    chars = list(unicode(u))
    i = 0while i < len(chars):
        c = chars[i]
        # map to nothingif stringprep.in_table_b1(c):
            del chars[i]
        else:
            # case fold
            chars[i] = stringprep.map_table_b2(c)
            i += 1# NFKC
    chars = stringprep.unicodedata.normalize("NFKC", "".join(chars))
    for c in chars:
        if (stringprep.in_table_c11(c) or
            stringprep.in_table_c12(c) or
            stringprep.in_table_c21(c) or
            stringprep.in_table_c22(c) or
            stringprep.in_table_c3(c) or
            stringprep.in_table_c4(c) or
            stringprep.in_table_c5(c) or
            stringprep.in_table_c6(c) or
            stringprep.in_table_c7(c) or
            stringprep.in_table_c8(c) or
            stringprep.in_table_c9(c) or
            c in"\"&'/:<>@"):
            raise UnicodeError("Invalid node character")

    bidi(chars)

    return chars

defresourceprep(res):
    chars = list(unicode(res))
    i = 0while i < len(chars):
        c = chars[i]
        # map to nothingif stringprep.in_table_b1(c):
            del chars[i]
        else:
            i += 1# NFKC
    chars = stringprep.unicodedata.normalize("NFKC", "".join(chars))
    for c in chars:
        if (stringprep.in_table_c12(c) or
            stringprep.in_table_c21(c) or
            stringprep.in_table_c22(c) or
            stringprep.in_table_c3(c) or
            stringprep.in_table_c4(c) or
            stringprep.in_table_c5(c) or
            stringprep.in_table_c6(c) or
            stringprep.in_table_c7(c) or
            stringprep.in_table_c8(c) or
            stringprep.in_table_c9(c)):
            raise UnicodeError("Invalid node character")

    bidi(chars)
    
    return chars

defparse_jid(jid):
    # first pass
    m = re.match("^(?:([^\"&'/:<>@]{1,1023})@)?([^/@]{1,1023})(?:/(.{1,1023}))?$", jid)
    ifnot m:
        returnFalse
    
    (node, domain, resource) = m.groups()
    try:
        # ipv4 address?
        socket.inet_pton(socket.AF_INET, domain)
    except socket.error:
        # ipv6 address?try:
            socket.inet_pton(socket.AF_INET6, domain)
        except socket.error:
            # domain name
            dom = []
            for label in domain.split("."):
                try:
                    label = encodings.idna.nameprep(unicode(label))
                    encodings.idna.ToASCII(label)
                except UnicodeError:
                    returnFalse# UseSTD3ASCIIRules is set, but Python's nameprep doesn't enforce it.# a) Verify the absence of non-LDH ASCII code points; that is, thefor c in label:
                    if c in BAD_DOMAIN_ASCII:
                        returnFalse# Verify the absence of leading and trailing hyphen-minusif label[0] == '-'or label[-1] == "-":
                    returnFalse
                dom.append(label)
            domain = ".".join(dom)
    try:
        if node isnotNone:
            node = nodeprep(node)
        if resource isnotNone:
            resource = resourceprep(resource)
    except UnicodeError:
        returnFalsereturn node, domain, resource

if __name__ == "__main__":
    results = parse_jid(sys.argv[1])
    ifnot results:
        print"FAIL"else:   
        print results

Yes, this is a lot of work. There's good reasons for all of it, but we're hoping to simplify it in the future somewhat if the précis working group bears fruit.

Baca Juga

Python Dictionary

Validating An Xmpp Jid With Python?

Solution 1:

Post a Comment for "Validating An Xmpp Jid With Python?"