Validating An Xmpp Jid With Python?
What is the correct way to validate an xmpp jid? The syntax is described here:, but I don't really understand it. Also, it seems pretty complicated, so using a library to do it wou
Solution 1:
First off, the current best reference for JIDs is RFC 6122.
I was just going to give you the regex in here, but got a little carried away, and implemented all of the spec:
import re
import sys
import socket
import encodings.idna
import stringprep
# These characters aren't allowed in domain names that are used# in XMPP
BAD_DOMAIN_ASCII = "".join([chr(c) for c inrange(0,0x2d) +
[0x2e, 0x2f] +
range(0x3a,0x41) +
range(0x5b,0x61) +
range(0x7b, 0x80)])
# check bi-directional character validitydefbidi(chars):
RandAL = map(stringprep.in_table_d1, chars)
for c in RandAL:
if c:
# There is a RandAL char in the string. Must perform further# tests:# 1) The characters in section 5.8 MUST be prohibited.# This is table C.8, which was already checked# 2) If a string contains any RandALCat character, the string# MUST NOT contain any LCat character.iffilter(stringprep.in_table_d2, chars):
raise UnicodeError("Violation of BIDI requirement 2")
# 3) If a string contains any RandALCat character, a# RandALCat character MUST be the first character of the# string, and a RandALCat character MUST be the last# character of the string.ifnot RandAL[0] ornot RandAL[-1]:
raise UnicodeError("Violation of BIDI requirement 3")
defnodeprep(u):
chars = list(unicode(u))
i = 0while i < len(chars):
c = chars[i]
# map to nothingif stringprep.in_table_b1(c):
del chars[i]
else:
# case fold
chars[i] = stringprep.map_table_b2(c)
i += 1# NFKC
chars = stringprep.unicodedata.normalize("NFKC", "".join(chars))
for c in chars:
if (stringprep.in_table_c11(c) or
stringprep.in_table_c12(c) or
stringprep.in_table_c21(c) or
stringprep.in_table_c22(c) or
stringprep.in_table_c3(c) or
stringprep.in_table_c4(c) or
stringprep.in_table_c5(c) or
stringprep.in_table_c6(c) or
stringprep.in_table_c7(c) or
stringprep.in_table_c8(c) or
stringprep.in_table_c9(c) or
c in"\"&'/:<>@"):
raise UnicodeError("Invalid node character")
bidi(chars)
return chars
defresourceprep(res):
chars = list(unicode(res))
i = 0while i < len(chars):
c = chars[i]
# map to nothingif stringprep.in_table_b1(c):
del chars[i]
else:
i += 1# NFKC
chars = stringprep.unicodedata.normalize("NFKC", "".join(chars))
for c in chars:
if (stringprep.in_table_c12(c) or
stringprep.in_table_c21(c) or
stringprep.in_table_c22(c) or
stringprep.in_table_c3(c) or
stringprep.in_table_c4(c) or
stringprep.in_table_c5(c) or
stringprep.in_table_c6(c) or
stringprep.in_table_c7(c) or
stringprep.in_table_c8(c) or
stringprep.in_table_c9(c)):
raise UnicodeError("Invalid node character")
bidi(chars)
return chars
defparse_jid(jid):
# first pass
m = re.match("^(?:([^\"&'/:<>@]{1,1023})@)?([^/@]{1,1023})(?:/(.{1,1023}))?$", jid)
ifnot m:
returnFalse
(node, domain, resource) = m.groups()
try:
# ipv4 address?
socket.inet_pton(socket.AF_INET, domain)
except socket.error:
# ipv6 address?try:
socket.inet_pton(socket.AF_INET6, domain)
except socket.error:
# domain name
dom = []
for label in domain.split("."):
try:
label = encodings.idna.nameprep(unicode(label))
encodings.idna.ToASCII(label)
except UnicodeError:
returnFalse# UseSTD3ASCIIRules is set, but Python's nameprep doesn't enforce it.# a) Verify the absence of non-LDH ASCII code points; that is, thefor c in label:
if c in BAD_DOMAIN_ASCII:
returnFalse# Verify the absence of leading and trailing hyphen-minusif label[0] == '-'or label[-1] == "-":
returnFalse
dom.append(label)
domain = ".".join(dom)
try:
if node isnotNone:
node = nodeprep(node)
if resource isnotNone:
resource = resourceprep(resource)
except UnicodeError:
returnFalsereturn node, domain, resource
if __name__ == "__main__":
results = parse_jid(sys.argv[1])
ifnot results:
print"FAIL"else:
print results
Yes, this is a lot of work. There's good reasons for all of it, but we're hoping to simplify it in the future somewhat if the précis working group bears fruit.
Post a Comment for "Validating An Xmpp Jid With Python?"