[Tutor] regex question

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[Tutor] regex question

Khalid Al-Ghamdi
hi all,

I'm trying to extract the domain in the following string. Why doesn't my pattern (patt) work:

>>> redata
'Tue Jan 14 00:43:21 2020::[hidden email]::1578951801-6-10 Sat Jul 31 15:17:39 1993::[hidden email]::744121059-5-6 Mon Sep 21 20:22:37 1987::[hidden email]::559243357-6-7 Fri Aug  2 07:15:23 1991::[hidden email]::681106523-4-9 Mon Mar 18 19:59:47 2024::[hidden email]::1710781187-6-7 '
>>> patt=r'\w+\.\w{3}(?<=@)'
>>> re.findall(patt,redata)
[]

This pattern works but the first should, too. shouldn't it?

>>> patt=r'\w+\.\w{3}'

_______________________________________________
Tutor maillist  -  [hidden email]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor
Reply | Threaded
Open this post in threaded view
|

Re: [Tutor] regex question

Peter Otten
Khalid Al-Ghamdi wrote:

> I'm trying to extract the domain in the following string. Why doesn't my
> pattern (patt) work:
>
>>>> redata
> 'Tue Jan 14 00:43:21 2020::[hidden email]::1578951801-6-10 Sat Jul
> 31 15:17:39 1993::[hidden email]::744121059-5-6 Mon Sep 21 20:22:37
> 1987::[hidden email]::559243357-6-7 Fri Aug  2 07:15:23
> 1991::[hidden email]::681106523-4-9 Mon Mar 18 19:59:47
> 2024::[hidden email]::1710781187-6-7 '
>>>> patt=r'\w+\.\w{3}(?<=@)'
>>>> re.findall(patt,redata)
> []
>
> This pattern works but the first should, too. shouldn't it?

No. I think you want r'(?<=@)\w+\.\w{3}'.

How do you handle a domain like web.de, by the way?


_______________________________________________
Tutor maillist  -  [hidden email]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor
Reply | Threaded
Open this post in threaded view
|

Re: [Tutor] regex question

Wayne Werner-2
In reply to this post by Khalid Al-Ghamdi
On Fri, 6 Apr 2012, Khalid Al-Ghamdi wrote:

> hi all,
> I'm trying to extract the domain in the following string. Why doesn't my pattern (patt) work:
>
> >>> redata
> 'Tue Jan 14 00:43:21 2020::[hidden email]::1578951801-6-10 Sat Jul 31 15:17:39 1993::[hidden email]::744121059-5-6 Mon Sep 21 20:22:37
> 1987::[hidden email]::559243357-6-7 Fri Aug  2 07:15:23 1991::[hidden email]::681106523-4-9 Mon Mar 18 19:59:47 2024::[hidden email]::1710781187-6-7 '
> >>> patt=r'\w+\.\w{3}(?<=@)'
> >>> re.findall(patt,redata)
> []
>
> This pattern works but the first should, too. shouldn't it?
The all too familiar quote looks like it applies here: "Often programmers,
when faced with a problem, think 'Aha! I'll use a regex!'. Now you have
two problems."

It looks like you could easily split this string with redata.split('::')
and then look at every second element in the list and split *that* element
on the last '.' in the string.

With data as well-formed as this, regex is probably overkill.

HTH,
Wayne

_______________________________________________
Tutor maillist  -  [hidden email]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor
Reply | Threaded
Open this post in threaded view
|

Re: [Tutor] regex question

kaifeng jin
I think you can do this:
a=[]
b=redata.split('::')
for e in b:
    if e.find('@') != -1:
        a.append(e.split('@')[1])

list a includes all the domain

在 2012年4月9日 上午5:26,Wayne Werner <[hidden email]>写道:
On Fri, 6 Apr 2012, Khalid Al-Ghamdi wrote:

hi all,
I'm trying to extract the domain in the following string. Why doesn't my pattern (patt) work:

>>> redata
'Tue Jan 14 00:43:21 [hidden email]::1578951801-6-10 Sat Jul 31 15:17:39 [hidden email]::744121059-5-6 Mon Sep 21 20:22:37
[hidden email]::559243357-6-7 Fri Aug  2 07:15:23 [hidden email]::681106523-4-9 Mon Mar 18 19:59:47 [hidden email]::1710781187-6-7 '
>>> patt=r'\w+\.\w{3}(?<=@)'
>>> re.findall(patt,redata)
[]

This pattern works but the first should, too. shouldn't it?

The all too familiar quote looks like it applies here: "Often programmers, when faced with a problem, think 'Aha! I'll use a regex!'. Now you have two problems."

It looks like you could easily split this string with redata.split('::') and then look at every second element in the list and split *that* element on the last '.' in the string.

With data as well-formed as this, regex is probably overkill.

HTH,
Wayne

_______________________________________________
Tutor maillist  -  [hidden email]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor




--
twitter:@zybest
新浪微博:@爱子悦
在openshift上搭建wordpress:http://blog-mking.rhcloud.com/ 


_______________________________________________
Tutor maillist  -  [hidden email]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor