Regular Expression

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Regular Expression

Palpandi
Hi All,

This is the case. To split "string2" from "string1_string2" I am using
re.split('_', "string1_string2", 1)[1].

It is working fine for string "string1_string2" and output as "string2". But actually the problem is that if a sting is "__string1_string2" and the output is "_string1_string2". It is wrong.

How to fix this issue?

Reply | Threaded
Open this post in threaded view
|

Regular Expression

Larry Martell
On Thu, Jun 4, 2015 at 9:36 AM, Palpandi <palpandi111 at gmail.com> wrote:
>
> Hi All,
>
> This is the case. To split "string2" from "string1_string2" I am using
> re.split('_', "string1_string2", 1)[1].
>
> It is working fine for string "string1_string2" and output as "string2". But actually the problem is that if a sting is "__string1_string2" and the output is "_string1_string2". It is wrong.
>
> How to fix this issue?

"__string1_string2".split('_')[-1]

Reply | Threaded
Open this post in threaded view
|

Regular Expression

TIm Chase-3
In reply to this post by Palpandi
On 2015-06-04 06:36, Palpandi wrote:
> This is the case. To split "string2" from "string1_string2" I am
> using re.split('_', "string1_string2", 1)[1].
>
> It is working fine for string "string1_string2" and output as
> "string2". But actually the problem is that if a sting is
> "__string1_string2" and the output is "_string1_string2". It is
> wrong.

Why use regular expressions to split a string on a constant?

Try

  for input in [
      "string1_string2",
      "__string1_string2",
      ]:
    value = input.rsplit('_', 1)[-1]
    assert value == "string2"

-tkc




Reply | Threaded
Open this post in threaded view
|

Regular Expression

Steven D'Aprano-8
In reply to this post by Palpandi
On Thu, 4 Jun 2015 11:36 pm, Palpandi wrote:

> Hi All,
>
> This is the case. To split "string2" from "string1_string2" I am using
> re.split('_', "string1_string2", 1)[1].


There is absolutely no need to use the nuclear-powered bulldozer of regular
expressions to crack that tiny peanut. Strings have a perfectly useful
split method:

py> "string1_string2".split("_")
['string1', 'string2']


> It is working fine for string "string1_string2" and output as "string2".
> But actually the problem is that if a sting is "__string1_string2" and the
> output is "_string1_string2". It is wrong.

No, the output is correct. You tell Python to split on the *first*
underscore only, which is exactly what Python does:

py> re.split('_', "__string1_string2", 1)
['', '_string1_string2']


> How to fix this issue?


Again, this is a small problem, and regular expressions are not needed. Just
strip the underscores off the left, then split:


py> s = "__string1_string2"
py> s.lstrip("_").split("_")
['string1', 'string2']





--
Steven


Reply | Threaded
Open this post in threaded view
|

Regular Expression

Peter Otten
In reply to this post by Palpandi
Palpandi wrote:

> This is the case. To split "string2" from "string1_string2" I am using
> re.split('_', "string1_string2", 1)[1].
>
> It is working fine for string "string1_string2" and output as "string2".
> But actually the problem is that if a sting is "__string1_string2" and the
> output is "_string1_string2". It is wrong.
>
> How to fix this issue?

Use str.rpartion():

>>> "one_two__three".rpartition("_")[-1]
'three'



Reply | Threaded
Open this post in threaded view
|

Regular Expression

Laura Creighton-2
In reply to this post by Palpandi
In a message of Thu, 04 Jun 2015 06:36:29 -0700, Palpandi writes:
>Hi All,
>
>This is the case. To split "string2" from "string1_string2" I am using
>re.split('_', "string1_string2", 1)

And you shouldn't be.  The 3rd argument, 1 says stop after one match.

>It is working fine for string "string1_string2" and output as "string2". But actually the problem is that if a sting is "__string1_string2" and the output is "_string1_string2". It is wrong.
>
>How to fix this issue?

Depends on what you want.

Approach #1 - just use the string method, forget re, because you do not
need it.

>>>> "__string1_string2".split("_")
['', '', 'string1', 'string2']
>>>> "_string1_string2__".split("_")
['', 'string1', 'string2', '', '']

Approach #2 -- use re but with a fixed string (probably a bad idea,
you should be using approach 1 instead if you have a fixed string)

>>>> re.split('_', "__string1_string2")
['', '', 'string1', 'string2']
>>>> re.split('_', "__string1_string2__")
['', '', 'string1', 'string2', '', '']

Approach #3 - there is a real pattern here I want to use, the example
I posted to the list is a lot simpler than what I really want to do.
Ok, in this case we will match 'any number of underscores' for an
example.

>>>> p = re.compile('_*')
>>>> p.split("__string1_string2")
['', 'string1', 'string2']
>>>> p.split("__string1__string2__")
['', 'string1', 'string2', '']

Laura