• 0

C# Is my regex wrong?


Question

Well first off, I know it's wrong. I'm trying to translate my regex knowledge from php to C# and it's not quite working the way I want it to.

In php, if I wanted to grab everything from a string within a set of tags I'd just write something to the effect (for the expression) of: "<mytagname>(.*)</mytagname>"

and it would literally grab everything. In C# I write the same thing:

Regex divSpecsReg = new Regex("&lt;div id=\"myTag\"&gt;.*", RegexOptions.IgnoreCase | RegexOptions.Multiline);
                        MatchCollection specMatches = divSpecsReg.Matches(pData);

When I loop through the matches all it grabs is <div id="myTag>" and then it doesn't grab the rest. The document goes over multiple lines.

Link to comment
Share on other sites

7 answers to this question

Recommended Posts

  • 0

Right, but when I do that it still stops after <div id="myTag">\r (there is a whole set of lines after this character) in the code document I pass. I have the multi-line flag set in

the options. The next character would have been a \n. I'm wanting it to match everything inbetween the <div></div> tags that spans over multiple lines and have it be greedy (which to my knowledge regex is by default unless you specify a .*? lazy

instruction).

Link to comment
Share on other sites

  • 0

Well first off, I know it's wrong. I'm trying to translate my regex knowledge from php to C# and it's not quite working the way I want it to.

In php, if I wanted to grab everything from a string within a set of tags I'd just write something to the effect (for the expression) of: "<mytagname>(.*)</mytagname>"

and it would literally grab everything. In C# I write the same thing:

Regex divSpecsReg = new Regex("&lt;div id=\"myTag\"&gt;.*", RegexOptions.IgnoreCase | RegexOptions.Multiline);
                        MatchCollection specMatches = divSpecsReg.Matches(pData);

When I loop through the matches all it grabs is <div id="myTag>" and then it doesn't grab the rest. The document goes over multiple lines.

Change to RegexOptions.Singleline

Link to comment
Share on other sites

  • 0

thanks, but why did that work? I thought multiline mode was supposed to support matching over multiple lines? But it works now :)

I'm not sure. I just happen to do a lot of screenscraping and knew exactly what your problem was because I've been there before. :)

Link to comment
Share on other sites

  • 0

thanks, but why did that work? I thought multiline mode was supposed to support matching over multiple lines? But it works now :)

http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regexoptions%28v=VS.71%29.aspx

This is why. Multiline changes ^ and $ to match beginning and end of entire string. Singleline changes . to match anything including \n.

Link to comment
Share on other sites

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.