Namespace:

  • Regex = using System.Text.RegularExpressions
  • WebClient = using System.Net

Here’s a very simple example using Regex in Web Scraping (Note: I didn’t include validations, try/catch, and the likes for brevity):


private string ExecuteURL(string url)
{

WebClient w = new WebClient();
return w.DownloadString(url);

}

private List<CurrencyItem> LoadCurrencies()
{

List<CurrencyItem> list = new List<CurrencyItem>();
string result = ExecuteURL(“https://www.google.com/finance/converter&#8221;);
MatchCollection m1 = Regex.Matches(result, @”<option value=\””(.*?)\””>(.*?)</option>”, RegexOptions.Singleline);

foreach (Match m in m1)
{

list.Add(new CurrencyItem(m.Groups[1].Value, WebUtility.HtmlDecode(m.Groups[2].Value)));

}

return list;

}

public class CurrencyItem
{

public string Name { get; set; }
public string Code { get; set; }
public CurrencyItem(string _code, string _name)
{

Name = _name;
Code = _code;

}

}


On the sample above, we used the Converter API of Google and we want to list each country’s currency. The returned result of the API is an HTML which contains a list of this line:

  • <option value=”USD”>United States ($)</option>

What I wanted to list are

  • “USD”
  • “United States ($)”

The regex I used then is

  • @”<option value=\””(.*?)\””>(.*?)</option>”

The above regex means:

  • *? == it can be any value
  • () == we want to get only what’s inside these parenthesis
  • The rest are just helpful to match the exact line we want to extract

In the collection we retrieved, we will have:

  • m.Groups[1].Value == first enclosed in parenthesis (USD)
  • m.Groups[2].Value == second enclosed in parenthesis (United States ($))

..and that’s it! You can now use the returned list to bind it to an object such as comboBox, datagridview, etc.