Using HtmlAgilityPack to GET and POST web forms - Part 2

In my previous post I showed how to post a html form using HtmlAgilityPack. My FormElementCollection class only worked with simple <input> tags like text and hidden fields. In this post I am going to extend this class to handle checkboxes, radio buttons, drop downs and textareas.

To review, this is the FormElementCollection from last post:

public class FormElementCollection : Dictionary<string, string>
{
    public FormElementCollection(HtmlDocument htmlDoc)
    {
        var inputs = htmlDoc.DocumentNode.Descendants("input");
        foreach (var element in inputs)
        {
            string name = element.GetAttributeValue("name", "undefined");
            string value = element.GetAttributeValue("value", "");
            if (!name.Equals("undefined")) Add(name, value);
        }
    }

    public string AssemblePostPayload()
    {
        // Not changing. Same as before
    }
}
The constructor parses the HtmlDocument to get all form input elements. We just have to extend this to parse for the other types as well:
public class FormElementCollection : Dictionary<string, string>
{
    public FormElementCollection(HtmlDocument htmlDoc)
    {
        var inputs = htmlDoc.DocumentNode.Descendants("input");
        foreach (var element in inputs)
        {
            AddInputElement(element); 
        }

        var menus = htmlDoc.DocumentNode.Descendants("select");
        foreach (var element in menus)
        {
            AddMenuElement(element); 
        }

        var textareas = htmlDoc.DocumentNode.Descendants("textarea");
        foreach (var element in textareas)
        {
            AddTextareaElement(element);
        }
    }
}
The AddInputElement method parses the <input> nodes. It looks at the "type" attribute to special handle the checkboxes and radio buttons.
private void AddInputElement(HtmlNode element)
{
    string name = element.GetAttributeValue("name", "");
    string value = element.GetAttributeValue("value", "");
    string type = element.GetAttributeValue("type", "");            

    if (string.IsNullOrEmpty(name)) return;

    switch (type.ToLower())
    {
        case "checkbox": 
        case "radio":
            if (!ContainsKey(name)) Add(name, "");
            string isChecked = element.GetAttributeValue("checked", "unchecked"); 
            if (!isChecked.Equals("unchecked")) this[name] = value;
            break; 
        default: 
            Add(name, value); 
            break;
    }            
}
The AddMenuElement method parses the dropdown elements (the <select> nodes).
private void AddMenuElement(HtmlNode element)
{
    string name = element.GetAttributeValue("name", "");
    var options = element.Descendants("option");

    if (string.IsNullOrEmpty(name)) return;

    // choose the first option as default
    var firstOp = options.First();
    string defaultValue = firstOp.GetAttributeValue("value", firstOp.NextSibling.InnerText); 

    Add(name, defaultValue); 

    // check if any option is selected
    foreach (var option in options)
    {
        string selected = option.GetAttributeValue("selected", "notSelected");
        if (!selected.Equals("notSelected"))
        {
            string selectedValue = option.GetAttributeValue("value", option.NextSibling.InnerText);
            this[name] = selectedValue;
        }
    }
}
And finally, the AddTextareaElement method parses the the <textarea> nodes.
private void AddTextareaElement(HtmlNode element)
{
    string name = element.GetAttributeValue("name", "");
    if (string.IsNullOrEmpty(name)) return;
    Add(name, element.InnerText);
}
Now we can use this to post data for different kinds of form elements:
BrowserSession b = new BrowserSession();
b.Get("http://my.site.com/myForm.aspx");
b.FormElements["NameTextBox"] = "my name";
b.FormElements["MaleFemaleRadioButton"] = "male";
b.FormElements["CommentTextArea"] = "blah blah ...";
b.FormElements["StateDropDown"] = "WA";
string response = b.Post("http://my.site.com/myForm.aspx");

5 comments:

Anonymous said...

Hey, thanks for sharing the BrowserSession and FormElementCollection code :)

I'm trying to login to facebook, and have some troubles. The first post to the login forms seems to work fine - login works, and the cookie saved is similar to what I get when I login from my browser.

Assuming I'm now logged in; using the same BrowserSession object for subsequent Get's doesn't work - I get nothing back, even though it (seemingly) correctly adds the saved cookie to the new request.

Would you have any clues on what I'm doing wrong? I've also tried to set the hidden input elements on their login page to no avail.

Anonymous said...

Here's what happens: I run this code

b.Get("http://www.facebook.com/login.php");
b.FormElements["email"] = "my@email.com";
b.FormElements["pass"] = "mypass";
b.FormElements["charset_test"] = @"€,´,€,´,水,Д,Є";
b.FormElements["lsd"] = "qDhIH";
b.FormElements["trynum"] = "1";
b.FormElements["session_key_only"] = "0";
b.FormElements["persistent_inputcheckbox"] = "1";
mainDoc = b.Post("https://login.facebook.com/login.php?login_attempt=1");

I also had to set the request.UserAgent to (for instance) "Opera/9.80 (Windows NT 6.1; U; en) Presto/2.6.30 Version/10.61" in AddPostData so that facebook recognized me as a supported browser.

Both the first Get above and the subsequen Post works fine, and I get the correct documents back - first I get the login doc, second I get my facebook homepage as a logged in user. So far all is good.

Then I try to go to another page inside my homepage, for example my profile page. Using the same BrowserSession object as before (b), I call b.Get("http://www.facebook.com/profile.php");

Unfortunately I know get an empty document back - what am I doing wrong?

Anonymous said...

Hi,
for removing the error have to use:.
using System.Linq; and it worked
Thanks Abi.

Guy Fomi said...

thank u rohit for this nice article...

please tell me, i would like to post data without getting the FormElements first, because i get the data in json format...i want to generate the FormElements when i post the data to the server to upload some images

Rohit said...

Hi Guy, you can instantiate your own FormElements collection. E.g.

BrowserSession b = new BrowserSession();
b.Get("http://my.site.com/login.aspx");
b.FormElements = new FormElementCollection();
b.FormElements.Add("loginTextBox","username");
b.FormElements.Add("passwordTextBox","password");
string response = b.Post("http://my.site.com/login.aspx");

Hope that helps.

I am a programmer based in Seattle, WA. This is a space where I put notes from my programming experience, reading and training. To browse the list of all articles on this site please go here. You can contact me at rohit@rohit.cc.