Add a textbox on top. Size around 450, 20
Add a button on the right of the textbox. Text: Navigate
Add another textbox below the first one, make it large. Size around 500, 220
Add a webbrowser on the right of the second textbox, resize it - make it small. Size: 115, 55
Double click on the button, add this:
WebBrowser1.Navigate(TextBox1.Text)
TextBox2.Clear()
Double click on the webbrowser, event should be: WebBrowser1.DocumentCompleted. Add inside:
TextBox2.Text = WebBrowser1.Url.ToString & vbNewLine & WebBrowser1.DocumentTitle
- navigate to a website. This gives us the url and title of the current website.
Change the above code into this:
TextBox2.Text = WebBrowser1.Document.Url.ToString & vbNewLine & WebBrowser1.Document.Title & vbNewLine
TextBox2.Text = TextBox2.Text & WebBrowser1.Document.Domain
- that's another way to get url and site title, also the domain name
Replace the above code into this:
TextBox2.Text = WebBrowser1.DocumentText
- after the page loads completely, you will get the source code of the website.
We could have used a streamreader instead to do this. At the start of the coding page, insert this namespace:
Imports System.IO
Change the webbrowser code into this:
Dim bb As New StreamReader(WebBrowser1.DocumentStream)
TextBox2.Text = bb.ReadToEnd()
- we store the website's stream of data into a streamreader ("bb") and then we read the streamreader into our textbox.
Public Class Form1
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
WebBrowser1.Navigate(TextBox1.Text)
TextBox2.Clear()
End Sub
Private Sub WebBrowser1_DocumentCompleted(ByVal sender As System.Object, ByVal e As System.Windows.Forms.WebBrowserDocumentCompletedEventArgs) Handles WebBrowser1.DocumentCompleted
Dim bb As New StreamReader(WebBrowser1.DocumentStream)
TextBox2.Text = bb.ReadToEnd()
End Sub
End Class
If you go to a website with your default OS browser, you should be able to choose "View Page Source" after right-clicking on it. Inside the source code you will notice a lot of html elements that are displayed by their "tag name" which usually has a pink color, like: meta, link, img, div etc. Google: "Html element list"
Each element should have some attributes that describe them, like: alt, class, default, size, name, src. Attributes are usually followed by the equality sign and quotation marks which hold the value of the attribute.
Let's get all the hyperlinks present on a chosen site:
- instead of writing: WebBrowser1.Document.GetElementsByTagName("a") we could have written: WebBrowser1.Document.Links
Change the previous code into this:
- this will give us all the image links from a site.
Instead of: WebBrowser1.Document.GetElementsByTagName("img") we could have used: WebBrowser1.Document.Images
You can change the tag name inside the "For" loop as you like and then search for the desired attribute by modifying the getattribute method.
Change the code again:
- this might provide you more results for the "src" attribute then you would have gotten by only searching the <img> tag name. That's because the web page can have this attribute describing other elements as well, like <script>.
WebBrowser1.Document.All - considers all the elements on the web page.
The "if" statement is needed in order to avoid all the empty lines in the results.
If you want to find more attributes during one single search just add another "if" statement with another attribute inside the "For Each" loop.
If you wanted to find another attribute only after the first one was found, then you'd insert the second "if" statement inside the previous one:
There might be occasions when the chosen website has an image slideshow displayed on the web page. Every time the display of the slideshow changes, the "WebBrowser1.DocumentCompleted" event will fire up the subroutine again. So the code might be repeated a couple of times. Other changing elements on a website could do this as well.
Above the button1 sub, in the declaration area, add:
Inside the button1 sub add:
Put the whole code of the webbrowser sub into this "if" statement:
Also add this below the "if" statement:
- additional website loads will not be processed by the sub code now.
Public Class Form1
Dim dd
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
WebBrowser1.Navigate(TextBox1.Text)
TextBox2.Clear()
dd = 0
End Sub
Private Sub WebBrowser1_DocumentCompleted(ByVal sender As System.Object, ByVal e As System.Windows.Forms.WebBrowserDocumentCompletedEventArgs) Handles WebBrowser1.DocumentCompleted
If dd = 0 Then
For Each i In WebBrowser1.Document.All
If i.getattribute("src").Length > 0 Then
TextBox2.Text = TextBox2.Text & i.getattribute("src") & vbNewLine
If (i.getattribute("alt").Length > 0) Then
TextBox2.Text = TextBox2.Text & i.getattribute("alt") & vbNewLine
End If
End If
Next
End If
dd = 1
End Sub
End Class
You can suppress script error notifications from your webbrowser if you ever encounter them.
Click on the WebBrowser control in your main form -> go to properties -> find: "ScriptErrorsSuppressed" -> change to: True
WebBrowser data:
MsgBox(WebBrowser1.Location.ToString) - location is relative to the client area of the main form.
MsgBox(WebBrowser1.Size.ToString) - size of the webbrowser in pixels
MsgBox(WebBrowser1.ClientRectangle.ToString) - webbrowser client area, the location is relative to the webbrowser itself so it's gonna be 0,0