NAME WWW::Mechanize::Firefox - use Firefox as if it were WWW::Mechanize SYNOPSIS use WWW::Mechanize::Firefox; my $mech = WWW::Mechanize::Firefox->new(); $mech->get('http://google.com'); $mech->eval_in_page('alert("Hello Firefox")'); my $png = $mech->content_as_png(); This module will let you automate Firefox through the Mozrepl plugin. You need to have installed that plugin in your Firefox. For more examples see WWW::Mechanize::Firefox::Examples. CONSTRUCTOR and CONFIGURATION `$mech->new( %args )' use WWW::Mechanize::Firefox; my $mech = WWW::Mechanize::Firefox->new(); Creates a new instance and connects it to Firefox. Note that Firefox must have the `mozrepl' extension installed and enabled. The following options are recognized: * `tab' - regex for the title of the tab to reuse. If no matching tab is found, the constructor dies. If you pass in the string `current', the currently active tab will be used instead. If you pass in a MozRepl::RemoteObject instance, this will be used as the new tab. This is convenient if you have an existing tab in Firefox as object already, for example created through Firefox::Application`->addTab()'. * `create' - will create a new tab if no existing tab matching the criteria given in `tab' can be found. * `activate' - make the tab the active tab * `launch' - name of the program to launch if we can't connect to it on the first try. * `frames' - an array reference of ids of subframes to include when searching for elements on a page. If you want to always search through all frames, just pass `1'. This is the default. To prevent searching through frames, pass frames => 0 To whitelist frames to be searched, pass the list of frame selectors: frames => ['#content_frame'] * `autodie' - whether web failures converted are fatal Perl errors. See the `autodie' accessor. True by default to make error checking easier. To make errors non-fatal, pass autodie => 0 in the constructor. * `agent' - the name of the User Agent to use. This overrides how Firefox identifies itself. * `log' - array reference to log levels, passed through to MozRepl::RemoteObject * `bufsize' - Net::Telnet buffer size, if the default of 1MB is not enough * `events' - the set of default Javascript events to listen for while waiting for a reply. In fact, WWW::Mechanize::Firefox will almost always wait until a 'DOMContentLoaded' or 'load' event. 'pagehide' events will tell it for what frames to wait. The default set is 'DOMContentLoaded','load', 'pageshow', 'pagehide', 'error','abort','stop', * `app' - a premade Firefox::Application * `repl' - a premade MozRepl::RemoteObject instance or a connection string suitable for initializing one * `use_queue' - whether to use the command queueing of MozRepl::RemoteObject. Default is 1. * `js_JSON' - whether to use native JSON encoder of Firefox js_JSON => 'native', # force using the native JSON encoder The default is to autodetect whether a native JSON encoder is available and whether the transport is UTF-8 safe. * `pre_events' - the events that are sent to an input field before its value is changed. By default this is `[focus]'. * `post_events' - the events that are sent to an input field after its value is changed. By default this is `[blur, change]'. `$mech->agent( $product_id );' $mech->agent('wonderbot/JS 1.0'); Set the product token that is used to identify the user agent on the network. The agent value is sent as the "User-Agent" header in the requests. The default is whatever Firefox uses. To reset the user agent to the Firefox default, pass an empty string: $mech->agent(''); `$mech->autodie( [$state] )' $mech->autodie(0); Accessor to get/set whether warnings become fatal. `$mech->events()' $mech->events( ['load'] ); Sets or gets the set of Javascript events that WWW::Mechanize::Firefox will wait for after requesting a new page. Returns an array reference. Changing the set of events will most likely make WWW::Mechanize::Firefox stall while waiting for a response. This method is special to WWW::Mechanize::Firefox. `$mech->on_event()' $mech->on_event(1); # prints every page load event # or give it a callback $mech->on_event(sub { warn "Page loaded with $ev->{name} event" }); Gets/sets the notification handler for the Javascript event that finished a page load. Set it to `1' to output via `warn', or a code reference to call it with the event. This method is special to WWW::Mechanize::Firefox. `$mech->cookies()' my $cookie_jar = $mech->cookies(); Returns a HTTP::Cookies object that was initialized from the live Firefox instance. Note: `->set_cookie' is not yet implemented, as is saving the cookie jar. JAVASCRIPT METHODS `$mech->allow( %options )' Enables or disables browser features for the current tab. The following options are recognized: * `plugins' - Whether to allow plugin execution. * `javascript' - Whether to allow Javascript execution. * `metaredirects' - Attribute stating if refresh based redirects can be allowed. * `frames', `subframes' - Attribute stating if it should allow subframes (framesets/iframes) or not. * `images' - Attribute stating whether or not images should be loaded. Options not listed remain unchanged. Disable Javascript $mech->allow( javascript => 0 ); `$mech->js_errors()' print $_->{message} for $mech->js_errors(); An interface to the Javascript Error Console Returns the list of errors in the JEC Maybe this should be called `js_messages' or `js_console_messages' instead. `$mech->clear_js_errors()' $mech->clear_js_errors(); Clears all Javascript messages from the console `$mech->eval_in_page( $str [, $env [, $document]] )' `$mech->eval( $str [, $env [, $document]] )' my ($value, $type) = $mech->eval( '2+2' ); Evaluates the given Javascript fragment in the context of the web page. Returns a pair of value and Javascript type. This allows access to variables and functions declared "globally" on the web page. The returned result needs to be treated with extreme care because it might lead to Javascript execution in the context of your application instead of the context of the webpage. This should be evident for functions and complex data structures like objects. When working with results from untrusted sources, you can only safely use simple types like `string'. If you want to modify the environment the code is run under, pass in a hash reference as the second parameter. All keys will be inserted into the `this' object as well as `this.window'. Also, complex data structures are only supported if they contain no objects. If you need finer control, you'll have to write the Javascript yourself. This method is special to WWW::Mechanize::Firefox. Also, using this method opens a potential security risk as the returned values can be objects and using these objects can execute malicious code in the context of the Firefox application. `$mech->unsafe_page_property_access( ELEMENT )' Allows you unsafe access to properties of the current page. Using such properties is an incredibly bad idea. This is why the function `die's. If you really want to use this function, edit the source code. UI METHODS See also Firefox::Application for how to add more than one tab and how to manipulate windows and tabs. `$mech->application()' my $ff = $mech->application(); Returns the Firefox::Application object for manipulating more parts of the Firefox UI and application. `$mech->autoclose_tab' $mech->autoclose_tab( 0 ); # keep tab open after program end Set whether to close the tab associated with the instance. `$mech->tab()' Gets the object that represents the Firefox tab used by WWW::Mechanize::Firefox. This method is special to WWW::Mechanize::Firefox. `$mech->make_progress_listener( %callbacks )' my $eventlistener = $mech->progress_listener( onStateChange => \&onStateChange, ); Creates an unconnected `nsIWebProgressListener' interface which calls the Perl subroutines you pass in. Returns a handle. Once the handle gets released, all callbacks will get stopped. Also, all Perl callbacks will get deregistered from the Javascript bridge, so make sure not to use the same callback in different progress listeners at the same time. The sender may still call your callbacks. `$mech->progress_listener( $source, %callbacks )' my $eventlistener = progress_listener( $browser, onLocationChange => \&onLocationChange, ); Sets up the callbacks for the `nsIWebProgressListener' interface to be the Perl subroutines you pass in. ` $source ' needs to support `.addProgressListener' and `.removeProgressListener'. Returns a handle. Once the handle gets released, all callbacks will get stopped. Also, all Perl callbacks will get deregistered from the Javascript bridge, so make sure not to use the same callback in different progress listeners at the same time. `$mech->repl()' my ($value,$type) = $mech->repl->expr('2+2'); Gets the MozRepl::RemoteObject instance that is used. This method is special to WWW::Mechanize::Firefox. `$mech->highlight_node( @nodes )' my @links = $mech->selector('a'); $mech->highlight_node(@links); Convenience method that marks all nodes in the arguments with background: red; border: solid black 1px; display: block; /* if the element was display: none before */ This is convenient if you need visual verification that you've got the right nodes. There currently is no way to restore the nodes to their original visual state except reloading the page. NAVIGATION METHODS `$mech->get( $url, %options )' $mech->get( $url, ':content_file' => $tempfile ); Retrieves the URL `URL' into the tab. It returns a faked HTTP::Response object for interface compatibility with WWW::Mechanize. Recognized options: * `:content_file' - filename to store the data in * `no_cache' - if true, bypass the browser cache * `synchronize' - wait until all elements have loaded The default is to wait until all elements have loaded. You can switch this off by passing synchronize => 0 for example if you want to manually poll for an element that appears fairly early during the load of a complex page. `$mech->get_local( $filename , %options )' $mech->get_local('test.html'); Shorthand method to construct the appropriate `file://' URI and load it into Firefox. Relative paths will be interpreted as relative to `$0'. This method accepts the same options as `->get()'. This method is special to WWW::Mechanize::Firefox but could also exist in WWW::Mechanize through a plugin. `$mech->post( $url, %options )' $mech->post( 'http://example.com', params => { param => "Hello World" }, headers => { "Content-Type" => 'application/x-www-form-urlencoded', }, charset => 'utf-8', ); Sends a POST request to `$url'. A `Content-Length' header will be automatically calculated if it is not given. The following options are recognized: * `headers' - a hash of HTTP headers to send. If not given, the content type will be generated automatically. * `data' - the raw data to send, if you've encoded it already. `$mech->add_header( $name => $value, ... )' $mech->add_header( 'X-WWW-Mechanize-Firefox' => "I'm using it", Encoding => 'text/klingon', ); This method sets up custom headers that will be sent with every HTTP(S) request that Firefox makes. Using multiple instances of WWW::Mechanize::Firefox objects with the same application together with changed request headers will most likely have weird effects. So don't do that. Note that currently, we only support one value per header. Some versions of Firefox don't work with the method that is used to set the custom headers. Please see `t/60-mech-custom-headers.t' for the exact versions where the implemented mechanism doesn't work. Roughly, this is for versions 17 to 24 of Firefox. `$mech->delete_header( $name , $name2... )' $mech->delete_header( 'User-Agent' ); Removes HTTP headers from the agent's list of special headers. Note that Firefox may still send a header with its default value. `$mech->reset_headers' $mech->reset_headers(); Removes all custom headers and makes Firefox send its defaults again. `$mech->synchronize( $event, $callback )' Wraps a synchronization semaphore around the callback and waits until the event `$event' fires on the browser. If you want to wait for one of multiple events to occur, pass an array reference as the first parameter. Usually, you want to use it like this: my $l = $mech->xpath('//a[@onclick]', single => 1); $mech->synchronize('DOMFrameContentLoaded', sub { $mech->click( $l ); }); It is necessary to synchronize with the browser whenever a click performs an action that takes longer and fires an event on the browser object. The `DOMFrameContentLoaded' event is fired by Firefox when the whole DOM and all `iframe's have been loaded. If your document doesn't have frames, use the `DOMContentLoaded' event instead. If you leave out `$event', the value of `->events()' will be used instead. `$mech->res()' / `$mech->response(%options)' my $response = $mech->response(headers => 0); Returns the current response as a HTTP::Response object. The `headers' option tells the module whether to fetch the headers from Firefox or not. This is mainly an internal optimization hack. `$mech->success()' $mech->get('http://google.com'); print "Yay" if $mech->success(); Returns a boolean telling whether the last request was successful. If there hasn't been an operation yet, returns false. This is a convenience function that wraps `$mech->res->is_success'. `$mech->status()' $mech->get('http://google.com'); print $mech->status(); # 200 Returns the HTTP status code of the response. This is a 3-digit number like 200 for OK, 404 for not found, and so on. `$mech->reload( [$bypass_cache] )' $mech->reload(); Reloads the current page. If `$bypass_cache' is a true value, the browser is not allowed to use a cached page. This is the difference between pressing `F5' (cached) and `shift-F5' (uncached). Returns the (new) response. `$mech->back( [$synchronize] )' $mech->back(); Goes one page back in the page history. Returns the (new) response. `$mech->forward( [$synchronize] )' $mech->forward(); Goes one page forward in the page history. Returns the (new) response. `$mech->uri()' print "We are at " . $mech->uri; Returns the current document URI. CONTENT METHODS `$mech->document()' Returns the DOM document object. This is WWW::Mechanize::Firefox specific. `$mech->docshell()' my $ds = $mech->docshell; Returns the `docShell' Javascript object associated with the tab. This is WWW::Mechanize::Firefox specific. `$mech->content( %options )' print $mech->content; print $mech->content( format => 'html' ); # default print $mech->content( format => 'text' ); # identical to ->text This always returns the content as a Unicode string. It tries to decode the raw content according to its input encoding. This currently only works for HTML pages, not for images etc. Recognized options: * `document' - the document to use. Default is `$self->document'. * `format' - the stuff to return The allowed values are `html' and `text'. The default is `html'. `$mech->text()' Returns the text of the current HTML content. If the content isn't HTML, $mech will die. `$mech->content_encoding()' print "The content is encoded as ", $mech->content_encoding; Returns the encoding that the content is in. This can be used to convert the content from UTF-8 back to its native encoding. `$mech->update_html( $html )' $mech->update_html($html); Writes `$html' into the current document. This is mostly implemented as a convenience method for HTML::Display::MozRepl. `$mech->save_content( $localname [, $resource_directory] [, %options ] )' $mech->get('http://google.com'); $mech->save_content('google search page','google search page files'); Saves the given URL to the given filename. The URL will be fetched from the cache if possible, avoiding unnecessary network traffic. If `$resource_directory' is given, the whole page will be saved. All CSS, subframes and images will be saved into that directory, while the page HTML itself will still be saved in the file pointed to by `$localname'. Returns a `nsIWebBrowserPersist' object through which you can cancel the download by calling its `->cancelSave' method. Also, you can poll the download status through the `->{currentState}' property. If you need to set persist flags pass the unsigned long value in the `persist' option. $mech->get('http://zombisoft.com'); $mech->save_content('Zombisoft','zombisoft-resource-files', "persist" => 512 | 2048); A list of flags and their values can be found at https://developer.mozilla.org/en-US/docs/XPCOM_Interface_Reference/nsIWe bBrowserPersist. If you are interested in the intermediate download progress, create a ProgressListener through `$mech->progress_listener' and pass it in the `progress' option. The download will continue in the background. It will not show up in the Download Manager. `$mech->save_url( $url, $localname, [%options] )' $mech->save_url('http://google.com','google_index.html'); Saves the given URL to the given filename. The URL will be fetched from the cache if possible, avoiding unnecessary network traffic. If you are interested in the intermediate download progress, create a ProgressListener through `$mech->progress_listener' and pass it in the `progress' option. The download will continue in the background. It will also not show up in the Download Manager. If the `progress' option is not passed in, ` -'save_url > will only return after the download has finished. Returns a `nsIWebBrowserPersist' object through which you can cancel the download by calling its `->cancelSave' method. Also, you can poll the download status through the `->{currentState}' property. `$mech->base()' print $mech->base; Returns the URL base for the current page. The base is either specified through a `base' tag or is the current URL. This method is specific to WWW::Mechanize::Firefox `$mech->content_type()' `$mech->ct()' print $mech->content_type; Returns the content type of the currently loaded document `$mech->is_html()' print $mech->is_html(); Returns true/false on whether our content is HTML, according to the HTTP headers. `$mech->title()' print "We are on page " . $mech->title; Returns the current document title. EXTRACTION METHODS `$mech->links()' print $_->text . " -> " . $_->url . "\n" for $mech->links; Returns all links in the document as WWW::Mechanize::Link objects. Currently accepts no parameters. See `->xpath' or `->selector' when you want more control. `$mech->find_link_dom( %options )' print $_->{innerHTML} . "\n" for $mech->find_link_dom( text_contains => 'CPAN' ); A method to find links, like WWW::Mechanize's `->find_links' method. This method returns DOM objects from Firefox instead of WWW::Mechanize::Link objects. Note that Firefox might have reordered the links or frame links in the document so the absolute numbers passed via `n' might not be the same between WWW::Mechanize and WWW::Mechanize::Firefox. Returns the DOM object as MozRepl::RemoteObject::Instance. The supported options are: * `text' and `text_contains' and `text_regex' Match the text of the link as a complete string, substring or regular expression. Matching as a complete string or substring is a bit faster, as it is done in the XPath engine of Firefox. * `id' and `id_contains' and `id_regex' Matches the `id' attribute of the link completely or as part * `name' and `name_contains' and `name_regex' Matches the `name' attribute of the link * `url' and `url_regex' Matches the URL attribute of the link (`href', `src' or `content'). * `class' - the `class' attribute of the link * `n' - the (1-based) index. Defaults to returning the first link. * `single' - If true, ensure that only one element is found. Otherwise croak or carp, depending on the `autodie' parameter. * `one' - If true, ensure that at least one element is found. Otherwise croak or carp, depending on the `autodie' parameter. The method `croak's if no link is found. If the `single' option is true, it also `croak's when more than one link is found. `$mech->find_link( %options )' print $_->text . "\n" for $mech->find_link( text_contains => 'CPAN' ); A method quite similar to WWW::Mechanize's method. The options are documented in `->find_link_dom'. Returns a WWW::Mechanize::Link object. This defaults to not look through child frames. `$mech->find_all_links( %options )' print $_->text . "\n" for $mech->find_all_links( text_regex => qr/google/i ); Finds all links in the document. The options are documented in `->find_link_dom'. Returns them as list or an array reference, depending on context. This defaults to not look through child frames. `$mech->find_all_links_dom %options' print $_->{innerHTML} . "\n" for $mech->find_all_links_dom( text_regex => qr/google/i ); Finds all matching linky DOM nodes in the document. The options are documented in `->find_link_dom'. Returns them as list or an array reference, depending on context. This defaults to not look through child frames. `$mech->follow_link( $link )' `$mech->follow_link( %options )' $mech->follow_link( xpath => '//a[text() = "Click here!"]' ); Follows the given link. Takes the same parameters that `find_link_dom' uses. In addition, `synchronize' can be passed to (not) force waiting for a new page to be loaded. Note that `->follow_link' will only try to follow link-like things like `A' tags. `$mech->xpath( $query, %options )' my $link = $mech->xpath('//a[id="clickme"]', one => 1); # croaks if there is no link or more than one link found my @para = $mech->xpath('//p'); # Collects all paragraphs my @para_text = $mech->xpath('//p/text()', type => $mech->xpathResult('STRING_TYPE')); # Collects all paragraphs as text Runs an XPath query in Firefox against the current document. If you need more information about the returned results, use the `->xpathEx()' function. The options allow the following keys: * `document' - document in which the query is to be executed. Use this to search a node within a specific subframe of `$mech->document'. * `frames' - if true, search all documents in all frames and iframes. This may or may not conflict with `node'. This will default to the `frames' setting of the WWW::Mechanize::Firefox object. * `node' - node relative to which the query is to be executed. Note that you will have to use a relative XPath expression as well. Use .//foo instead of //foo * `single' - If true, ensure that only one element is found. Otherwise croak or carp, depending on the `autodie' parameter. * `one' - If true, ensure that at least one element is found. Otherwise croak or carp, depending on the `autodie' parameter. * `maybe' - If true, ensure that at most one element is found. Otherwise croak or carp, depending on the `autodie' parameter. * `all' - If true, return all elements found. This is the default. You can use this option if you want to use `->xpath' in scalar context to count the number of matched elements, as it will otherwise emit a warning for each usage in scalar context without any of the above restricting options. * `any' - no error is raised, no matter if an item is found or not. * `type' - force the return type of the query. type => $mech->xpathResult('ORDERED_NODE_SNAPSHOT_TYPE'), WWW::Mechanize::Firefox tries a best effort in giving you the appropriate result of your query, be it a DOM node or a string or a number. In the case you need to restrict the return type, you can pass this in. The allowed strings are documented in the MDN. Interesting types are ANY_TYPE (default, uses whatever things the query returns) STRING_TYPE NUMBER_TYPE ORDERED_NODE_SNAPSHOT_TYPE Returns the matched results. You can pass in a list of queries as an array reference for the first parameter. The result will then be the list of all elements matching any of the queries. This is a method that is not implemented in WWW::Mechanize. In the long run, this should go into a general plugin for WWW::Mechanize. `$mech->xpathEx( $query, %options )' my @links = $mech->xpathEx('//a[id="clickme"]'); Runs an XPath query in Firefox against a document. Returns a list of found elements. Each element in the result has the following properties: * `resultType' - the type of the result. The numerical value of `$mech->xpathResult()'. * `resultSize' - the number of elements in this result. This is 1 for atomic results like strings or numbers, and the number of elements for nodesets. * `result' - the best result available. This is the nodeset or the text or number, depending on the query. `$mech->selector( $css_selector, %options )' my @text = $mech->selector('p.content'); Returns all nodes matching the given CSS selector. If `$css_selector' is an array reference, it returns all nodes matched by any of the CSS selectors in the array. This takes the same options that `->xpath' does. In the long run, this should go into a general plugin for WWW::Mechanize. `$mech->by_id( $id, %options )' my @text = $mech->by_id('_foo:bar'); Returns all nodes matching the given ids. If `$id' is an array reference, it returns all nodes matched by any of the ids in the array. This method is equivalent to calling `->xpath' : $self->xpath(qq{//*[\@id="$_"], %options) It is convenient when your element ids get mistaken for CSS selectors. `$mech->click( $name [,$x ,$y] )' $mech->click( 'go' ); $mech->click({ xpath => '//button[@name="go"]' }); Has the effect of clicking a button (or other element) on the current form. The first argument is the `name' of the button to be clicked. The second and third arguments (optional) allow you to specify the (x,y) coordinates of the click. If there is only one button on the form, `$mech->click()' with no arguments simply clicks that one button. If you pass in a hash reference instead of a name, the following keys are recognized: * `selector' - Find the element to click by the CSS selector * `xpath' - Find the element to click by the XPath query * `dom' - Click on the passed DOM element You can use this to click on arbitrary page elements. There is no convenient way to pass x/y co-ordinates with this method. * `id' - Click on the element with the given id This is useful if your document ids contain characters that do look like CSS selectors. It is equivalent to xpath => qq{//*[\@id="$id"]} * `synchronize' - Synchronize the click (default is 1) Synchronizing means that WWW::Mechanize::Firefox will wait until one of the events listed in `events' is fired. You want to switch it off when there will be no HTTP response or DOM event fired, for example for clicks that only modify the DOM. You can pass in a scalar that is a false value to not wait for any kind of event. Passing in an array reference will use the array elements as Javascript events to wait for. Passing in any other true value will use the value of `->events' as the list of events to wait for. Returns a HTTP::Response object. As a deviation from the WWW::Mechanize API, you can also pass a hash reference as the first parameter. In it, you can specify the parameters to search much like for the `find_link' calls. Note: Currently, clicking on images with the `ismap' attribute does not trigger the move to the new URL. A workaround is to program the new URL into your script. `$mech->click_button( ... )' $mech->click_button( name => 'go' ); $mech->click_button( input => $mybutton ); Has the effect of clicking a button on the current form by specifying its name, value, or index. Its arguments are a list of key/value pairs. Only one of name, number, input or value must be specified in the keys. * `name' - name of the button * `value' - value of the button * `input' - DOM node * `id' - id of the button * `number' - number of the button If you find yourself wanting to specify a button through its `selector' or `xpath', consider using `->click' instead. FORM METHODS `$mech->current_form()' print $mech->current_form->{name}; Returns the current form. This method is incompatible with WWW::Mechanize. It returns the DOM `