3.         Customize Robots

In some cases, the actions automatically generated by recording and step-by-step creation may not be what you want.  For example, in the “saved-abstracts.txt” file saved previously by the “pubmed” task, a record is like this:

 

Note that the authors are broken into multiple likes. 

A robot action created from the robot recorder or robot designer can be customized to change its default behavior. Right click a robot action from the action list box in the control panel, a pop-up menu shows:

Click on the ‘Test’ menu item from the pop-up menu to see its behavior.  For example, see if it navigates to the correct URL or submit the correct form. 

 

Click on the ‘Property …’ menu item from the pop-up menu to see and modify the action properties.  Different actions have different properties to modify, which will be explained later. 

 

You can select multiple actions from the action list box and choose the ‘Move Up’, ‘Move Down’, or ‘Delete’ from the pop-up menu to switch the action order.

 

You can select ‘Copy’ and ‘Paste’ menu item from the pop-up menu to duplicate actions across different robot tasks and across different robots. 

3.1.        Action Properties

Click on the ‘Property …’ menu item from the pop-up menu to see and modify the action properties.  Different actions have different properties to modify.

1)     Go to URL

The property page looks like:

The Url can be a String like ‘http://mail.yahoo.com/’ in the example.  The Url can also be taken from a Variable or an Expression, such as ‘http://mail’ + ‘.yahoo.com’.

 

            Make sure to click the [Modify] button after your change.

 

2)     A Click

The property page looks like:

 

            A Click” is similar to “A Link”, which can be chosen from the drop-down list. “A Click” is more general that it can click on links that include JavaScripts.  However, “A Link” can be executed more robustly from API programs. The property page of “A Link” looks like:

 

The           Query” link tests if the target query locates the correct link.  In order to use the test correctly, navigate to the target page before opening the property page.

 

            Target query” is an HTQL expression to locate the link to click. The HTQL expression can also taken from a Variable or an Expression.

 

            Wait navigation?” specifies whether to wait for navigation after the click or not.

 

            The “Link tag” specifies the HTML tag of the target link.

 

            Make sure to click the [Modify] button after your change.

 

3)     A List of Links

The property page looks like:

 

The           Query” link test if the target query locate the correct list of links.  In order to use the test correctly, navigate to the target page before opening the property page.

 

            Target query” is an HTQL expression to locate the list of links to click. The HTQL expression can also taken from a Variable or an Expression.

 

            Field Index for Links” specifies the table columns where the list of links is defined in the target query.  If the field index is 0, the action is turned to ‘Take Table’. 

 

            Make sure to click the [Modify] button after your change.

 

4)     Take Data

The property page looks like:

 

The           Query” link tests if the target query extracts the correct data.  In order to use the test correctly, navigate to the target page before opening the property page.

 

            Target query” is an HTQL expression to extract the data. The HTQL expression can also taken from a Variable or an Expression.

 

            You can let the robot wait until the target data is shown or until the data disappear (work only when the page is refreshing by itself).  Wait time” specifies the wait interval in seconds.

 

            Make sure to click the [Modify] button after your change.

 

5)     Take Table

Take table is the same as “A List of Links”, when the “Field Index for Links” is set to 0 or left empty.  Example page:

 

The           Query” link tests if the target query locates the correct list of links.  In order to use the test correctly, navigate to the target page before opening the property page.

 

            Target query” is an HTQL expression to locate the list of links to click. The HTQL expression can also taken from a Variable or an Expression.

 

            Field Index for Links” specifies the table columns where the list of links is defined in the target query.  If the field index is 0, the action is turned to ‘Take Table’. 

 

            Make sure to click the [Modify] button after your change.

 

6)     Submit a Form

The property page looks like:

 

            The “values” link brings up the form value window from the control panel (Click on the [OK] button in the form value window to close it).

 

            Form values” specifies the form values to be filled in the target form. The form values can be specified as a “Variable” or an “Expression”.  Form values can be drawn from databases, which will be explained later.

 

            The      Form” link tests if the target form is located correctly.  In order to use the test correctly, navigate to the target page before opening the property page.

 

            Form location HTQL” is an HTQL expression to locate the form to be submitted.

 

            Submit button HTQL” is an HTQL expression to locate the submit-button of the form. A special ‘-none-’ expression tell the robot not to submit the form (only fill form values).

 

            To match form action” tells the robot submit the form only the action matches the specified URL. 

 

            Make sure to click the [Modify] button after your change.

 

7)     Logon Form

‘Logon Form’ is same as ‘Submit a Form’

 

8)     Open a Frame

The property page looks like:

 

The Url specifies an HTQL expression to locate the frame.  The Url can also be taken from a Variable or an Expression, such as ‘http://mail’ + ‘.yahoo.com’.

 

            Make sure to click the [Modify] button after your change.

 

9)     Sent Email

To be explained.

 

10) A Schedule

The property page looks like:

           

            Click on the “View Schedule” to modify the schedule. The schedule page looks like:

            To add a new schedule, click the [Insert] button.  To modify existing schedules, check the schedules and click the [Modify] button.  To delete existing schedules, check the schedules to be deleted and click the [Delete] button.  The insert and modify page looks like:

 

           Customize the Type, Interval, Base, and Action to desirable settings and click [Modify] button to confirm the change. 

 

            Leave the Session attribute empty to use a default browser.  Give a special Session name if you want the robot to launch a special browser for the scheduled task.

 

3.2.        Action Variables

Action variables can be used to calculate data and test action condition.  Each action has default variables (case insensitive):

Tuple: For “List of Links”, tuple is the index of the current link the robot is acting on.  For “Take Table”, tuple is the index of the current row the robot is extracting. Tuple ranges from 1 to N, the number of total tuples.  Tupe=0 means that the robot has completed the action.

ErrorCode: Reflect the status of the current action.  ErrorCode<0 means the action is in error; ErrorCode=0 means success; ErrorCode=1 means the robot is to ignore the current tuple and try next tuple; ErrorCode=2 means the robot will end the current action no matter how much tuple has not been process; ErrorCode=3 means that the robot will pause navigation on the current tuple; and ErrorCode=4 means that the robot will retry the current tuple.

ActionName: Take the current action name, each name corresponding to an action type.  As a list:

a)      Go to URL: ActionName=’URL’;

b)      A Click: ActionName=’Click’;

c)      A List of Links: ActionName=’Table’;

d)      Take Data: ActionName=’Item’;

e)      Take Table: ActionName=’Table’;

f)        Submit a Form: ActionName=’Form’;

g)      Logon Form: ActionName=’Logon’;

h)      Open a Frame: ActionName=’Frame’;

i)        Sent Email: ActionName=’Email’;

j)        A Schedule: ActionName=’Schedule’.

 

Additional variables can be created by users.  User-created variables are case sensitive.  User can create variables by two ways: one from action menu, and other from action events.  To create variables from action menu, right-click on an action in the list box in the control panel.  It shows:

 

Choose the “Name Variable …” from the pop-up menu to name the data extracted by the action. 

 

For “Take Data” action, the “Name Variable …” shows:

 

Give the extracted data a name and choose a corresponding transformation type for it.  Then press [OK] button, like:

For “Take Data” or “A List of Links” actions, the “Name Variable …” page is shown in the browser window, like (navigating to a sample page before showing the variables allow you to see the sample table from this page):

 

Give names under the Name column and choose the corresponding transformations to declare variables. Press the [Update] button to confirm the change.

 

Variables can also be created from action events, as will be explained in the Action event section.

 

3.3.        Action Events

Each action fires a number of events.  You can create variables, calculate data, and test conditions from these events.  The list of available events, in the order they are fired from the action, is:

a)      Before page: Before the page is open;

b)      Read page: The page is just opened;

c)      Page loaded: The page has been loaded into memory;

d)      Before each tuple: Before processing each tuple of the page;

e)      Read tuple: Read a tuple from the page;

f)        Check tuple: Tuple has been read;

g)      Before action: An action will be taken on the current tuple;

h)      After action: Action on the current tuple is completed;

i)        After each tuple: After each tuple is processed;

j)        After page: All tuple on the page has been processed;

k)      Completed: The current action is completely executed.

 

Action events are shown when right-clicking a chosen action from the action list box in the control panel, and choosing the “Events …” from the pop-up menu, as shown in the following figure.

 

 

The event page is shown in the browser window like:

 

In the event page, you can [Insert], [Modify], and [Delete] events.  For example, in the previous figure, a ‘Before action’ event is defined and a condition ‘Eval>1e-150’ is tested.  If the condition is satisfied, it returns a status to finish the action (End this page)

 

A variable can be created in events like:

 

In the above figure, an “After page” event is defined and a variable named “mp3list” is assigned an expression of “mp3list+mp3_link+’<br>’”, where “mp3list” and “mp3_link” are two existing variables and ‘<br>’ is a string.  The expression concatenates two variables and a string to form a new string and assigns the new string the variable “mp3_list”.

 

You can manipulate the Web page the action is acting on in events.  The following figure shows an example. 

 

In the above figure, an “Before each tuple” event is defined and in this event, the setAttribute() function set the “target” attribute of the first form in the target page to be empty, thus this event clear the target attribute of the form and avoids opening a new browser when the form is submitted. 

3.4.        Repeat Actions

An action can be designed to repeat on a sequence of Web pages, such as to check a same Web page or follow the [Next] links continuously.  Right-click on an action from the action list box, and choose the “Repeat Property …” from the pop-up menu, as:

 

The repeat action dialog is shown as:

 

If you would like to repeat on the next page, browse to an example Web page before bringing up this repeat dialog.  Mark the repeat item in the browser window (move your mouse above the [Next] link, press the left button, drag the link slightly, and release the left button).  Then check the “Repeat this action”, and the “Next Page” radio in the repeat dialog page.  Press the [OK] button to confirm the change.

 

If you check the “Do action on new item (monitor)” check box, the robot will repeat actions only when the data extracted from the current page have changed.