Saturday, August 10, 2013

Functional HtmlUnit testing in Python using Requests for Humans,

Using testing for Html by HtmlUnit/http-requests using server side requests is a good way to improve the speed of testing. In Ajax era, testing the web throws more challenges in automation testing, better way is to test the XHR requests separately rather than trying to test all combinations in browser based UI automation.

First let us see the GET/POST request by HtmlUnit,

//Create a WebRequest for POST, use HttpMethod.GET for GET requests.
WebRequest request = new WebRequest(new URL("http://yoururl.com"), HttpMethod.POST);
request.setRequestParameters(new ArrayList());

//Add the params using ArrayList
request.getRequestParameters().add(new NameValuePair("param1","value1"));

//Send the request to the server using getPage.
HtmlPage page = webClient.getPage(request);

//Get the response as a String.
String html= page.getWebResponse().getContentAsString();
//Cookies can accessed using the following code,
CookieManager CM = webClient.getCookieManager();
Set ck = CM.getCookies();
for(Cookie value : ck)   
{
     System.out.println(value+" "+value.getValue());
}
So, how to do this in Python using http-requests (Requests for humans)?

#First to create a session using requests.Session
s=requests.Session()
#if you need to print the url along with params using this
s=requests.Session(config={'verbose':sys.stderr})

#Create a dict for params to be sent
url = 'https://yoururlhere.com'
params={'param1':'value1' 'param2':'value2'}

#Call the url & params
reply=s.post(url, params=params)

#if you want to send the params in the request body not in query string use the data option
reply=s.post(url,data=params)

#GET request using s.get option
reply=s.get(url,data=params)

#get the redirected url by accessing .url
print reply.url

#text or content will give the response
print reply.text
print reply.content

#Accessing cookie is by s.cookies
print s.cookies

#convert cookie into dict by dict_from_cookiejar
cookdict=requests.utils.dict_from_cookiejar(s.cookies)

For HtmlUnit you may need to call  the following depending upon your need, options are self explanatory.


WebClient webClient = new WebClient();
webClient.setHTMLParserListener(HTMLParserListener.LOG_REPORTER);
webClient.setRedirectEnabled(true);
webClient.setJavaScriptEnabled(true);                                            
webClient.setThrowExceptionOnScriptError(false);            
webClient.setThrowExceptionOnFailingStatusCode(false); 
webClient.setUseInsecureSSL(true);
webClient.setCssEnabled(true);
webClient.setAppletEnabled(true);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());


Thanks for http-requests and HtmlUnit for these amazing frameworks which makes testing easier and faster. 

Happy Testing.

Wednesday, June 26, 2013

Code for converting Tamil Virtual University's Kamabaramayanam into E book format

The Tamil Epic Kambaramayanam is not available as e book format. However it is available in Tamil Virtual University as a HTML pages. Friend of mine, Iyyappan who wanted to study Kamabaramayan thought if that available as a e book format preferably mobile, it would be helpful. He asked me to lend a helping hand.

I downloaded the html pages from the TVU site. I used python http-requests to download. After downloading, it comes the big issue. These html pages and content are not organized, there is no id, structure to convert the content into desired format. So initially I tried to convert, gave up, just formatted html using BeautifulSoup and combined all the pages. So now the content is in e book format but not good for reading.

Here is the code for removing the extra html elements

print ''
print ''
print ''

for i in range(1,796+1):
     inputfilename='filename'+i+'.html'
     data=urllib2.urlopen(inputfilename)
     soup = BeautifulSoup(data)
     data=soup.prettify()
     soup = BeautifulSoup(data)
     ti=soup.findAll(attrs={'class':'link'})
     for t in ti:
          t.extract()
     ti=soup.findAll(attrs={'class':'thead'})
     for t in ti:
          t.extract()    
     hr=soup.findAll('hr')
     for h in hr:
          h.extract()
     hr=soup.findAll('form')
     for h in hr:
          h.extract()
     hr=soup.findAll('head')
     for h in hr:
          h.extract()
     hr=soup.findAll('input')
     for h in hr:
          h.extract()    
     print soup.prettify()
print '
'print '
'print '
'   
Now, need to find a way to format this code. So, I thought instead of trying to wrestle this content inside, html tags, just extract the content convert into text, then process it.

So, I removed the remaining unwanted text such as page numbers, headings and extracted the text using the following code

inputfilename='file.html'
data=urllib2.urlopen(inputfilename)
soup = BeautifulSoup(data)
data=soup.prettify()
soup = BeautifulSoup(data)
ti=soup.findAll(attrs={'class':'pno'})
for t in ti:
     t.extract()    
ti=soup.findAll(attrs={'class':'subhead'})
for t in ti:
     t.extract()    
lines=[]
for s in soup(text=True):
     #print s
     s=s.strip().replace('\t','')
     #if s!='.':
     #     print s
     print s

Then I found a pattern in the texts to get that converted. I used the poem numbers to find poems. So if the poem numbers starting at 3000, next poem is at 3001 so on and so forth. Next the explanation of the poems are next to the poem that would be easy to find out too. Removing the newlines and formatting the poem is easy too. Now, I think, it could be done if the content in the html too, but I didnt see that pattern in HTML. I could try to do that from remaining chapters.

Find the code below,

A=open('file.txt')
E=A.readlines()
A.close()

startcount=6060
emptyline=0
state=True
value=True
linevalue=0
i=0
isurai=True
for line in range(0,len(E)):
     #print E[line].strip()
     p = re.compile(r'\d')
     if p.match(E[line].replace('.','')) and E[line].replace('.','').find(str(startcount))!=-1:
          print E[line],
          #print E[line+1],
          #print E[line+2],
          if not isurai:
               break
          for i in range(1,10,2):
               if E[line+1+i]=='\n':
                    break
               if E[line+2+i].startswith('    '):
                    #print line,'a',i,i+1
                    print E[line+1+i].replace('    ','').replace('\n','')+E[line+2+i].replace('    ',''),
               else:
                    #print line,i,i+1
                    print E[line+i+1].replace('    ','').replace('\n','')
                    print E[line+i+2].replace('    ','').replace('\n','')
          if i<4: p="">               break
          #print 'iiiiiiiiii',i
          startcount=startcount+1
          isurai=False
          print '\n',
     else:
          printline=''
          printline1=''
          if i==9 or i==5:
               isurai=True
               for j in range(i,150):
                    if p.match(E[line+j].replace('.','')) and E[line+j].replace('.','').find(str(startcount))!=-1:
                         break
                    #if p.match(E[line+j]):
                    #     printline1=E[line+j+1].replace(' ','')
                    #else:    
                    printline=printline+' '+E[line+j].replace('\n',' ').replace(' ','')
               print printline.strip(),'\n\n',
               #print printline1.strip(),
               i=0

After this need to check manually the missing the poems are content, also check the poem numbers. Because it is manually typed and not proof read, poem numbers are not in order. I had to manually change that re run the script everytime.

Another manual work is to separate the sub headings from the explanations. Now the code to convert the txt to html format along with table of contents.

A=open('file.txt')
E=A.readlines()
A.close()
startcount=4740
print """




"""
co=[]
for e in range(0,len(E)):
     if E[e].find('\t')!=-1 and E[e].find('(')==-1:
          if E[e].find('.')!=-1:
               print '      '
               print E[e]
               print '
'
               print ' '
               print E[e+2]
               print '
'
               cc=''+E[e].strip()+''
               co.append(cc)
               continue
          else:
               print ' '
               print E[e]
               print '
'
               cc=''+E[e].strip()+''
               co.append(cc)
               continue
     else:         
          p = re.compile(r'\d')
          #if p.match(E[e].replace('.','')):
          #     print E[e]
          #print E[e].replace('.','').find(str(startcount))!=-1
          if p.match(E[e].replace('.','')) and E[e].replace('.','').find(str(startcount))!=-1:# :
               #print '-----------'
               print ' '
               print ''
               print ''

               print ' '
               print ''
               print ''
               print E[e]
               print '
'
               print '
'               print ''
               for i in range(1,20):
                    if E[e+i]!='\n':
                         print E[e+i]+''
                    else:
                         break
               print '
'
               print '
'               print '
'               print '
'               print '
'               print '
'               print ' '
               print ''
               print ''
               if E[e+i+1].find('\t')==-1 and len(E[e+i+1])<250: p="">                    print len(E[e+i+1])
                    break
               print E[e+i+1]+'
'
               i=0
               print '
'
               print '
'               print '
'               #print ' '
               startcount=startcount+1
print """              





"""
for cc in co:
     if len(cc)>28:
          if cc.find('padalam_sub_')!=-1:
               print '      ',cc,'
'    
          else:
               print '
',cc,'

'
print """





"""         
print """
    

     """Final manual work is move the TOC from bottom to top. Just cut and paste. All done.

Used python libs,

Requests: HTTP for Humans


Beautiful Soup


Find the formatted html in my public dropbox folder,

https://www.dropbox.com/sh/yy9lq619z299fp5/zqZuKIF74H

Download the file names contain _formatted

https://www.dropbox.com/sh/yy9lq619z299fp5/dAuP0o-KGF/arayanya_formatted.html
https://www.dropbox.com/sh/yy9lq619z299fp5/RkzrW3EiE3/ayodhya_formatted.html
https://www.dropbox.com/sh/yy9lq619z299fp5/hzZvSIdNzn/kikinada_formatted.html
https://www.dropbox.com/sh/yy9lq619z299fp5/aD_91lnW90/sundra_formatted.html


In the 7 chapters only four chapters are converted. Need to convert remaining three chapters. Hoping to finish that work in coming week.

Thursday, June 6, 2013

Automation vs Product Code - What is the difference?

What are you doing? Do you write code?

That would be one of questions faced by testers who write automation code/script. What is difference?

well, a simple explanation would be automation is interaction between browser, that wont help because most of things a tester uses is based on interaction between two systems. So that what is the difference? Why it is so hard for automation code to run?

Automation code needs 100% code coverage. In other words, each line should work or not to break the flow of the test script. If that happens automation tests become brittle. Compare to product code, coverage is less that 100%. There are figures about 30%,40%  or 80% etc but it always below 100%.

When traditional method of coding is applied to automation, result is endless writing-rewriting of automation scripts. Automation testers end up in a state where they work hard, then work harder but there is no output or useful work is done leading to the conclusion that automation is not going to work.

Another difference is how automation handle the changes in the UI? When there is change in the UI that particular change needs to be handled in the script otherwise test would fail. The question is how product teams handle these kind of changes? If we analyze that, then it is clear that traditional model of code should be applied here too. By separating the UI handles and script flow. (Clue- MVC)

So, we end up using traditional model where we shouldn't using and should use a model that can be easily modified without causing too much trouble.

In Tirukkural , one of oldest books of law, it says that, (my translation)

Doing what shouldn't be done brings bad results
Not doing what should be done brings bad results

There are people tried to solve this dilemma using methods such as BDD or Acceptance testing tools such as fit or Selenese.

Even though, BDD itself is mainly used for product coding, but it fits easily in automation testing. Instead of writing scripts or lines of code, automation testers will write the flow, that will be executed via a framework or by other means. That flow can be understood by developers, marketing people etc.

Apart from using BDD, separating code and ui selectors can make the tests more stable. One can go one step further even separate the data in code too, so code flow, data used for testing, ui element selectors will be in separate places making very easy to write tests, find why tests are brittle. This is needed because by simply using BDD alone can't make the life easier for the testers.

Monday, March 18, 2013

Learning Selenium/WebDriver

Recently, I am facing a question lot of times.

How do I learn Selenium/Webdriver?

That question is asked by my friends, friends of friends, co-workers who wanted to learn Selenium but couldn't know where to start. I tell them, that is easy, go to youtube.com, search for webdriver talks in Google Tech Talks, Google Test Automation Conf, Selenium Conf etc, you will find videos,presentations by creators of Selenium.

After repeating this lot of times, I thought, Ok let put those video links some where and send it to them. Because lot of them dont know who created Webdriver or commiters to Webdriver. So here are the links. I will keep updating this whenever I found time or new things.

First, Selenium IDE from SeleniumConf


GTAC 2011: WebDriver





Automating Your Browser Based Testing Using WebDriver 


Se Builder


Web performance testing using Webdriver



Apart from those sources, you can also look into these places to find Selenium related things.

Selenium Officical HQ http://seleniumhq.org/
Selenium Blog http://seleniumhq.wordpress.com/

Sunday, July 15, 2012

Managing PDF/Documents using Benubird PDF

Having the books, presentations, articles in PDF is the best way to read and use it across platforms. However, if you have more than thousands of them, then it is difficult to manage. Here my account of my efforts to find a way on my own and how I stumbled upon a good product to manage it.

When I started to use PDF books few years ago, at that time, with few files organizing into folders was easy. Then Google Desktop came, that made life easy. But that has other issues such as re-indexing when moving folders/system and of course the security vulnerabilities. So, I decided to go for a primitive solution. Have the name, title, author, description and location of PDFs in a spreadsheet. I made some tinkering so that I can open a pdf from spreadsheet. This too has some issues. If I send the file to someone else they also have to do this.

Ideal solution is to have those properties in the PDF so that extracting, sorting etc would be easier. Alas, it took long time for me to realize. So, how to do that? First, to know, how many files have those properties?

I started to search for python PDF packages and found pyPDF is good in extracting information from PDFs. But if the PDF size is more than 15MB then it hangs. Apart from that, edit the property of the PDF is only possible by creating a new PDF with those properties and content. This means I will end in duplicate files and need to check those files manually before deleting them. I've tried other tools too but it is same thing. Even the famous ExifTool is also creates new file.

So next thing is search for pdf manager. First hit is Mendeley. Actually it is managing your research papers, make the citations easy not for other purposes. So, it doesnt have any option is edit to title in PDF. You can only edit that in Mendeley not outside of it. That is not I am looking for but coming  close.

Next, searching alternatives for Mendeley. After reviewing few apps, found Benubird PDF. To my surprise, it offers easy editing of PDF properties and this app will write that to PDF. But dont try this with PDF opened in Acrobat. Benubird will overwrite the pdf leaving only one page. Ideally that throw a warning but it is okie as of now. It has other nice features as watched folders, smart lists, lists etc. UI shows Filename, title, author, Subject, tags size, type, path etc.

I have transferred all of my spreadsheet data to PDFs via Benubird. Hope that it will work fine. 

Other links if you want to check
PDFMiner

pdfrw 

pdflib

Monday, April 23, 2012

Logging Selenium WebDriver


With Selenium Webdriver version 2.15.0 logging is enabled to help the debugging. By default logging is enabled. Just add a logger to get the logs.

Setting log level is also easy. Set in the logger or use this code

((RemoteWebDriver) driver).setLogLevel(ALL);

As this helps in debugging however, in latest version there is a change in the log output. In 2.15 out put is shown below,

Jan 13, 2012 5:42:00 AM org.openqa.selenium.remote.RemoteWebDriver execute
INFO: Executing: [null, newSession {"desiredCapabilities":"Capabilities [{platform=WINDOWS, ensureCleanSession=true, browserName=internet explorer, version=}]"}]
Jan 13, 2012 5:42:16 AM org.apache.http.impl.client.DefaultRequestDirector tryExecute
INFO: I/O exception (java.net.SocketException) caught when processing request: Software caused connection abort: recv failed
Jan 13, 2012 5:42:16 AM org.apache.http.impl.client.DefaultRequestDirector tryExecute
INFO: Retrying request
Jan 13, 2012 5:42:16 AM org.openqa.selenium.remote.RemoteWebDriver execute
INFO: Executing: [e36c6b06-ab81-4daf-b99a-1af8cbc138cd, implicitlyWait {"ms":300000}]
Jan 13, 2012 5:42:17 AM org.openqa.selenium.remote.RemoteWebDriver execute
INFO: Executing: [e36c6b06-ab81-4daf-b99a-1af8cbc138cd, get {"url":"https://test.com/"}]
Jan 13, 2012 5:42:54 AM org.openqa.selenium.remote.RemoteWebDriver execute
INFO: Executing: [e36c6b06-ab81-4daf-b99a-1af8cbc138cd, getCurrentWindowHandle {}]
Jan 13, 2012 5:42:54 AM org.openqa.selenium.remote.RemoteWebDriver execute
INFO: Executing: [e36c6b06-ab81-4daf-b99a-1af8cbc138cd, getCurrentWindowHandle {}]
353b31e2-046e-403f-96da-4f520163d341
Jan 13, 2012 5:42:54 AM org.openqa.selenium.remote.RemoteWebDriver execute
INFO: Executing: [e36c6b06-ab81-4daf-b99a-1af8cbc138cd, switchToFrame {"id":0}]
Jan 13, 2012 5:42:54 AM org.openqa.selenium.remote.RemoteWebDriver execute
INFO: Executing: [e36c6b06-ab81-4daf-b99a-1af8cbc138cd, findElement {"using":"name","value":"a"}]
Jan 13, 2012 5:43:18 AM org.openqa.selenium.remote.RemoteWebDriver execute
INFO: Executing: [e36c6b06-ab81-4daf-b99a-1af8cbc138cd, findElement {"using":"name","value":"b"}]
Jan 13, 2012 5:43:18 AM org.openqa.selenium.remote.RemoteWebDriver execute
INFO: Executing: [e36c6b06-ab81-4daf-b99a-1af8cbc138cd, sendKeysToElement {"id":"e9a6a8d3-301f-49a7-85fc-977f35414095","value":["test"]}]
Jan 13, 2012 5:43:30 AM org.openqa.selenium.remote.RemoteWebDriver execute
INFO: Executing: [e36c6b06-ab81-4daf-b99a-1af8cbc138cd, findElement {"using":"name","value":"p"}]
Jan 13, 2012 5:43:33 AM org.openqa.selenium.remote.RemoteWebDriver execute
INFO: Executing: [e36c6b06-ab81-4daf-b99a-1af8cbc138cd, sendKeysToElement {"id":"c9373957-5fcd-4a63-999f-bf54fa1cc9fd","value":["test"]}]
Jan 13, 2012 5:43:40 AM org.openqa.selenium.remote.RemoteWebDriver execute
INFO: Executing: [e36c6b06-ab81-4daf-b99a-1af8cbc138cd, submitElement {"id":"c9373957-5fcd-4a63-999f-bf54fa1cc9fd"}]
Jan 13, 2012 5:43:59 AM org.openqa.selenium.remote.RemoteWebDriver execute
INFO: Executing: [e36c6b06-ab81-4daf-b99a-1af8cbc138cd, switchToWindow {"name":"353b31e2-046e-403f-96da-4f520163d341"}]



However, in 2.20 the output is like this,

Apr 6, 2012 12:17:02 PM java.util.logging.LogManager$RootLogger log
INFO: Logging of firefox driver is enabled
Apr 6, 2012 12:17:02 PM java.util.logging.LogManager$RootLogger log
INFO: Logging of firefox driver is enabled
Tabsv2.conf
Apr 6, 2012 12:17:02 PM org.openqa.selenium.remote.RemoteWebDriver log
FINE: Executing: get
Apr 6, 2012 12:17:02 PM org.openqa.selenium.remote.RemoteWebDriver log
FINE: Executing: getCurrentWindowHandle
Apr 6, 2012 12:17:04 PM org.openqa.selenium.remote.RemoteWebDriver log
FINE: Executing: getCurrentWindowHandle
{339289c1-afc6-4352-8ba5-b8e5f5c9997c}
Apr 6, 2012 12:17:04 PM org.openqa.selenium.remote.RemoteWebDriver log
FINE: Executing: switchToFrame
Apr 6, 2012 12:17:04 PM org.openqa.selenium.remote.RemoteWebDriver log
FINE: Executing: findElement
Apr 6, 2012 12:17:04 PM org.openqa.selenium.remote.RemoteWebDriver log
FINE: Executing: findElement
Apr 6, 2012 12:17:04 PM org.openqa.selenium.remote.RemoteWebDriver log
FINE: Executing: sendKeysToElement
Apr 6, 2012 12:17:04 PM org.openqa.selenium.remote.RemoteWebDriver log
FINE: Executing: findElement
Apr 6, 2012 12:17:04 PM org.openqa.selenium.remote.RemoteWebDriver log
FINE: Executing: sendKeysToElement
Apr 6, 2012 12:17:05 PM org.openqa.selenium.remote.RemoteWebDriver log
FINE: Executing: submitElement
Apr 6, 2012 12:17:05 PM org.openqa.selenium.remote.RemoteWebDriver log
FINE: Executing: switchToWindow
Apr 6, 2012 12:17:05 PM org.openqa.selenium.remote.RemoteWebDriver log
FINE: Executing: get
Apr 6, 2012 12:17:13 PM org.openqa.selenium.remote.RemoteWebDriver log
FINE: Executing: getCurrentWindowHandle
{339289c1-afc6-4352-8ba5-b8e5f5c9997c}

Notice the difference. Earlier, what the value is passed and what is the text to be typed all printed in the logs. Now, it was changed.


Why? Because they implementation is changed. Earlier this code is used to print the logs in RemoteWebDriver.java

try {
      log(sessionId, command.toString(), command, When.BEFORE);
      response = executor.execute(command);
      if (response == null) {
        log(sessionId, command.toString (), command, When.AFTER);
        return null;
      }


Now, it changed to like this,

try {
      log(sessionId, command.getName(), command, When.BEFORE);
      response = executor.execute(command);
      if (response == null) {
        log(sessionId, command.getName(), command, When.AFTER);
        return null;
      }


So, I changed those lines in source, complied and replaced in the jar file Now, the logs look like as in 2.15.0


Happy Testing.

Thursday, March 8, 2012

Locate anything with jQuery in Selenium

One of major issues of automation of XHTML or Ajax based pages is locating the elements which dont have id/name. There is no need for the developers to put them in that way or testers can't go and enforce id/name for all the elements.

First shortcut testers take is XPath locators. Yes, that is easy and can work in any browser, but the pitfall is fragility. A addition of row in that page can make the XPath locator to brake. This issue can be solved by using jQuery's :has and :contains options.

Let see an example,

See the below screenshot,
Code for this html can be found here, http://pastebin.com/6ndmTDST

In that, there is no id,name or any other CSS attribute to locate expect that input and a tag.

What is the locator for locating first checkbox?

jQuery(':has(input):contains("AA"):last input')


Locator for EB in the first row if EB and 1EA both has value of EB like the below image?

jQuery(':has(:contains("EB")):contains("AA"):last :contains("EB"):last')

How it works?


:has locates all the elements which have the element we are looking for. Then, first :contains filters out the last td/tr/div/span in which both elements are located. This takes out the necessity of juggling with parent,child,adjacent calculations. Once we have the row, it is easy to filter the element we are looking for.


You can locate pretty much anything using these two combinations. Next time, if you think some locator can't located without going for XPath, think again. 


Now, comes the next question, WebDriver aka  Selenium 2 doesn't support jQuery. It only support CSS selectors. How can I use these locators in Selenium? Well, you can inject jQuery into the page using JavaScript executor method. Code for the same can be found elsewhere or look at the Selenium/WebDriver users list. 

Note: You'll notice that, there is no jQuery in the HTML code. I used FireQuery.


Happy Automated Testing.

ShareThis

raja's shared items

There was an error in this gadget

My "Testing" Bundle