Saturday, August 10, 2013

Functional HtmlUnit testing in Python using Requests for Humans,

Using testing for Html by HtmlUnit/http-requests using server side requests is a good way to improve the speed of testing. In Ajax era, testing the web throws more challenges in automation testing, better way is to test the XHR requests separately rather than trying to test all combinations in browser based UI automation.

First let us see the GET/POST request by HtmlUnit,

//Create a WebRequest for POST, use HttpMethod.GET for GET requests.
WebRequest request = new WebRequest(new URL("http://yoururl.com"), HttpMethod.POST);
request.setRequestParameters(new ArrayList());

//Add the params using ArrayList
request.getRequestParameters().add(new NameValuePair("param1","value1"));

//Send the request to the server using getPage.
HtmlPage page = webClient.getPage(request);

//Get the response as a String.
String html= page.getWebResponse().getContentAsString();
//Cookies can accessed using the following code,
CookieManager CM = webClient.getCookieManager();
Set ck = CM.getCookies();
for(Cookie value : ck)   
{
     System.out.println(value+" "+value.getValue());
}
So, how to do this in Python using http-requests (Requests for humans)?

#First to create a session using requests.Session
s=requests.Session()
#if you need to print the url along with params using this
s=requests.Session(config={'verbose':sys.stderr})

#Create a dict for params to be sent
url = 'https://yoururlhere.com'
params={'param1':'value1' 'param2':'value2'}

#Call the url & params
reply=s.post(url, params=params)

#if you want to send the params in the request body not in query string use the data option
reply=s.post(url,data=params)

#GET request using s.get option
reply=s.get(url,data=params)

#get the redirected url by accessing .url
print reply.url

#text or content will give the response
print reply.text
print reply.content

#Accessing cookie is by s.cookies
print s.cookies

#convert cookie into dict by dict_from_cookiejar
cookdict=requests.utils.dict_from_cookiejar(s.cookies)

For HtmlUnit you may need to call  the following depending upon your need, options are self explanatory.


WebClient webClient = new WebClient();
webClient.setHTMLParserListener(HTMLParserListener.LOG_REPORTER);
webClient.setRedirectEnabled(true);
webClient.setJavaScriptEnabled(true);                                            
webClient.setThrowExceptionOnScriptError(false);            
webClient.setThrowExceptionOnFailingStatusCode(false); 
webClient.setUseInsecureSSL(true);
webClient.setCssEnabled(true);
webClient.setAppletEnabled(true);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());


Thanks for http-requests and HtmlUnit for these amazing frameworks which makes testing easier and faster. 

Happy Testing.

Wednesday, June 26, 2013

Code for converting Tamil Virtual University's Kamabaramayanam into E book format

The Tamil Epic Kambaramayanam is not available as e book format. However it is available in Tamil Virtual University as a HTML pages. Friend of mine, Iyyappan who wanted to study Kamabaramayan thought if that available as a e book format preferably mobile, it would be helpful. He asked me to lend a helping hand.

I downloaded the html pages from the TVU site. I used python http-requests to download. After downloading, it comes the big issue. These html pages and content are not organized, there is no id, structure to convert the content into desired format. So initially I tried to convert, gave up, just formatted html using BeautifulSoup and combined all the pages. So now the content is in e book format but not good for reading.

Here is the code for removing the extra html elements

print ''
print ''
print ''

for i in range(1,796+1):
     inputfilename='filename'+i+'.html'
     data=urllib2.urlopen(inputfilename)
     soup = BeautifulSoup(data)
     data=soup.prettify()
     soup = BeautifulSoup(data)
     ti=soup.findAll(attrs={'class':'link'})
     for t in ti:
          t.extract()
     ti=soup.findAll(attrs={'class':'thead'})
     for t in ti:
          t.extract()    
     hr=soup.findAll('hr')
     for h in hr:
          h.extract()
     hr=soup.findAll('form')
     for h in hr:
          h.extract()
     hr=soup.findAll('head')
     for h in hr:
          h.extract()
     hr=soup.findAll('input')
     for h in hr:
          h.extract()    
     print soup.prettify()
print '
'print '
'print '
'   
Now, need to find a way to format this code. So, I thought instead of trying to wrestle this content inside, html tags, just extract the content convert into text, then process it.

So, I removed the remaining unwanted text such as page numbers, headings and extracted the text using the following code

inputfilename='file.html'
data=urllib2.urlopen(inputfilename)
soup = BeautifulSoup(data)
data=soup.prettify()
soup = BeautifulSoup(data)
ti=soup.findAll(attrs={'class':'pno'})
for t in ti:
     t.extract()    
ti=soup.findAll(attrs={'class':'subhead'})
for t in ti:
     t.extract()    
lines=[]
for s in soup(text=True):
     #print s
     s=s.strip().replace('\t','')
     #if s!='.':
     #     print s
     print s

Then I found a pattern in the texts to get that converted. I used the poem numbers to find poems. So if the poem numbers starting at 3000, next poem is at 3001 so on and so forth. Next the explanation of the poems are next to the poem that would be easy to find out too. Removing the newlines and formatting the poem is easy too. Now, I think, it could be done if the content in the html too, but I didnt see that pattern in HTML. I could try to do that from remaining chapters.

Find the code below,

A=open('file.txt')
E=A.readlines()
A.close()

startcount=6060
emptyline=0
state=True
value=True
linevalue=0
i=0
isurai=True
for line in range(0,len(E)):
     #print E[line].strip()
     p = re.compile(r'\d')
     if p.match(E[line].replace('.','')) and E[line].replace('.','').find(str(startcount))!=-1:
          print E[line],
          #print E[line+1],
          #print E[line+2],
          if not isurai:
               break
          for i in range(1,10,2):
               if E[line+1+i]=='\n':
                    break
               if E[line+2+i].startswith('    '):
                    #print line,'a',i,i+1
                    print E[line+1+i].replace('    ','').replace('\n','')+E[line+2+i].replace('    ',''),
               else:
                    #print line,i,i+1
                    print E[line+i+1].replace('    ','').replace('\n','')
                    print E[line+i+2].replace('    ','').replace('\n','')
          if i<4: p="">               break
          #print 'iiiiiiiiii',i
          startcount=startcount+1
          isurai=False
          print '\n',
     else:
          printline=''
          printline1=''
          if i==9 or i==5:
               isurai=True
               for j in range(i,150):
                    if p.match(E[line+j].replace('.','')) and E[line+j].replace('.','').find(str(startcount))!=-1:
                         break
                    #if p.match(E[line+j]):
                    #     printline1=E[line+j+1].replace(' ','')
                    #else:    
                    printline=printline+' '+E[line+j].replace('\n',' ').replace(' ','')
               print printline.strip(),'\n\n',
               #print printline1.strip(),
               i=0

After this need to check manually the missing the poems are content, also check the poem numbers. Because it is manually typed and not proof read, poem numbers are not in order. I had to manually change that re run the script everytime.

Another manual work is to separate the sub headings from the explanations. Now the code to convert the txt to html format along with table of contents.

A=open('file.txt')
E=A.readlines()
A.close()
startcount=4740
print """




"""
co=[]
for e in range(0,len(E)):
     if E[e].find('\t')!=-1 and E[e].find('(')==-1:
          if E[e].find('.')!=-1:
               print '      '
               print E[e]
               print '
'
               print ' '
               print E[e+2]
               print '
'
               cc=''+E[e].strip()+''
               co.append(cc)
               continue
          else:
               print ' '
               print E[e]
               print '
'
               cc=''+E[e].strip()+''
               co.append(cc)
               continue
     else:         
          p = re.compile(r'\d')
          #if p.match(E[e].replace('.','')):
          #     print E[e]
          #print E[e].replace('.','').find(str(startcount))!=-1
          if p.match(E[e].replace('.','')) and E[e].replace('.','').find(str(startcount))!=-1:# :
               #print '-----------'
               print ' '
               print ''
               print ''

               print ' '
               print ''
               print ''
               print E[e]
               print '
'
               print '
'               print ''
               for i in range(1,20):
                    if E[e+i]!='\n':
                         print E[e+i]+''
                    else:
                         break
               print '
'
               print '
'               print '
'               print '
'               print '
'               print '
'               print ' '
               print ''
               print ''
               if E[e+i+1].find('\t')==-1 and len(E[e+i+1])<250: p="">                    print len(E[e+i+1])
                    break
               print E[e+i+1]+'
'
               i=0
               print '
'
               print '
'               print '
'               #print ' '
               startcount=startcount+1
print """              





"""
for cc in co:
     if len(cc)>28:
          if cc.find('padalam_sub_')!=-1:
               print '      ',cc,'
'    
          else:
               print '
',cc,'

'
print """





"""         
print """
    

     """Final manual work is move the TOC from bottom to top. Just cut and paste. All done.

Used python libs,

Requests: HTTP for Humans


Beautiful Soup


Find the formatted html in my public dropbox folder,

https://www.dropbox.com/sh/yy9lq619z299fp5/zqZuKIF74H

Download the file names contain _formatted

https://www.dropbox.com/sh/yy9lq619z299fp5/dAuP0o-KGF/arayanya_formatted.html
https://www.dropbox.com/sh/yy9lq619z299fp5/RkzrW3EiE3/ayodhya_formatted.html
https://www.dropbox.com/sh/yy9lq619z299fp5/hzZvSIdNzn/kikinada_formatted.html
https://www.dropbox.com/sh/yy9lq619z299fp5/aD_91lnW90/sundra_formatted.html


In the 7 chapters only four chapters are converted. Need to convert remaining three chapters. Hoping to finish that work in coming week.

Thursday, June 6, 2013

Automation vs Product Code - What is the difference?

What are you doing? Do you write code?

That would be one of questions faced by testers who write automation code/script. What is difference?

well, a simple explanation would be automation is interaction between browser, that wont help because most of things a tester uses is based on interaction between two systems. So that what is the difference? Why it is so hard for automation code to run?

Automation code needs 100% code coverage. In other words, each line should work or not to break the flow of the test script. If that happens automation tests become brittle. Compare to product code, coverage is less that 100%. There are figures about 30%,40%  or 80% etc but it always below 100%.

When traditional method of coding is applied to automation, result is endless writing-rewriting of automation scripts. Automation testers end up in a state where they work hard, then work harder but there is no output or useful work is done leading to the conclusion that automation is not going to work.

Another difference is how automation handle the changes in the UI? When there is change in the UI that particular change needs to be handled in the script otherwise test would fail. The question is how product teams handle these kind of changes? If we analyze that, then it is clear that traditional model of code should be applied here too. By separating the UI handles and script flow. (Clue- MVC)

So, we end up using traditional model where we shouldn't using and should use a model that can be easily modified without causing too much trouble.

In Tirukkural , one of oldest books of law, it says that, (my translation)

Doing what shouldn't be done brings bad results
Not doing what should be done brings bad results

There are people tried to solve this dilemma using methods such as BDD or Acceptance testing tools such as fit or Selenese.

Even though, BDD itself is mainly used for product coding, but it fits easily in automation testing. Instead of writing scripts or lines of code, automation testers will write the flow, that will be executed via a framework or by other means. That flow can be understood by developers, marketing people etc.

Apart from using BDD, separating code and ui selectors can make the tests more stable. One can go one step further even separate the data in code too, so code flow, data used for testing, ui element selectors will be in separate places making very easy to write tests, find why tests are brittle. This is needed because by simply using BDD alone can't make the life easier for the testers.

Monday, March 18, 2013

Learning Selenium/WebDriver

Recently, I am facing a question lot of times.

How do I learn Selenium/Webdriver?

That question is asked by my friends, friends of friends, co-workers who wanted to learn Selenium but couldn't know where to start. I tell them, that is easy, go to youtube.com, search for webdriver talks in Google Tech Talks, Google Test Automation Conf, Selenium Conf etc, you will find videos,presentations by creators of Selenium.

After repeating this lot of times, I thought, Ok let put those video links some where and send it to them. Because lot of them dont know who created Webdriver or commiters to Webdriver. So here are the links. I will keep updating this whenever I found time or new things.

First, Selenium IDE from SeleniumConf


GTAC 2011: WebDriver





Automating Your Browser Based Testing Using WebDriver 


Se Builder


Web performance testing using Webdriver



Apart from those sources, you can also look into these places to find Selenium related things.

Selenium Officical HQ http://seleniumhq.org/
Selenium Blog http://seleniumhq.wordpress.com/

ShareThis

raja's shared items

There was an error in this gadget

My "Testing" Bundle