I thought I would post a recent conversation I had regarding the analysis of metrics. The Ross Report collates software testing benchmarks for Australian organisations. The discussion with Michael De Robertis, Test Manager at ISoft, highlighted a mutual interest in testing metrics, and the deeper questions raised as you start to peel back the layers.
Hi Kelvin.
Thanks a lot for your detailed response.
Yes please feel free to post on your blog, and you’re welcome to include my name/title/company.
From my personal perspective, it is a breath of fresh air to establish good, constructive dialog with any like minded people who have a deep understanding of Testing and who wish to challenge the boundaries of what it all means.
One of the things I love about testing is that despite having 15 uninterrupted years in the profession I still have so much to learn, and good philosophical discussions about Testing still inspire me.
Whilst I can’t speak for regions outside of Canberra (and Sydney to a lesser degree where I used to live), I am deeply concerned with the general quality of Testing professionals out there, and perhaps this is my biggest driver for having a good, mature Benchmarking process which I refine and improve religiously.
I am seeing all types of Testers out there – those who are only politically driven and insist on a requirement having a dot on the “ i “; those who don’t understand business priorities and how to manage risk or to plan; those who only want to break systems and drive wedges with development; those who lose motivation quickly; and those who don’t understand why solid process needs to accompany what we do.
I am seeing these qualities in “certified” testers (both new and experienced), and in graduate IT professionals, many of whom do not even study one single subject on testing. In my final year of my Electronic Engineering degree, testing was a big deal. I even majored in subjects on testing (through what was then called the Australian Centre for Test and Evaluation, ACTE). However unfortunately I don’t see a passion for testing driving the Test industry as it did when I began my career. Instead it seems to be only market driven (ie we need more testers, even if they are no good at it).
Apologies for the drawn out background. In my view, this is where I see publications such as the Ross Report adding real value because the Testing industry needs proper guidance through skills validation programmes complimented with independent evaluations (benchmarks) about how well the industry performs.
Refer below, where I have responded to your follow up questions.
Thanks again.
Regards,
Hi Michael,
Thanks for your comments and feedback on the Ross Report. I really appreciate your email, as it gives us valuable feedback on the way the industry benchmark is being used and what improvements and additional information users are seeking.
Apologies first of all for not responding sooner, it has been on my long list of to do items for a little while, and I have been waiting for a little time to put some real attention to it.
I will add some responses inline with your feedback below.
Also would it be possible that I could post your comments and my responses to my blog page? I found this discussion quite interesting, and I would like to share it with others.
Cheers
Kelvin
Hi there.
Can you please forward this email to someone who is familiar with the content of the Ross Report?
I found the report very comprehensive and particularly useful to compare to our benchmark…..love the metrics! J
I must say I am a numbers person myself. I love looking at what data shows, and using that to gain a deeper understanding of the testing process. Also I have the view that "without evidence, you are just another person with an opinion".
What I did find though, was the more that you look at the results, the more questions that open up to you. You tend to question: how did the number emerge, what does it mean, what influences it, how to I improve it... And I think that is the point you make below. Certainly after scratching the surface, I want to know more.
Getting the balance right in the survey is hard. To better understand I want to ask more questions, but the tradeoff then is participation will fall off if respondents find the survey too onerous. What we probably need to do is to periodically run other more detailed surveys that drill down, so long as participants dont then suffer "survey fatigue"! Anyhow a challenge to build the just right "goldilocks" survey.
Compiling the Ross Report turned into a much larger job than I anticipated. We spent well over 500 hours, with multiple people involved. It was a big job, but we want do go deeper and wider (more respondents) next time.
We are just about to kick off the 2012 Ross Report, so your feedback is very timely.
In my personal view, there were three areas that I was hoping someone could comment further about (purely for the benefit of comparing philosophies of benchmarking).
1) More proof about Automation Tools effectiveness – The report rightly separates automation and performance testing from manual testing, however I couldn’t find measurements that specifically reported on whether automation testing (for example) improved regression fail rates or resulted in bugs being found. All that I could see was that organisations were happy with their choice (but not the backup metrics as to why). I also couldn’t find metrics on downtime needed for training, setup, configuration, execution, and maintenance of those tools.
Very good point. I think the industry is still uncertain on the real benefits they hope to obtain from automation. Few organisation measure their effort, tests automated/executed, defects found, etc. Automation ROI expectations seem to be poorly understood in terms of their economics, and the metrics to show that.
There are questions around metrics for automation, for instance, that we should measure manual test reduction. Many times we count execution reduction from automated tests, that would perhaps never have been chosed to be executed if it was manual. We probably still expend a lot of effort running tests that shouldn't be continued, as they are not providing much insight.
There were some metrics on coverage, regression re-execution specific to automation, and these indicated some surpising insights - many organisations are poorat re-executing automated tests. Obviously this requires more in-depth follow up! There were also some metrics in relation to
Your comment about defect discovery rates from automation is also pertinent. I think this is a very important metric. Also perhaps to be combined with effort, to deduce defects per hour (from automation). On page 49 you should see we report that only 6% of regression tests fail per test cycle. Regression tests have lower defect yield than new feature or defect correction tests. I would expect for many this would also apply to automation, as in most organisations automation is focussed on regression testing.
I think your comments to extenddefect detection, testing effort metrics from general testing, to specifically look at automation, is a good suggestion, and one we will look at.
2) Skills validation as opposed to Skills Certification – There is still a view amongst many testing professionals that existing certification programs are not value-adding other than to say that you have the certification. Although I have certification, I lean towards this view because I personally feel that certification could do more to prove that testing professionals have worked the hours, gained the experience, delivered the services, and have been endorsed by former and current peers. Hence I think the report would add value if it considered skills qualification and validation rather than through certification alone.
I agree, certification falls short in a number of areas in terms of skills validation. See http://kelvinross.blogspot.com/2010/10/certification-is-failing-do-we-need.html and http://dorothygraham.blogspot.com/2011/02/part-3-certification-schemes-do-not.html.
I think to go beyond the certifications that organisation's staff are attaining, we would need to get some measurement on the competencies of the staff. I am just not sure how we can measure this from an organisational perspective??? It ispretty hard to assess competency of individuals, doing it in a meaningful way for organisations, hmmm, I need to think about that a bit more.
We tried to assess where staff are sourced, in terms of career entry, e.g. from thebusiness, development, etc. and impact of staff attributes, e.g. tools, techniques, theory, etc.
In other work we have undertaken skills assessments, including with our own consulting staff, scored skills for different skill categories, and tried to role that up to a organisational profile score to aim for improvements, etc. I say tried, because getting a precise skill profile was a challenge. While this profile is important to the individual organisation, I am not sure how this as an average trend would benefit, nor how we would measure individual organisations in a consistent way.
I think also if we take a organisation norm, it would perhaps look similar if we surveyed individuals regarding their skills and competencies across a large survey set. It is also possibly easier to do it at an individual level. What it might not show us is what skill areas are in most demand, and where are the most significant shortfalls.
That being said, what you say about skills validation is definately correct. Could you perhaps give me some more suggestions on what you would like to see here from a benchmarking point of view?
You are right, I’m not sure how you would present this in a benchmark / survey form and I’m not sure how respondents would react. However, if I look at this from a real day to day issue, my problem is that I cannot be secure in knowing that a Tester is fit for the job on the basis of having certification. I recall the early 90’s when Cisco Certification was a big deal. Now it isn’t, and everyone has it. In my view Testing is heading down the same path, and therefore I rely on my own benchmarking to substitute “certification” for “validated skills”.
I think your report almost has the structure in place to do this when you look at the Perceptions section. For example instead of a Test Manager answering a question about what their perception is about something, why not have a section whereby a number of disciplines are required to answer (ie Developers, Dev Managers, Project Managers, BA, etc)? At least once per year I obtain this feedback from my peers. Perhaps respondents might not do so if their team culture is not solid, I don’t know.
In terms of actual certification, I think something like CBAP (for business analysis / systems engineering) would be a good model.
3) Respondent Profile & the influence of the report due to their Methodology – I couldn’t relate to the average team size being up to 25. In my 15 year testing career spanning small and large organisations, I can recall only two occasions where test teams were this big (let alone the whole project team), however they were also subdivided and very independent of each other. Therefore I was slightly concerned that the report leaned towards the perspective of more strictly controlled / less-progressive teams running old methodologies. In fact, your Methodology section confirms that given that Waterfall appeared to be one of the most prominent methodologies used.
Team sizes vary dramatically, you can see some organisation at 1-5 members, the norm around 11-25, and also several in the 100+. The team size though refers to the overall testing team size for the organisation, and hasn't been reduced to project testing team size. Some of the 100+ organisation I know have over 500 testers, some at 700 - 750 testers.
When we looked at the Project Landscape you can see it is common to have 10 - 20 projects per annum. Also common was a 6 month project duration. This paints a rough picture that on average you are probably going to have a project with 2 or 3 testers. Larger projects with longer durations may had 5 - 10 perhaps. We didn't specifically ask though the average number of testers on a project. As you point out that could change a lot based on methodogy, archecture, team composition, etc.
I think the survey does have a bias towards "testing aware" companies, as we less likely to have contact with "testing ignorant" companies. Also as you point out, this is likely biased to larger organisations. I think in benchmarking yourself, you need to think in terms of benchmarking with the testing industry, which wouldn't include "testing ignorant/weak" organisations.
Waterfall is still most common, however Agile has been increasing dramatically over recent times. Still waterfall is prevalent in large teams/organisations, however we are finding the understanding is improving, and the barriers are coming down to adoption of Agile. What I still see in organisation is misconception of what constitutes "Agile", and hence why we tried to drill down into some of the Agile practices. We need to do this further again in the next survey.
Some other areas that didn’t appear to be covered which might add value in my opinion are:
1) Exploring more deeply into the skill sets of testing professionals. Examples:
a. Are they more closely aligned with the business side (in scope & analysis) or with the development / technical side of things (design, tools, infrastructure, etc)
b. What classifies an Automation or a Performance Test Specialist? Is it a Tester who can capture/playback and make basic adjustments to automation scripts, or are they at the other end of the spectrum and are fully competent in reading/writing in scripting and programming languages?
I think this relates to your point 2 above.
2) More detail about the value adding of test professionals (in addition to perceptions of testing) . Examples:
a. How quickly can they adapt to the job at hand?
b. What is their contribution to process improvements, finding important bugs, and identifying new features?
I think we should be able to look at some of these in future perceptions. We just need to be able to phrase it into a perception questions, and one that has qualified results, and one where the benchmark is likely to provide some comparable/actionable results that organisations can compare their own position against.
As discussed earlier, perhaps we need to split out a survey for individuals, and get their bottom up input into the testing approach, rather than the top down perception from the organisation view as well.
3) More information about how Test Managers estimate. Example:
a. If a Test Manager has a wealth of metrics at their disposal, what are the best ones to use when estimating for future projects, and what are their reasons?
Michael, is this something that you wanted surveyed, or something that needed some further discussion?
In my experience, people tend to use rules of thumb at the early stage, supported by % of project budget, and developer/tester ratios (2 metrics we did survey). That is when they are varying their budget, for many it is what you had last time. As more detail comes in they tend to move towards work breakdown, and that's when overheads must be factored in such as test management, test environments, etc. (again distribution of test effort metrics provide some guidance here). I guess we could survey whether people use other things such as function points (or test points, etc.), and other estimation approaches.
If you think we should survey this further, some example of what you would like to see would help us.
Good question. I probably have to think about this one in more detail, but basically, I am constantly challenging myself to get the perfect estimate. That is despite the scope creep, changing priorities, higher than expected bug rates, etc etc. My view is that these ‘unplanned’ events should always be expected and therefore planned. However this should be done with a proven technique rather than to simply double the time required (for example). I have detailed methods which I use and which consider these events, but I’m not sure if others do likewise particularly in organizations where they have to stick to the fixed number of deliverables or timeframes (even if quality is not up to scratch). In my group, we react according to the need and therefore the estimation process needs to have lots of options. Stressful? Yes. Good end result? Also yes!
Hence I’m after guidance which tells me that my estimation methods are good or not good (etc). From a mathematical perspective, I would think this is achievable.
4) Post Deployment and measurement of success - How accurate was testing implemented to a plan, what criterion was used (and by whom) to determine if the project was a success, and does this criterion ever change depending on the priorities? Example:
a. If the test plan could not be implemented accurately (due to other project issues out of our control) AND the project ran late (due to a larger number of bugs needing to be fixed) AND the client was very happy with the end result, is this considered a failure or a success? In my industry (safety critical, I would be happy with this result).
Yes, you point he is quite right. Completing a project on schedule is not the only measurement of success. Finding defects late in a project is always a huge frustration, and I have experienced it first hand where despite the testing evidence the project has steamrolled into production, in some cases costing 100's of millions of dollars. So success could be a cancelled project due to test evidence.
The Standish study doesn't reflect that point. I think IT portfolio management should be like share portfolio management. Some investments will not acheive the return, and better to abandon them then to stick with it.
I guess we (as an industry) need to think about measuring success more effectively. Quality ultimately needs to include the end user / customer perception. While testing we tend to focus on the pre-deployment metrics, it is the post deployment metrics which really count! It is like we are measuring inputs to quality, and we need to focus more on outputs of quality.
Here we could look at specific issues, like you suggest above. It might be worth understanding:
- projects delays caused by testing, did post-deployment quality improve. Ie. are testing results beneficial in influence project schedules
- do project plans effectively use testing results to measure plan milestone completion
- and so on
Something also for our benchmark team to focus on for 2012, and one where your suggestions are very welcome.