Research Methods Supercourse

Publish Soon, Publish Often, a Guide to Scientific Publications

 Introduction    Questions and Answers    Dr.Serageldin Course  

Ask a question  "BA Supercourse Help Desk" "BASupercourse Help Desk" 

Answer a question "BA Supercourse Help Desk" "BASupercourse Help Desk"  Please include a particular question to your E-mail together with your Answer to this question. Thank you for your Answer! Direct E-mail address for questions and answers is 



Expert Answers

2017 Questions    
Q26 by Madzime
August 27, 2017
I need to validate a questionnaire for its eventual establishment as a research instrument in the field. What kind of steps do i need to go about validating it? A26 by Nicolas Padilla, August 28, 2017

Depend on type of answers. Scale maybe Crombach's alpha and it is apply one time.
If the answers are binary Kappa de Cohen. The questionnaire is apply three times at the same person. Two times for the same researcher and the third by another researcher. Witn this, we measure repetability  intra e inter observer.

A26 by Faina Linkov, August 28, 2017
These 2 links will be helfpul: 
Q25 by Hani
August 10, 2017
 What are the situations where it would be valid or recommended to conduct a survey using a mixture of paper-based and online versions? how would the response rate be calculated in such a case? A25 by Nicolas Padilla, August 11, 2017

The questionnaire type depend on the nature of the questions.  I prefer on line for sensitive questions and paper for another questions. Also, it depends from financial resources.
2016 Questions    
Q24 by Avalli, November 2, 2016 I need to develop village maps of risk factors of diabetes. The risk factors will be assessed through household survey of individuals. For developing GIS maps for risk factors based on survey what should be the sample size of household in villages. Will random selection of 25% households in each village suffix to develop maps as high, moderate & low A24 by Nicolas Padilla, August 11, 2017

It depends on the sample size. The sample can be by conglomerate and add 50% of the sample size to obtain more precision and power sue to design factor, Using a sampling scheme complex need more sample size. 
Q23 by Teresa, November 1, 2016 What is the quickest and best form of recruitment for a qualitative study, which requires 12 face-to-face interviews? A23 by Nicolas Padilla, November 01, 2016

It depend on the study design.
All selection of participants is non-paramethric.
Q22 by Karen, March 30, 2016 I am required to conduct an information needs analysis to determine the needs of a school community (the majority of the children come from a poor background). Which data collection tools and sampling method or methods can be used? A22 by Nicolas Padilla, March 31, 2016
The determine the needs of a schools is needed a cross-sectional design. The sampling method can be simple random choice ( systematic or with random numbers).The tools to col​l​ect information depend on what you think abou​t​ the need​s​ of the​ ​MAy​ ​be on nutrition of scholar-age childre​n​ or obesity or health care.

by Ronald LaPorte, March 31, 2016
I wanted to add a few more pragmatic comments. It is always best to start with an existing survey, that has been used frequently and is directly comparable with other centers. I typically recommend a WHO survey or FAO, NHANES, Monaca, etc. If you want to add additional questions it is fine. One reason for this is that often articles are turned down if you say that you are using the "Buckle" survey that you developed to measure smoking, in contrast to the WHO Smoking survey used in 50 countries and 10,000 people.. You can easily search to find surveys that have stood the test of time. This also eliminates the need to do reliability testing for a survey that you developed as you do not need to test a survey that has been used 20000 times. I would think that in your country and others. Do a google scholar search and identify some of the authors, and just write to them to get their advice. When I was starting out I did this all the time. I would write a MOO (methods of operation). This describes exactly what you are planning to do, the hypotheses you plan to test, the data you will collect, and how you plan to analyze your data. This can be changed as the project evolves, but it serves as a template. You will find as you plan this survey that various people will come to you and say "why don't you collect this", what about this, what about a sample of adults, etc. You have to be very careful that you do not allow project drift. Project drift is where you start out a project designed for one set of objectives, and then other objectives our piled on. Having a MOO helps to prevent this.I always think that it is important to pilot test before going in to a much broader survey. Even by testing this on 40 students, you can see the problems you will run into with the much larger sample, and you can modify you protocolI always believe the shorter the survey the better. I personally like 1 page surveys if you can do this. Once you have a MOO written you can have scientists external to your center review it. As you can see, for experienced researchers, it is very important to try and iron out issues before you go into the field. This eliminates many problems.
2015 Questions    
Q21 by Aliaa, November 17, 2015 What are the best statistical analysis methods that could be used in evaluating experiments with small sample size of experimental animals? I am working in chronic experiment in which I am facing a high mortality rate. I started with a large number of rats, but unfortunately the end surviving animals was small. I hope if you could guide me for the most appropriate statistical method for analysis of my work. A21 by Nicolas Padilla, November 24, 2015
Your statistical analysis depends on what do you want to measure.
Also, it depends on type of variables that you are measuring.

Survival? Kaplan Meier Curve.

Do you have two groups: exposed and non-exposed and your variables are
quantitative? t Student for two independent means.

Are your variables categoricals and you have two groups with follow up? Risk Ratios.

Are your variables categorical and you have two groups without follow up? Chi-squared test and Odds Ratio.

If you have a small sample size (for example, less than 50) or your quantitative variables are no
Normal you can use Wilcoxon.
Q20 by Afiamaa
March 28, 2015
What are methodological assumptions? A20 by Faina Linkov, March 28, 2015
That's a tough question as different types of study methodologies generally have different underlying assumptions.
Overall most investigators would have assumptions pertaining to the underlying distribution if data, study heterogeneity, and characteristics of targeted population.
This is a good article for case control and cohort studies description
You may comment to clarify the question.
Q19 by Aliaa, February 11, 2015  What are the best conditions and the best prophylactic therapy that could be used to protect the experimental rats in a long-term (2 months duration) or chronic diabetes Study using Streptozotocin(STZ)? Is it suitable to use antimicrobial therapy as a prophylactic during such experiments? Is there any reference mentioned that issue? A19 by Ronald LaPorte, February 12, 2015
Thank you very much for your question. I am a diabetes researcher, but sadly not working in the area of animal models. When I have an important question like this I search google scholar for others who have published in the area. Then select 5-10 and write to them posing your question. Typically a few will respond as scientists like to help colleagues. Then you can also continue to ask questions of that person. If no one responds, write to another 10. You will find someone who can help. Good luck!
2014 Questions    
Q18 by Joel Samson Ruvugo, March 31, 2014  I am in need of finding support on how to publish the literature review and what procedures to follow. A18 by Ronald LaPorte, March 31, 2014
I am in need of finding support on how to publish the literature review and what procedures to follow.
Writing a literature can be a daunting task.  Luckly most of the areas where one would need to write a literature review you can find example.  Search first on your topic in google scholar, and google it self. From these you can identify review articles that provide information. It is best to do a systematic review.  Do a search on Systematic reviews and this will provide guidelines. You might consider a Meta Analysis. We have a wonderful lecture on Meta Analysis in the supercourse
I personally like to do reviews and put tables which describe the literature. For example something like this.
Studies examining the relationship of Physical Activity to bone density. With tables like this one can immediately see the overview of the area.  The text would then describe the area.
Author    Year     Population   Type of study   relationship between PA and BD   conclusions  comment
If you have not written a review before it is good to find a mentor.  You can find mentors at your university, also you can find people who have published in the area and ask them to mentor you.  In general it will be difficult to get a full professor to help. However, if you find an assistant professor who wants to help, contact them.
I would also do a search in Youtube on "how to write a review".  I have been finding that many of the points presented in youtube are very good
The end of the review will typically consist of types of research that can be done in the future.
It is best initially to do reviews in areas that are specific to where you live. For example a review on the epidemiology of sand Pneumonia would be of interest to many people in and out of Saudi Arabia.
After your article is written search around to find experts who could review it.
Q17 by Naresh T Chauhan, March 31, 2014 I am helping one of my student on conducting one study on factors delaying the diagnosis of breast cancer in tertiary care center. Kindly guide me How to proceed? I mean after doing review of literature I found that most of the researcher had taken the newly diagnosed cases from hospital for particular duration, Is there any other way to select the subject in this study. This study will be done amongst three special center diagnosing and providing treatment to cancer pt. A17 by Nicolas Padilla, March 31, 2014

Depend of study design.
Maybe Cases (Breast cancer with delaying diagnosis, por example metastasis) Controls (breast cancer with early diagnosis). And to ask or to see the registry why diagnosis was delayed (from subject, hospital, system etc)

A17 by Jay M. Fleisher, March 31, 2014

The control group selected is problematic (Case-Control Methodology)

It ignores Length Bias and Lead Time Bias. 

If one looks at a group Dx with early stage Ca vs Late Stage Dx one would be ignoring the aggressiveness of the individual cancers.   

For example One person could be diagnosed with a early stage Ca and survive say, 8 years.  While another person could be Dx at the same stage and only live one year.

The latter case having a more aggressive form of tumor. 

I am confused at the outcome of the proposed study.  Is it Survival.   If so things like stage, and grade have to be factored in and one have to follow the cohort for at least 5 years.   This puts us into a Prospective Cohort Design, which is costly. The proposed study is more complex than it seems.   One needs to Google both Lead Time Bias and Length Bias to better understand


A17 by Ronald LaPorte, March 31, 2014

A difficulty for me is that it is not really clear as to what your hypothesis is, and what you want to test. I typically have my students outline the hypothesis first, before defining the population.You appear to want to look at those how are late diagnosis compared to early.  There is a large literature on this which you should review. Do a search in Google Scholar.
You can set up a study by identifying those who come in late compared to those who are early. You need to define operationally what "late" is, and what "early is".  You could look at all women  coming in during a certain period of time and do a case control study.  There are many different types of surveys that you can use, use something that has already been used.  You have to define your questions first as to what factors might be associated with a delay in coming in, then this will define your survey.  It would be good also if you contacted people world wide who have done research in the area. It is an important area, but you need to do a little more homework as to how to set up the design
Q16 by Mohammad Asif Alokozai, March 07, 2014 We have conducted the EPI coverage survey, and analyzed the data, the survey is WHO 30*7 cluster survey and the clusters select using PPS. The none response were calculated to be 20 percent, so we have had 30 clusters in each province with 210 interview to be done for each province, but now some province got lesser number of interview e.g. 159, or 166, 169, so does this data need reweighing during analysis? for finding the proportions of coverage using Stata software so what will be the best command to be used? A16 by Eman Eltahlawy, March 07, 2014

If your cluster , 159  166 , or  169 ,  we must look for reasons why we are not get all out sample size if they take all children in this cluster  and they are less than we planned  so u not in need to reweigh but if there is some difficulties like security or refusal may be u must try again to approach this area and get what u missing  , but if it is difficult so reweigh this areas only. 

In cluster sample if the cluster become less than expected in some area you must revise why it become less

First if cluster include all children in area and they are few in number as in some villages in the desert so you not need for reweigh

Second reasons if there is a refusal to participate or security condition at time of visiting those cluster so you may revisit this cluster to get the rest of target

Third if there is a big difficulties to reach those cluster again so you should reweigh those cluster again

A16 by Nicolas Padilla, March 07, 2014

First, you should know why the surveys are lesser than expected.
Second, in a clustered sampling all people in the cluster should be
 included, maybe including under 18 years.

Comments by Mohammad Asif Alokozai, March 09, 2014
Yes the reason for not getting the complete number of interviews in some clusters were because of less number of children in these clusters, not refusal or security. as security were the problem but we have recruited the interviewers from local area who were familiar with context and local traditions.   for less number of interviews as total per province, although the number of interviews were more;  but these participants were above or lower in age, of set criteria for the survey (children of 12 to 23) months. 
so this was the reasons for lower number of interviews.

Q15 by Dr Rajeev Rao Eashwari, March 02, 2014 I am a medical doctor in charge of eHealth services for a province in South Africa. I am planning to do a province wide teleHealth needs of rural health practitioners.

I am struggling to get a survey tool for teleHealth needs analysis with an aim to have focus group interviews.
Please advise
A15 by Faina Linkov, March 03, 2014

There was an article published on telehealth needs assessment a few years ago,
it can be found here
I  wanted to emphasize the fact that it's is important not just to assess the needs
but also to keep in minds capabilities of the country for which you are trying
to assess the needs. Professionals may say that they need extremely high level
of expensive technology, but its important to remember that we need to look at
what can be done with even limited technology.
2013 Questions    
Q1 2013 by Nabil D Sulaiman, May 19, 2013 What is the best sampling frame for a national diabetes prevalence study in the absence of updated  GHS sample?


          A1 by Mohamed E. Salem, May 25, 2013


In the absence of General Household Survey (GHS) sample, the most accurate and easy alternative is to use the readily available country geographical information to create your own national representative sample. The country map including its geographical information will be used as the sample frame. Geographical Information System (GIS) applications will help you to divide your map into representative clusters of households. ArcGIS is one of the application that could be used to do this exercise.

Here is a youtube link for ArcGIS tutorial

Here is an alternative youtube link (in Arabic) the sound is not very good

The program will enable you to use your country map as a sample frame. Using the geographical information available on the map, you can select the clusters of households. Within each cluster you can draw a systematic random sample using walk through method inside the selected study areas.

          A1 by Nabil D Sulaiman, May 26, 2013

Various sampling approaches could be explored to seek the best possible sampling frame for a rapidly changing expatriate population in the Gulf region, which are, The National Census (GHS), Water and Electricity Register, Telephone register and National ID.

Based on representativeness and feasibility,
we in UAE have adopted a novel sampling methodology, which involved systematic random sampling through Preventive Medicine Departments (PMDs), where all expatriate adults in the UAE are legally required to attend every 2-3 years to renew their residency visa. The PMD is a single place where recruiters, interviewers, nurses and phlebotomists are available. All staff working were nominated and trained for the study. Blood samples for both the study and the visa renewal, were collected at the same visit.
Q2 by Abdelrahim Mutwakel Gaffar, May 19, 2013  I am conducting an evaluation study for a project using "a pre- intervention - post -intervention" design.
What is the best statistical test to examine the change due to the intervention.
A2 by Sami AR AL-Dubai, June 6, 2013

Someone can use SPSS. The statistical test depends on the type of the dependent variable someone want to test. If the dependent variable is normally distributed then you can use ''paired sample T-Test''. If it is not normally distributed, then you can use the alternative non-parametric test '' 2 related samples'' (Wilcoxon Signed Ranks).

by Shacara Johnson, June 5, 2013

This research would be considered a repeated measures study design or paired design, in which you are interested in observing an intervention change in the same group of subjects. The statistical test to use is called a paired t-test so that you can determine whether a difference exist (finding the mean and standard deviation of the differences between the before- and after- measurements) and then the t-distribution for the single mean is used to analyze the difference (at the significance level).

by Mohammad Babaeeian moghaddam, 
June 5, 2013

If If we have a repeated measures design with pre-post measure
and the variables are distributed normally
(As assessed by SPSS, we can use a paired t test analysis.
If the data are not distributed normally,
we can use a Wilcoxson non-parametric test.
These two tests can be found in all standard statistical
testing procedures (SPSS, SAS and others).

by Nicolas Padilla, June 6, 2013 

If the variable is quantitative, the mean of differences (measure 1 - measure2) then the mean of differences and then t Student (short sample size less than 50) or Z(sample size more than 50). 

A2 by Rami H. AL Rifai, June 9, 2013

The idea is that "Tell what the type of the dependent variable you have to tell you what type of test you could use".

There are 2 main types of variables:
1- Continuous like measuring blood pressure.
2- Discreet: like Yes or No variables.

Those variables could be dichotomous (two subcategory) or dichotomous (three subcategories) or even more than three subcategories.

However, in pre- post intervention studies, the testes to be used if your intervention was carried on the same groups are different from those if your intervention was carried out on different groups like control and intervention groups.

For example, if your dependent variable is continuous, and your intervention was on the same group, you have to use Paired T-test to detect if there was a difference due to intervention or not. But before that you have test the assumption of normality distribution for the dependent variable otherwise you have to use the non-parametric test.

A2 by Jay M. Fleisher, June 10, 2013 

If your data is continuous and Normally distributed you can use a paired
t-test Procedure.  This would apply if you can create an index over all

If you data is Categorical you can use McNeamars Test. This would be
applicable if you are looking at individual questions
Remember your data are pared.

by Mohamed E. Salem, June13, 2013 

If you want to measure the impact of an intervention, the simplest test are
1) The paired sample T-test; in case your indicator is measured quantitatively (blood sugar level, BMI, etc..).
See this link on how to perform paired sample T-test using SPSS

2) the McNemar test; If your indicator is measured qualitatively (diseased or not, complicated or not, etc…)
See this link on how to use McNemar

My suggestion to you is to improve your study design (pre-post) intervention design is a weak design if you want to relate the change (improvement) to your intervention. Including a control group will give more strength to your results. Random assignment of the cases and controls to your intervention will make your study even stronger and blindness will be perfect if it is applicable.
I am attaching a link to study designs
In case you operate one of the above designs the analysis should be more in-depth using, double difference analysis, regression model and non-equivalent group design in case of quasi-experimental designs

by Saad Tai, June 2, 2013
I am interested to conduct a KAP study on HIV in Pakistan among medical doctors. My question is from where I can get questionnaire or how can I make self made questionnaire.                          A3 by Deena Alasfoor, June 7, 2013

Guideline on how to do a KAP study is on the following link:

 The research questions depend on the context and how you want to use the information; in general; It is important that each knowledge question is followed up with an attitude and a practice question that helps you in the course of action/intervention . As a researcher you need to identify your questions, based on the context and use the KAP method to explore these.

I hope this is helpful.
A3 by
Bruce G. Weniger, June 8, 2013 

Search the medical literature for well-written, high-quality reports in competitive journals for studies with similar research questions and methods you wish to employ.  Many journals are already making available questionnaires, protocols, and other study-related documents by optional online download of “supplementary” material for published “printed” reports.  For example, the supplementary material to this study ( is available in the “Tools” section at right on its webpage, providing a URL link to the additional files:

 If the questionnaire is not thus posted, then you can email or write to the paper’s author(s), explaining that you would like to perform a similar study in your own population, using similar questionnaire for comparability.  Ask if the author(s) will provide you the questionnaire to adapt for your own study.  Offer to acknowledge their assistance in your future paper, and to cite their work if relevant to what you eventually perform and find.

A3 by Shacara Johnson, June 8, 2013

You can access information on KAP (Knowledge, Attitudes, and Practice) survey instruments using the World Health Organization’s website or conduct an internet search for HIV KAP surveys in Pakistan. You might also want to search publications by fellow researchers who conducted similar research among HIV care providers in Pakistan and contact them about collaborating or asking for permission to utilize their instrument for your work.

There are several references to HIV KAP instruments pertaining to the Eastern Mediterranean region (which would include Pakistan) stemming from topics ranging from conducting behavioral surveillance of risk factors to country-level results of KAP implementation. If you cannot identify a current instrument being used in Pakistan, then you might seek to search for other KAP instruments used in a similar setting for which you can modify for what you desire to examine in Pakistan among health care providers. Two sources as examples from the WHO site that may be of interest to provide points of consideration while constructing your instrument include: (HIV surveillance) and (A guide for developing KAP for control but may be useful as you are developing your instrument for HIV).

Q4 by Murtada Osman, June 11, 2013  How to select the journal for publication? A4 by Deena Alasfoor, June 14, 2013

Selecting a suitable journal for publication is one of the most difficult tasks of researchers. The impact factor; interest of the journal and the value of your manuscript; as well as your experience in publishing all count for this.  Obviously, you would want to aim at the journals with the highest impact factors. However, it is very hard to publish in these unless your manuscript is of great importance; and you have collaborators who have published in that area earlier.  First, select the journals that might interest you; probably these will be the ones you refer to them. If your publication is context free; then you might have a better chance in publishing in an international high impact factor. If your manuscript for example presents national  survey results you may want to go to a national or regional journal.  Once you have read the authors guide carefully, be sure that your paper matches the journals subjects of interest; then if you have a number of these you could try for the highest impact factor first, and then if rejected go to the lower one until your paper is accepted. This could happen at the first time; but could also take some attempts before getting a journal that accepts your publication. Sometimes the topic had been discussed enough, and the authors do not see that your publication adds a new thing to the existing knowledge, do not get disappointed, keep trying. Good Luck

by Eugene Shubnikov, June 11, 2013

 I will recommend you to study Supercourse lecture with the name "Publishing research articles: a look on the inside" by Sholpan Askarova. For your first paper please consider sending your article to Journals with not the highest impact factor as Sholpan recommends.

Q5 by

Hanan Abdulghafur Khalil  (thought face book), June 23, 2013

May I ask about the required sample size for pilot study and pretesting and

whether the results should be mentioned briefly after the study completed?
A5 by Eman Eltahlawy  (thought face book), June 23. 2013

Thanks Dr Hanan for your question. It depends on your needs -  to test the language of questionnaire, the methodology and logistics inside the field and to ensure that time needed for questionnaire. 


A5 by Eugene Shubnikov, June 23. 2013

Dear Hanan, I recommend you to study lectures from Introductory page, especially  "Sample size and Statistical power Lecture". Thank you for Question!

A5 by Fatma Hassan, June 23, 2013 (Facebook)

Baker (1994) found that a sample size of 10-20% of the sample size for the actual study is a reasonable number of participants to consider enrolling in a pilot study. Another rule of the thumb is to take 30 patients or greater to estimate a parameter (Browne, 1999). Yes the results of pilot should be reported, better in the methodology section. The details of any modifications in the questionnaire based on pilot should be reported.

A5 by Nicolas Padilla , June 23, 2013 (Facebook)

In a pilot study you need about 10-20% of the sample size needed for the larger study

A5 by Jay Fleisher, June 23, 2013 (Facebook)

Sample Size calculations basically deal with the difference you expect to see and the probability you wish this difference will occur ( Alpha).  There are many Sample Size calculators of the Web for free.

Q6 by

Andrey Kuznetzov  (thought face book), July 08, 2013

Is there any standard R function for calculation of a variance of probability distribution (not sample variance)? Thanks in advance. A6 by Jay Fleisher, July 21, 2013
I think the question pertains to the software package R.  There are many
distributions besides the Normal Distribution.

I think the question is how to find the variance using R of a certain
probability distribution.

I attach a brief description of what I think this question means.



Q7 by Mohammad Babaeeian moghaddam, July 19, 2013 Animal bites are an important problem in my city and I want to investigate that. How can I design the study (what study design I can use)? Any questionnaires are available for such studies? A7 by Nicolas Padilla, July 20, 2013 (Facebook)
 First, are you meaning animals as dogs, for example? If you want to know the burden of animals bites it is better to use a cross-sectional design, if you want to know the risk factors for animals bites, it is better to use cases-controls design.
Q8 by Shatabdi Goon, August 28,2013 If I lost my data from survey, which was used in SPSS program, how I will be able to find out the statistical analysis without having those data(want to evaluate p value). Is it possible to get the p value from the direct result? A8 by Nicolas Padilla, August 29, 2013

You can use epidat (statistical software for free from Xunta de Galicia and PAHO) using tabulated data, for example.

A8 by Eman Eltahlawy, September 4, 2013

You can use Epicalac 2000 for tabulated data to evaluate the p value , this easy program and free in the net

A8 by Mohamed E. Salem, September 5, 2013

You can use spss and organise ur data for 2*2 table as 0 1, 1 1, 0 0, 1 0 and put the count for each category as a third column . Then go to data weight and weight your data by the count column
Q9 by Nagah Selim, September 23,2013
How can I calculate the sample size if I would like to assess the prevalence of depression, anxiety and stress among school children. According to the literature the prevalence is 48%, 38% and 47%respectively and also the prevalence is different among male and female.
A9 by Javier Muñiz, September 23, 2013
1.- Your study aims at estimating a proportion in the population.
2.- You have to consider:
            a.- How do I select the participants?: Sampling procedure
            b.- How many participants should I select: Sample size (related to “a” to some extent).
3.- Assuming simple random sampling in “a”, the sample size depends on:
a.- Size of the population to which you want to infere the proportion that you will find in your study (sample). Surprisingly, it is not very important (unless very small populations).
b.- Any idea of what you expect to find? (48%, 38% and 47% in your case). 50% is the most demanding assumption (the one that will result in a bigger sample size). Use 50% and you will be safe (your pre-study estimate is very close to 50%).
c.- What precision do I need? Or, how wide do I want the confidence interval of my estimate? A wide confidence interval is less precise than a narrow one. The narrower the confidence interval desired when presenting the results (better precision), the bigger the sample size.
Below find an output of a program that I use (EPIDAT, developed by Xunta de Galicia, Spain and O.P.S.)
Sample size and precision to estimate a population proportion
            Size of the population:   20000
          Expected proportion:     50,0%
            Confidence level:       95,0%
              Study design:         1,0
Precisión (%)     Tamaño de muestra
------------      ----------------
       1,000                  6489
       2,000                  2144
       3,000                  1014
       4,000                   583
       5,000                   377
       6,000                   264
       7,000                   195
       8,000                   149
       9,000                   118
      10,000                    96
What does this mean? For example, if you choose to aim at a precision of 2% (IT IS YOUR DECISION AS INVESTIGATOR) for the whole study, you will aim to include 2144 participants. At the end of the study you will be able to say: The prevalence of depression among school children is 50%, with a 95% confidence interval of 48-52% (maybe it is not exactly 50%, but you will be pretty sure that the proportion in the population is somewhere between 48% and 52%). When considering subgroups, your precision will decrease because of smaller sample sizes available (for example, you may have around 1000 boys and 1000 girls and the corresponding precision in these subgroups will be around 3%).
NOTE: We have assumed simple random sampling (not always feasible when studying kids in schools). If other design is chosen this may affect the sample size (bigger samples will be needed, at least in theory).
It is plenty of free programs available to compute the sample size for different study designs. I recommend you EPIDAT 3.1 because it also have some other very useful procedure for tabulated data ( There is a more recent version (4.0) that still does not completely substitutes the previous one.

A9 by Abu Zar, September 23, 2013  
The sample size, in this case, refers to the number of children to be included in the survey.
Step 1: Base Sample-size Calculation
The appropriate sample size for a population-based survey is determined largely by three factors: (i) the estimated prevalence of the variable of interest – chronic malnutrition in this instance, (ii) the desired level of confidence and (iii) the acceptable margin of error.
For a survey design based on a simple random sample, the sample size required can be calculated according to the following formula.
 x p(1-p)
= required sample size
t =
 confidence level at 95% (standard value of 1.96)
p =
 estimated prevalence of malnutrition in the project area
m =
 margin of error at 5% (standard value of 0.05)
In the Al Haouz project in Morocco, it has been estimated that roughly 30% (0.3) of the children in the project area suffer from chronic malnutrition. This figure has been taken from national statistics on malnutrition in rural areas. Use of the standard values listed above provides the following calculation.
1.96² x .3(1-.3)
n =
3.8416 x .21
n =
322.72 ~ 323
Step 2: Design Effect
The anthropometric survey is designed as a cluster sample (a representative selection of villages), not a simple random sample. To correct for the difference in design, the sample size is multiplied by the design effect (D).
The design effect is generally assumed to be 2 for nutrition surveys using cluster-sampling methodology.
D = 323 x 2 = 646
Step 3: Contingency
The sample is further increased by 5% to account for contingencies such as non-response or recording error.
+ 5% = 646 x 1.05 = 678.3 ˜ 678
Step 4: Distribution of Observations
Finally, the calculation result is rounded up to the closest number that matches well with the number of clusters (30 villages) to be surveyed.
Thirty is the standard number of clusters established by the WHO Expanded Programme of Immunization (EPI Cluster Surveys). There is no statistically necessary reason to maintain exactly 30 clusters, and the number can be adjusted if there is a compelling motive for doing so.
Final Sample Size: N = 690 children
The final sample size (N) is then divided by the number of clusters (30) to determine the number of observations per cluster.
N ÷ no. clusters = 690 ÷ 30 = 23 children per village
General Rule: Standardized Sample Sizes for Nutrition Surveys
The following table provides the recommended sample size for various estimated levels of malnutrition, incorporating standard values for confidence level and margin of error. The final sample size includes the contingency percentage and is rounded to match well with a 30-cluster survey.
(est. % malnutrition)
(base sample size)
n x D
(n x design effect)
(final sample size)
0.2 (20%)
0.25 (25%)
0.3 (30%)
0.35 (35%)
0.4 (40%)
0.45 (45%)
0.5 (50%)
If it is not possible to find an estimated prevalence of malnutrition for the project area, the recommended action is to set the sample size at 810.
When in doubt, set the sample size at 810.
See also my lecture on Sample Size Estimation -

Q10 by Nagah Selim, September 23,2013
If I would like to study prevalence of a disease among clients attending phcc and I have a rough estimate on the monthly attendees, can I use this number as total population for calculating the sample size
If I calculate the sample size and distribute it  proportionally between the health centers and I will take all the attendees during a specified period of time, say 2 months, for data collection
What do we call these from the sampling methodology point of view? Could we consider this as all population?
A10 by Jay Fleisher, September 23, 2013

If we assume alpha=0.05 and a margin of error = 5% the following sample sizes are

For a prevalence of 48%, n=384
                    38%, n=363
                     43%, n=377

This is for an un-stratified analysis because we don't have info for a difference for males vs females...  One should add in about 20% for non-responders, if applicable

I have added a link ( see attachment) that explains how to do it and calculates the sample size for you. One can alternate alpha, size of the margin of error to get different estimates...

As for the second question, if I understand it, the answer is no one can't assume it is the whole population. What you have is a sample that goes to pcc.  Thus the inference will be to the clinic

See a link:

Q11 by Naresh Chauhan September 29,2013 What is the sampling technique to draw samples from urban slum to know their behaviour on particular health problems and service utilization? A11 by Jay Fleisher, October 06, 2013

The following steps should be taken to insure an unbiased sampling:

1. Define the Population you want to sample. The inference of any analysis will go to this population
2. Define your basic Measure of Effect. Are you going to sample Homes, Individuals, etc
3. When step 1 and 2 are completed RANDOMLY sample.
4. Conduct Analysis
Q12 by Mary  Mwangome, October 02,2013

 I am planning to analyse a cohort database for the effect of half dose of drug x prophylaxis on deaths. Drug x is a prophylactic medication.
The cohort is about 6 years old. Patients received half-dose X daily then from 1 year ago, the dosing of drug X in this cohort was changed to full dose to conform to National recommendations.

My interest is in finding out if half dose is equivalent or inferior to full dose in preventing deaths.
My outcome of interest is mortality though I will have to deal with loss to follow up rates of up to 30%. This outcome is chosen because morbidity data has not been collected in a standardized way and prospective collection has cost implications.
The prescription of X was also not very standardized and adherence not measured.
These data quality issues are what make me think of an ecological study. May be compare deaths rate over time by amount of drug X dispensed or by number of drug X prescriptions in a year or other parameters.

This question I want to answer is very important but due to ethical reasons, I was not allowed to prospectively compare (leave older patients on half dose and start new patients on full dose).

My question now:

1.       I am thinking ahead in terms of interpretation of the results I find? I am not sure how I would go about that if I choose the ecological design. I am also thinking of the time lag issue between dose changing and when to expect effect on death given that the outcome is not a very commonly occurring one.

2.       I don’t know if stata can help me in handling this kind of analysis including controlling for various other variables may be at group level as well.

3.       If you have smarter ideas on how I can approach this question using the data I have, kindly help me.

A12 by Jay Fleisher, October 06, 2013

There are sampling issues in this study design but I don’t think their fatal.  If I understand the design, you have a situation where the patients act as their own controls with respect to dosage.  Thus you have paired data. I would break the paring and run separate analyses on each dose.  I would try Logistic regression on each dose.  This would control for any covariates you have.

In other words you can use mortality as your outcome variable and  dose1 + covariates for dose 1 for your Independent variables  and do the same analysis separately for dose 2.    Then compare the odds ratios for each dose along with 95% Confidence Intervals and p values that the Logistic regression will provide to you.   If the Confidence Intervals overlap then Dose would have no effect.    As for the 30% Lost to follow-up I think a 70% follow-up rate is acceptable.  STATA, and SAS can do this easily.  You would have to report the weaknesses in design of course.  The “wash out” period is of concern since there was none.    I would give it a try anyway. My opinion is that if your results show a clinically significant difference among the Higher dosage you have an answer.  If they do not then you have another answer.

A12 by Faina Linkov, October 06, 2013

Main points for the answer:
1. Loss to follow up of 30% in 6 years is very good and typical for studies of this caliber. 
2. Survival analysis might provide good approach for some of the data analysis. 
3. Stata and sas can both do the analysis.


Q13 by Zafar Fatmi,December 05,2013 I am trying to analyze the time-series data for Air pollution and cardiovascular diseases. I want to use Generalized Additive Model (GAM) for analysis. I am unable to find any help in this regard.
I have to adjust for weather variables and age and gender.
Please provide some guideline and help.
A13 by Faina Linkov, December 14,2013

Perhaps an answer to question 13 is this guideline -
Q14 by Mohammad Babaeeian moghaddam,December 05, 2013

See Question 7 and Answer 7 first.
 Animal bite(dog bite or petty - home dog) is an important problem in my city and I want investigate causes(factors that make animal angry and then animals attack their owners. How can I design the study(what study design I can use)? Is the questionary available for this study? A14 by Nicolas Padilla, December 6, 2013

You can use a cross-sectional design. Search homes with dogs or domestic pets and ask for bites and risk factors (people battered the pet, diseases of the pets etc. Classify the homes in bites by dog or without bites and classify with risk factors or without them and calculate Odds Ratio and Attributable Fraction in exposed.