prev next front |1 |2 |3 |4 |5 |6 |7 |8 |9 |10 |11 |12 |13 |14 |15 |16 |17 |18 |19 |20 |21 |22 |23 |24 |25 |26 |27 |28 |29 |30 |review
It is important to practise on some real datasets. One useful resource is StatLib . Here we analyse the TUMOR data set contributed by Terry Therneau. One reason of choosing this data set is that it is small enough for easy handling (n=86). The purpose is purely for computer practice; not to examine the quality or findings of the study.

The bladder tumor data file contains 8 variables (names): treatment group (group), follow-up time (futime), pre-treatment number of tumors (number), largest pre-treatment tumor size (size), and times to first, second, third, and fourth recurrences. Only time to first recurrence is analysed in this practice. We used a word processor to edit the raw file so that each row represents one subject, each line contains only 5 values (removing the last 3 recurrences). If the fifth value was left blank (meaning no recurrence) we replace the blank by a dot (.), preceded by a space. All comments and description in the file were also removed. The file is saved as c:\data\tumor.dat (a text /ascii file).