stats.py
Go to the documentation of this file.
1 # Copyright (c) 1999-2007 Gary Strangman; All Rights Reserved.
2 #
3 # Permission is hereby granted, free of charge, to any person obtaining a copy
4 # of this software and associated documentation files (the "Software"), to deal
5 # in the Software without restriction, including without limitation the rights
6 # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
7 # copies of the Software, and to permit persons to whom the Software is
8 # furnished to do so, subject to the following conditions:
9 #
10 # The above copyright notice and this permission notice shall be included in
11 # all copies or substantial portions of the Software.
12 #
13 # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14 # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15 # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16 # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17 # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18 # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
19 # THE SOFTWARE.
20 #
21 # Comments and/or additions are welcome (send e-mail to:
22 # strang@nmr.mgh.harvard.edu).
23 #
24 """
25 stats.py module
26 
27 (Requires pstat.py module.)
28 
29 #################################################
30 ####### Written by: Gary Strangman ###########
31 ####### Last modified: Dec 18, 2007 ###########
32 #################################################
33 
34 A collection of basic statistical functions for python. The function
35 names appear below.
36 
37 IMPORTANT: There are really *3* sets of functions. The first set has an 'l'
38 prefix, which can be used with list or tuple arguments. The second set has
39 an 'a' prefix, which can accept NumPy array arguments. These latter
40 functions are defined only when NumPy is available on the system. The third
41 type has NO prefix (i.e., has the name that appears below). Functions of
42 this set are members of a "Dispatch" class, c/o David Ascher. This class
43 allows different functions to be called depending on the type of the passed
44 arguments. Thus, stats.mean is a member of the Dispatch class and
45 stats.mean(range(20)) will call stats.lmean(range(20)) while
46 stats.mean(Numeric.arange(20)) will call stats.amean(Numeric.arange(20)).
47 This is a handy way to keep consistent function names when different
48 argument types require different functions to be called. Having
49 implementated the Dispatch class, however, means that to get info on
50 a given function, you must use the REAL function name ... that is
51 "print stats.lmean.__doc__" or "print stats.amean.__doc__" work fine,
52 while "print stats.mean.__doc__" will print the doc for the Dispatch
53 class. NUMPY FUNCTIONS ('a' prefix) generally have more argument options
54 but should otherwise be consistent with the corresponding list functions.
55 
56 Disclaimers: The function list is obviously incomplete and, worse, the
57 functions are not optimized. All functions have been tested (some more
58 so than others), but they are far from bulletproof. Thus, as with any
59 free software, no warranty or guarantee is expressed or implied. :-) A
60 few extra functions that don't appear in the list below can be found by
61 interested treasure-hunters. These functions don't necessarily have
62 both list and array versions but were deemed useful
63 
64 CENTRAL TENDENCY: geometricmean
65  harmonicmean
66  mean
67  median
68  medianscore
69  mode
70 
71 MOMENTS: moment
72  variation
73  skew
74  kurtosis
75  skewtest (for Numpy arrays only)
76  kurtosistest (for Numpy arrays only)
77  normaltest (for Numpy arrays only)
78 
79 ALTERED VERSIONS: tmean (for Numpy arrays only)
80  tvar (for Numpy arrays only)
81  tmin (for Numpy arrays only)
82  tmax (for Numpy arrays only)
83  tstdev (for Numpy arrays only)
84  tsem (for Numpy arrays only)
85  describe
86 
87 FREQUENCY STATS: itemfreq
88  scoreatpercentile
89  percentileofscore
90  histogram
91  cumfreq
92  relfreq
93 
94 VARIABILITY: obrientransform
95  samplevar
96  samplestdev
97  signaltonoise (for Numpy arrays only)
98  var
99  stdev
100  sterr
101  sem
102  z
103  zs
104  zmap (for Numpy arrays only)
105 
106 TRIMMING FCNS: threshold (for Numpy arrays only)
107  trimboth
108  trim1
109  round (round all vals to 'n' decimals; Numpy only)
110 
111 CORRELATION FCNS: covariance (for Numpy arrays only)
112  correlation (for Numpy arrays only)
113  paired
114  pearsonr
115  spearmanr
116  pointbiserialr
117  kendalltau
118  linregress
119 
120 INFERENTIAL STATS: ttest_1samp
121  ttest_ind
122  ttest_rel
123  chisquare
124  ks_2samp
125  mannwhitneyu
126  ranksums
127  wilcoxont
128  kruskalwallish
129  friedmanchisquare
130 
131 PROBABILITY CALCS: chisqprob
132  erfcc
133  zprob
134  ksprob
135  fprob
136  betacf
137  gammln
138  betai
139 
140 ANOVA FUNCTIONS: F_oneway
141  F_value
142 
143 SUPPORT FUNCTIONS: writecc
144  incr
145  sign (for Numpy arrays only)
146  sum
147  cumsum
148  ss
149  summult
150  sumdiffsquared
151  square_of_sums
152  shellsort
153  rankdata
154  outputpairedstats
155  findwithin
156 """
157 ## CHANGE LOG:
158 ## ===========
159 ## 07-11.26 ... conversion for numpy started
160 ## 07-05-16 ... added Lin's Concordance Correlation Coefficient (alincc) and acov
161 ## 05-08-21 ... added "Dice's coefficient"
162 ## 04-10-26 ... added ap2t(), an ugly fcn for converting p-vals to T-vals
163 ## 04-04-03 ... added amasslinregress() function to do regression on N-D arrays
164 ## 03-01-03 ... CHANGED VERSION TO 0.6
165 ## fixed atsem() to properly handle limits=None case
166 ## improved histogram and median functions (estbinwidth) and
167 ## fixed atvar() function (wrong answers for neg numbers?!?)
168 ## 02-11-19 ... fixed attest_ind and attest_rel for div-by-zero Overflows
169 ## 02-05-10 ... fixed lchisqprob indentation (failed when df=even)
170 ## 00-12-28 ... removed aanova() to separate module, fixed licensing to
171 ## match Python License, fixed doc string & imports
172 ## 00-04-13 ... pulled all "global" statements, except from aanova()
173 ## added/fixed lots of documentation, removed io.py dependency
174 ## changed to version 0.5
175 ## 99-11-13 ... added asign() function
176 ## 99-11-01 ... changed version to 0.4 ... enough incremental changes now
177 ## 99-10-25 ... added acovariance and acorrelation functions
178 ## 99-10-10 ... fixed askew/akurtosis to avoid divide-by-zero errors
179 ## added aglm function (crude, but will be improved)
180 ## 99-10-04 ... upgraded acumsum, ass, asummult, asamplevar, avar, etc. to
181 ## all handle lists of 'dimension's and keepdims
182 ## REMOVED ar0, ar2, ar3, ar4 and replaced them with around
183 ## reinserted fixes for abetai to avoid math overflows
184 ## 99-09-05 ... rewrote achisqprob/aerfcc/aksprob/afprob/abetacf/abetai to
185 ## handle multi-dimensional arrays (whew!)
186 ## 99-08-30 ... fixed l/amoment, l/askew, l/akurtosis per D'Agostino (1990)
187 ## added anormaltest per same reference
188 ## re-wrote azprob to calc arrays of probs all at once
189 ## 99-08-22 ... edited attest_ind printing section so arrays could be rounded
190 ## 99-08-19 ... fixed amean and aharmonicmean for non-error(!) overflow on
191 ## short/byte arrays (mean of #s btw 100-300 = -150??)
192 ## 99-08-09 ... fixed asum so that the None case works for Byte arrays
193 ## 99-08-08 ... fixed 7/3 'improvement' to handle t-calcs on N-D arrays
194 ## 99-07-03 ... improved attest_ind, attest_rel (zero-division errortrap)
195 ## 99-06-24 ... fixed bug(?) in attest_ind (n1=a.shape[0])
196 ## 04/11/99 ... added asignaltonoise, athreshold functions, changed all
197 ## max/min in array section to N.maximum/N.minimum,
198 ## fixed square_of_sums to prevent integer overflow
199 ## 04/10/99 ... !!! Changed function name ... sumsquared ==> square_of_sums
200 ## 03/18/99 ... Added ar0, ar2, ar3 and ar4 rounding functions
201 ## 02/28/99 ... Fixed aobrientransform to return an array rather than a list
202 ## 01/15/99 ... Essentially ceased updating list-versions of functions (!!!)
203 ## 01/13/99 ... CHANGED TO VERSION 0.3
204 ## fixed bug in a/lmannwhitneyu p-value calculation
205 ## 12/31/98 ... fixed variable-name bug in ldescribe
206 ## 12/19/98 ... fixed bug in findwithin (fcns needed pstat. prefix)
207 ## 12/16/98 ... changed amedianscore to return float (not array) for 1 score
208 ## 12/14/98 ... added atmin and atmax functions
209 ## removed umath from import line (not needed)
210 ## l/ageometricmean modified to reduce chance of overflows (take
211 ## nth root first, then multiply)
212 ## 12/07/98 ... added __version__variable (now 0.2)
213 ## removed all 'stats.' from anova() fcn
214 ## 12/06/98 ... changed those functions (except shellsort) that altered
215 ## arguments in-place ... cumsum, ranksort, ...
216 ## updated (and fixed some) doc-strings
217 ## 12/01/98 ... added anova() function (requires NumPy)
218 ## incorporated Dispatch class
219 ## 11/12/98 ... added functionality to amean, aharmonicmean, ageometricmean
220 ## added 'asum' function (added functionality to N.add.reduce)
221 ## fixed both moment and amoment (two errors)
222 ## changed name of skewness and askewness to skew and askew
223 ## fixed (a)histogram (which sometimes counted points <lowerlimit)
224 
225 import pstat # required 3rd party module
226 import math, string, copy # required python modules
227 from types import *
228 
229 __version__ = 0.6
230 
231 ############# DISPATCH CODE ##############
232 
233 
234 class Dispatch:
235  """
236 The Dispatch class, care of David Ascher, allows different functions to
237 be called depending on the argument types. This way, there can be one
238 function name regardless of the argument type. To access function doc
239 in stats.py module, prefix the function with an 'l' or 'a' for list or
240 array arguments, respectively. That is, print stats.lmean.__doc__ or
241 print stats.amean.__doc__ or whatever.
242 """
243 
244  def __init__(self, *tuples):
245  self._dispatch = {}
246  for func, types in tuples:
247  for t in types:
248  if t in self._dispatch.keys():
249  raise ValueError, "can't have two dispatches on "+str(t)
250  self._dispatch[t] = func
251  self._types = self._dispatch.keys()
252 
253  def __call__(self, arg1, *args, **kw):
254  if type(arg1) not in self._types:
255  raise TypeError, "don't know how to dispatch %s arguments" % type(arg1)
256  return apply(self._dispatch[type(arg1)], (arg1,) + args, kw)
257 
258 
259 ##########################################################################
260 ######################## LIST-BASED FUNCTIONS ########################
261 ##########################################################################
262 
263 ### Define these regardless
264 
265 ####################################
266 ####### CENTRAL TENDENCY #########
267 ####################################
268 
269 def lgeometricmean (inlist):
270  """
271 Calculates the geometric mean of the values in the passed list.
272 That is: n-th root of (x1 * x2 * ... * xn). Assumes a '1D' list.
273 
274 Usage: lgeometricmean(inlist)
275 """
276  mult = 1.0
277  one_over_n = 1.0/len(inlist)
278  for item in inlist:
279  mult = mult * pow(item,one_over_n)
280  return mult
281 
282 
283 def lharmonicmean (inlist):
284  """
285 Calculates the harmonic mean of the values in the passed list.
286 That is: n / (1/x1 + 1/x2 + ... + 1/xn). Assumes a '1D' list.
287 
288 Usage: lharmonicmean(inlist)
289 """
290  sum = 0
291  for item in inlist:
292  sum = sum + 1.0/item
293  return len(inlist) / sum
294 
295 
296 def lmean (inlist):
297  """
298 Returns the arithematic mean of the values in the passed list.
299 Assumes a '1D' list, but will function on the 1st dim of an array(!).
300 
301 Usage: lmean(inlist)
302 """
303  sum = 0
304  for item in inlist:
305  sum = sum + item
306  return sum/float(len(inlist))
307 
308 
309 def lmedian (inlist,numbins=1000):
310  """
311 Returns the computed median value of a list of numbers, given the
312 number of bins to use for the histogram (more bins brings the computed value
313 closer to the median score, default number of bins = 1000). See G.W.
314 Heiman's Basic Stats (1st Edition), or CRC Probability & Statistics.
315 
316 Usage: lmedian (inlist, numbins=1000)
317 """
318  (hist, smallest, binsize, extras) = histogram(inlist,numbins,[min(inlist),max(inlist)]) # make histog
319  cumhist = cumsum(hist) # make cumulative histogram
320  for i in range(len(cumhist)): # get 1st(!) index holding 50%ile score
321  if cumhist[i]>=len(inlist)/2.0:
322  cfbin = i
323  break
324  LRL = smallest + binsize*cfbin # get lower read limit of that bin
325  cfbelow = cumhist[cfbin-1]
326  freq = float(hist[cfbin]) # frequency IN the 50%ile bin
327  median = LRL + ((len(inlist)/2.0 - cfbelow)/float(freq))*binsize # median formula
328  return median
329 
330 
331 def lmedianscore (inlist):
332  """
333 Returns the 'middle' score of the passed list. If there is an even
334 number of scores, the mean of the 2 middle scores is returned.
335 
336 Usage: lmedianscore(inlist)
337 """
338 
339  newlist = copy.deepcopy(inlist)
340  newlist.sort()
341  if len(newlist) % 2 == 0: # if even number of scores, average middle 2
342  index = len(newlist)/2 # integer division correct
343  median = float(newlist[index] + newlist[index-1]) /2
344  else:
345  index = len(newlist)/2 # int divsion gives mid value when count from 0
346  median = newlist[index]
347  return median
348 
349 
350 def lmode(inlist):
351  """
352 Returns a list of the modal (most common) score(s) in the passed
353 list. If there is more than one such score, all are returned. The
354 bin-count for the mode(s) is also returned.
355 
356 Usage: lmode(inlist)
357 Returns: bin-count for mode(s), a list of modal value(s)
358 """
359 
360  scores = pstat.unique(inlist)
361  scores.sort()
362  freq = []
363  for item in scores:
364  freq.append(inlist.count(item))
365  maxfreq = max(freq)
366  mode = []
367  stillmore = 1
368  while stillmore:
369  try:
370  indx = freq.index(maxfreq)
371  mode.append(scores[indx])
372  del freq[indx]
373  del scores[indx]
374  except ValueError:
375  stillmore=0
376  return maxfreq, mode
377 
378 
379 ####################################
380 ############ MOMENTS #############
381 ####################################
382 
383 def lmoment(inlist,moment=1):
384  """
385 Calculates the nth moment about the mean for a sample (defaults to
386 the 1st moment). Used to calculate coefficients of skewness and kurtosis.
387 
388 Usage: lmoment(inlist,moment=1)
389 Returns: appropriate moment (r) from ... 1/n * SUM((inlist(i)-mean)**r)
390 """
391  if moment == 1:
392  return 0.0
393  else:
394  mn = mean(inlist)
395  n = len(inlist)
396  s = 0
397  for x in inlist:
398  s = s + (x-mn)**moment
399  return s/float(n)
400 
401 
402 def lvariation(inlist):
403  """
404 Returns the coefficient of variation, as defined in CRC Standard
405 Probability and Statistics, p.6.
406 
407 Usage: lvariation(inlist)
408 """
409  return 100.0*samplestdev(inlist)/float(mean(inlist))
410 
411 
412 def lskew(inlist):
413  """
414 Returns the skewness of a distribution, as defined in Numerical
415 Recipies (alternate defn in CRC Standard Probability and Statistics, p.6.)
416 
417 Usage: lskew(inlist)
418 """
419  return moment(inlist,3)/pow(moment(inlist,2),1.5)
420 
421 
422 def lkurtosis(inlist):
423  """
424 Returns the kurtosis of a distribution, as defined in Numerical
425 Recipies (alternate defn in CRC Standard Probability and Statistics, p.6.)
426 
427 Usage: lkurtosis(inlist)
428 """
429  return moment(inlist,4)/pow(moment(inlist,2),2.0)
430 
431 
432 def ldescribe(inlist):
433  """
434 Returns some descriptive statistics of the passed list (assumed to be 1D).
435 
436 Usage: ldescribe(inlist)
437 Returns: n, mean, standard deviation, skew, kurtosis
438 """
439  n = len(inlist)
440  mm = (min(inlist),max(inlist))
441  m = mean(inlist)
442  sd = stdev(inlist)
443  sk = skew(inlist)
444  kurt = kurtosis(inlist)
445  return n, mm, m, sd, sk, kurt
446 
447 
448 ####################################
449 ####### FREQUENCY STATS ##########
450 ####################################
451 
452 def litemfreq(inlist):
453  """
454 Returns a list of pairs. Each pair consists of one of the scores in inlist
455 and it's frequency count. Assumes a 1D list is passed.
456 
457 Usage: litemfreq(inlist)
458 Returns: a 2D frequency table (col [0:n-1]=scores, col n=frequencies)
459 """
460  scores = pstat.unique(inlist)
461  scores.sort()
462  freq = []
463  for item in scores:
464  freq.append(inlist.count(item))
465  return pstat.abut(scores, freq)
466 
467 
468 def lscoreatpercentile (inlist, percent):
469  """
470 Returns the score at a given percentile relative to the distribution
471 given by inlist.
472 
473 Usage: lscoreatpercentile(inlist,percent)
474 """
475  if percent > 1:
476  print "\nDividing percent>1 by 100 in lscoreatpercentile().\n"
477  percent = percent / 100.0
478  targetcf = percent*len(inlist)
479  h, lrl, binsize, extras = histogram(inlist)
480  cumhist = cumsum(copy.deepcopy(h))
481  for i in range(len(cumhist)):
482  if cumhist[i] >= targetcf:
483  break
484  score = binsize * ((targetcf - cumhist[i-1]) / float(h[i])) + (lrl+binsize*i)
485  return score
486 
487 
488 def lpercentileofscore (inlist, score,histbins=10,defaultlimits=None):
489  """
490 Returns the percentile value of a score relative to the distribution
491 given by inlist. Formula depends on the values used to histogram the data(!).
492 
493 Usage: lpercentileofscore(inlist,score,histbins=10,defaultlimits=None)
494 """
495 
496  h, lrl, binsize, extras = histogram(inlist,histbins,defaultlimits)
497  cumhist = cumsum(copy.deepcopy(h))
498  i = int((score - lrl)/float(binsize))
499  pct = (cumhist[i-1]+((score-(lrl+binsize*i))/float(binsize))*h[i])/float(len(inlist)) * 100
500  return pct
501 
502 
503 def lhistogram (inlist,numbins=10,defaultreallimits=None,printextras=0):
504  """
505 Returns (i) a list of histogram bin counts, (ii) the smallest value
506 of the histogram binning, and (iii) the bin width (the last 2 are not
507 necessarily integers). Default number of bins is 10. If no sequence object
508 is given for defaultreallimits, the routine picks (usually non-pretty) bins
509 spanning all the numbers in the inlist.
510 
511 Usage: lhistogram (inlist, numbins=10, defaultreallimits=None,suppressoutput=0)
512 Returns: list of bin values, lowerreallimit, binsize, extrapoints
513 """
514  if (defaultreallimits <> None):
515  if type(defaultreallimits) not in [ListType,TupleType] or len(defaultreallimits)==1: # only one limit given, assumed to be lower one & upper is calc'd
516  lowerreallimit = defaultreallimits
517  upperreallimit = 1.000001 * max(inlist)
518  else: # assume both limits given
519  lowerreallimit = defaultreallimits[0]
520  upperreallimit = defaultreallimits[1]
521  binsize = (upperreallimit-lowerreallimit)/float(numbins)
522  else: # no limits given for histogram, both must be calc'd
523  estbinwidth=(max(inlist)-min(inlist))/float(numbins) +1e-6 #1=>cover all
524  binsize = ((max(inlist)-min(inlist)+estbinwidth))/float(numbins)
525  lowerreallimit = min(inlist) - binsize/2 #lower real limit,1st bin
526  bins = [0]*(numbins)
527  extrapoints = 0
528  for num in inlist:
529  try:
530  if (num-lowerreallimit) < 0:
531  extrapoints = extrapoints + 1
532  else:
533  bintoincrement = int((num-lowerreallimit)/float(binsize))
534  bins[bintoincrement] = bins[bintoincrement] + 1
535  except:
536  extrapoints = extrapoints + 1
537  if (extrapoints > 0 and printextras == 1):
538  print '\nPoints outside given histogram range =',extrapoints
539  return (bins, lowerreallimit, binsize, extrapoints)
540 
541 
542 def lcumfreq(inlist,numbins=10,defaultreallimits=None):
543  """
544 Returns a cumulative frequency histogram, using the histogram function.
545 
546 Usage: lcumfreq(inlist,numbins=10,defaultreallimits=None)
547 Returns: list of cumfreq bin values, lowerreallimit, binsize, extrapoints
548 """
549  h,l,b,e = histogram(inlist,numbins,defaultreallimits)
550  cumhist = cumsum(copy.deepcopy(h))
551  return cumhist,l,b,e
552 
553 
554 def lrelfreq(inlist,numbins=10,defaultreallimits=None):
555  """
556 Returns a relative frequency histogram, using the histogram function.
557 
558 Usage: lrelfreq(inlist,numbins=10,defaultreallimits=None)
559 Returns: list of cumfreq bin values, lowerreallimit, binsize, extrapoints
560 """
561  h,l,b,e = histogram(inlist,numbins,defaultreallimits)
562  for i in range(len(h)):
563  h[i] = h[i]/float(len(inlist))
564  return h,l,b,e
565 
566 
567 ####################################
568 ##### VARIABILITY FUNCTIONS ######
569 ####################################
570 
571 def lobrientransform(*args):
572  """
573 Computes a transform on input data (any number of columns). Used to
574 test for homogeneity of variance prior to running one-way stats. From
575 Maxwell and Delaney, p.112.
576 
577 Usage: lobrientransform(*args)
578 Returns: transformed data for use in an ANOVA
579 """
580  TINY = 1e-10
581  k = len(args)
582  n = [0.0]*k
583  v = [0.0]*k
584  m = [0.0]*k
585  nargs = []
586  for i in range(k):
587  nargs.append(copy.deepcopy(args[i]))
588  n[i] = float(len(nargs[i]))
589  v[i] = var(nargs[i])
590  m[i] = mean(nargs[i])
591  for j in range(k):
592  for i in range(n[j]):
593  t1 = (n[j]-1.5)*n[j]*(nargs[j][i]-m[j])**2
594  t2 = 0.5*v[j]*(n[j]-1.0)
595  t3 = (n[j]-1.0)*(n[j]-2.0)
596  nargs[j][i] = (t1-t2) / float(t3)
597  check = 1
598  for j in range(k):
599  if v[j] - mean(nargs[j]) > TINY:
600  check = 0
601  if check <> 1:
602  raise ValueError, 'Problem in obrientransform.'
603  else:
604  return nargs
605 
606 
607 def lsamplevar (inlist):
608  """
609 Returns the variance of the values in the passed list using
610 N for the denominator (i.e., DESCRIBES the sample variance only).
611 
612 Usage: lsamplevar(inlist)
613 """
614  n = len(inlist)
615  mn = mean(inlist)
616  deviations = []
617  for item in inlist:
618  deviations.append(item-mn)
619  return ss(deviations)/float(n)
620 
621 
622 def lsamplestdev (inlist):
623  """
624 Returns the standard deviation of the values in the passed list using
625 N for the denominator (i.e., DESCRIBES the sample stdev only).
626 
627 Usage: lsamplestdev(inlist)
628 """
629  return math.sqrt(samplevar(inlist))
630 
631 
632 def lcov (x,y, keepdims=0):
633  """
634 Returns the estimated covariance of the values in the passed
635 array (i.e., N-1). Dimension can equal None (ravel array first), an
636 integer (the dimension over which to operate), or a sequence (operate
637 over multiple dimensions). Set keepdims=1 to return an array with the
638 same number of dimensions as inarray.
639 
640 Usage: lcov(x,y,keepdims=0)
641 """
642 
643  n = len(x)
644  xmn = mean(x)
645  ymn = mean(y)
646  xdeviations = [0]*len(x)
647  ydeviations = [0]*len(y)
648  for i in range(len(x)):
649  xdeviations[i] = x[i] - xmn
650  ydeviations[i] = y[i] - ymn
651  ss = 0.0
652  for i in range(len(xdeviations)):
653  ss = ss + xdeviations[i]*ydeviations[i]
654  return ss/float(n-1)
655 
656 
657 def lvar (inlist):
658  """
659 Returns the variance of the values in the passed list using N-1
660 for the denominator (i.e., for estimating population variance).
661 
662 Usage: lvar(inlist)
663 """
664  n = len(inlist)
665  mn = mean(inlist)
666  deviations = [0]*len(inlist)
667  for i in range(len(inlist)):
668  deviations[i] = inlist[i] - mn
669  return ss(deviations)/float(n-1)
670 
671 
672 def lstdev (inlist):
673  """
674 Returns the standard deviation of the values in the passed list
675 using N-1 in the denominator (i.e., to estimate population stdev).
676 
677 Usage: lstdev(inlist)
678 """
679  return math.sqrt(var(inlist))
680 
681 
682 def lsterr(inlist):
683  """
684 Returns the standard error of the values in the passed list using N-1
685 in the denominator (i.e., to estimate population standard error).
686 
687 Usage: lsterr(inlist)
688 """
689  return stdev(inlist) / float(math.sqrt(len(inlist)))
690 
691 
692 def lsem (inlist):
693  """
694 Returns the estimated standard error of the mean (sx-bar) of the
695 values in the passed list. sem = stdev / sqrt(n)
696 
697 Usage: lsem(inlist)
698 """
699  sd = stdev(inlist)
700  n = len(inlist)
701  return sd/math.sqrt(n)
702 
703 
704 def lz (inlist, score):
705  """
706 Returns the z-score for a given input score, given that score and the
707 list from which that score came. Not appropriate for population calculations.
708 
709 Usage: lz(inlist, score)
710 """
711  z = (score-mean(inlist))/samplestdev(inlist)
712  return z
713 
714 
715 def lzs (inlist):
716  """
717 Returns a list of z-scores, one for each score in the passed list.
718 
719 Usage: lzs(inlist)
720 """
721  zscores = []
722  for item in inlist:
723  zscores.append(z(inlist,item))
724  return zscores
725 
726 
727 ####################################
728 ####### TRIMMING FUNCTIONS #######
729 ####################################
730 
731 def ltrimboth (l,proportiontocut):
732  """
733 Slices off the passed proportion of items from BOTH ends of the passed
734 list (i.e., with proportiontocut=0.1, slices 'leftmost' 10% AND 'rightmost'
735 10% of scores. Assumes list is sorted by magnitude. Slices off LESS if
736 proportion results in a non-integer slice index (i.e., conservatively
737 slices off proportiontocut).
738 
739 Usage: ltrimboth (l,proportiontocut)
740 Returns: trimmed version of list l
741 """
742  lowercut = int(proportiontocut*len(l))
743  uppercut = len(l) - lowercut
744  return l[lowercut:uppercut]
745 
746 
747 def ltrim1 (l,proportiontocut,tail='right'):
748  """
749 Slices off the passed proportion of items from ONE end of the passed
750 list (i.e., if proportiontocut=0.1, slices off 'leftmost' or 'rightmost'
751 10% of scores). Slices off LESS if proportion results in a non-integer
752 slice index (i.e., conservatively slices off proportiontocut).
753 
754 Usage: ltrim1 (l,proportiontocut,tail='right') or set tail='left'
755 Returns: trimmed version of list l
756 """
757  if tail == 'right':
758  lowercut = 0
759  uppercut = len(l) - int(proportiontocut*len(l))
760  elif tail == 'left':
761  lowercut = int(proportiontocut*len(l))
762  uppercut = len(l)
763  return l[lowercut:uppercut]
764 
765 
766 ####################################
767 ##### CORRELATION FUNCTIONS ######
768 ####################################
769 
770 def lpaired(x,y):
771  """
772 Interactively determines the type of data and then runs the
773 appropriated statistic for paired group data.
774 
775 Usage: lpaired(x,y)
776 Returns: appropriate statistic name, value, and probability
777 """
778  samples = ''
779  while samples not in ['i','r','I','R','c','C']:
780  print '\nIndependent or related samples, or correlation (i,r,c): ',
781  samples = raw_input()
782 
783  if samples in ['i','I','r','R']:
784  print '\nComparing variances ...',
785 # USE O'BRIEN'S TEST FOR HOMOGENEITY OF VARIANCE, Maxwell & delaney, p.112
786  r = obrientransform(x,y)
787  f,p = F_oneway(pstat.colex(r,0),pstat.colex(r,1))
788  if p<0.05:
789  vartype='unequal, p='+str(round(p,4))
790  else:
791  vartype='equal'
792  print vartype
793  if samples in ['i','I']:
794  if vartype[0]=='e':
795  t,p = ttest_ind(x,y,0)
796  print '\nIndependent samples t-test: ', round(t,4),round(p,4)
797  else:
798  if len(x)>20 or len(y)>20:
799  z,p = ranksums(x,y)
800  print '\nRank Sums test (NONparametric, n>20): ', round(z,4),round(p,4)
801  else:
802  u,p = mannwhitneyu(x,y)
803  print '\nMann-Whitney U-test (NONparametric, ns<20): ', round(u,4),round(p,4)
804 
805  else: # RELATED SAMPLES
806  if vartype[0]=='e':
807  t,p = ttest_rel(x,y,0)
808  print '\nRelated samples t-test: ', round(t,4),round(p,4)
809  else:
810  t,p = ranksums(x,y)
811  print '\nWilcoxon T-test (NONparametric): ', round(t,4),round(p,4)
812  else: # CORRELATION ANALYSIS
813  corrtype = ''
814  while corrtype not in ['c','C','r','R','d','D']:
815  print '\nIs the data Continuous, Ranked, or Dichotomous (c,r,d): ',
816  corrtype = raw_input()
817  if corrtype in ['c','C']:
818  m,b,r,p,see = linregress(x,y)
819  print '\nLinear regression for continuous variables ...'
820  lol = [['Slope','Intercept','r','Prob','SEestimate'],[round(m,4),round(b,4),round(r,4),round(p,4),round(see,4)]]
821  pstat.printcc(lol)
822  elif corrtype in ['r','R']:
823  r,p = spearmanr(x,y)
824  print '\nCorrelation for ranked variables ...'
825  print "Spearman's r: ",round(r,4),round(p,4)
826  else: # DICHOTOMOUS
827  r,p = pointbiserialr(x,y)
828  print '\nAssuming x contains a dichotomous variable ...'
829  print 'Point Biserial r: ',round(r,4),round(p,4)
830  print '\n\n'
831  return None
832 
833 
834 def lpearsonr(x,y):
835  """
836 Calculates a Pearson correlation coefficient and the associated
837 probability value. Taken from Heiman's Basic Statistics for the Behav.
838 Sci (2nd), p.195.
839 
840 Usage: lpearsonr(x,y) where x and y are equal-length lists
841 Returns: Pearson's r value, two-tailed p-value
842 """
843  TINY = 1.0e-30
844  if len(x) <> len(y):
845  raise ValueError, 'Input values not paired in pearsonr. Aborting.'
846  n = len(x)
847  x = map(float,x)
848  y = map(float,y)
849  xmean = mean(x)
850  ymean = mean(y)
851  r_num = n*(summult(x,y)) - sum(x)*sum(y)
852  r_den = math.sqrt((n*ss(x) - square_of_sums(x))*(n*ss(y)-square_of_sums(y)))
853  r = (r_num / r_den) # denominator already a float
854  df = n-2
855  t = r*math.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY)))
856  prob = betai(0.5*df,0.5,df/float(df+t*t))
857  return r, prob
858 
859 
860 def llincc(x,y):
861  """
862 Calculates Lin's concordance correlation coefficient.
863 
864 Usage: alincc(x,y) where x, y are equal-length arrays
865 Returns: Lin's CC
866 """
867  covar = lcov(x,y)*(len(x)-1)/float(len(x)) # correct denom to n
868  xvar = lvar(x)*(len(x)-1)/float(len(x)) # correct denom to n
869  yvar = lvar(y)*(len(y)-1)/float(len(y)) # correct denom to n
870  lincc = (2 * covar) / ((xvar+yvar) +((amean(x)-amean(y))**2))
871  return lincc
872 
873 
874 def lspearmanr(x,y):
875  """
876 Calculates a Spearman rank-order correlation coefficient. Taken
877 from Heiman's Basic Statistics for the Behav. Sci (1st), p.192.
878 
879 Usage: lspearmanr(x,y) where x and y are equal-length lists
880 Returns: Spearman's r, two-tailed p-value
881 """
882  TINY = 1e-30
883  if len(x) <> len(y):
884  raise ValueError, 'Input values not paired in spearmanr. Aborting.'
885  n = len(x)
886  rankx = rankdata(x)
887  ranky = rankdata(y)
888  dsq = sumdiffsquared(rankx,ranky)
889  rs = 1 - 6*dsq / float(n*(n**2-1))
890  t = rs * math.sqrt((n-2) / ((rs+1.0)*(1.0-rs)))
891  df = n-2
892  probrs = betai(0.5*df,0.5,df/(df+t*t)) # t already a float
893 # probability values for rs are from part 2 of the spearman function in
894 # Numerical Recipies, p.510. They are close to tables, but not exact. (?)
895  return rs, probrs
896 
897 
899  """
900 Calculates a point-biserial correlation coefficient and the associated
901 probability value. Taken from Heiman's Basic Statistics for the Behav.
902 Sci (1st), p.194.
903 
904 Usage: lpointbiserialr(x,y) where x,y are equal-length lists
905 Returns: Point-biserial r, two-tailed p-value
906 """
907  TINY = 1e-30
908  if len(x) <> len(y):
909  raise ValueError, 'INPUT VALUES NOT PAIRED IN pointbiserialr. ABORTING.'
910  data = pstat.abut(x,y)
911  categories = pstat.unique(x)
912  if len(categories) <> 2:
913  raise ValueError, "Exactly 2 categories required for pointbiserialr()."
914  else: # there are 2 categories, continue
915  codemap = pstat.abut(categories,range(2))
916  recoded = pstat.recode(data,codemap,0)
917  x = pstat.linexand(data,0,categories[0])
918  y = pstat.linexand(data,0,categories[1])
919  xmean = mean(pstat.colex(x,1))
920  ymean = mean(pstat.colex(y,1))
921  n = len(data)
922  adjust = math.sqrt((len(x)/float(n))*(len(y)/float(n)))
923  rpb = (ymean - xmean)/samplestdev(pstat.colex(data,1))*adjust
924  df = n-2
925  t = rpb*math.sqrt(df/((1.0-rpb+TINY)*(1.0+rpb+TINY)))
926  prob = betai(0.5*df,0.5,df/(df+t*t)) # t already a float
927  return rpb, prob
928 
929 
930 def lkendalltau(x,y):
931  """
932 Calculates Kendall's tau ... correlation of ordinal data. Adapted
933 from function kendl1 in Numerical Recipies. Needs good test-routine.@@@
934 
935 Usage: lkendalltau(x,y)
936 Returns: Kendall's tau, two-tailed p-value
937 """
938  n1 = 0
939  n2 = 0
940  iss = 0
941  for j in range(len(x)-1):
942  for k in range(j,len(y)):
943  a1 = x[j] - x[k]
944  a2 = y[j] - y[k]
945  aa = a1 * a2
946  if (aa): # neither list has a tie
947  n1 = n1 + 1
948  n2 = n2 + 1
949  if aa > 0:
950  iss = iss + 1
951  else:
952  iss = iss -1
953  else:
954  if (a1):
955  n1 = n1 + 1
956  else:
957  n2 = n2 + 1
958  tau = iss / math.sqrt(n1*n2)
959  svar = (4.0*len(x)+10.0) / (9.0*len(x)*(len(x)-1))
960  z = tau / math.sqrt(svar)
961  prob = erfcc(abs(z)/1.4142136)
962  return tau, prob
963 
964 
965 def llinregress(x,y):
966  """
967 Calculates a regression line on x,y pairs.
968 
969 Usage: llinregress(x,y) x,y are equal-length lists of x-y coordinates
970 Returns: slope, intercept, r, two-tailed prob, sterr-of-estimate
971 """
972  TINY = 1.0e-20
973  if len(x) <> len(y):
974  raise ValueError, 'Input values not paired in linregress. Aborting.'
975  n = len(x)
976  x = map(float,x)
977  y = map(float,y)
978  xmean = mean(x)
979  ymean = mean(y)
980  r_num = float(n*(summult(x,y)) - sum(x)*sum(y))
981  r_den = math.sqrt((n*ss(x) - square_of_sums(x))*(n*ss(y)-square_of_sums(y)))
982  r = r_num / r_den
983  z = 0.5*math.log((1.0+r+TINY)/(1.0-r+TINY))
984  df = n-2
985  t = r*math.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY)))
986  prob = betai(0.5*df,0.5,df/(df+t*t))
987  slope = r_num / float(n*ss(x) - square_of_sums(x))
988  intercept = ymean - slope*xmean
989  sterrest = math.sqrt(1-r*r)*samplestdev(y)
990  return slope, intercept, r, prob, sterrest
991 
992 
993 ####################################
994 ##### INFERENTIAL STATISTICS #####
995 ####################################
996 
997 def lttest_1samp(a,popmean,printit=0,name='Sample',writemode='a'):
998  """
999 Calculates the t-obtained for the independent samples T-test on ONE group
1000 of scores a, given a population mean. If printit=1, results are printed
1001 to the screen. If printit='filename', the results are output to 'filename'
1002 using the given writemode (default=append). Returns t-value, and prob.
1003 
1004 Usage: lttest_1samp(a,popmean,Name='Sample',printit=0,writemode='a')
1005 Returns: t-value, two-tailed prob
1006 """
1007  x = mean(a)
1008  v = var(a)
1009  n = len(a)
1010  df = n-1
1011  svar = ((n-1)*v)/float(df)
1012  t = (x-popmean)/math.sqrt(svar*(1.0/n))
1013  prob = betai(0.5*df,0.5,float(df)/(df+t*t))
1014 
1015  if printit <> 0:
1016  statname = 'Single-sample T-test.'
1017  outputpairedstats(printit,writemode,
1018  'Population','--',popmean,0,0,0,
1019  name,n,x,v,min(a),max(a),
1020  statname,t,prob)
1021  return t,prob
1022 
1023 
1024 def lttest_ind (a, b, printit=0, name1='Samp1', name2='Samp2', writemode='a'):
1025  """
1026 Calculates the t-obtained T-test on TWO INDEPENDENT samples of
1027 scores a, and b. From Numerical Recipies, p.483. If printit=1, results
1028 are printed to the screen. If printit='filename', the results are output
1029 to 'filename' using the given writemode (default=append). Returns t-value,
1030 and prob.
1031 
1032 Usage: lttest_ind(a,b,printit=0,name1='Samp1',name2='Samp2',writemode='a')
1033 Returns: t-value, two-tailed prob
1034 """
1035  x1 = mean(a)
1036  x2 = mean(b)
1037  v1 = stdev(a)**2
1038  v2 = stdev(b)**2
1039  n1 = len(a)
1040  n2 = len(b)
1041  df = n1+n2-2
1042  svar = ((n1-1)*v1+(n2-1)*v2)/float(df)
1043  t = (x1-x2)/math.sqrt(svar*(1.0/n1 + 1.0/n2))
1044  prob = betai(0.5*df,0.5,df/(df+t*t))
1045 
1046  if printit <> 0:
1047  statname = 'Independent samples T-test.'
1048  outputpairedstats(printit,writemode,
1049  name1,n1,x1,v1,min(a),max(a),
1050  name2,n2,x2,v2,min(b),max(b),
1051  statname,t,prob)
1052  return t,prob
1053 
1054 
1055 def lttest_rel (a,b,printit=0,name1='Sample1',name2='Sample2',writemode='a'):
1056  """
1057 Calculates the t-obtained T-test on TWO RELATED samples of scores,
1058 a and b. From Numerical Recipies, p.483. If printit=1, results are
1059 printed to the screen. If printit='filename', the results are output to
1060 'filename' using the given writemode (default=append). Returns t-value,
1061 and prob.
1062 
1063 Usage: lttest_rel(a,b,printit=0,name1='Sample1',name2='Sample2',writemode='a')
1064 Returns: t-value, two-tailed prob
1065 """
1066  if len(a)<>len(b):
1067  raise ValueError, 'Unequal length lists in ttest_rel.'
1068  x1 = mean(a)
1069  x2 = mean(b)
1070  v1 = var(a)
1071  v2 = var(b)
1072  n = len(a)
1073  cov = 0
1074  for i in range(len(a)):
1075  cov = cov + (a[i]-x1) * (b[i]-x2)
1076  df = n-1
1077  cov = cov / float(df)
1078  sd = math.sqrt((v1+v2 - 2.0*cov)/float(n))
1079  t = (x1-x2)/sd
1080  prob = betai(0.5*df,0.5,df/(df+t*t))
1081 
1082  if printit <> 0:
1083  statname = 'Related samples T-test.'
1084  outputpairedstats(printit,writemode,
1085  name1,n,x1,v1,min(a),max(a),
1086  name2,n,x2,v2,min(b),max(b),
1087  statname,t,prob)
1088  return t, prob
1089 
1090 
1091 def lchisquare(f_obs,f_exp=None):
1092  """
1093 Calculates a one-way chi square for list of observed frequencies and returns
1094 the result. If no expected frequencies are given, the total N is assumed to
1095 be equally distributed across all groups.
1096 
1097 Usage: lchisquare(f_obs, f_exp=None) f_obs = list of observed cell freq.
1098 Returns: chisquare-statistic, associated p-value
1099 """
1100  k = len(f_obs) # number of groups
1101  if f_exp == None:
1102  f_exp = [sum(f_obs)/float(k)] * len(f_obs) # create k bins with = freq.
1103  chisq = 0
1104  for i in range(len(f_obs)):
1105  chisq = chisq + (f_obs[i]-f_exp[i])**2 / float(f_exp[i])
1106  return chisq, chisqprob(chisq, k-1)
1107 
1108 
1109 def lks_2samp (data1,data2):
1110  """
1111 Computes the Kolmogorov-Smirnof statistic on 2 samples. From
1112 Numerical Recipies in C, page 493.
1113 
1114 Usage: lks_2samp(data1,data2) data1&2 are lists of values for 2 conditions
1115 Returns: KS D-value, associated p-value
1116 """
1117  j1 = 0
1118  j2 = 0
1119  fn1 = 0.0
1120  fn2 = 0.0
1121  n1 = len(data1)
1122  n2 = len(data2)
1123  en1 = n1
1124  en2 = n2
1125  d = 0.0
1126  data1.sort()
1127  data2.sort()
1128  while j1 < n1 and j2 < n2:
1129  d1=data1[j1]
1130  d2=data2[j2]
1131  if d1 <= d2:
1132  fn1 = (j1)/float(en1)
1133  j1 = j1 + 1
1134  if d2 <= d1:
1135  fn2 = (j2)/float(en2)
1136  j2 = j2 + 1
1137  dt = (fn2-fn1)
1138  if math.fabs(dt) > math.fabs(d):
1139  d = dt
1140  try:
1141  en = math.sqrt(en1*en2/float(en1+en2))
1142  prob = ksprob((en+0.12+0.11/en)*abs(d))
1143  except:
1144  prob = 1.0
1145  return d, prob
1146 
1147 
1148 def lmannwhitneyu(x,y):
1149  """
1150 Calculates a Mann-Whitney U statistic on the provided scores and
1151 returns the result. Use only when the n in each condition is < 20 and
1152 you have 2 independent samples of ranks. NOTE: Mann-Whitney U is
1153 significant if the u-obtained is LESS THAN or equal to the critical
1154 value of U found in the tables. Equivalent to Kruskal-Wallis H with
1155 just 2 groups.
1156 
1157 Usage: lmannwhitneyu(data)
1158 Returns: u-statistic, one-tailed p-value (i.e., p(z(U)))
1159 """
1160  n1 = len(x)
1161  n2 = len(y)
1162  ranked = rankdata(x+y)
1163  rankx = ranked[0:n1] # get the x-ranks
1164  ranky = ranked[n1:] # the rest are y-ranks
1165  u1 = n1*n2 + (n1*(n1+1))/2.0 - sum(rankx) # calc U for x
1166  u2 = n1*n2 - u1 # remainder is U for y
1167  bigu = max(u1,u2)
1168  smallu = min(u1,u2)
1169  T = math.sqrt(tiecorrect(ranked)) # correction factor for tied scores
1170  if T == 0:
1171  raise ValueError, 'All numbers are identical in lmannwhitneyu'
1172  sd = math.sqrt(T*n1*n2*(n1+n2+1)/12.0)
1173  z = abs((bigu-n1*n2/2.0) / sd) # normal approximation for prob calc
1174  return smallu, 1.0 - zprob(z)
1175 
1176 
1177 def ltiecorrect(rankvals):
1178  """
1179 Corrects for ties in Mann Whitney U and Kruskal Wallis H tests. See
1180 Siegel, S. (1956) Nonparametric Statistics for the Behavioral Sciences.
1181 New York: McGraw-Hill. Code adapted from |Stat rankind.c code.
1182 
1183 Usage: ltiecorrect(rankvals)
1184 Returns: T correction factor for U or H
1185 """
1186  sorted,posn = shellsort(rankvals)
1187  n = len(sorted)
1188  T = 0.0
1189  i = 0
1190  while (i<n-1):
1191  if sorted[i] == sorted[i+1]:
1192  nties = 1
1193  while (i<n-1) and (sorted[i] == sorted[i+1]):
1194  nties = nties +1
1195  i = i +1
1196  T = T + nties**3 - nties
1197  i = i+1
1198  T = T / float(n**3-n)
1199  return 1.0 - T
1200 
1201 
1202 def lranksums(x,y):
1203  """
1204 Calculates the rank sums statistic on the provided scores and
1205 returns the result. Use only when the n in each condition is > 20 and you
1206 have 2 independent samples of ranks.
1207 
1208 Usage: lranksums(x,y)
1209 Returns: a z-statistic, two-tailed p-value
1210 """
1211  n1 = len(x)
1212  n2 = len(y)
1213  alldata = x+y
1214  ranked = rankdata(alldata)
1215  x = ranked[:n1]
1216  y = ranked[n1:]
1217  s = sum(x)
1218  expected = n1*(n1+n2+1) / 2.0
1219  z = (s - expected) / math.sqrt(n1*n2*(n1+n2+1)/12.0)
1220  prob = 2*(1.0 -zprob(abs(z)))
1221  return z, prob
1222 
1223 
1224 def lwilcoxont(x,y):
1225  """
1226 Calculates the Wilcoxon T-test for related samples and returns the
1227 result. A non-parametric T-test.
1228 
1229 Usage: lwilcoxont(x,y)
1230 Returns: a t-statistic, two-tail probability estimate
1231 """
1232  if len(x) <> len(y):
1233  raise ValueError, 'Unequal N in wilcoxont. Aborting.'
1234  d=[]
1235  for i in range(len(x)):
1236  diff = x[i] - y[i]
1237  if diff <> 0:
1238  d.append(diff)
1239  count = len(d)
1240  absd = map(abs,d)
1241  absranked = rankdata(absd)
1242  r_plus = 0.0
1243  r_minus = 0.0
1244  for i in range(len(absd)):
1245  if d[i] < 0:
1246  r_minus = r_minus + absranked[i]
1247  else:
1248  r_plus = r_plus + absranked[i]
1249  wt = min(r_plus, r_minus)
1250  mn = count * (count+1) * 0.25
1251  se = math.sqrt(count*(count+1)*(2.0*count+1.0)/24.0)
1252  z = math.fabs(wt-mn) / se
1253  prob = 2*(1.0 -zprob(abs(z)))
1254  return wt, prob
1255 
1256 
1257 def lkruskalwallish(*args):
1258  """
1259 The Kruskal-Wallis H-test is a non-parametric ANOVA for 3 or more
1260 groups, requiring at least 5 subjects in each group. This function
1261 calculates the Kruskal-Wallis H-test for 3 or more independent samples
1262 and returns the result.
1263 
1264 Usage: lkruskalwallish(*args)
1265 Returns: H-statistic (corrected for ties), associated p-value
1266 """
1267  args = list(args)
1268  n = [0]*len(args)
1269  all = []
1270  n = map(len,args)
1271  for i in range(len(args)):
1272  all = all + args[i]
1273  ranked = rankdata(all)
1274  T = tiecorrect(ranked)
1275  for i in range(len(args)):
1276  args[i] = ranked[0:n[i]]
1277  del ranked[0:n[i]]
1278  rsums = []
1279  for i in range(len(args)):
1280  rsums.append(sum(args[i])**2)
1281  rsums[i] = rsums[i] / float(n[i])
1282  ssbn = sum(rsums)
1283  totaln = sum(n)
1284  h = 12.0 / (totaln*(totaln+1)) * ssbn - 3*(totaln+1)
1285  df = len(args) - 1
1286  if T == 0:
1287  raise ValueError, 'All numbers are identical in lkruskalwallish'
1288  h = h / float(T)
1289  return h, chisqprob(h,df)
1290 
1291 
1293  """
1294 Friedman Chi-Square is a non-parametric, one-way within-subjects
1295 ANOVA. This function calculates the Friedman Chi-square test for repeated
1296 measures and returns the result, along with the associated probability
1297 value. It assumes 3 or more repeated measures. Only 3 levels requires a
1298 minimum of 10 subjects in the study. Four levels requires 5 subjects per
1299 level(??).
1300 
1301 Usage: lfriedmanchisquare(*args)
1302 Returns: chi-square statistic, associated p-value
1303 """
1304  k = len(args)
1305  if k < 3:
1306  raise ValueError, 'Less than 3 levels. Friedman test not appropriate.'
1307  n = len(args[0])
1308  data = apply(pstat.abut,tuple(args))
1309  for i in range(len(data)):
1310  data[i] = rankdata(data[i])
1311  ssbn = 0
1312  for i in range(k):
1313  ssbn = ssbn + sum(args[i])**2
1314  chisq = 12.0 / (k*n*(k+1)) * ssbn - 3*n*(k+1)
1315  return chisq, chisqprob(chisq,k-1)
1316 
1317 
1318 ####################################
1319 #### PROBABILITY CALCULATIONS ####
1320 ####################################
1321 
1322 def lchisqprob(chisq,df):
1323  """
1324 Returns the (1-tailed) probability value associated with the provided
1325 chi-square value and df. Adapted from chisq.c in Gary Perlman's |Stat.
1326 
1327 Usage: lchisqprob(chisq,df)
1328 """
1329  BIG = 20.0
1330  def ex(x):
1331  BIG = 20.0
1332  if x < -BIG:
1333  return 0.0
1334  else:
1335  return math.exp(x)
1336 
1337  if chisq <=0 or df < 1:
1338  return 1.0
1339  a = 0.5 * chisq
1340  if df%2 == 0:
1341  even = 1
1342  else:
1343  even = 0
1344  if df > 1:
1345  y = ex(-a)
1346  if even:
1347  s = y
1348  else:
1349  s = 2.0 * zprob(-math.sqrt(chisq))
1350  if (df > 2):
1351  chisq = 0.5 * (df - 1.0)
1352  if even:
1353  z = 1.0
1354  else:
1355  z = 0.5
1356  if a > BIG:
1357  if even:
1358  e = 0.0
1359  else:
1360  e = math.log(math.sqrt(math.pi))
1361  c = math.log(a)
1362  while (z <= chisq):
1363  e = math.log(z) + e
1364  s = s + ex(c*z-a-e)
1365  z = z + 1.0
1366  return s
1367  else:
1368  if even:
1369  e = 1.0
1370  else:
1371  e = 1.0 / math.sqrt(math.pi) / math.sqrt(a)
1372  c = 0.0
1373  while (z <= chisq):
1374  e = e * (a/float(z))
1375  c = c + e
1376  z = z + 1.0
1377  return (c*y+s)
1378  else:
1379  return s
1380 
1381 
1382 def lerfcc(x):
1383  """
1384 Returns the complementary error function erfc(x) with fractional
1385 error everywhere less than 1.2e-7. Adapted from Numerical Recipies.
1386 
1387 Usage: lerfcc(x)
1388 """
1389  z = abs(x)
1390  t = 1.0 / (1.0+0.5*z)
1391  ans = t * math.exp(-z*z-1.26551223 + t*(1.00002368+t*(0.37409196+t*(0.09678418+t*(-0.18628806+t*(0.27886807+t*(-1.13520398+t*(1.48851587+t*(-0.82215223+t*0.17087277)))))))))
1392  if x >= 0:
1393  return ans
1394  else:
1395  return 2.0 - ans
1396 
1397 
1398 def lzprob(z):
1399  """
1400 Returns the area under the normal curve 'to the left of' the given z value.
1401 Thus,
1402  for z<0, zprob(z) = 1-tail probability
1403  for z>0, 1.0-zprob(z) = 1-tail probability
1404  for any z, 2.0*(1.0-zprob(abs(z))) = 2-tail probability
1405 Adapted from z.c in Gary Perlman's |Stat.
1406 
1407 Usage: lzprob(z)
1408 """
1409  Z_MAX = 6.0 # maximum meaningful z-value
1410  if z == 0.0:
1411  x = 0.0
1412  else:
1413  y = 0.5 * math.fabs(z)
1414  if y >= (Z_MAX*0.5):
1415  x = 1.0
1416  elif (y < 1.0):
1417  w = y*y
1418  x = ((((((((0.000124818987 * w
1419  -0.001075204047) * w +0.005198775019) * w
1420  -0.019198292004) * w +0.059054035642) * w
1421  -0.151968751364) * w +0.319152932694) * w
1422  -0.531923007300) * w +0.797884560593) * y * 2.0
1423  else:
1424  y = y - 2.0
1425  x = (((((((((((((-0.000045255659 * y
1426  +0.000152529290) * y -0.000019538132) * y
1427  -0.000676904986) * y +0.001390604284) * y
1428  -0.000794620820) * y -0.002034254874) * y
1429  +0.006549791214) * y -0.010557625006) * y
1430  +0.011630447319) * y -0.009279453341) * y
1431  +0.005353579108) * y -0.002141268741) * y
1432  +0.000535310849) * y +0.999936657524
1433  if z > 0.0:
1434  prob = ((x+1.0)*0.5)
1435  else:
1436  prob = ((1.0-x)*0.5)
1437  return prob
1438 
1439 
1440 def lksprob(alam):
1441  """
1442 Computes a Kolmolgorov-Smirnov t-test significance level. Adapted from
1443 Numerical Recipies.
1444 
1445 Usage: lksprob(alam)
1446 """
1447  fac = 2.0
1448  sum = 0.0
1449  termbf = 0.0
1450  a2 = -2.0*alam*alam
1451  for j in range(1,201):
1452  term = fac*math.exp(a2*j*j)
1453  sum = sum + term
1454  if math.fabs(term) <= (0.001*termbf) or math.fabs(term) < (1.0e-8*sum):
1455  return sum
1456  fac = -fac
1457  termbf = math.fabs(term)
1458  return 1.0 # Get here only if fails to converge; was 0.0!!
1459 
1460 
1461 def lfprob (dfnum, dfden, F):
1462  """
1463 Returns the (1-tailed) significance level (p-value) of an F
1464 statistic given the degrees of freedom for the numerator (dfR-dfF) and
1465 the degrees of freedom for the denominator (dfF).
1466 
1467 Usage: lfprob(dfnum, dfden, F) where usually dfnum=dfbn, dfden=dfwn
1468 """
1469  p = betai(0.5*dfden, 0.5*dfnum, dfden/float(dfden+dfnum*F))
1470  return p
1471 
1472 
1473 def lbetacf(a,b,x):
1474  """
1475 This function evaluates the continued fraction form of the incomplete
1476 Beta function, betai. (Adapted from: Numerical Recipies in C.)
1477 
1478 Usage: lbetacf(a,b,x)
1479 """
1480  ITMAX = 200
1481  EPS = 3.0e-7
1482 
1483  bm = az = am = 1.0
1484  qab = a+b
1485  qap = a+1.0
1486  qam = a-1.0
1487  bz = 1.0-qab*x/qap
1488  for i in range(ITMAX+1):
1489  em = float(i+1)
1490  tem = em + em
1491  d = em*(b-em)*x/((qam+tem)*(a+tem))
1492  ap = az + d*am
1493  bp = bz+d*bm
1494  d = -(a+em)*(qab+em)*x/((qap+tem)*(a+tem))
1495  app = ap+d*az
1496  bpp = bp+d*bz
1497  aold = az
1498  am = ap/bpp
1499  bm = bp/bpp
1500  az = app/bpp
1501  bz = 1.0
1502  if (abs(az-aold)<(EPS*abs(az))):
1503  return az
1504  print 'a or b too big, or ITMAX too small in Betacf.'
1505 
1506 
1507 def lgammln(xx):
1508  """
1509 Returns the gamma function of xx.
1510  Gamma(z) = Integral(0,infinity) of t^(z-1)exp(-t) dt.
1511 (Adapted from: Numerical Recipies in C.)
1512 
1513 Usage: lgammln(xx)
1514 """
1515 
1516  coeff = [76.18009173, -86.50532033, 24.01409822, -1.231739516,
1517  0.120858003e-2, -0.536382e-5]
1518  x = xx - 1.0
1519  tmp = x + 5.5
1520  tmp = tmp - (x+0.5)*math.log(tmp)
1521  ser = 1.0
1522  for j in range(len(coeff)):
1523  x = x + 1
1524  ser = ser + coeff[j]/x
1525  return -tmp + math.log(2.50662827465*ser)
1526 
1527 
1528 def lbetai(a,b,x):
1529  """
1530 Returns the incomplete beta function:
1531 
1532  I-sub-x(a,b) = 1/B(a,b)*(Integral(0,x) of t^(a-1)(1-t)^(b-1) dt)
1533 
1534 where a,b>0 and B(a,b) = G(a)*G(b)/(G(a+b)) where G(a) is the gamma
1535 function of a. The continued fraction formulation is implemented here,
1536 using the betacf function. (Adapted from: Numerical Recipies in C.)
1537 
1538 Usage: lbetai(a,b,x)
1539 """
1540  if (x<0.0 or x>1.0):
1541  raise ValueError, 'Bad x in lbetai'
1542  if (x==0.0 or x==1.0):
1543  bt = 0.0
1544  else:
1545  bt = math.exp(gammln(a+b)-gammln(a)-gammln(b)+a*math.log(x)+b*
1546  math.log(1.0-x))
1547  if (x<(a+1.0)/(a+b+2.0)):
1548  return bt*betacf(a,b,x)/float(a)
1549  else:
1550  return 1.0-bt*betacf(b,a,1.0-x)/float(b)
1551 
1552 
1553 ####################################
1554 ####### ANOVA CALCULATIONS #######
1555 ####################################
1556 
1557 def lF_oneway(*lists):
1558  """
1559 Performs a 1-way ANOVA, returning an F-value and probability given
1560 any number of groups. From Heiman, pp.394-7.
1561 
1562 Usage: F_oneway(*lists) where *lists is any number of lists, one per
1563  treatment group
1564 Returns: F value, one-tailed p-value
1565 """
1566  a = len(lists) # ANOVA on 'a' groups, each in it's own list
1567  means = [0]*a
1568  vars = [0]*a
1569  ns = [0]*a
1570  alldata = []
1571  tmp = map(N.array,lists)
1572  means = map(amean,tmp)
1573  vars = map(avar,tmp)
1574  ns = map(len,lists)
1575  for i in range(len(lists)):
1576  alldata = alldata + lists[i]
1577  alldata = N.array(alldata)
1578  bign = len(alldata)
1579  sstot = ass(alldata)-(asquare_of_sums(alldata)/float(bign))
1580  ssbn = 0
1581  for list in lists:
1582  ssbn = ssbn + asquare_of_sums(N.array(list))/float(len(list))
1583  ssbn = ssbn - (asquare_of_sums(alldata)/float(bign))
1584  sswn = sstot-ssbn
1585  dfbn = a-1
1586  dfwn = bign - a
1587  msb = ssbn/float(dfbn)
1588  msw = sswn/float(dfwn)
1589  f = msb/msw
1590  prob = fprob(dfbn,dfwn,f)
1591  return f, prob
1592 
1593 
1594 def lF_value (ER,EF,dfnum,dfden):
1595  """
1596 Returns an F-statistic given the following:
1597  ER = error associated with the null hypothesis (the Restricted model)
1598  EF = error associated with the alternate hypothesis (the Full model)
1599  dfR-dfF = degrees of freedom of the numerator
1600  dfF = degrees of freedom associated with the denominator/Full model
1601 
1602 Usage: lF_value(ER,EF,dfnum,dfden)
1603 """
1604  return ((ER-EF)/float(dfnum) / (EF/float(dfden)))
1605 
1606 
1607 ####################################
1608 ######## SUPPORT FUNCTIONS #######
1609 ####################################
1610 
1611 def writecc (listoflists,file,writetype='w',extra=2):
1612  """
1613 Writes a list of lists to a file in columns, customized by the max
1614 size of items within the columns (max size of items in col, +2 characters)
1615 to specified file. File-overwrite is the default.
1616 
1617 Usage: writecc (listoflists,file,writetype='w',extra=2)
1618 Returns: None
1619 """
1620  if type(listoflists[0]) not in [ListType,TupleType]:
1621  listoflists = [listoflists]
1622  outfile = open(file,writetype)
1623  rowstokill = []
1624  list2print = copy.deepcopy(listoflists)
1625  for i in range(len(listoflists)):
1626  if listoflists[i] == ['\n'] or listoflists[i]=='\n' or listoflists[i]=='dashes':
1627  rowstokill = rowstokill + [i]
1628  rowstokill.reverse()
1629  for row in rowstokill:
1630  del list2print[row]
1631  maxsize = [0]*len(list2print[0])
1632  for col in range(len(list2print[0])):
1633  items = pstat.colex(list2print,col)
1634  items = map(pstat.makestr,items)
1635  maxsize[col] = max(map(len,items)) + extra
1636  for row in listoflists:
1637  if row == ['\n'] or row == '\n':
1638  outfile.write('\n')
1639  elif row == ['dashes'] or row == 'dashes':
1640  dashes = [0]*len(maxsize)
1641  for j in range(len(maxsize)):
1642  dashes[j] = '-'*(maxsize[j]-2)
1643  outfile.write(pstat.lineincustcols(dashes,maxsize))
1644  else:
1645  outfile.write(pstat.lineincustcols(row,maxsize))
1646  outfile.write('\n')
1647  outfile.close()
1648  return None
1649 
1650 
1651 def lincr(l,cap): # to increment a list up to a max-list of 'cap'
1652  """
1653 Simulate a counting system from an n-dimensional list.
1654 
1655 Usage: lincr(l,cap) l=list to increment, cap=max values for each list pos'n
1656 Returns: next set of values for list l, OR -1 (if overflow)
1657 """
1658  l[0] = l[0] + 1 # e.g., [0,0,0] --> [2,4,3] (=cap)
1659  for i in range(len(l)):
1660  if l[i] > cap[i] and i < len(l)-1: # if carryover AND not done
1661  l[i] = 0
1662  l[i+1] = l[i+1] + 1
1663  elif l[i] > cap[i] and i == len(l)-1: # overflow past last column, must be finished
1664  l = -1
1665  return l
1666 
1667 
1668 def lsum (inlist):
1669  """
1670 Returns the sum of the items in the passed list.
1671 
1672 Usage: lsum(inlist)
1673 """
1674  s = 0
1675  for item in inlist:
1676  s = s + item
1677  return s
1678 
1679 
1680 def lcumsum (inlist):
1681  """
1682 Returns a list consisting of the cumulative sum of the items in the
1683 passed list.
1684 
1685 Usage: lcumsum(inlist)
1686 """
1687  newlist = copy.deepcopy(inlist)
1688  for i in range(1,len(newlist)):
1689  newlist[i] = newlist[i] + newlist[i-1]
1690  return newlist
1691 
1692 
1693 def lss(inlist):
1694  """
1695 Squares each value in the passed list, adds up these squares and
1696 returns the result.
1697 
1698 Usage: lss(inlist)
1699 """
1700  ss = 0
1701  for item in inlist:
1702  ss = ss + item*item
1703  return ss
1704 
1705 
1706 def lsummult (list1,list2):
1707  """
1708 Multiplies elements in list1 and list2, element by element, and
1709 returns the sum of all resulting multiplications. Must provide equal
1710 length lists.
1711 
1712 Usage: lsummult(list1,list2)
1713 """
1714  if len(list1) <> len(list2):
1715  raise ValueError, "Lists not equal length in summult."
1716  s = 0
1717  for item1,item2 in pstat.abut(list1,list2):
1718  s = s + item1*item2
1719  return s
1720 
1721 
1723  """
1724 Takes pairwise differences of the values in lists x and y, squares
1725 these differences, and returns the sum of these squares.
1726 
1727 Usage: lsumdiffsquared(x,y)
1728 Returns: sum[(x[i]-y[i])**2]
1729 """
1730  sds = 0
1731  for i in range(len(x)):
1732  sds = sds + (x[i]-y[i])**2
1733  return sds
1734 
1735 
1736 def lsquare_of_sums(inlist):
1737  """
1738 Adds the values in the passed list, squares the sum, and returns
1739 the result.
1740 
1741 Usage: lsquare_of_sums(inlist)
1742 Returns: sum(inlist[i])**2
1743 """
1744  s = sum(inlist)
1745  return float(s)*s
1746 
1747 
1748 def lshellsort(inlist):
1749  """
1750 Shellsort algorithm. Sorts a 1D-list.
1751 
1752 Usage: lshellsort(inlist)
1753 Returns: sorted-inlist, sorting-index-vector (for original list)
1754 """
1755  n = len(inlist)
1756  svec = copy.deepcopy(inlist)
1757  ivec = range(n)
1758  gap = n/2 # integer division needed
1759  while gap >0:
1760  for i in range(gap,n):
1761  for j in range(i-gap,-1,-gap):
1762  while j>=0 and svec[j]>svec[j+gap]:
1763  temp = svec[j]
1764  svec[j] = svec[j+gap]
1765  svec[j+gap] = temp
1766  itemp = ivec[j]
1767  ivec[j] = ivec[j+gap]
1768  ivec[j+gap] = itemp
1769  gap = gap / 2 # integer division needed
1770 # svec is now sorted inlist, and ivec has the order svec[i] = vec[ivec[i]]
1771  return svec, ivec
1772 
1773 
1774 def lrankdata(inlist):
1775  """
1776 Ranks the data in inlist, dealing with ties appropritely. Assumes
1777 a 1D inlist. Adapted from Gary Perlman's |Stat ranksort.
1778 
1779 Usage: lrankdata(inlist)
1780 Returns: a list of length equal to inlist, containing rank scores
1781 """
1782  n = len(inlist)
1783  svec, ivec = shellsort(inlist)
1784  sumranks = 0
1785  dupcount = 0
1786  newlist = [0]*n
1787  for i in range(n):
1788  sumranks = sumranks + i
1789  dupcount = dupcount + 1
1790  if i==n-1 or svec[i] <> svec[i+1]:
1791  averank = sumranks / float(dupcount) + 1
1792  for j in range(i-dupcount+1,i+1):
1793  newlist[ivec[j]] = averank
1794  sumranks = 0
1795  dupcount = 0
1796  return newlist
1797 
1798 
1799 def outputpairedstats(fname,writemode,name1,n1,m1,se1,min1,max1,name2,n2,m2,se2,min2,max2,statname,stat,prob):
1800  """
1801 Prints or write to a file stats for two groups, using the name, n,
1802 mean, sterr, min and max for each group, as well as the statistic name,
1803 its value, and the associated p-value.
1804 
1805 Usage: outputpairedstats(fname,writemode,
1806  name1,n1,mean1,stderr1,min1,max1,
1807  name2,n2,mean2,stderr2,min2,max2,
1808  statname,stat,prob)
1809 Returns: None
1810 """
1811  suffix = '' # for *s after the p-value
1812  try:
1813  x = prob.shape
1814  prob = prob[0]
1815  except:
1816  pass
1817  if prob < 0.001: suffix = ' ***'
1818  elif prob < 0.01: suffix = ' **'
1819  elif prob < 0.05: suffix = ' *'
1820  title = [['Name','N','Mean','SD','Min','Max']]
1821  lofl = title+[[name1,n1,round(m1,3),round(math.sqrt(se1),3),min1,max1],
1822  [name2,n2,round(m2,3),round(math.sqrt(se2),3),min2,max2]]
1823  if type(fname)<>StringType or len(fname)==0:
1824  print
1825  print statname
1826  print
1827  pstat.printcc(lofl)
1828  print
1829  try:
1830  if stat.shape == ():
1831  stat = stat[0]
1832  if prob.shape == ():
1833  prob = prob[0]
1834  except:
1835  pass
1836  print 'Test statistic = ',round(stat,3),' p = ',round(prob,3),suffix
1837  print
1838  else:
1839  file = open(fname,writemode)
1840  file.write('\n'+statname+'\n\n')
1841  file.close()
1842  writecc(lofl,fname,'a')
1843  file = open(fname,'a')
1844  try:
1845  if stat.shape == ():
1846  stat = stat[0]
1847  if prob.shape == ():
1848  prob = prob[0]
1849  except:
1850  pass
1851  file.write(pstat.list2string(['\nTest statistic = ',round(stat,4),' p = ',round(prob,4),suffix,'\n\n']))
1852  file.close()
1853  return None
1854 
1855 
1856 def lfindwithin (data):
1857  """
1858 Returns an integer representing a binary vector, where 1=within-
1859 subject factor, 0=between. Input equals the entire data 2D list (i.e.,
1860 column 0=random factor, column -1=measured values (those two are skipped).
1861 Note: input data is in |Stat format ... a list of lists ("2D list") with
1862 one row per measured value, first column=subject identifier, last column=
1863 score, one in-between column per factor (these columns contain level
1864 designations on each factor). See also stats.anova.__doc__.
1865 
1866 Usage: lfindwithin(data) data in |Stat format
1867 """
1868 
1869  numfact = len(data[0])-1
1870  withinvec = 0
1871  for col in range(1,numfact):
1872  examplelevel = pstat.unique(pstat.colex(data,col))[0]
1873  rows = pstat.linexand(data,col,examplelevel) # get 1 level of this factor
1874  factsubjs = pstat.unique(pstat.colex(rows,0))
1875  allsubjs = pstat.unique(pstat.colex(data,0))
1876  if len(factsubjs) == len(allsubjs): # fewer Ss than scores on this factor?
1877  withinvec = withinvec + (1 << col)
1878  return withinvec
1879 
1880 
1881 #########################################################
1882 #########################################################
1883 ####### DISPATCH LISTS AND TUPLES TO ABOVE FCNS #########
1884 #########################################################
1885 #########################################################
1886 
1887 ## CENTRAL TENDENCY:
1888 geometricmean = Dispatch ( (lgeometricmean, (ListType, TupleType)), )
1889 harmonicmean = Dispatch ( (lharmonicmean, (ListType, TupleType)), )
1890 mean = Dispatch ( (lmean, (ListType, TupleType)), )
1891 median = Dispatch ( (lmedian, (ListType, TupleType)), )
1892 medianscore = Dispatch ( (lmedianscore, (ListType, TupleType)), )
1893 mode = Dispatch ( (lmode, (ListType, TupleType)), )
1894 
1895 ## MOMENTS:
1896 moment = Dispatch ( (lmoment, (ListType, TupleType)), )
1897 variation = Dispatch ( (lvariation, (ListType, TupleType)), )
1898 skew = Dispatch ( (lskew, (ListType, TupleType)), )
1899 kurtosis = Dispatch ( (lkurtosis, (ListType, TupleType)), )
1900 describe = Dispatch ( (ldescribe, (ListType, TupleType)), )
1901 
1902 ## FREQUENCY STATISTICS:
1903 itemfreq = Dispatch ( (litemfreq, (ListType, TupleType)), )
1904 scoreatpercentile = Dispatch ( (lscoreatpercentile, (ListType, TupleType)), )
1905 percentileofscore = Dispatch ( (lpercentileofscore, (ListType, TupleType)), )
1906 histogram = Dispatch ( (lhistogram, (ListType, TupleType)), )
1907 cumfreq = Dispatch ( (lcumfreq, (ListType, TupleType)), )
1908 relfreq = Dispatch ( (lrelfreq, (ListType, TupleType)), )
1909 
1910 ## VARIABILITY:
1911 obrientransform = Dispatch ( (lobrientransform, (ListType, TupleType)), )
1912 samplevar = Dispatch ( (lsamplevar, (ListType, TupleType)), )
1913 samplestdev = Dispatch ( (lsamplestdev, (ListType, TupleType)), )
1914 var = Dispatch ( (lvar, (ListType, TupleType)), )
1915 stdev = Dispatch ( (lstdev, (ListType, TupleType)), )
1916 sterr = Dispatch ( (lsterr, (ListType, TupleType)), )
1917 sem = Dispatch ( (lsem, (ListType, TupleType)), )
1918 z = Dispatch ( (lz, (ListType, TupleType)), )
1919 zs = Dispatch ( (lzs, (ListType, TupleType)), )
1920 
1921 ## TRIMMING FCNS:
1922 trimboth = Dispatch ( (ltrimboth, (ListType, TupleType)), )
1923 trim1 = Dispatch ( (ltrim1, (ListType, TupleType)), )
1924 
1925 ## CORRELATION FCNS:
1926 paired = Dispatch ( (lpaired, (ListType, TupleType)), )
1927 pearsonr = Dispatch ( (lpearsonr, (ListType, TupleType)), )
1928 spearmanr = Dispatch ( (lspearmanr, (ListType, TupleType)), )
1929 pointbiserialr = Dispatch ( (lpointbiserialr, (ListType, TupleType)), )
1930 kendalltau = Dispatch ( (lkendalltau, (ListType, TupleType)), )
1931 linregress = Dispatch ( (llinregress, (ListType, TupleType)), )
1932 
1933 ## INFERENTIAL STATS:
1934 ttest_1samp = Dispatch ( (lttest_1samp, (ListType, TupleType)), )
1935 ttest_ind = Dispatch ( (lttest_ind, (ListType, TupleType)), )
1936 ttest_rel = Dispatch ( (lttest_rel, (ListType, TupleType)), )
1937 chisquare = Dispatch ( (lchisquare, (ListType, TupleType)), )
1938 ks_2samp = Dispatch ( (lks_2samp, (ListType, TupleType)), )
1939 mannwhitneyu = Dispatch ( (lmannwhitneyu, (ListType, TupleType)), )
1940 ranksums = Dispatch ( (lranksums, (ListType, TupleType)), )
1941 tiecorrect = Dispatch ( (ltiecorrect, (ListType, TupleType)), )
1942 wilcoxont = Dispatch ( (lwilcoxont, (ListType, TupleType)), )
1943 kruskalwallish = Dispatch ( (lkruskalwallish, (ListType, TupleType)), )
1944 friedmanchisquare = Dispatch ( (lfriedmanchisquare, (ListType, TupleType)), )
1945 
1946 ## PROBABILITY CALCS:
1947 chisqprob = Dispatch ( (lchisqprob, (IntType, FloatType)), )
1948 zprob = Dispatch ( (lzprob, (IntType, FloatType)), )
1949 ksprob = Dispatch ( (lksprob, (IntType, FloatType)), )
1950 fprob = Dispatch ( (lfprob, (IntType, FloatType)), )
1951 betacf = Dispatch ( (lbetacf, (IntType, FloatType)), )
1952 betai = Dispatch ( (lbetai, (IntType, FloatType)), )
1953 erfcc = Dispatch ( (lerfcc, (IntType, FloatType)), )
1954 gammln = Dispatch ( (lgammln, (IntType, FloatType)), )
1955 
1956 ## ANOVA FUNCTIONS:
1957 F_oneway = Dispatch ( (lF_oneway, (ListType, TupleType)), )
1958 F_value = Dispatch ( (lF_value, (ListType, TupleType)), )
1959 
1960 ## SUPPORT FUNCTIONS:
1961 incr = Dispatch ( (lincr, (ListType, TupleType)), )
1962 sum = Dispatch ( (lsum, (ListType, TupleType)), )
1963 cumsum = Dispatch ( (lcumsum, (ListType, TupleType)), )
1964 ss = Dispatch ( (lss, (ListType, TupleType)), )
1965 summult = Dispatch ( (lsummult, (ListType, TupleType)), )
1966 square_of_sums = Dispatch ( (lsquare_of_sums, (ListType, TupleType)), )
1967 sumdiffsquared = Dispatch ( (lsumdiffsquared, (ListType, TupleType)), )
1968 shellsort = Dispatch ( (lshellsort, (ListType, TupleType)), )
1969 rankdata = Dispatch ( (lrankdata, (ListType, TupleType)), )
1970 findwithin = Dispatch ( (lfindwithin, (ListType, TupleType)), )
1971 
1972 
1973 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1974 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1975 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1976 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1977 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1978 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1979 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1980 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1981 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1982 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1983 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1984 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1985 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1986 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1987 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1988 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1989 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1990 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1991 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS ===============
1992 
1993 try: # DEFINE THESE *ONLY* IF NUMERIC IS AVAILABLE
1994  import numpy as N
1995  import numpy.linalg as LA
1996 
1997 
1998 #####################################
1999 ######## ACENTRAL TENDENCY ########
2000 #####################################
2001 
2002  def ageometricmean (inarray,dimension=None,keepdims=0):
2003  """
2004 Calculates the geometric mean of the values in the passed array.
2005 That is: n-th root of (x1 * x2 * ... * xn). Defaults to ALL values in
2006 the passed array. Use dimension=None to flatten array first. REMEMBER: if
2007 dimension=0, it collapses over dimension 0 ('rows' in a 2D array) only, and
2008 if dimension is a sequence, it collapses over all specified dimensions. If
2009 keepdims is set to 1, the resulting array will have as many dimensions as
2010 inarray, with only 1 'level' per dim that was collapsed over.
2011 
2012 Usage: ageometricmean(inarray,dimension=None,keepdims=0)
2013 Returns: geometric mean computed over dim(s) listed in dimension
2014 """
2015  inarray = N.array(inarray,N.float_)
2016  if dimension == None:
2017  inarray = N.ravel(inarray)
2018  size = len(inarray)
2019  mult = N.power(inarray,1.0/size)
2020  mult = N.multiply.reduce(mult)
2021  elif type(dimension) in [IntType,FloatType]:
2022  size = inarray.shape[dimension]
2023  mult = N.power(inarray,1.0/size)
2024  mult = N.multiply.reduce(mult,dimension)
2025  if keepdims == 1:
2026  shp = list(inarray.shape)
2027  shp[dimension] = 1
2028  sum = N.reshape(sum,shp)
2029  else: # must be a SEQUENCE of dims to average over
2030  dims = list(dimension)
2031  dims.sort()
2032  dims.reverse()
2033  size = N.array(N.multiply.reduce(N.take(inarray.shape,dims)),N.float_)
2034  mult = N.power(inarray,1.0/size)
2035  for dim in dims:
2036  mult = N.multiply.reduce(mult,dim)
2037  if keepdims == 1:
2038  shp = list(inarray.shape)
2039  for dim in dims:
2040  shp[dim] = 1
2041  mult = N.reshape(mult,shp)
2042  return mult
2043 
2044 
2045  def aharmonicmean (inarray,dimension=None,keepdims=0):
2046  """
2047 Calculates the harmonic mean of the values in the passed array.
2048 That is: n / (1/x1 + 1/x2 + ... + 1/xn). Defaults to ALL values in
2049 the passed array. Use dimension=None to flatten array first. REMEMBER: if
2050 dimension=0, it collapses over dimension 0 ('rows' in a 2D array) only, and
2051 if dimension is a sequence, it collapses over all specified dimensions. If
2052 keepdims is set to 1, the resulting array will have as many dimensions as
2053 inarray, with only 1 'level' per dim that was collapsed over.
2054 
2055 Usage: aharmonicmean(inarray,dimension=None,keepdims=0)
2056 Returns: harmonic mean computed over dim(s) in dimension
2057 """
2058  inarray = inarray.astype(N.float_)
2059  if dimension == None:
2060  inarray = N.ravel(inarray)
2061  size = len(inarray)
2062  s = N.add.reduce(1.0 / inarray)
2063  elif type(dimension) in [IntType,FloatType]:
2064  size = float(inarray.shape[dimension])
2065  s = N.add.reduce(1.0/inarray, dimension)
2066  if keepdims == 1:
2067  shp = list(inarray.shape)
2068  shp[dimension] = 1
2069  s = N.reshape(s,shp)
2070  else: # must be a SEQUENCE of dims to average over
2071  dims = list(dimension)
2072  dims.sort()
2073  nondims = []
2074  for i in range(len(inarray.shape)):
2075  if i not in dims:
2076  nondims.append(i)
2077  tinarray = N.transpose(inarray,nondims+dims) # put keep-dims first
2078  idx = [0] *len(nondims)
2079  if idx == []:
2080  size = len(N.ravel(inarray))
2081  s = asum(1.0 / inarray)
2082  if keepdims == 1:
2083  s = N.reshape([s],N.ones(len(inarray.shape)))
2084  else:
2085  idx[0] = -1
2086  loopcap = N.array(tinarray.shape[0:len(nondims)]) -1
2087  s = N.zeros(loopcap+1,N.float_)
2088  while incr(idx,loopcap) <> -1:
2089  s[idx] = asum(1.0/tinarray[idx])
2090  size = N.multiply.reduce(N.take(inarray.shape,dims))
2091  if keepdims == 1:
2092  shp = list(inarray.shape)
2093  for dim in dims:
2094  shp[dim] = 1
2095  s = N.reshape(s,shp)
2096  return size / s
2097 
2098 
2099  def amean (inarray,dimension=None,keepdims=0):
2100  """
2101 Calculates the arithmatic mean of the values in the passed array.
2102 That is: 1/n * (x1 + x2 + ... + xn). Defaults to ALL values in the
2103 passed array. Use dimension=None to flatten array first. REMEMBER: if
2104 dimension=0, it collapses over dimension 0 ('rows' in a 2D array) only, and
2105 if dimension is a sequence, it collapses over all specified dimensions. If
2106 keepdims is set to 1, the resulting array will have as many dimensions as
2107 inarray, with only 1 'level' per dim that was collapsed over.
2108 
2109 Usage: amean(inarray,dimension=None,keepdims=0)
2110 Returns: arithematic mean calculated over dim(s) in dimension
2111 """
2112  if inarray.dtype in [N.int_, N.short,N.ubyte]:
2113  inarray = inarray.astype(N.float_)
2114  if dimension == None:
2115  inarray = N.ravel(inarray)
2116  sum = N.add.reduce(inarray)
2117  denom = float(len(inarray))
2118  elif type(dimension) in [IntType,FloatType]:
2119  sum = asum(inarray,dimension)
2120  denom = float(inarray.shape[dimension])
2121  if keepdims == 1:
2122  shp = list(inarray.shape)
2123  shp[dimension] = 1
2124  sum = N.reshape(sum,shp)
2125  else: # must be a TUPLE of dims to average over
2126  dims = list(dimension)
2127  dims.sort()
2128  dims.reverse()
2129  sum = inarray *1.0
2130  for dim in dims:
2131  sum = N.add.reduce(sum,dim)
2132  denom = N.array(N.multiply.reduce(N.take(inarray.shape,dims)),N.float_)
2133  if keepdims == 1:
2134  shp = list(inarray.shape)
2135  for dim in dims:
2136  shp[dim] = 1
2137  sum = N.reshape(sum,shp)
2138  return sum/denom
2139 
2140 
2141  def amedian (inarray,numbins=1000):
2142  """
2143 Calculates the COMPUTED median value of an array of numbers, given the
2144 number of bins to use for the histogram (more bins approaches finding the
2145 precise median value of the array; default number of bins = 1000). From
2146 G.W. Heiman's Basic Stats, or CRC Probability & Statistics.
2147 NOTE: THIS ROUTINE ALWAYS uses the entire passed array (flattens it first).
2148 
2149 Usage: amedian(inarray,numbins=1000)
2150 Returns: median calculated over ALL values in inarray
2151 """
2152  inarray = N.ravel(inarray)
2153  (hist, smallest, binsize, extras) = ahistogram(inarray,numbins,[min(inarray),max(inarray)])
2154  cumhist = N.cumsum(hist) # make cumulative histogram
2155  otherbins = N.greater_equal(cumhist,len(inarray)/2.0)
2156  otherbins = list(otherbins) # list of 0/1s, 1s start at median bin
2157  cfbin = otherbins.index(1) # get 1st(!) index holding 50%ile score
2158  LRL = smallest + binsize*cfbin # get lower read limit of that bin
2159  cfbelow = N.add.reduce(hist[0:cfbin]) # cum. freq. below bin
2160  freq = hist[cfbin] # frequency IN the 50%ile bin
2161  median = LRL + ((len(inarray)/2.0-cfbelow)/float(freq))*binsize # MEDIAN
2162  return median
2163 
2164 
2165  def amedianscore (inarray,dimension=None):
2166  """
2167 Returns the 'middle' score of the passed array. If there is an even
2168 number of scores, the mean of the 2 middle scores is returned. Can function
2169 with 1D arrays, or on the FIRST dimension of 2D arrays (i.e., dimension can
2170 be None, to pre-flatten the array, or else dimension must equal 0).
2171 
2172 Usage: amedianscore(inarray,dimension=None)
2173 Returns: 'middle' score of the array, or the mean of the 2 middle scores
2174 """
2175  if dimension == None:
2176  inarray = N.ravel(inarray)
2177  dimension = 0
2178  inarray = N.sort(inarray,dimension)
2179  if inarray.shape[dimension] % 2 == 0: # if even number of elements
2180  indx = inarray.shape[dimension]/2 # integer division correct
2181  median = N.asarray(inarray[indx]+inarray[indx-1]) / 2.0
2182  else:
2183  indx = inarray.shape[dimension] / 2 # integer division correct
2184  median = N.take(inarray,[indx],dimension)
2185  if median.shape == (1,):
2186  median = median[0]
2187  return median
2188 
2189 
2190  def amode(a, dimension=None):
2191  """
2192 Returns an array of the modal (most common) score in the passed array.
2193 If there is more than one such score, ONLY THE FIRST is returned.
2194 The bin-count for the modal values is also returned. Operates on whole
2195 array (dimension=None), or on a given dimension.
2196 
2197 Usage: amode(a, dimension=None)
2198 Returns: array of bin-counts for mode(s), array of corresponding modal values
2199 """
2200 
2201  if dimension == None:
2202  a = N.ravel(a)
2203  dimension = 0
2204  scores = pstat.aunique(N.ravel(a)) # get ALL unique values
2205  testshape = list(a.shape)
2206  testshape[dimension] = 1
2207  oldmostfreq = N.zeros(testshape)
2208  oldcounts = N.zeros(testshape)
2209  for score in scores:
2210  template = N.equal(a,score)
2211  counts = asum(template,dimension,1)
2212  mostfrequent = N.where(counts>oldcounts,score,oldmostfreq)
2213  oldcounts = N.where(counts>oldcounts,counts,oldcounts)
2214  oldmostfreq = mostfrequent
2215  return oldcounts, mostfrequent
2216 
2217 
2218  def atmean(a,limits=None,inclusive=(1,1)):
2219  """
2220 Returns the arithmetic mean of all values in an array, ignoring values
2221 strictly outside the sequence passed to 'limits'. Note: either limit
2222 in the sequence, or the value of limits itself, can be set to None. The
2223 inclusive list/tuple determines whether the lower and upper limiting bounds
2224 (respectively) are open/exclusive (0) or closed/inclusive (1).
2225 
2226 Usage: atmean(a,limits=None,inclusive=(1,1))
2227 """
2228  if a.dtype in [N.int_, N.short,N.ubyte]:
2229  a = a.astype(N.float_)
2230  if limits == None:
2231  return mean(a)
2232  assert type(limits) in [ListType,TupleType,N.ndarray], "Wrong type for limits in atmean"
2233  if inclusive[0]: lowerfcn = N.greater_equal
2234  else: lowerfcn = N.greater
2235  if inclusive[1]: upperfcn = N.less_equal
2236  else: upperfcn = N.less
2237  if limits[0] > N.maximum.reduce(N.ravel(a)) or limits[1] < N.minimum.reduce(N.ravel(a)):
2238  raise ValueError, "No array values within given limits (atmean)."
2239  elif limits[0]==None and limits[1]<>None:
2240  mask = upperfcn(a,limits[1])
2241  elif limits[0]<>None and limits[1]==None:
2242  mask = lowerfcn(a,limits[0])
2243  elif limits[0]<>None and limits[1]<>None:
2244  mask = lowerfcn(a,limits[0])*upperfcn(a,limits[1])
2245  s = float(N.add.reduce(N.ravel(a*mask)))
2246  n = float(N.add.reduce(N.ravel(mask)))
2247  return s/n
2248 
2249 
2250  def atvar(a,limits=None,inclusive=(1,1)):
2251  """
2252 Returns the sample variance of values in an array, (i.e., using N-1),
2253 ignoring values strictly outside the sequence passed to 'limits'.
2254 Note: either limit in the sequence, or the value of limits itself,
2255 can be set to None. The inclusive list/tuple determines whether the lower
2256 and upper limiting bounds (respectively) are open/exclusive (0) or
2257 closed/inclusive (1). ASSUMES A FLAT ARRAY (OR ELSE PREFLATTENS).
2258 
2259 Usage: atvar(a,limits=None,inclusive=(1,1))
2260 """
2261  a = a.astype(N.float_)
2262  if limits == None or limits == [None,None]:
2263  return avar(a)
2264  assert type(limits) in [ListType,TupleType,N.ndarray], "Wrong type for limits in atvar"
2265  if inclusive[0]: lowerfcn = N.greater_equal
2266  else: lowerfcn = N.greater
2267  if inclusive[1]: upperfcn = N.less_equal
2268  else: upperfcn = N.less
2269  if limits[0] > N.maximum.reduce(N.ravel(a)) or limits[1] < N.minimum.reduce(N.ravel(a)):
2270  raise ValueError, "No array values within given limits (atvar)."
2271  elif limits[0]==None and limits[1]<>None:
2272  mask = upperfcn(a,limits[1])
2273  elif limits[0]<>None and limits[1]==None:
2274  mask = lowerfcn(a,limits[0])
2275  elif limits[0]<>None and limits[1]<>None:
2276  mask = lowerfcn(a,limits[0])*upperfcn(a,limits[1])
2277 
2278  a = N.compress(mask,a) # squish out excluded values
2279  return avar(a)
2280 
2281 
2282  def atmin(a,lowerlimit=None,dimension=None,inclusive=1):
2283  """
2284 Returns the minimum value of a, along dimension, including only values less
2285 than (or equal to, if inclusive=1) lowerlimit. If the limit is set to None,
2286 all values in the array are used.
2287 
2288 Usage: atmin(a,lowerlimit=None,dimension=None,inclusive=1)
2289 """
2290  if inclusive: lowerfcn = N.greater
2291  else: lowerfcn = N.greater_equal
2292  if dimension == None:
2293  a = N.ravel(a)
2294  dimension = 0
2295  if lowerlimit == None:
2296  lowerlimit = N.minimum.reduce(N.ravel(a))-11
2297  biggest = N.maximum.reduce(N.ravel(a))
2298  ta = N.where(lowerfcn(a,lowerlimit),a,biggest)
2299  return N.minimum.reduce(ta,dimension)
2300 
2301 
2302  def atmax(a,upperlimit,dimension=None,inclusive=1):
2303  """
2304 Returns the maximum value of a, along dimension, including only values greater
2305 than (or equal to, if inclusive=1) upperlimit. If the limit is set to None,
2306 a limit larger than the max value in the array is used.
2307 
2308 Usage: atmax(a,upperlimit,dimension=None,inclusive=1)
2309 """
2310  if inclusive: upperfcn = N.less
2311  else: upperfcn = N.less_equal
2312  if dimension == None:
2313  a = N.ravel(a)
2314  dimension = 0
2315  if upperlimit == None:
2316  upperlimit = N.maximum.reduce(N.ravel(a))+1
2317  smallest = N.minimum.reduce(N.ravel(a))
2318  ta = N.where(upperfcn(a,upperlimit),a,smallest)
2319  return N.maximum.reduce(ta,dimension)
2320 
2321 
2322  def atstdev(a,limits=None,inclusive=(1,1)):
2323  """
2324 Returns the standard deviation of all values in an array, ignoring values
2325 strictly outside the sequence passed to 'limits'. Note: either limit
2326 in the sequence, or the value of limits itself, can be set to None. The
2327 inclusive list/tuple determines whether the lower and upper limiting bounds
2328 (respectively) are open/exclusive (0) or closed/inclusive (1).
2329 
2330 Usage: atstdev(a,limits=None,inclusive=(1,1))
2331 """
2332  return N.sqrt(tvar(a,limits,inclusive))
2333 
2334 
2335  def atsem(a,limits=None,inclusive=(1,1)):
2336  """
2337 Returns the standard error of the mean for the values in an array,
2338 (i.e., using N for the denominator), ignoring values strictly outside
2339 the sequence passed to 'limits'. Note: either limit in the sequence,
2340 or the value of limits itself, can be set to None. The inclusive list/tuple
2341 determines whether the lower and upper limiting bounds (respectively) are
2342 open/exclusive (0) or closed/inclusive (1).
2343 
2344 Usage: atsem(a,limits=None,inclusive=(1,1))
2345 """
2346  sd = tstdev(a,limits,inclusive)
2347  if limits == None or limits == [None,None]:
2348  n = float(len(N.ravel(a)))
2349  limits = [min(a)-1, max(a)+1]
2350  assert type(limits) in [ListType,TupleType,N.ndarray], "Wrong type for limits in atsem"
2351  if inclusive[0]: lowerfcn = N.greater_equal
2352  else: lowerfcn = N.greater
2353  if inclusive[1]: upperfcn = N.less_equal
2354  else: upperfcn = N.less
2355  if limits[0] > N.maximum.reduce(N.ravel(a)) or limits[1] < N.minimum.reduce(N.ravel(a)):
2356  raise ValueError, "No array values within given limits (atsem)."
2357  elif limits[0]==None and limits[1]<>None:
2358  mask = upperfcn(a,limits[1])
2359  elif limits[0]<>None and limits[1]==None:
2360  mask = lowerfcn(a,limits[0])
2361  elif limits[0]<>None and limits[1]<>None:
2362  mask = lowerfcn(a,limits[0])*upperfcn(a,limits[1])
2363  term1 = N.add.reduce(N.ravel(a*a*mask))
2364  n = float(N.add.reduce(N.ravel(mask)))
2365  return sd/math.sqrt(n)
2366 
2367 
2368 #####################################
2369 ############ AMOMENTS #############
2370 #####################################
2371 
2372  def amoment(a,moment=1,dimension=None):
2373  """
2374 Calculates the nth moment about the mean for a sample (defaults to the
2375 1st moment). Generally used to calculate coefficients of skewness and
2376 kurtosis. Dimension can equal None (ravel array first), an integer
2377 (the dimension over which to operate), or a sequence (operate over
2378 multiple dimensions).
2379 
2380 Usage: amoment(a,moment=1,dimension=None)
2381 Returns: appropriate moment along given dimension
2382 """
2383  if dimension == None:
2384  a = N.ravel(a)
2385  dimension = 0
2386  if moment == 1:
2387  return 0.0
2388  else:
2389  mn = amean(a,dimension,1) # 1=keepdims
2390  s = N.power((a-mn),moment)
2391  return amean(s,dimension)
2392 
2393 
2394  def avariation(a,dimension=None):
2395  """
2396 Returns the coefficient of variation, as defined in CRC Standard
2397 Probability and Statistics, p.6. Dimension can equal None (ravel array
2398 first), an integer (the dimension over which to operate), or a
2399 sequence (operate over multiple dimensions).
2400 
2401 Usage: avariation(a,dimension=None)
2402 """
2403  return 100.0*asamplestdev(a,dimension)/amean(a,dimension)
2404 
2405 
2406  def askew(a,dimension=None):
2407  """
2408 Returns the skewness of a distribution (normal ==> 0.0; >0 means extra
2409 weight in left tail). Use askewtest() to see if it's close enough.
2410 Dimension can equal None (ravel array first), an integer (the
2411 dimension over which to operate), or a sequence (operate over multiple
2412 dimensions).
2413 
2414 Usage: askew(a, dimension=None)
2415 Returns: skew of vals in a along dimension, returning ZERO where all vals equal
2416 """
2417  denom = N.power(amoment(a,2,dimension),1.5)
2418  zero = N.equal(denom,0)
2419  if type(denom) == N.ndarray and asum(zero) <> 0:
2420  print "Number of zeros in askew: ",asum(zero)
2421  denom = denom + zero # prevent divide-by-zero
2422  return N.where(zero, 0, amoment(a,3,dimension)/denom)
2423 
2424 
2425  def akurtosis(a,dimension=None):
2426  """
2427 Returns the kurtosis of a distribution (normal ==> 3.0; >3 means
2428 heavier in the tails, and usually more peaked). Use akurtosistest()
2429 to see if it's close enough. Dimension can equal None (ravel array
2430 first), an integer (the dimension over which to operate), or a
2431 sequence (operate over multiple dimensions).
2432 
2433 Usage: akurtosis(a,dimension=None)
2434 Returns: kurtosis of values in a along dimension, and ZERO where all vals equal
2435 """
2436  denom = N.power(amoment(a,2,dimension),2)
2437  zero = N.equal(denom,0)
2438  if type(denom) == N.ndarray and asum(zero) <> 0:
2439  print "Number of zeros in akurtosis: ",asum(zero)
2440  denom = denom + zero # prevent divide-by-zero
2441  return N.where(zero,0,amoment(a,4,dimension)/denom)
2442 
2443 
2444  def adescribe(inarray,dimension=None):
2445  """
2446 Returns several descriptive statistics of the passed array. Dimension
2447 can equal None (ravel array first), an integer (the dimension over
2448 which to operate), or a sequence (operate over multiple dimensions).
2449 
2450 Usage: adescribe(inarray,dimension=None)
2451 Returns: n, (min,max), mean, standard deviation, skew, kurtosis
2452 """
2453  if dimension == None:
2454  inarray = N.ravel(inarray)
2455  dimension = 0
2456  n = inarray.shape[dimension]
2457  mm = (N.minimum.reduce(inarray),N.maximum.reduce(inarray))
2458  m = amean(inarray,dimension)
2459  sd = astdev(inarray,dimension)
2460  skew = askew(inarray,dimension)
2461  kurt = akurtosis(inarray,dimension)
2462  return n, mm, m, sd, skew, kurt
2463 
2464 
2465 #####################################
2466 ######## NORMALITY TESTS ##########
2467 #####################################
2468 
2469  def askewtest(a,dimension=None):
2470  """
2471 Tests whether the skew is significantly different from a normal
2472 distribution. Dimension can equal None (ravel array first), an
2473 integer (the dimension over which to operate), or a sequence (operate
2474 over multiple dimensions).
2475 
2476 Usage: askewtest(a,dimension=None)
2477 Returns: z-score and 2-tail z-probability
2478 """
2479  if dimension == None:
2480  a = N.ravel(a)
2481  dimension = 0
2482  b2 = askew(a,dimension)
2483  n = float(a.shape[dimension])
2484  y = b2 * N.sqrt(((n+1)*(n+3)) / (6.0*(n-2)) )
2485  beta2 = ( 3.0*(n*n+27*n-70)*(n+1)*(n+3) ) / ( (n-2.0)*(n+5)*(n+7)*(n+9) )
2486  W2 = -1 + N.sqrt(2*(beta2-1))
2487  delta = 1/N.sqrt(N.log(N.sqrt(W2)))
2488  alpha = N.sqrt(2/(W2-1))
2489  y = N.where(y==0,1,y)
2490  Z = delta*N.log(y/alpha + N.sqrt((y/alpha)**2+1))
2491  return Z, (1.0-zprob(Z))*2
2492 
2493 
2494  def akurtosistest(a,dimension=None):
2495  """
2496 Tests whether a dataset has normal kurtosis (i.e.,
2497 kurtosis=3(n-1)/(n+1)) Valid only for n>20. Dimension can equal None
2498 (ravel array first), an integer (the dimension over which to operate),
2499 or a sequence (operate over multiple dimensions).
2500 
2501 Usage: akurtosistest(a,dimension=None)
2502 Returns: z-score and 2-tail z-probability, returns 0 for bad pixels
2503 """
2504  if dimension == None:
2505  a = N.ravel(a)
2506  dimension = 0
2507  n = float(a.shape[dimension])
2508  if n<20:
2509  print "akurtosistest only valid for n>=20 ... continuing anyway, n=",n
2510  b2 = akurtosis(a,dimension)
2511  E = 3.0*(n-1) /(n+1)
2512  varb2 = 24.0*n*(n-2)*(n-3) / ((n+1)*(n+1)*(n+3)*(n+5))
2513  x = (b2-E)/N.sqrt(varb2)
2514  sqrtbeta1 = 6.0*(n*n-5*n+2)/((n+7)*(n+9)) * N.sqrt((6.0*(n+3)*(n+5))/
2515  (n*(n-2)*(n-3)))
2516  A = 6.0 + 8.0/sqrtbeta1 *(2.0/sqrtbeta1 + N.sqrt(1+4.0/(sqrtbeta1**2)))
2517  term1 = 1 -2/(9.0*A)
2518  denom = 1 +x*N.sqrt(2/(A-4.0))
2519  denom = N.where(N.less(denom,0), 99, denom)
2520  term2 = N.where(N.equal(denom,0), term1, N.power((1-2.0/A)/denom,1/3.0))
2521  Z = ( term1 - term2 ) / N.sqrt(2/(9.0*A))
2522  Z = N.where(N.equal(denom,99), 0, Z)
2523  return Z, (1.0-zprob(Z))*2
2524 
2525 
2526  def anormaltest(a,dimension=None):
2527  """
2528 Tests whether skew and/OR kurtosis of dataset differs from normal
2529 curve. Can operate over multiple dimensions. Dimension can equal
2530 None (ravel array first), an integer (the dimension over which to
2531 operate), or a sequence (operate over multiple dimensions).
2532 
2533 Usage: anormaltest(a,dimension=None)
2534 Returns: z-score and 2-tail probability
2535 """
2536  if dimension == None:
2537  a = N.ravel(a)
2538  dimension = 0
2539  s,p = askewtest(a,dimension)
2540  k,p = akurtosistest(a,dimension)
2541  k2 = N.power(s,2) + N.power(k,2)
2542  return k2, achisqprob(k2,2)
2543 
2544 
2545 #####################################
2546 ###### AFREQUENCY FUNCTIONS #######
2547 #####################################
2548 
2549  def aitemfreq(a):
2550  """
2551 Returns a 2D array of item frequencies. Column 1 contains item values,
2552 column 2 contains their respective counts. Assumes a 1D array is passed.
2553 @@@sorting OK?
2554 
2555 Usage: aitemfreq(a)
2556 Returns: a 2D frequency table (col [0:n-1]=scores, col n=frequencies)
2557 """
2558  scores = pstat.aunique(a)
2559  scores = N.sort(scores)
2560  freq = N.zeros(len(scores))
2561  for i in range(len(scores)):
2562  freq[i] = N.add.reduce(N.equal(a,scores[i]))
2563  return N.array(pstat.aabut(scores, freq))
2564 
2565 
2566  def ascoreatpercentile (inarray, percent):
2567  """
2568 Usage: ascoreatpercentile(inarray,percent) 0<percent<100
2569 Returns: score at given percentile, relative to inarray distribution
2570 """
2571  percent = percent / 100.0
2572  targetcf = percent*len(inarray)
2573  h, lrl, binsize, extras = histogram(inarray)
2574  cumhist = cumsum(h*1)
2575  for i in range(len(cumhist)):
2576  if cumhist[i] >= targetcf:
2577  break
2578  score = binsize * ((targetcf - cumhist[i-1]) / float(h[i])) + (lrl+binsize*i)
2579  return score
2580 
2581 
2582  def apercentileofscore (inarray,score,histbins=10,defaultlimits=None):
2583  """
2584 Note: result of this function depends on the values used to histogram
2585 the data(!).
2586 
2587 Usage: apercentileofscore(inarray,score,histbins=10,defaultlimits=None)
2588 Returns: percentile-position of score (0-100) relative to inarray
2589 """
2590  h, lrl, binsize, extras = histogram(inarray,histbins,defaultlimits)
2591  cumhist = cumsum(h*1)
2592  i = int((score - lrl)/float(binsize))
2593  pct = (cumhist[i-1]+((score-(lrl+binsize*i))/float(binsize))*h[i])/float(len(inarray)) * 100
2594  return pct
2595 
2596 
2597  def ahistogram (inarray,numbins=10,defaultlimits=None,printextras=1):
2598  """
2599 Returns (i) an array of histogram bin counts, (ii) the smallest value
2600 of the histogram binning, and (iii) the bin width (the last 2 are not
2601 necessarily integers). Default number of bins is 10. Defaultlimits
2602 can be None (the routine picks bins spanning all the numbers in the
2603 inarray) or a 2-sequence (lowerlimit, upperlimit). Returns all of the
2604 following: array of bin values, lowerreallimit, binsize, extrapoints.
2605 
2606 Usage: ahistogram(inarray,numbins=10,defaultlimits=None,printextras=1)
2607 Returns: (array of bin counts, bin-minimum, min-width, #-points-outside-range)
2608 """
2609  inarray = N.ravel(inarray) # flatten any >1D arrays
2610  if (defaultlimits <> None):
2611  lowerreallimit = defaultlimits[0]
2612  upperreallimit = defaultlimits[1]
2613  binsize = (upperreallimit-lowerreallimit) / float(numbins)
2614  else:
2615  Min = N.minimum.reduce(inarray)
2616  Max = N.maximum.reduce(inarray)
2617  estbinwidth = float(Max - Min)/float(numbins) + 1e-6
2618  binsize = (Max-Min+estbinwidth)/float(numbins)
2619  lowerreallimit = Min - binsize/2.0 #lower real limit,1st bin
2620  bins = N.zeros(numbins)
2621  extrapoints = 0
2622  for num in inarray:
2623  try:
2624  if (num-lowerreallimit) < 0:
2625  extrapoints = extrapoints + 1
2626  else:
2627  bintoincrement = int((num-lowerreallimit) / float(binsize))
2628  bins[bintoincrement] = bins[bintoincrement] + 1
2629  except: # point outside lower/upper limits
2630  extrapoints = extrapoints + 1
2631  if (extrapoints > 0 and printextras == 1):
2632  print '\nPoints outside given histogram range =',extrapoints
2633  return (bins, lowerreallimit, binsize, extrapoints)
2634 
2635 
2636  def acumfreq(a,numbins=10,defaultreallimits=None):
2637  """
2638 Returns a cumulative frequency histogram, using the histogram function.
2639 Defaultreallimits can be None (use all data), or a 2-sequence containing
2640 lower and upper limits on values to include.
2641 
2642 Usage: acumfreq(a,numbins=10,defaultreallimits=None)
2643 Returns: array of cumfreq bin values, lowerreallimit, binsize, extrapoints
2644 """
2645  h,l,b,e = histogram(a,numbins,defaultreallimits)
2646  cumhist = cumsum(h*1)
2647  return cumhist,l,b,e
2648 
2649 
2650  def arelfreq(a,numbins=10,defaultreallimits=None):
2651  """
2652 Returns a relative frequency histogram, using the histogram function.
2653 Defaultreallimits can be None (use all data), or a 2-sequence containing
2654 lower and upper limits on values to include.
2655 
2656 Usage: arelfreq(a,numbins=10,defaultreallimits=None)
2657 Returns: array of cumfreq bin values, lowerreallimit, binsize, extrapoints
2658 """
2659  h,l,b,e = histogram(a,numbins,defaultreallimits)
2660  h = N.array(h/float(a.shape[0]))
2661  return h,l,b,e
2662 
2663 
2664 #####################################
2665 ###### AVARIABILITY FUNCTIONS #####
2666 #####################################
2667 
2668  def aobrientransform(*args):
2669  """
2670 Computes a transform on input data (any number of columns). Used to
2671 test for homogeneity of variance prior to running one-way stats. Each
2672 array in *args is one level of a factor. If an F_oneway() run on the
2673 transformed data and found significant, variances are unequal. From
2674 Maxwell and Delaney, p.112.
2675 
2676 Usage: aobrientransform(*args) *args = 1D arrays, one per level of factor
2677 Returns: transformed data for use in an ANOVA
2678 """
2679  TINY = 1e-10
2680  k = len(args)
2681  n = N.zeros(k,N.float_)
2682  v = N.zeros(k,N.float_)
2683  m = N.zeros(k,N.float_)
2684  nargs = []
2685  for i in range(k):
2686  nargs.append(args[i].astype(N.float_))
2687  n[i] = float(len(nargs[i]))
2688  v[i] = var(nargs[i])
2689  m[i] = mean(nargs[i])
2690  for j in range(k):
2691  for i in range(n[j]):
2692  t1 = (n[j]-1.5)*n[j]*(nargs[j][i]-m[j])**2
2693  t2 = 0.5*v[j]*(n[j]-1.0)
2694  t3 = (n[j]-1.0)*(n[j]-2.0)
2695  nargs[j][i] = (t1-t2) / float(t3)
2696  check = 1
2697  for j in range(k):
2698  if v[j] - mean(nargs[j]) > TINY:
2699  check = 0
2700  if check <> 1:
2701  raise ValueError, 'Lack of convergence in obrientransform.'
2702  else:
2703  return N.array(nargs)
2704 
2705 
2706  def asamplevar (inarray,dimension=None,keepdims=0):
2707  """
2708 Returns the sample standard deviation of the values in the passed
2709 array (i.e., using N). Dimension can equal None (ravel array first),
2710 an integer (the dimension over which to operate), or a sequence
2711 (operate over multiple dimensions). Set keepdims=1 to return an array
2712 with the same number of dimensions as inarray.
2713 
2714 Usage: asamplevar(inarray,dimension=None,keepdims=0)
2715 """
2716  if dimension == None:
2717  inarray = N.ravel(inarray)
2718  dimension = 0
2719  if dimension == 1:
2720  mn = amean(inarray,dimension)[:,N.NewAxis]
2721  else:
2722  mn = amean(inarray,dimension,keepdims=1)
2723  deviations = inarray - mn
2724  if type(dimension) == ListType:
2725  n = 1
2726  for d in dimension:
2727  n = n*inarray.shape[d]
2728  else:
2729  n = inarray.shape[dimension]
2730  svar = ass(deviations,dimension,keepdims) / float(n)
2731  return svar
2732 
2733 
2734  def asamplestdev (inarray, dimension=None, keepdims=0):
2735  """
2736 Returns the sample standard deviation of the values in the passed
2737 array (i.e., using N). Dimension can equal None (ravel array first),
2738 an integer (the dimension over which to operate), or a sequence
2739 (operate over multiple dimensions). Set keepdims=1 to return an array
2740 with the same number of dimensions as inarray.
2741 
2742 Usage: asamplestdev(inarray,dimension=None,keepdims=0)
2743 """
2744  return N.sqrt(asamplevar(inarray,dimension,keepdims))
2745 
2746 
2747  def asignaltonoise(instack,dimension=0):
2748  """
2749 Calculates signal-to-noise. Dimension can equal None (ravel array
2750 first), an integer (the dimension over which to operate), or a
2751 sequence (operate over multiple dimensions).
2752 
2753 Usage: asignaltonoise(instack,dimension=0):
2754 Returns: array containing the value of (mean/stdev) along dimension,
2755  or 0 when stdev=0
2756 """
2757  m = mean(instack,dimension)
2758  sd = stdev(instack,dimension)
2759  return N.where(sd==0,0,m/sd)
2760 
2761 
2762  def acov (x,y, dimension=None,keepdims=0):
2763  """
2764 Returns the estimated covariance of the values in the passed
2765 array (i.e., N-1). Dimension can equal None (ravel array first), an
2766 integer (the dimension over which to operate), or a sequence (operate
2767 over multiple dimensions). Set keepdims=1 to return an array with the
2768 same number of dimensions as inarray.
2769 
2770 Usage: acov(x,y,dimension=None,keepdims=0)
2771 """
2772  if dimension == None:
2773  x = N.ravel(x)
2774  y = N.ravel(y)
2775  dimension = 0
2776  xmn = amean(x,dimension,1) # keepdims
2777  xdeviations = x - xmn
2778  ymn = amean(y,dimension,1) # keepdims
2779  ydeviations = y - ymn
2780  if type(dimension) == ListType:
2781  n = 1
2782  for d in dimension:
2783  n = n*x.shape[d]
2784  else:
2785  n = x.shape[dimension]
2786  covar = N.sum(xdeviations*ydeviations)/float(n-1)
2787  return covar
2788 
2789 
2790  def avar (inarray, dimension=None,keepdims=0):
2791  """
2792 Returns the estimated population variance of the values in the passed
2793 array (i.e., N-1). Dimension can equal None (ravel array first), an
2794 integer (the dimension over which to operate), or a sequence (operate
2795 over multiple dimensions). Set keepdims=1 to return an array with the
2796 same number of dimensions as inarray.
2797 
2798 Usage: avar(inarray,dimension=None,keepdims=0)
2799 """
2800  if dimension == None:
2801  inarray = N.ravel(inarray)
2802  dimension = 0
2803  mn = amean(inarray,dimension,1)
2804  deviations = inarray - mn
2805  if type(dimension) == ListType:
2806  n = 1
2807  for d in dimension:
2808  n = n*inarray.shape[d]
2809  else:
2810  n = inarray.shape[dimension]
2811  var = ass(deviations,dimension,keepdims)/float(n-1)
2812  return var
2813 
2814 
2815  def astdev (inarray, dimension=None, keepdims=0):
2816  """
2817 Returns the estimated population standard deviation of the values in
2818 the passed array (i.e., N-1). Dimension can equal None (ravel array
2819 first), an integer (the dimension over which to operate), or a
2820 sequence (operate over multiple dimensions). Set keepdims=1 to return
2821 an array with the same number of dimensions as inarray.
2822 
2823 Usage: astdev(inarray,dimension=None,keepdims=0)
2824 """
2825  return N.sqrt(avar(inarray,dimension,keepdims))
2826 
2827 
2828  def asterr (inarray, dimension=None, keepdims=0):
2829  """
2830 Returns the estimated population standard error of the values in the
2831 passed array (i.e., N-1). Dimension can equal None (ravel array
2832 first), an integer (the dimension over which to operate), or a
2833 sequence (operate over multiple dimensions). Set keepdims=1 to return
2834 an array with the same number of dimensions as inarray.
2835 
2836 Usage: asterr(inarray,dimension=None,keepdims=0)
2837 """
2838  if dimension == None:
2839  inarray = N.ravel(inarray)
2840  dimension = 0
2841  return astdev(inarray,dimension,keepdims) / float(N.sqrt(inarray.shape[dimension]))
2842 
2843 
2844  def asem (inarray, dimension=None, keepdims=0):
2845  """
2846 Returns the standard error of the mean (i.e., using N) of the values
2847 in the passed array. Dimension can equal None (ravel array first), an
2848 integer (the dimension over which to operate), or a sequence (operate
2849 over multiple dimensions). Set keepdims=1 to return an array with the
2850 same number of dimensions as inarray.
2851 
2852 Usage: asem(inarray,dimension=None, keepdims=0)
2853 """
2854  if dimension == None:
2855  inarray = N.ravel(inarray)
2856  dimension = 0
2857  if type(dimension) == ListType:
2858  n = 1
2859  for d in dimension:
2860  n = n*inarray.shape[d]
2861  else:
2862  n = inarray.shape[dimension]
2863  s = asamplestdev(inarray,dimension,keepdims) / N.sqrt(n-1)
2864  return s
2865 
2866 
2867  def az (a, score):
2868  """
2869 Returns the z-score of a given input score, given thearray from which
2870 that score came. Not appropriate for population calculations, nor for
2871 arrays > 1D.
2872 
2873 Usage: az(a, score)
2874 """
2875  z = (score-amean(a)) / asamplestdev(a)
2876  return z
2877 
2878 
2879  def azs (a):
2880  """
2881 Returns a 1D array of z-scores, one for each score in the passed array,
2882 computed relative to the passed array.
2883 
2884 Usage: azs(a)
2885 """
2886  zscores = []
2887  for item in a:
2888  zscores.append(z(a,item))
2889  return N.array(zscores)
2890 
2891 
2892  def azmap (scores, compare, dimension=0):
2893  """
2894 Returns an array of z-scores the shape of scores (e.g., [x,y]), compared to
2895 array passed to compare (e.g., [time,x,y]). Assumes collapsing over dim 0
2896 of the compare array.
2897 
2898 Usage: azs(scores, compare, dimension=0)
2899 """
2900  mns = amean(compare,dimension)
2901  sstd = asamplestdev(compare,0)
2902  return (scores - mns) / sstd
2903 
2904 
2905 #####################################
2906 ####### ATRIMMING FUNCTIONS #######
2907 #####################################
2908 
2909 ## deleted around() as it's in numpy now
2910 
2911  def athreshold(a,threshmin=None,threshmax=None,newval=0):
2912  """
2913 Like Numeric.clip() except that values <threshmid or >threshmax are replaced
2914 by newval instead of by threshmin/threshmax (respectively).
2915 
2916 Usage: athreshold(a,threshmin=None,threshmax=None,newval=0)
2917 Returns: a, with values <threshmin or >threshmax replaced with newval
2918 """
2919  mask = N.zeros(a.shape)
2920  if threshmin <> None:
2921  mask = mask + N.where(a<threshmin,1,0)
2922  if threshmax <> None:
2923  mask = mask + N.where(a>threshmax,1,0)
2924  mask = N.clip(mask,0,1)
2925  return N.where(mask,newval,a)
2926 
2927 
2928  def atrimboth (a,proportiontocut):
2929  """
2930 Slices off the passed proportion of items from BOTH ends of the passed
2931 array (i.e., with proportiontocut=0.1, slices 'leftmost' 10% AND
2932 'rightmost' 10% of scores. You must pre-sort the array if you want
2933 "proper" trimming. Slices off LESS if proportion results in a
2934 non-integer slice index (i.e., conservatively slices off
2935 proportiontocut).
2936 
2937 Usage: atrimboth (a,proportiontocut)
2938 Returns: trimmed version of array a
2939 """
2940  lowercut = int(proportiontocut*len(a))
2941  uppercut = len(a) - lowercut
2942  return a[lowercut:uppercut]
2943 
2944 
2945  def atrim1 (a,proportiontocut,tail='right'):
2946  """
2947 Slices off the passed proportion of items from ONE end of the passed
2948 array (i.e., if proportiontocut=0.1, slices off 'leftmost' or 'rightmost'
2949 10% of scores). Slices off LESS if proportion results in a non-integer
2950 slice index (i.e., conservatively slices off proportiontocut).
2951 
2952 Usage: atrim1(a,proportiontocut,tail='right') or set tail='left'
2953 Returns: trimmed version of array a
2954 """
2955  if string.lower(tail) == 'right':
2956  lowercut = 0
2957  uppercut = len(a) - int(proportiontocut*len(a))
2958  elif string.lower(tail) == 'left':
2959  lowercut = int(proportiontocut*len(a))
2960  uppercut = len(a)
2961  return a[lowercut:uppercut]
2962 
2963 
2964 #####################################
2965 ##### ACORRELATION FUNCTIONS ######
2966 #####################################
2967 
2968  def acovariance(X):
2969  """
2970 Computes the covariance matrix of a matrix X. Requires a 2D matrix input.
2971 
2972 Usage: acovariance(X)
2973 Returns: covariance matrix of X
2974 """
2975  if len(X.shape) <> 2:
2976  raise TypeError, "acovariance requires 2D matrices"
2977  n = X.shape[0]
2978  mX = amean(X,0)
2979  return N.dot(N.transpose(X),X) / float(n) - N.multiply.outer(mX,mX)
2980 
2981 
2983  """
2984 Computes the correlation matrix of a matrix X. Requires a 2D matrix input.
2985 
2986 Usage: acorrelation(X)
2987 Returns: correlation matrix of X
2988 """
2989  C = acovariance(X)
2990  V = N.diagonal(C)
2991  return C / N.sqrt(N.multiply.outer(V,V))
2992 
2993 
2994  def apaired(x,y):
2995  """
2996 Interactively determines the type of data in x and y, and then runs the
2997 appropriated statistic for paired group data.
2998 
2999 Usage: apaired(x,y) x,y = the two arrays of values to be compared
3000 Returns: appropriate statistic name, value, and probability
3001 """
3002  samples = ''
3003  while samples not in ['i','r','I','R','c','C']:
3004  print '\nIndependent or related samples, or correlation (i,r,c): ',
3005  samples = raw_input()
3006 
3007  if samples in ['i','I','r','R']:
3008  print '\nComparing variances ...',
3009 # USE O'BRIEN'S TEST FOR HOMOGENEITY OF VARIANCE, Maxwell & delaney, p.112
3010  r = obrientransform(x,y)
3011  f,p = F_oneway(pstat.colex(r,0),pstat.colex(r,1))
3012  if p<0.05:
3013  vartype='unequal, p='+str(round(p,4))
3014  else:
3015  vartype='equal'
3016  print vartype
3017  if samples in ['i','I']:
3018  if vartype[0]=='e':
3019  t,p = ttest_ind(x,y,None,0)
3020  print '\nIndependent samples t-test: ', round(t,4),round(p,4)
3021  else:
3022  if len(x)>20 or len(y)>20:
3023  z,p = ranksums(x,y)
3024  print '\nRank Sums test (NONparametric, n>20): ', round(z,4),round(p,4)
3025  else:
3026  u,p = mannwhitneyu(x,y)
3027  print '\nMann-Whitney U-test (NONparametric, ns<20): ', round(u,4),round(p,4)
3028 
3029  else: # RELATED SAMPLES
3030  if vartype[0]=='e':
3031  t,p = ttest_rel(x,y,0)
3032  print '\nRelated samples t-test: ', round(t,4),round(p,4)
3033  else:
3034  t,p = ranksums(x,y)
3035  print '\nWilcoxon T-test (NONparametric): ', round(t,4),round(p,4)
3036  else: # CORRELATION ANALYSIS
3037  corrtype = ''
3038  while corrtype not in ['c','C','r','R','d','D']:
3039  print '\nIs the data Continuous, Ranked, or Dichotomous (c,r,d): ',
3040  corrtype = raw_input()
3041  if corrtype in ['c','C']:
3042  m,b,r,p,see = linregress(x,y)
3043  print '\nLinear regression for continuous variables ...'
3044  lol = [['Slope','Intercept','r','Prob','SEestimate'],[round(m,4),round(b,4),round(r,4),round(p,4),round(see,4)]]
3045  pstat.printcc(lol)
3046  elif corrtype in ['r','R']:
3047  r,p = spearmanr(x,y)
3048  print '\nCorrelation for ranked variables ...'
3049  print "Spearman's r: ",round(r,4),round(p,4)
3050  else: # DICHOTOMOUS
3051  r,p = pointbiserialr(x,y)
3052  print '\nAssuming x contains a dichotomous variable ...'
3053  print 'Point Biserial r: ',round(r,4),round(p,4)
3054  print '\n\n'
3055  return None
3056 
3057 
3058  def dices(x,y):
3059  """
3060 Calculates Dice's coefficient ... (2*number of common terms)/(number of terms in x +
3061 number of terms in y). Returns a value between 0 (orthogonal) and 1.
3062 
3063 Usage: dices(x,y)
3064 """
3065  import sets
3066  x = sets.Set(x)
3067  y = sets.Set(y)
3068  common = len(x.intersection(y))
3069  total = float(len(x) + len(y))
3070  return 2*common/total
3071 
3072 
3073  def icc(x,y=None,verbose=0):
3074  """
3075 Calculates intraclass correlation coefficients using simple, Type I sums of squares.
3076 If only one variable is passed, assumed it's an Nx2 matrix
3077 
3078 Usage: icc(x,y=None,verbose=0)
3079 Returns: icc rho, prob ####PROB IS A GUESS BASED ON PEARSON
3080 """
3081  TINY = 1.0e-20
3082  if y:
3083  all = N.concatenate([x,y],0)
3084  else:
3085  all = x+0
3086  x = all[:,0]
3087  y = all[:,1]
3088  totalss = ass(all-mean(all))
3089  pairmeans = (x+y)/2.
3090  withinss = ass(x-pairmeans) + ass(y-pairmeans)
3091  withindf = float(len(x))
3092  betwdf = float(len(x)-1)
3093  withinms = withinss / withindf
3094  betweenms = (totalss-withinss) / betwdf
3095  rho = (betweenms-withinms)/(withinms+betweenms)
3096  t = rho*math.sqrt(betwdf/((1.0-rho+TINY)*(1.0+rho+TINY)))
3097  prob = abetai(0.5*betwdf,0.5,betwdf/(betwdf+t*t),verbose)
3098  return rho, prob
3099 
3100 
3101  def alincc(x,y):
3102  """
3103 Calculates Lin's concordance correlation coefficient.
3104 
3105 Usage: alincc(x,y) where x, y are equal-length arrays
3106 Returns: Lin's CC
3107 """
3108  x = N.ravel(x)
3109  y = N.ravel(y)
3110  covar = acov(x,y)*(len(x)-1)/float(len(x)) # correct denom to n
3111  xvar = avar(x)*(len(x)-1)/float(len(x)) # correct denom to n
3112  yvar = avar(y)*(len(y)-1)/float(len(y)) # correct denom to n
3113  lincc = (2 * covar) / ((xvar+yvar) +((amean(x)-amean(y))**2))
3114  return lincc
3115 
3116 
3117  def apearsonr(x,y,verbose=1):
3118  """
3119 Calculates a Pearson correlation coefficient and returns p. Taken
3120 from Heiman's Basic Statistics for the Behav. Sci (2nd), p.195.
3121 
3122 Usage: apearsonr(x,y,verbose=1) where x,y are equal length arrays
3123 Returns: Pearson's r, two-tailed p-value
3124 """
3125  TINY = 1.0e-20
3126  n = len(x)
3127  xmean = amean(x)
3128  ymean = amean(y)
3129  r_num = n*(N.add.reduce(x*y)) - N.add.reduce(x)*N.add.reduce(y)
3130  r_den = math.sqrt((n*ass(x) - asquare_of_sums(x))*(n*ass(y)-asquare_of_sums(y)))
3131  r = (r_num / r_den)
3132  df = n-2
3133  t = r*math.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY)))
3134  prob = abetai(0.5*df,0.5,df/(df+t*t),verbose)
3135  return r,prob
3136 
3137 
3138  def aspearmanr(x,y):
3139  """
3140 Calculates a Spearman rank-order correlation coefficient. Taken
3141 from Heiman's Basic Statistics for the Behav. Sci (1st), p.192.
3142 
3143 Usage: aspearmanr(x,y) where x,y are equal-length arrays
3144 Returns: Spearman's r, two-tailed p-value
3145 """
3146  TINY = 1e-30
3147  n = len(x)
3148  rankx = rankdata(x)
3149  ranky = rankdata(y)
3150  dsq = N.add.reduce((rankx-ranky)**2)
3151  rs = 1 - 6*dsq / float(n*(n**2-1))
3152  t = rs * math.sqrt((n-2) / ((rs+1.0)*(1.0-rs)))
3153  df = n-2
3154  probrs = abetai(0.5*df,0.5,df/(df+t*t))
3155 # probability values for rs are from part 2 of the spearman function in
3156 # Numerical Recipies, p.510. They close to tables, but not exact.(?)
3157  return rs, probrs
3158 
3159 
3161  """
3162 Calculates a point-biserial correlation coefficient and the associated
3163 probability value. Taken from Heiman's Basic Statistics for the Behav.
3164 Sci (1st), p.194.
3165 
3166 Usage: apointbiserialr(x,y) where x,y are equal length arrays
3167 Returns: Point-biserial r, two-tailed p-value
3168 """
3169  TINY = 1e-30
3170  categories = pstat.aunique(x)
3171  data = pstat.aabut(x,y)
3172  if len(categories) <> 2:
3173  raise ValueError, "Exactly 2 categories required (in x) for pointbiserialr()."
3174  else: # there are 2 categories, continue
3175  codemap = pstat.aabut(categories,N.arange(2))
3176  recoded = pstat.arecode(data,codemap,0)
3177  x = pstat.alinexand(data,0,categories[0])
3178  y = pstat.alinexand(data,0,categories[1])
3179  xmean = amean(pstat.acolex(x,1))
3180  ymean = amean(pstat.acolex(y,1))
3181  n = len(data)
3182  adjust = math.sqrt((len(x)/float(n))*(len(y)/float(n)))
3183  rpb = (ymean - xmean)/asamplestdev(pstat.acolex(data,1))*adjust
3184  df = n-2
3185  t = rpb*math.sqrt(df/((1.0-rpb+TINY)*(1.0+rpb+TINY)))
3186  prob = abetai(0.5*df,0.5,df/(df+t*t))
3187  return rpb, prob
3188 
3189 
3190  def akendalltau(x,y):
3191  """
3192 Calculates Kendall's tau ... correlation of ordinal data. Adapted
3193 from function kendl1 in Numerical Recipies. Needs good test-cases.@@@
3194 
3195 Usage: akendalltau(x,y)
3196 Returns: Kendall's tau, two-tailed p-value
3197 """
3198  n1 = 0
3199  n2 = 0
3200  iss = 0
3201  for j in range(len(x)-1):
3202  for k in range(j,len(y)):
3203  a1 = x[j] - x[k]
3204  a2 = y[j] - y[k]
3205  aa = a1 * a2
3206  if (aa): # neither array has a tie
3207  n1 = n1 + 1
3208  n2 = n2 + 1
3209  if aa > 0:
3210  iss = iss + 1
3211  else:
3212  iss = iss -1
3213  else:
3214  if (a1):
3215  n1 = n1 + 1
3216  else:
3217  n2 = n2 + 1
3218  tau = iss / math.sqrt(n1*n2)
3219  svar = (4.0*len(x)+10.0) / (9.0*len(x)*(len(x)-1))
3220  z = tau / math.sqrt(svar)
3221  prob = erfcc(abs(z)/1.4142136)
3222  return tau, prob
3223 
3224 
3225  def alinregress(*args):
3226  """
3227 Calculates a regression line on two arrays, x and y, corresponding to x,y
3228 pairs. If a single 2D array is passed, alinregress finds dim with 2 levels
3229 and splits data into x,y pairs along that dim.
3230 
3231 Usage: alinregress(*args) args=2 equal-length arrays, or one 2D array
3232 Returns: slope, intercept, r, two-tailed prob, sterr-of-the-estimate, n
3233 """
3234  TINY = 1.0e-20
3235  if len(args) == 1: # more than 1D array?
3236  args = args[0]
3237  if len(args) == 2:
3238  x = args[0]
3239  y = args[1]
3240  else:
3241  x = args[:,0]
3242  y = args[:,1]
3243  else:
3244  x = args[0]
3245  y = args[1]
3246  n = len(x)
3247  xmean = amean(x)
3248  ymean = amean(y)
3249  r_num = n*(N.add.reduce(x*y)) - N.add.reduce(x)*N.add.reduce(y)
3250  r_den = math.sqrt((n*ass(x) - asquare_of_sums(x))*(n*ass(y)-asquare_of_sums(y)))
3251  r = r_num / r_den
3252  z = 0.5*math.log((1.0+r+TINY)/(1.0-r+TINY))
3253  df = n-2
3254  t = r*math.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY)))
3255  prob = abetai(0.5*df,0.5,df/(df+t*t))
3256  slope = r_num / (float(n)*ass(x) - asquare_of_sums(x))
3257  intercept = ymean - slope*xmean
3258  sterrest = math.sqrt(1-r*r)*asamplestdev(y)
3259  return slope, intercept, r, prob, sterrest, n
3260 
3261  def amasslinregress(*args):
3262  """
3263 Calculates a regression line on one 1D array (x) and one N-D array (y).
3264 
3265 Returns: slope, intercept, r, two-tailed prob, sterr-of-the-estimate, n
3266 """
3267  TINY = 1.0e-20
3268  if len(args) == 1: # more than 1D array?
3269  args = args[0]
3270  if len(args) == 2:
3271  x = N.ravel(args[0])
3272  y = args[1]
3273  else:
3274  x = N.ravel(args[:,0])
3275  y = args[:,1]
3276  else:
3277  x = args[0]
3278  y = args[1]
3279  x = x.astype(N.float_)
3280  y = y.astype(N.float_)
3281  n = len(x)
3282  xmean = amean(x)
3283  ymean = amean(y,0)
3284  shp = N.ones(len(y.shape))
3285  shp[0] = len(x)
3286  x.shape = shp
3287  print x.shape, y.shape
3288  r_num = n*(N.add.reduce(x*y,0)) - N.add.reduce(x)*N.add.reduce(y,0)
3289  r_den = N.sqrt((n*ass(x) - asquare_of_sums(x))*(n*ass(y,0)-asquare_of_sums(y,0)))
3290  zerodivproblem = N.equal(r_den,0)
3291  r_den = N.where(zerodivproblem,1,r_den) # avoid zero-division in 1st place
3292  r = r_num / r_den # need to do this nicely for matrix division
3293  r = N.where(zerodivproblem,0.0,r)
3294  z = 0.5*N.log((1.0+r+TINY)/(1.0-r+TINY))
3295  df = n-2
3296  t = r*N.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY)))
3297  prob = abetai(0.5*df,0.5,df/(df+t*t))
3298 
3299  ss = float(n)*ass(x)-asquare_of_sums(x)
3300  s_den = N.where(ss==0,1,ss) # avoid zero-division in 1st place
3301  slope = r_num / s_den
3302  intercept = ymean - slope*xmean
3303  sterrest = N.sqrt(1-r*r)*asamplestdev(y,0)
3304  return slope, intercept, r, prob, sterrest, n
3305 
3306 
3307 #####################################
3308 ##### AINFERENTIAL STATISTICS #####
3309 #####################################
3310 
3311  def attest_1samp(a,popmean,printit=0,name='Sample',writemode='a'):
3312  """
3313 Calculates the t-obtained for the independent samples T-test on ONE group
3314 of scores a, given a population mean. If printit=1, results are printed
3315 to the screen. If printit='filename', the results are output to 'filename'
3316 using the given writemode (default=append). Returns t-value, and prob.
3317 
3318 Usage: attest_1samp(a,popmean,Name='Sample',printit=0,writemode='a')
3319 Returns: t-value, two-tailed prob
3320 """
3321  if type(a) != N.ndarray:
3322  a = N.array(a)
3323  x = amean(a)
3324  v = avar(a)
3325  n = len(a)
3326  df = n-1
3327  svar = ((n-1)*v) / float(df)
3328  t = (x-popmean)/math.sqrt(svar*(1.0/n))
3329  prob = abetai(0.5*df,0.5,df/(df+t*t))
3330 
3331  if printit <> 0:
3332  statname = 'Single-sample T-test.'
3333  outputpairedstats(printit,writemode,
3334  'Population','--',popmean,0,0,0,
3335  name,n,x,v,N.minimum.reduce(N.ravel(a)),
3336  N.maximum.reduce(N.ravel(a)),
3337  statname,t,prob)
3338  return t,prob
3339 
3340 
3341  def attest_ind (a, b, dimension=None, printit=0, name1='Samp1', name2='Samp2',writemode='a'):
3342  """
3343 Calculates the t-obtained T-test on TWO INDEPENDENT samples of scores
3344 a, and b. From Numerical Recipies, p.483. If printit=1, results are
3345 printed to the screen. If printit='filename', the results are output
3346 to 'filename' using the given writemode (default=append). Dimension
3347 can equal None (ravel array first), or an integer (the dimension over
3348 which to operate on a and b).
3349 
3350 Usage: attest_ind (a,b,dimension=None,printit=0,
3351  Name1='Samp1',Name2='Samp2',writemode='a')
3352 Returns: t-value, two-tailed p-value
3353 """
3354  if dimension == None:
3355  a = N.ravel(a)
3356  b = N.ravel(b)
3357  dimension = 0
3358  x1 = amean(a,dimension)
3359  x2 = amean(b,dimension)
3360  v1 = avar(a,dimension)
3361  v2 = avar(b,dimension)
3362  n1 = a.shape[dimension]
3363  n2 = b.shape[dimension]
3364  df = n1+n2-2
3365  svar = ((n1-1)*v1+(n2-1)*v2) / float(df)
3366  zerodivproblem = N.equal(svar,0)
3367  svar = N.where(zerodivproblem,1,svar) # avoid zero-division in 1st place
3368  t = (x1-x2)/N.sqrt(svar*(1.0/n1 + 1.0/n2)) # N-D COMPUTATION HERE!!!!!!
3369  t = N.where(zerodivproblem,1.0,t) # replace NaN/wrong t-values with 1.0
3370  probs = abetai(0.5*df,0.5,float(df)/(df+t*t))
3371 
3372  if type(t) == N.ndarray:
3373  probs = N.reshape(probs,t.shape)
3374  if probs.shape == (1,):
3375  probs = probs[0]
3376 
3377  if printit <> 0:
3378  if type(t) == N.ndarray:
3379  t = t[0]
3380  if type(probs) == N.ndarray:
3381  probs = probs[0]
3382  statname = 'Independent samples T-test.'
3383  outputpairedstats(printit,writemode,
3384  name1,n1,x1,v1,N.minimum.reduce(N.ravel(a)),
3385  N.maximum.reduce(N.ravel(a)),
3386  name2,n2,x2,v2,N.minimum.reduce(N.ravel(b)),
3387  N.maximum.reduce(N.ravel(b)),
3388  statname,t,probs)
3389  return
3390  return t, probs
3391 
3392  def ap2t(pval,df):
3393  """
3394 Tries to compute a t-value from a p-value (or pval array) and associated df.
3395 SLOW for large numbers of elements(!) as it re-computes p-values 20 times
3396 (smaller step-sizes) at which point it decides it's done. Keeps the signs
3397 of the input array. Returns 1000 (or -1000) if t>100.
3398 
3399 Usage: ap2t(pval,df)
3400 Returns: an array of t-values with the shape of pval
3401  """
3402  pval = N.array(pval)
3403  signs = N.sign(pval)
3404  pval = abs(pval)
3405  t = N.ones(pval.shape,N.float_)*50
3406  step = N.ones(pval.shape,N.float_)*25
3407  print "Initial ap2t() prob calc"
3408  prob = abetai(0.5*df,0.5,float(df)/(df+t*t))
3409  print 'ap2t() iter: ',
3410  for i in range(10):
3411  print i,' ',
3412  t = N.where(pval<prob,t+step,t-step)
3413  prob = abetai(0.5*df,0.5,float(df)/(df+t*t))
3414  step = step/2
3415  print
3416  # since this is an ugly hack, we get ugly boundaries
3417  t = N.where(t>99.9,1000,t) # hit upper-boundary
3418  t = t+signs
3419  return t #, prob, pval
3420 
3421 
3422  def attest_rel (a,b,dimension=None,printit=0,name1='Samp1',name2='Samp2',writemode='a'):
3423  """
3424 Calculates the t-obtained T-test on TWO RELATED samples of scores, a
3425 and b. From Numerical Recipies, p.483. If printit=1, results are
3426 printed to the screen. If printit='filename', the results are output
3427 to 'filename' using the given writemode (default=append). Dimension
3428 can equal None (ravel array first), or an integer (the dimension over
3429 which to operate on a and b).
3430 
3431 Usage: attest_rel(a,b,dimension=None,printit=0,
3432  name1='Samp1',name2='Samp2',writemode='a')
3433 Returns: t-value, two-tailed p-value
3434 """
3435  if dimension == None:
3436  a = N.ravel(a)
3437  b = N.ravel(b)
3438  dimension = 0
3439  if len(a)<>len(b):
3440  raise ValueError, 'Unequal length arrays.'
3441  x1 = amean(a,dimension)
3442  x2 = amean(b,dimension)
3443  v1 = avar(a,dimension)
3444  v2 = avar(b,dimension)
3445  n = a.shape[dimension]
3446  df = float(n-1)
3447  d = (a-b).astype('d')
3448 
3449  denom = N.sqrt((n*N.add.reduce(d*d,dimension) - N.add.reduce(d,dimension)**2) /df)
3450  zerodivproblem = N.equal(denom,0)
3451  denom = N.where(zerodivproblem,1,denom) # avoid zero-division in 1st place
3452  t = N.add.reduce(d,dimension) / denom # N-D COMPUTATION HERE!!!!!!
3453  t = N.where(zerodivproblem,1.0,t) # replace NaN/wrong t-values with 1.0
3454  probs = abetai(0.5*df,0.5,float(df)/(df+t*t))
3455  if type(t) == N.ndarray:
3456  probs = N.reshape(probs,t.shape)
3457  if probs.shape == (1,):
3458  probs = probs[0]
3459 
3460  if printit <> 0:
3461  statname = 'Related samples T-test.'
3462  outputpairedstats(printit,writemode,
3463  name1,n,x1,v1,N.minimum.reduce(N.ravel(a)),
3464  N.maximum.reduce(N.ravel(a)),
3465  name2,n,x2,v2,N.minimum.reduce(N.ravel(b)),
3466  N.maximum.reduce(N.ravel(b)),
3467  statname,t,probs)
3468  return
3469  return t, probs
3470 
3471 
3472  def achisquare(f_obs,f_exp=None):
3473  """
3474 Calculates a one-way chi square for array of observed frequencies and returns
3475 the result. If no expected frequencies are given, the total N is assumed to
3476 be equally distributed across all groups.
3477 @@@NOT RIGHT??
3478 
3479 Usage: achisquare(f_obs, f_exp=None) f_obs = array of observed cell freq.
3480 Returns: chisquare-statistic, associated p-value
3481 """
3482 
3483  k = len(f_obs)
3484  if f_exp == None:
3485  f_exp = N.array([sum(f_obs)/float(k)] * len(f_obs),N.float_)
3486  f_exp = f_exp.astype(N.float_)
3487  chisq = N.add.reduce((f_obs-f_exp)**2 / f_exp)
3488  return chisq, achisqprob(chisq, k-1)
3489 
3490 
3491  def aks_2samp (data1,data2):
3492  """
3493 Computes the Kolmogorov-Smirnof statistic on 2 samples. Modified from
3494 Numerical Recipies in C, page 493. Returns KS D-value, prob. Not ufunc-
3495 like.
3496 
3497 Usage: aks_2samp(data1,data2) where data1 and data2 are 1D arrays
3498 Returns: KS D-value, p-value
3499 """
3500  j1 = 0 # N.zeros(data1.shape[1:]) TRIED TO MAKE THIS UFUNC-LIKE
3501  j2 = 0 # N.zeros(data2.shape[1:])
3502  fn1 = 0.0 # N.zeros(data1.shape[1:],N.float_)
3503  fn2 = 0.0 # N.zeros(data2.shape[1:],N.float_)
3504  n1 = data1.shape[0]
3505  n2 = data2.shape[0]
3506  en1 = n1*1
3507  en2 = n2*1
3508  d = N.zeros(data1.shape[1:],N.float_)
3509  data1 = N.sort(data1,0)
3510  data2 = N.sort(data2,0)
3511  while j1 < n1 and j2 < n2:
3512  d1=data1[j1]
3513  d2=data2[j2]
3514  if d1 <= d2:
3515  fn1 = (j1)/float(en1)
3516  j1 = j1 + 1
3517  if d2 <= d1:
3518  fn2 = (j2)/float(en2)
3519  j2 = j2 + 1
3520  dt = (fn2-fn1)
3521  if abs(dt) > abs(d):
3522  d = dt
3523 # try:
3524  en = math.sqrt(en1*en2/float(en1+en2))
3525  prob = aksprob((en+0.12+0.11/en)*N.fabs(d))
3526 # except:
3527 # prob = 1.0
3528  return d, prob
3529 
3530 
3531  def amannwhitneyu(x,y):
3532  """
3533 Calculates a Mann-Whitney U statistic on the provided scores and
3534 returns the result. Use only when the n in each condition is < 20 and
3535 you have 2 independent samples of ranks. REMEMBER: Mann-Whitney U is
3536 significant if the u-obtained is LESS THAN or equal to the critical
3537 value of U.
3538 
3539 Usage: amannwhitneyu(x,y) where x,y are arrays of values for 2 conditions
3540 Returns: u-statistic, one-tailed p-value (i.e., p(z(U)))
3541 """
3542  n1 = len(x)
3543  n2 = len(y)
3544  ranked = rankdata(N.concatenate((x,y)))
3545  rankx = ranked[0:n1] # get the x-ranks
3546  ranky = ranked[n1:] # the rest are y-ranks
3547  u1 = n1*n2 + (n1*(n1+1))/2.0 - sum(rankx) # calc U for x
3548  u2 = n1*n2 - u1 # remainder is U for y
3549  bigu = max(u1,u2)
3550  smallu = min(u1,u2)
3551  T = math.sqrt(tiecorrect(ranked)) # correction factor for tied scores
3552  if T == 0:
3553  raise ValueError, 'All numbers are identical in amannwhitneyu'
3554  sd = math.sqrt(T*n1*n2*(n1+n2+1)/12.0)
3555  z = abs((bigu-n1*n2/2.0) / sd) # normal approximation for prob calc
3556  return smallu, 1.0 - azprob(z)
3557 
3558 
3559  def atiecorrect(rankvals):
3560  """
3561 Tie-corrector for ties in Mann Whitney U and Kruskal Wallis H tests.
3562 See Siegel, S. (1956) Nonparametric Statistics for the Behavioral
3563 Sciences. New York: McGraw-Hill. Code adapted from |Stat rankind.c
3564 code.
3565 
3566 Usage: atiecorrect(rankvals)
3567 Returns: T correction factor for U or H
3568 """
3569  sorted,posn = ashellsort(N.array(rankvals))
3570  n = len(sorted)
3571  T = 0.0
3572  i = 0
3573  while (i<n-1):
3574  if sorted[i] == sorted[i+1]:
3575  nties = 1
3576  while (i<n-1) and (sorted[i] == sorted[i+1]):
3577  nties = nties +1
3578  i = i +1
3579  T = T + nties**3 - nties
3580  i = i+1
3581  T = T / float(n**3-n)
3582  return 1.0 - T
3583 
3584 
3585  def aranksums(x,y):
3586  """
3587 Calculates the rank sums statistic on the provided scores and returns
3588 the result.
3589 
3590 Usage: aranksums(x,y) where x,y are arrays of values for 2 conditions
3591 Returns: z-statistic, two-tailed p-value
3592 """
3593  n1 = len(x)
3594  n2 = len(y)
3595  alldata = N.concatenate((x,y))
3596  ranked = arankdata(alldata)
3597  x = ranked[:n1]
3598  y = ranked[n1:]
3599  s = sum(x)
3600  expected = n1*(n1+n2+1) / 2.0
3601  z = (s - expected) / math.sqrt(n1*n2*(n1+n2+1)/12.0)
3602  prob = 2*(1.0 - azprob(abs(z)))
3603  return z, prob
3604 
3605 
3606  def awilcoxont(x,y):
3607  """
3608 Calculates the Wilcoxon T-test for related samples and returns the
3609 result. A non-parametric T-test.
3610 
3611 Usage: awilcoxont(x,y) where x,y are equal-length arrays for 2 conditions
3612 Returns: t-statistic, two-tailed p-value
3613 """
3614  if len(x) <> len(y):
3615  raise ValueError, 'Unequal N in awilcoxont. Aborting.'
3616  d = x-y
3617  d = N.compress(N.not_equal(d,0),d) # Keep all non-zero differences
3618  count = len(d)
3619  absd = abs(d)
3620  absranked = arankdata(absd)
3621  r_plus = 0.0
3622  r_minus = 0.0
3623  for i in range(len(absd)):
3624  if d[i] < 0:
3625  r_minus = r_minus + absranked[i]
3626  else:
3627  r_plus = r_plus + absranked[i]
3628  wt = min(r_plus, r_minus)
3629  mn = count * (count+1) * 0.25
3630  se = math.sqrt(count*(count+1)*(2.0*count+1.0)/24.0)
3631  z = math.fabs(wt-mn) / se
3632  z = math.fabs(wt-mn) / se
3633  prob = 2*(1.0 -zprob(abs(z)))
3634  return wt, prob
3635 
3636 
3637  def akruskalwallish(*args):
3638  """
3639 The Kruskal-Wallis H-test is a non-parametric ANOVA for 3 or more
3640 groups, requiring at least 5 subjects in each group. This function
3641 calculates the Kruskal-Wallis H and associated p-value for 3 or more
3642 independent samples.
3643 
3644 Usage: akruskalwallish(*args) args are separate arrays for 3+ conditions
3645 Returns: H-statistic (corrected for ties), associated p-value
3646 """
3647  assert len(args) == 3, "Need at least 3 groups in stats.akruskalwallish()"
3648  args = list(args)
3649  n = [0]*len(args)
3650  n = map(len,args)
3651  all = []
3652  for i in range(len(args)):
3653  all = all + args[i].tolist()
3654  ranked = rankdata(all)
3655  T = tiecorrect(ranked)
3656  for i in range(len(args)):
3657  args[i] = ranked[0:n[i]]
3658  del ranked[0:n[i]]
3659  rsums = []
3660  for i in range(len(args)):
3661  rsums.append(sum(args[i])**2)
3662  rsums[i] = rsums[i] / float(n[i])
3663  ssbn = sum(rsums)
3664  totaln = sum(n)
3665  h = 12.0 / (totaln*(totaln+1)) * ssbn - 3*(totaln+1)
3666  df = len(args) - 1
3667  if T == 0:
3668  raise ValueError, 'All numbers are identical in akruskalwallish'
3669  h = h / float(T)
3670  return h, chisqprob(h,df)
3671 
3672 
3674  """
3675 Friedman Chi-Square is a non-parametric, one-way within-subjects
3676 ANOVA. This function calculates the Friedman Chi-square test for
3677 repeated measures and returns the result, along with the associated
3678 probability value. It assumes 3 or more repeated measures. Only 3
3679 levels requires a minimum of 10 subjects in the study. Four levels
3680 requires 5 subjects per level(??).
3681 
3682 Usage: afriedmanchisquare(*args) args are separate arrays for 2+ conditions
3683 Returns: chi-square statistic, associated p-value
3684 """
3685  k = len(args)
3686  if k < 3:
3687  raise ValueError, '\nLess than 3 levels. Friedman test not appropriate.\n'
3688  n = len(args[0])
3689  data = apply(pstat.aabut,args)
3690  data = data.astype(N.float_)
3691  for i in range(len(data)):
3692  data[i] = arankdata(data[i])
3693  ssbn = asum(asum(args,1)**2)
3694  chisq = 12.0 / (k*n*(k+1)) * ssbn - 3*n*(k+1)
3695  return chisq, achisqprob(chisq,k-1)
3696 
3697 
3698 #####################################
3699 #### APROBABILITY CALCULATIONS ####
3700 #####################################
3701 
3702  def achisqprob(chisq,df):
3703  """
3704 Returns the (1-tail) probability value associated with the provided chi-square
3705 value and df. Heavily modified from chisq.c in Gary Perlman's |Stat. Can
3706 handle multiple dimensions.
3707 
3708 Usage: achisqprob(chisq,df) chisq=chisquare stat., df=degrees of freedom
3709 """
3710  BIG = 200.0
3711  def ex(x):
3712  BIG = 200.0
3713  exponents = N.where(N.less(x,-BIG),-BIG,x)
3714  return N.exp(exponents)
3715 
3716  if type(chisq) == N.ndarray:
3717  arrayflag = 1
3718  else:
3719  arrayflag = 0
3720  chisq = N.array([chisq])
3721  if df < 1:
3722  return N.ones(chisq.shape,N.float)
3723  probs = N.zeros(chisq.shape,N.float_)
3724  probs = N.where(N.less_equal(chisq,0),1.0,probs) # set prob=1 for chisq<0
3725  a = 0.5 * chisq
3726  if df > 1:
3727  y = ex(-a)
3728  if df%2 == 0:
3729  even = 1
3730  s = y*1
3731  s2 = s*1
3732  else:
3733  even = 0
3734  s = 2.0 * azprob(-N.sqrt(chisq))
3735  s2 = s*1
3736  if (df > 2):
3737  chisq = 0.5 * (df - 1.0)
3738  if even:
3739  z = N.ones(probs.shape,N.float_)
3740  else:
3741  z = 0.5 *N.ones(probs.shape,N.float_)
3742  if even:
3743  e = N.zeros(probs.shape,N.float_)
3744  else:
3745  e = N.log(N.sqrt(N.pi)) *N.ones(probs.shape,N.float_)
3746  c = N.log(a)
3747  mask = N.zeros(probs.shape)
3748  a_big = N.greater(a,BIG)
3749  a_big_frozen = -1 *N.ones(probs.shape,N.float_)
3750  totalelements = N.multiply.reduce(N.array(probs.shape))
3751  while asum(mask)<>totalelements:
3752  e = N.log(z) + e
3753  s = s + ex(c*z-a-e)
3754  z = z + 1.0
3755 # print z, e, s
3756  newmask = N.greater(z,chisq)
3757  a_big_frozen = N.where(newmask*N.equal(mask,0)*a_big, s, a_big_frozen)
3758  mask = N.clip(newmask+mask,0,1)
3759  if even:
3760  z = N.ones(probs.shape,N.float_)
3761  e = N.ones(probs.shape,N.float_)
3762  else:
3763  z = 0.5 *N.ones(probs.shape,N.float_)
3764  e = 1.0 / N.sqrt(N.pi) / N.sqrt(a) * N.ones(probs.shape,N.float_)
3765  c = 0.0
3766  mask = N.zeros(probs.shape)
3767  a_notbig_frozen = -1 *N.ones(probs.shape,N.float_)
3768  while asum(mask)<>totalelements:
3769  e = e * (a/z.astype(N.float_))
3770  c = c + e
3771  z = z + 1.0
3772 # print '#2', z, e, c, s, c*y+s2
3773  newmask = N.greater(z,chisq)
3774  a_notbig_frozen = N.where(newmask*N.equal(mask,0)*(1-a_big),
3775  c*y+s2, a_notbig_frozen)
3776  mask = N.clip(newmask+mask,0,1)
3777  probs = N.where(N.equal(probs,1),1,
3778  N.where(N.greater(a,BIG),a_big_frozen,a_notbig_frozen))
3779  return probs
3780  else:
3781  return s
3782 
3783 
3784  def aerfcc(x):
3785  """
3786 Returns the complementary error function erfc(x) with fractional error
3787 everywhere less than 1.2e-7. Adapted from Numerical Recipies. Can
3788 handle multiple dimensions.
3789 
3790 Usage: aerfcc(x)
3791 """
3792  z = abs(x)
3793  t = 1.0 / (1.0+0.5*z)
3794  ans = t * N.exp(-z*z-1.26551223 + t*(1.00002368+t*(0.37409196+t*(0.09678418+t*(-0.18628806+t*(0.27886807+t*(-1.13520398+t*(1.48851587+t*(-0.82215223+t*0.17087277)))))))))
3795  return N.where(N.greater_equal(x,0), ans, 2.0-ans)
3796 
3797 
3798  def azprob(z):
3799  """
3800 Returns the area under the normal curve 'to the left of' the given z value.
3801 Thus,
3802  for z<0, zprob(z) = 1-tail probability
3803  for z>0, 1.0-zprob(z) = 1-tail probability
3804  for any z, 2.0*(1.0-zprob(abs(z))) = 2-tail probability
3805 Adapted from z.c in Gary Perlman's |Stat. Can handle multiple dimensions.
3806 
3807 Usage: azprob(z) where z is a z-value
3808 """
3809  def yfunc(y):
3810  x = (((((((((((((-0.000045255659 * y
3811  +0.000152529290) * y -0.000019538132) * y
3812  -0.000676904986) * y +0.001390604284) * y
3813  -0.000794620820) * y -0.002034254874) * y
3814  +0.006549791214) * y -0.010557625006) * y
3815  +0.011630447319) * y -0.009279453341) * y
3816  +0.005353579108) * y -0.002141268741) * y
3817  +0.000535310849) * y +0.999936657524
3818  return x
3819 
3820  def wfunc(w):
3821  x = ((((((((0.000124818987 * w
3822  -0.001075204047) * w +0.005198775019) * w
3823  -0.019198292004) * w +0.059054035642) * w
3824  -0.151968751364) * w +0.319152932694) * w
3825  -0.531923007300) * w +0.797884560593) * N.sqrt(w) * 2.0
3826  return x
3827 
3828  Z_MAX = 6.0 # maximum meaningful z-value
3829  x = N.zeros(z.shape,N.float_) # initialize
3830  y = 0.5 * N.fabs(z)
3831  x = N.where(N.less(y,1.0),wfunc(y*y),yfunc(y-2.0)) # get x's
3832  x = N.where(N.greater(y,Z_MAX*0.5),1.0,x) # kill those with big Z
3833  prob = N.where(N.greater(z,0),(x+1)*0.5,(1-x)*0.5)
3834  return prob
3835 
3836 
3837  def aksprob(alam):
3838  """
3839 Returns the probability value for a K-S statistic computed via ks_2samp.
3840 Adapted from Numerical Recipies. Can handle multiple dimensions.
3841 
3842 Usage: aksprob(alam)
3843 """
3844  if type(alam) == N.ndarray:
3845  frozen = -1 *N.ones(alam.shape,N.float64)
3846  alam = alam.astype(N.float64)
3847  arrayflag = 1
3848  else:
3849  frozen = N.array(-1.)
3850  alam = N.array(alam,N.float64)
3851  arrayflag = 1
3852  mask = N.zeros(alam.shape)
3853  fac = 2.0 *N.ones(alam.shape,N.float_)
3854  sum = N.zeros(alam.shape,N.float_)
3855  termbf = N.zeros(alam.shape,N.float_)
3856  a2 = N.array(-2.0*alam*alam,N.float64)
3857  totalelements = N.multiply.reduce(N.array(mask.shape))
3858  for j in range(1,201):
3859  if asum(mask) == totalelements:
3860  break
3861  exponents = (a2*j*j)
3862  overflowmask = N.less(exponents,-746)
3863  frozen = N.where(overflowmask,0,frozen)
3864  mask = mask+overflowmask
3865  term = fac*N.exp(exponents)
3866  sum = sum + term
3867  newmask = N.where(N.less_equal(abs(term),(0.001*termbf)) +
3868  N.less(abs(term),1.0e-8*sum), 1, 0)
3869  frozen = N.where(newmask*N.equal(mask,0), sum, frozen)
3870  mask = N.clip(mask+newmask,0,1)
3871  fac = -fac
3872  termbf = abs(term)
3873  if arrayflag:
3874  return N.where(N.equal(frozen,-1), 1.0, frozen) # 1.0 if doesn't converge
3875  else:
3876  return N.where(N.equal(frozen,-1), 1.0, frozen)[0] # 1.0 if doesn't converge
3877 
3878 
3879  def afprob (dfnum, dfden, F):
3880  """
3881 Returns the 1-tailed significance level (p-value) of an F statistic
3882 given the degrees of freedom for the numerator (dfR-dfF) and the degrees
3883 of freedom for the denominator (dfF). Can handle multiple dims for F.
3884 
3885 Usage: afprob(dfnum, dfden, F) where usually dfnum=dfbn, dfden=dfwn
3886 """
3887  if type(F) == N.ndarray:
3888  return abetai(0.5*dfden, 0.5*dfnum, dfden/(1.0*dfden+dfnum*F))
3889  else:
3890  return abetai(0.5*dfden, 0.5*dfnum, dfden/float(dfden+dfnum*F))
3891 
3892 
3893  def abetacf(a,b,x,verbose=1):
3894  """
3895 Evaluates the continued fraction form of the incomplete Beta function,
3896 betai. (Adapted from: Numerical Recipies in C.) Can handle multiple
3897 dimensions for x.
3898 
3899 Usage: abetacf(a,b,x,verbose=1)
3900 """
3901  ITMAX = 200
3902  EPS = 3.0e-7
3903 
3904  arrayflag = 1
3905  if type(x) == N.ndarray:
3906  frozen = N.ones(x.shape,N.float_) *-1 #start out w/ -1s, should replace all
3907  else:
3908  arrayflag = 0
3909  frozen = N.array([-1])
3910  x = N.array([x])
3911  mask = N.zeros(x.shape)
3912  bm = az = am = 1.0
3913  qab = a+b
3914  qap = a+1.0
3915  qam = a-1.0
3916  bz = 1.0-qab*x/qap
3917  for i in range(ITMAX+1):
3918  if N.sum(N.ravel(N.equal(frozen,-1)))==0:
3919  break
3920  em = float(i+1)
3921  tem = em + em
3922  d = em*(b-em)*x/((qam+tem)*(a+tem))
3923  ap = az + d*am
3924  bp = bz+d*bm
3925  d = -(a+em)*(qab+em)*x/((qap+tem)*(a+tem))
3926  app = ap+d*az
3927  bpp = bp+d*bz
3928  aold = az*1
3929  am = ap/bpp
3930  bm = bp/bpp
3931  az = app/bpp
3932  bz = 1.0
3933  newmask = N.less(abs(az-aold),EPS*abs(az))
3934  frozen = N.where(newmask*N.equal(mask,0), az, frozen)
3935  mask = N.clip(mask+newmask,0,1)
3936  noconverge = asum(N.equal(frozen,-1))
3937  if noconverge <> 0 and verbose:
3938  print 'a or b too big, or ITMAX too small in Betacf for ',noconverge,' elements'
3939  if arrayflag:
3940  return frozen
3941  else:
3942  return frozen[0]
3943 
3944 
3945  def agammln(xx):
3946  """
3947 Returns the gamma function of xx.
3948  Gamma(z) = Integral(0,infinity) of t^(z-1)exp(-t) dt.
3949 Adapted from: Numerical Recipies in C. Can handle multiple dims ... but
3950 probably doesn't normally have to.
3951 
3952 Usage: agammln(xx)
3953 """
3954  coeff = [76.18009173, -86.50532033, 24.01409822, -1.231739516,
3955  0.120858003e-2, -0.536382e-5]
3956  x = xx - 1.0
3957  tmp = x + 5.5
3958  tmp = tmp - (x+0.5)*N.log(tmp)
3959  ser = 1.0
3960  for j in range(len(coeff)):
3961  x = x + 1
3962  ser = ser + coeff[j]/x
3963  return -tmp + N.log(2.50662827465*ser)
3964 
3965 
3966  def abetai(a,b,x,verbose=1):
3967  """
3968 Returns the incomplete beta function:
3969 
3970  I-sub-x(a,b) = 1/B(a,b)*(Integral(0,x) of t^(a-1)(1-t)^(b-1) dt)
3971 
3972 where a,b>0 and B(a,b) = G(a)*G(b)/(G(a+b)) where G(a) is the gamma
3973 function of a. The continued fraction formulation is implemented
3974 here, using the betacf function. (Adapted from: Numerical Recipies in
3975 C.) Can handle multiple dimensions.
3976 
3977 Usage: abetai(a,b,x,verbose=1)
3978 """
3979  TINY = 1e-15
3980  if type(a) == N.ndarray:
3981  if asum(N.less(x,0)+N.greater(x,1)) <> 0:
3982  raise ValueError, 'Bad x in abetai'
3983  x = N.where(N.equal(x,0),TINY,x)
3984  x = N.where(N.equal(x,1.0),1-TINY,x)
3985 
3986  bt = N.where(N.equal(x,0)+N.equal(x,1), 0, -1)
3987  exponents = ( gammln(a+b)-gammln(a)-gammln(b)+a*N.log(x)+b*
3988  N.log(1.0-x) )
3989  # 746 (below) is the MAX POSSIBLE BEFORE OVERFLOW
3990  exponents = N.where(N.less(exponents,-740),-740,exponents)
3991  bt = N.exp(exponents)
3992  if type(x) == N.ndarray:
3993  ans = N.where(N.less(x,(a+1)/(a+b+2.0)),
3994  bt*abetacf(a,b,x,verbose)/float(a),
3995  1.0-bt*abetacf(b,a,1.0-x,verbose)/float(b))
3996  else:
3997  if x<(a+1)/(a+b+2.0):
3998  ans = bt*abetacf(a,b,x,verbose)/float(a)
3999  else:
4000  ans = 1.0-bt*abetacf(b,a,1.0-x,verbose)/float(b)
4001  return ans
4002 
4003 
4004 #####################################
4005 ####### AANOVA CALCULATIONS #######
4006 #####################################
4007 
4008  import LinearAlgebra, operator
4009  LA = LinearAlgebra
4010 
4011  def aglm(data,para):
4012  """
4013 Calculates a linear model fit ... anova/ancova/lin-regress/t-test/etc. Taken
4014 from:
4015  Peterson et al. Statistical limitations in functional neuroimaging
4016  I. Non-inferential methods and statistical models. Phil Trans Royal Soc
4017  Lond B 354: 1239-1260.
4018 
4019 Usage: aglm(data,para)
4020 Returns: statistic, p-value ???
4021 """
4022  if len(para) <> len(data):
4023  print "data and para must be same length in aglm"
4024  return
4025  n = len(para)
4026  p = pstat.aunique(para)
4027  x = N.zeros((n,len(p))) # design matrix
4028  for l in range(len(p)):
4029  x[:,l] = N.equal(para,p[l])
4030  b = N.dot(N.dot(LA.inv(N.dot(N.transpose(x),x)), # i.e., b=inv(X'X)X'Y
4031  N.transpose(x)),
4032  data)
4033  diffs = (data - N.dot(x,b))
4034  s_sq = 1./(n-len(p)) * N.dot(N.transpose(diffs), diffs)
4035 
4036  if len(p) == 2: # ttest_ind
4037  c = N.array([1,-1])
4038  df = n-2
4039  fact = asum(1.0/asum(x,0)) # i.e., 1/n1 + 1/n2 + 1/n3 ...
4040  t = N.dot(c,b) / N.sqrt(s_sq*fact)
4041  probs = abetai(0.5*df,0.5,float(df)/(df+t*t))
4042  return t, probs
4043 
4044 
4045  def aF_oneway(*args):
4046  """
4047 Performs a 1-way ANOVA, returning an F-value and probability given
4048 any number of groups. From Heiman, pp.394-7.
4049 
4050 Usage: aF_oneway (*args) where *args is 2 or more arrays, one per
4051  treatment group
4052 Returns: f-value, probability
4053 """
4054  na = len(args) # ANOVA on 'na' groups, each in it's own array
4055  means = [0]*na
4056  vars = [0]*na
4057  ns = [0]*na
4058  alldata = []
4059  tmp = map(N.array,args)
4060  means = map(amean,tmp)
4061  vars = map(avar,tmp)
4062  ns = map(len,args)
4063  alldata = N.concatenate(args)
4064  bign = len(alldata)
4065  sstot = ass(alldata)-(asquare_of_sums(alldata)/float(bign))
4066  ssbn = 0
4067  for a in args:
4068  ssbn = ssbn + asquare_of_sums(N.array(a))/float(len(a))
4069  ssbn = ssbn - (asquare_of_sums(alldata)/float(bign))
4070  sswn = sstot-ssbn
4071  dfbn = na-1
4072  dfwn = bign - na
4073  msb = ssbn/float(dfbn)
4074  msw = sswn/float(dfwn)
4075  f = msb/msw
4076  prob = fprob(dfbn,dfwn,f)
4077  return f, prob
4078 
4079 
4080  def aF_value (ER,EF,dfR,dfF):
4081  """
4082 Returns an F-statistic given the following:
4083  ER = error associated with the null hypothesis (the Restricted model)
4084  EF = error associated with the alternate hypothesis (the Full model)
4085  dfR = degrees of freedom the Restricted model
4086  dfF = degrees of freedom associated with the Restricted model
4087 """
4088  return ((ER-EF)/float(dfR-dfF) / (EF/float(dfF)))
4089 
4090 
4091  def outputfstats(Enum, Eden, dfnum, dfden, f, prob):
4092  Enum = round(Enum,3)
4093  Eden = round(Eden,3)
4094  dfnum = round(Enum,3)
4095  dfden = round(dfden,3)
4096  f = round(f,3)
4097  prob = round(prob,3)
4098  suffix = '' # for *s after the p-value
4099  if prob < 0.001: suffix = ' ***'
4100  elif prob < 0.01: suffix = ' **'
4101  elif prob < 0.05: suffix = ' *'
4102  title = [['EF/ER','DF','Mean Square','F-value','prob','']]
4103  lofl = title+[[Enum, dfnum, round(Enum/float(dfnum),3), f, prob, suffix],
4104  [Eden, dfden, round(Eden/float(dfden),3),'','','']]
4105  pstat.printcc(lofl)
4106  return
4107 
4108 
4109  def F_value_multivariate(ER, EF, dfnum, dfden):
4110  """
4111 Returns an F-statistic given the following:
4112  ER = error associated with the null hypothesis (the Restricted model)
4113  EF = error associated with the alternate hypothesis (the Full model)
4114  dfR = degrees of freedom the Restricted model
4115  dfF = degrees of freedom associated with the Restricted model
4116 where ER and EF are matrices from a multivariate F calculation.
4117 """
4118  if type(ER) in [IntType, FloatType]:
4119  ER = N.array([[ER]])
4120  if type(EF) in [IntType, FloatType]:
4121  EF = N.array([[EF]])
4122  n_um = (LA.det(ER) - LA.det(EF)) / float(dfnum)
4123  d_en = LA.det(EF) / float(dfden)
4124  return n_um / d_en
4125 
4126 
4127 #####################################
4128 ####### ASUPPORT FUNCTIONS ########
4129 #####################################
4130 
4131  def asign(a):
4132  """
4133 Usage: asign(a)
4134 Returns: array shape of a, with -1 where a<0 and +1 where a>=0
4135 """
4136  a = N.asarray(a)
4137  if ((type(a) == type(1.4)) or (type(a) == type(1))):
4138  return a-a-N.less(a,0)+N.greater(a,0)
4139  else:
4140  return N.zeros(N.shape(a))-N.less(a,0)+N.greater(a,0)
4141 
4142 
4143  def asum (a, dimension=None,keepdims=0):
4144  """
4145 An alternative to the Numeric.add.reduce function, which allows one to
4146 (1) collapse over multiple dimensions at once, and/or (2) to retain
4147 all dimensions in the original array (squashing one down to size.
4148 Dimension can equal None (ravel array first), an integer (the
4149 dimension over which to operate), or a sequence (operate over multiple
4150 dimensions). If keepdims=1, the resulting array will have as many
4151 dimensions as the input array.
4152 
4153 Usage: asum(a, dimension=None, keepdims=0)
4154 Returns: array summed along 'dimension'(s), same _number_ of dims if keepdims=1
4155 """
4156  if type(a) == N.ndarray and a.dtype in [N.int_, N.short, N.ubyte]:
4157  a = a.astype(N.float_)
4158  if dimension == None:
4159  s = N.sum(N.ravel(a))
4160  elif type(dimension) in [IntType,FloatType]:
4161  s = N.add.reduce(a, dimension)
4162  if keepdims == 1:
4163  shp = list(a.shape)
4164  shp[dimension] = 1
4165  s = N.reshape(s,shp)
4166  else: # must be a SEQUENCE of dims to sum over
4167  dims = list(dimension)
4168  dims.sort()
4169  dims.reverse()
4170  s = a *1.0
4171  for dim in dims:
4172  s = N.add.reduce(s,dim)
4173  if keepdims == 1:
4174  shp = list(a.shape)
4175  for dim in dims:
4176  shp[dim] = 1
4177  s = N.reshape(s,shp)
4178  return s
4179 
4180 
4181  def acumsum (a,dimension=None):
4182  """
4183 Returns an array consisting of the cumulative sum of the items in the
4184 passed array. Dimension can equal None (ravel array first), an
4185 integer (the dimension over which to operate), or a sequence (operate
4186 over multiple dimensions, but this last one just barely makes sense).
4187 
4188 Usage: acumsum(a,dimension=None)
4189 """
4190  if dimension == None:
4191  a = N.ravel(a)
4192  dimension = 0
4193  if type(dimension) in [ListType, TupleType, N.ndarray]:
4194  dimension = list(dimension)
4195  dimension.sort()
4196  dimension.reverse()
4197  for d in dimension:
4198  a = N.add.accumulate(a,d)
4199  return a
4200  else:
4201  return N.add.accumulate(a,dimension)
4202 
4203 
4204  def ass(inarray, dimension=None, keepdims=0):
4205  """
4206 Squares each value in the passed array, adds these squares & returns
4207 the result. Unfortunate function name. :-) Defaults to ALL values in
4208 the array. Dimension can equal None (ravel array first), an integer
4209 (the dimension over which to operate), or a sequence (operate over
4210 multiple dimensions). Set keepdims=1 to maintain the original number
4211 of dimensions.
4212 
4213 Usage: ass(inarray, dimension=None, keepdims=0)
4214 Returns: sum-along-'dimension' for (inarray*inarray)
4215 """
4216  if dimension == None:
4217  inarray = N.ravel(inarray)
4218  dimension = 0
4219  return asum(inarray*inarray,dimension,keepdims)
4220 
4221 
4222  def asummult (array1,array2,dimension=None,keepdims=0):
4223  """
4224 Multiplies elements in array1 and array2, element by element, and
4225 returns the sum (along 'dimension') of all resulting multiplications.
4226 Dimension can equal None (ravel array first), an integer (the
4227 dimension over which to operate), or a sequence (operate over multiple
4228 dimensions). A trivial function, but included for completeness.
4229 
4230 Usage: asummult(array1,array2,dimension=None,keepdims=0)
4231 """
4232  if dimension == None:
4233  array1 = N.ravel(array1)
4234  array2 = N.ravel(array2)
4235  dimension = 0
4236  return asum(array1*array2,dimension,keepdims)
4237 
4238 
4239  def asquare_of_sums(inarray, dimension=None, keepdims=0):
4240  """
4241 Adds the values in the passed array, squares that sum, and returns the
4242 result. Dimension can equal None (ravel array first), an integer (the
4243 dimension over which to operate), or a sequence (operate over multiple
4244 dimensions). If keepdims=1, the returned array will have the same
4245 NUMBER of dimensions as the original.
4246 
4247 Usage: asquare_of_sums(inarray, dimension=None, keepdims=0)
4248 Returns: the square of the sum over dim(s) in dimension
4249 """
4250  if dimension == None:
4251  inarray = N.ravel(inarray)
4252  dimension = 0
4253  s = asum(inarray,dimension,keepdims)
4254  if type(s) == N.ndarray:
4255  return s.astype(N.float_)*s
4256  else:
4257  return float(s)*s
4258 
4259 
4260  def asumdiffsquared(a,b, dimension=None, keepdims=0):
4261  """
4262 Takes pairwise differences of the values in arrays a and b, squares
4263 these differences, and returns the sum of these squares. Dimension
4264 can equal None (ravel array first), an integer (the dimension over
4265 which to operate), or a sequence (operate over multiple dimensions).
4266 keepdims=1 means the return shape = len(a.shape) = len(b.shape)
4267 
4268 Usage: asumdiffsquared(a,b)
4269 Returns: sum[ravel(a-b)**2]
4270 """
4271  if dimension == None:
4272  inarray = N.ravel(a)
4273  dimension = 0
4274  return asum((a-b)**2,dimension,keepdims)
4275 
4276 
4277  def ashellsort(inarray):
4278  """
4279 Shellsort algorithm. Sorts a 1D-array.
4280 
4281 Usage: ashellsort(inarray)
4282 Returns: sorted-inarray, sorting-index-vector (for original array)
4283 """
4284  n = len(inarray)
4285  svec = inarray *1.0
4286  ivec = range(n)
4287  gap = n/2 # integer division needed
4288  while gap >0:
4289  for i in range(gap,n):
4290  for j in range(i-gap,-1,-gap):
4291  while j>=0 and svec[j]>svec[j+gap]:
4292  temp = svec[j]
4293  svec[j] = svec[j+gap]
4294  svec[j+gap] = temp
4295  itemp = ivec[j]
4296  ivec[j] = ivec[j+gap]
4297  ivec[j+gap] = itemp
4298  gap = gap / 2 # integer division needed
4299 # svec is now sorted input vector, ivec has the order svec[i] = vec[ivec[i]]
4300  return svec, ivec
4301 
4302 
4303  def arankdata(inarray):
4304  """
4305 Ranks the data in inarray, dealing with ties appropritely. Assumes
4306 a 1D inarray. Adapted from Gary Perlman's |Stat ranksort.
4307 
4308 Usage: arankdata(inarray)
4309 Returns: array of length equal to inarray, containing rank scores
4310 """
4311  n = len(inarray)
4312  svec, ivec = ashellsort(inarray)
4313  sumranks = 0
4314  dupcount = 0
4315  newarray = N.zeros(n,N.float_)
4316  for i in range(n):
4317  sumranks = sumranks + i
4318  dupcount = dupcount + 1
4319  if i==n-1 or svec[i] <> svec[i+1]:
4320  averank = sumranks / float(dupcount) + 1
4321  for j in range(i-dupcount+1,i+1):
4322  newarray[ivec[j]] = averank
4323  sumranks = 0
4324  dupcount = 0
4325  return newarray
4326 
4327 
4328  def afindwithin(data):
4329  """
4330 Returns a binary vector, 1=within-subject factor, 0=between. Input
4331 equals the entire data array (i.e., column 1=random factor, last
4332 column = measured values.
4333 
4334 Usage: afindwithin(data) data in |Stat format
4335 """
4336  numfact = len(data[0])-2
4337  withinvec = [0]*numfact
4338  for col in range(1,numfact+1):
4339  rows = pstat.linexand(data,col,pstat.unique(pstat.colex(data,1))[0]) # get 1 level of this factor
4340  if len(pstat.unique(pstat.colex(rows,0))) < len(rows): # if fewer subjects than scores on this factor
4341  withinvec[col-1] = 1
4342  return withinvec
4343 
4344 
4345  #########################################################
4346  #########################################################
4347  ###### RE-DEFINE DISPATCHES TO INCLUDE ARRAYS #########
4348  #########################################################
4349  #########################################################
4350 
4351 ## CENTRAL TENDENCY:
4352  geometricmean = Dispatch ( (lgeometricmean, (ListType, TupleType)),
4353  (ageometricmean, (N.ndarray,)) )
4354  harmonicmean = Dispatch ( (lharmonicmean, (ListType, TupleType)),
4355  (aharmonicmean, (N.ndarray,)) )
4356  mean = Dispatch ( (lmean, (ListType, TupleType)),
4357  (amean, (N.ndarray,)) )
4358  median = Dispatch ( (lmedian, (ListType, TupleType)),
4359  (amedian, (N.ndarray,)) )
4360  medianscore = Dispatch ( (lmedianscore, (ListType, TupleType)),
4361  (amedianscore, (N.ndarray,)) )
4362  mode = Dispatch ( (lmode, (ListType, TupleType)),
4363  (amode, (N.ndarray,)) )
4364  tmean = Dispatch ( (atmean, (N.ndarray,)) )
4365  tvar = Dispatch ( (atvar, (N.ndarray,)) )
4366  tstdev = Dispatch ( (atstdev, (N.ndarray,)) )
4367  tsem = Dispatch ( (atsem, (N.ndarray,)) )
4368 
4369 ## VARIATION:
4370  moment = Dispatch ( (lmoment, (ListType, TupleType)),
4371  (amoment, (N.ndarray,)) )
4372  variation = Dispatch ( (lvariation, (ListType, TupleType)),
4373  (avariation, (N.ndarray,)) )
4374  skew = Dispatch ( (lskew, (ListType, TupleType)),
4375  (askew, (N.ndarray,)) )
4376  kurtosis = Dispatch ( (lkurtosis, (ListType, TupleType)),
4377  (akurtosis, (N.ndarray,)) )
4378  describe = Dispatch ( (ldescribe, (ListType, TupleType)),
4379  (adescribe, (N.ndarray,)) )
4380 
4381 ## DISTRIBUTION TESTS
4382 
4383  skewtest = Dispatch ( (askewtest, (ListType, TupleType)),
4384  (askewtest, (N.ndarray,)) )
4385  kurtosistest = Dispatch ( (akurtosistest, (ListType, TupleType)),
4386  (akurtosistest, (N.ndarray,)) )
4387  normaltest = Dispatch ( (anormaltest, (ListType, TupleType)),
4388  (anormaltest, (N.ndarray,)) )
4389 
4390 ## FREQUENCY STATS:
4391  itemfreq = Dispatch ( (litemfreq, (ListType, TupleType)),
4392  (aitemfreq, (N.ndarray,)) )
4393  scoreatpercentile = Dispatch ( (lscoreatpercentile, (ListType, TupleType)),
4394  (ascoreatpercentile, (N.ndarray,)) )
4395  percentileofscore = Dispatch ( (lpercentileofscore, (ListType, TupleType)),
4396  (apercentileofscore, (N.ndarray,)) )
4397  histogram = Dispatch ( (lhistogram, (ListType, TupleType)),
4398  (ahistogram, (N.ndarray,)) )
4399  cumfreq = Dispatch ( (lcumfreq, (ListType, TupleType)),
4400  (acumfreq, (N.ndarray,)) )
4401  relfreq = Dispatch ( (lrelfreq, (ListType, TupleType)),
4402  (arelfreq, (N.ndarray,)) )
4403 
4404 ## VARIABILITY:
4405  obrientransform = Dispatch ( (lobrientransform, (ListType, TupleType)),
4406  (aobrientransform, (N.ndarray,)) )
4407  samplevar = Dispatch ( (lsamplevar, (ListType, TupleType)),
4408  (asamplevar, (N.ndarray,)) )
4409  samplestdev = Dispatch ( (lsamplestdev, (ListType, TupleType)),
4410  (asamplestdev, (N.ndarray,)) )
4411  signaltonoise = Dispatch( (asignaltonoise, (N.ndarray,)),)
4412  var = Dispatch ( (lvar, (ListType, TupleType)),
4413  (avar, (N.ndarray,)) )
4414  stdev = Dispatch ( (lstdev, (ListType, TupleType)),
4415  (astdev, (N.ndarray,)) )
4416  sterr = Dispatch ( (lsterr, (ListType, TupleType)),
4417  (asterr, (N.ndarray,)) )
4418  sem = Dispatch ( (lsem, (ListType, TupleType)),
4419  (asem, (N.ndarray,)) )
4420  z = Dispatch ( (lz, (ListType, TupleType)),
4421  (az, (N.ndarray,)) )
4422  zs = Dispatch ( (lzs, (ListType, TupleType)),
4423  (azs, (N.ndarray,)) )
4424 
4425 ## TRIMMING FCNS:
4426  threshold = Dispatch( (athreshold, (N.ndarray,)),)
4427  trimboth = Dispatch ( (ltrimboth, (ListType, TupleType)),
4428  (atrimboth, (N.ndarray,)) )
4429  trim1 = Dispatch ( (ltrim1, (ListType, TupleType)),
4430  (atrim1, (N.ndarray,)) )
4431 
4432 ## CORRELATION FCNS:
4433  paired = Dispatch ( (lpaired, (ListType, TupleType)),
4434  (apaired, (N.ndarray,)) )
4435  lincc = Dispatch ( (llincc, (ListType, TupleType)),
4436  (alincc, (N.ndarray,)) )
4437  pearsonr = Dispatch ( (lpearsonr, (ListType, TupleType)),
4438  (apearsonr, (N.ndarray,)) )
4439  spearmanr = Dispatch ( (lspearmanr, (ListType, TupleType)),
4440  (aspearmanr, (N.ndarray,)) )
4441  pointbiserialr = Dispatch ( (lpointbiserialr, (ListType, TupleType)),
4442  (apointbiserialr, (N.ndarray,)) )
4443  kendalltau = Dispatch ( (lkendalltau, (ListType, TupleType)),
4444  (akendalltau, (N.ndarray,)) )
4445  linregress = Dispatch ( (llinregress, (ListType, TupleType)),
4446  (alinregress, (N.ndarray,)) )
4447 
4448 ## INFERENTIAL STATS:
4449  ttest_1samp = Dispatch ( (lttest_1samp, (ListType, TupleType)),
4450  (attest_1samp, (N.ndarray,)) )
4451  ttest_ind = Dispatch ( (lttest_ind, (ListType, TupleType)),
4452  (attest_ind, (N.ndarray,)) )
4453  ttest_rel = Dispatch ( (lttest_rel, (ListType, TupleType)),
4454  (attest_rel, (N.ndarray,)) )
4455  chisquare = Dispatch ( (lchisquare, (ListType, TupleType)),
4456  (achisquare, (N.ndarray,)) )
4457  ks_2samp = Dispatch ( (lks_2samp, (ListType, TupleType)),
4458  (aks_2samp, (N.ndarray,)) )
4459  mannwhitneyu = Dispatch ( (lmannwhitneyu, (ListType, TupleType)),
4460  (amannwhitneyu, (N.ndarray,)) )
4461  tiecorrect = Dispatch ( (ltiecorrect, (ListType, TupleType)),
4462  (atiecorrect, (N.ndarray,)) )
4463  ranksums = Dispatch ( (lranksums, (ListType, TupleType)),
4464  (aranksums, (N.ndarray,)) )
4465  wilcoxont = Dispatch ( (lwilcoxont, (ListType, TupleType)),
4466  (awilcoxont, (N.ndarray,)) )
4467  kruskalwallish = Dispatch ( (lkruskalwallish, (ListType, TupleType)),
4468  (akruskalwallish, (N.ndarray,)) )
4469  friedmanchisquare = Dispatch ( (lfriedmanchisquare, (ListType, TupleType)),
4470  (afriedmanchisquare, (N.ndarray,)) )
4471 
4472 ## PROBABILITY CALCS:
4473  chisqprob = Dispatch ( (lchisqprob, (IntType, FloatType)),
4474  (achisqprob, (N.ndarray,)) )
4475  zprob = Dispatch ( (lzprob, (IntType, FloatType)),
4476  (azprob, (N.ndarray,)) )
4477  ksprob = Dispatch ( (lksprob, (IntType, FloatType)),
4478  (aksprob, (N.ndarray,)) )
4479  fprob = Dispatch ( (lfprob, (IntType, FloatType)),
4480  (afprob, (N.ndarray,)) )
4481  betacf = Dispatch ( (lbetacf, (IntType, FloatType)),
4482  (abetacf, (N.ndarray,)) )
4483  betai = Dispatch ( (lbetai, (IntType, FloatType)),
4484  (abetai, (N.ndarray,)) )
4485  erfcc = Dispatch ( (lerfcc, (IntType, FloatType)),
4486  (aerfcc, (N.ndarray,)) )
4487  gammln = Dispatch ( (lgammln, (IntType, FloatType)),
4488  (agammln, (N.ndarray,)) )
4489 
4490 ## ANOVA FUNCTIONS:
4491  F_oneway = Dispatch ( (lF_oneway, (ListType, TupleType)),
4492  (aF_oneway, (N.ndarray,)) )
4493  F_value = Dispatch ( (lF_value, (ListType, TupleType)),
4494  (aF_value, (N.ndarray,)) )
4495 
4496 ## SUPPORT FUNCTIONS:
4497  incr = Dispatch ( (lincr, (ListType, TupleType, N.ndarray)), )
4498  sum = Dispatch ( (lsum, (ListType, TupleType)),
4499  (asum, (N.ndarray,)) )
4500  cumsum = Dispatch ( (lcumsum, (ListType, TupleType)),
4501  (acumsum, (N.ndarray,)) )
4502  ss = Dispatch ( (lss, (ListType, TupleType)),
4503  (ass, (N.ndarray,)) )
4504  summult = Dispatch ( (lsummult, (ListType, TupleType)),
4505  (asummult, (N.ndarray,)) )
4506  square_of_sums = Dispatch ( (lsquare_of_sums, (ListType, TupleType)),
4507  (asquare_of_sums, (N.ndarray,)) )
4508  sumdiffsquared = Dispatch ( (lsumdiffsquared, (ListType, TupleType)),
4509  (asumdiffsquared, (N.ndarray,)) )
4510  shellsort = Dispatch ( (lshellsort, (ListType, TupleType)),
4511  (ashellsort, (N.ndarray,)) )
4512  rankdata = Dispatch ( (lrankdata, (ListType, TupleType)),
4513  (arankdata, (N.ndarray,)) )
4514  findwithin = Dispatch ( (lfindwithin, (ListType, TupleType)),
4515  (afindwithin, (N.ndarray,)) )
4516 
4517 ###################### END OF NUMERIC FUNCTION BLOCK #####################
4518 
4519 ###################### END OF STATISTICAL FUNCTIONS ######################
4520 
4521 except ImportError:
4522  pass
def atstdev(a, limits=None, inclusive=(1, 1))
Definition: stats.py:2322
def amoment(a, moment=1, dimension=None)
AMOMENTS #############.
Definition: stats.py:2372
def apercentileofscore(inarray, score, histbins=10, defaultlimits=None)
Definition: stats.py:2582
def aksprob(alam)
Definition: stats.py:3837
def lcumsum(inlist)
Definition: stats.py:1680
def llincc(x, y)
Definition: stats.py:860
def lksprob(alam)
Definition: stats.py:1440
def asign(a)
ASUPPORT FUNCTIONS ########.
Definition: stats.py:4131
def lsquare_of_sums(inlist)
Definition: stats.py:1736
def akendalltau(x, y)
Definition: stats.py:3190
def amasslinregress(args)
Definition: stats.py:3261
def F_value_multivariate(ER, EF, dfnum, dfden)
Definition: stats.py:4109
def lpaired(x, y)
CORRELATION FUNCTIONS ######.
Definition: stats.py:770
def lpointbiserialr(x, y)
Definition: stats.py:898
def atmean(a, limits=None, inclusive=(1, 1))
Definition: stats.py:2218
def lss(inlist)
Definition: stats.py:1693
def akurtosis(a, dimension=None)
Definition: stats.py:2425
def acov(x, y, dimension=None, keepdims=0)
Definition: stats.py:2762
def aerfcc(x)
Definition: stats.py:3784
def lshellsort(inlist)
Definition: stats.py:1748
def avar(inarray, dimension=None, keepdims=0)
Definition: stats.py:2790
def abetai(a, b, x, verbose=1)
Definition: stats.py:3966
def dices(x, y)
Definition: stats.py:3058
def ltrim1(l, proportiontocut, tail='right')
Definition: stats.py:747
def lkendalltau(x, y)
Definition: stats.py:930
def atmin(a, lowerlimit=None, dimension=None, inclusive=1)
Definition: stats.py:2282
def asem(inarray, dimension=None, keepdims=0)
Definition: stats.py:2844
def atrim1(a, proportiontocut, tail='right')
Definition: stats.py:2945
def lharmonicmean(inlist)
Definition: stats.py:283
def lmode(inlist)
Definition: stats.py:350
def lbetai(a, b, x)
Definition: stats.py:1528
F_oneway
ANOVA FUNCTIONS:
Definition: stats.py:1957
def lrelfreq(inlist, numbins=10, defaultreallimits=None)
Definition: stats.py:554
def atrimboth(a, proportiontocut)
Definition: stats.py:2928
def lfindwithin(data)
Definition: stats.py:1856
def lfriedmanchisquare(args)
Definition: stats.py:1292
def attest_rel(a, b, dimension=None, printit=0, name1='Samp1', name2='Samp2', writemode='a')
Definition: stats.py:3422
def akurtosistest(a, dimension=None)
Definition: stats.py:2494
def lskew(inlist)
Definition: stats.py:412
def avariation(a, dimension=None)
Definition: stats.py:2394
def ageometricmean(inarray, dimension=None, keepdims=0)
ACENTRAL TENDENCY ########.
Definition: stats.py:2002
def az(a, score)
Definition: stats.py:2867
def attest_ind(a, b, dimension=None, printit=0, name1='Samp1', name2='Samp2', writemode='a')
Definition: stats.py:3341
def ldescribe(inlist)
Definition: stats.py:432
def acovariance(X)
ACORRELATION FUNCTIONS ######.
Definition: stats.py:2968
def afprob(dfnum, dfden, F)
Definition: stats.py:3879
def lspearmanr(x, y)
Definition: stats.py:874
incr
SUPPORT FUNCTIONS:
Definition: stats.py:1961
def aspearmanr(x, y)
Definition: stats.py:3138
def lmoment(inlist, moment=1)
MOMENTS #############.
Definition: stats.py:383
def lF_oneway(lists)
ANOVA CALCULATIONS #######.
Definition: stats.py:1557
def __init__(self, tuples)
Definition: stats.py:244
def acumfreq(a, numbins=10, defaultreallimits=None)
Definition: stats.py:2636
def apearsonr(x, y, verbose=1)
Definition: stats.py:3117
def abetacf(a, b, x, verbose=1)
Definition: stats.py:3893
def azmap(scores, compare, dimension=0)
Definition: stats.py:2892
def amannwhitneyu(x, y)
Definition: stats.py:3531
def lpercentileofscore(inlist, score, histbins=10, defaultlimits=None)
Definition: stats.py:488
def ashellsort(inarray)
Definition: stats.py:4277
def alinregress(args)
Definition: stats.py:3225
def outputpairedstats(fname, writemode, name1, n1, m1, se1, min1, max1, name2, n2, m2, se2, min2, max2, statname, stat, prob)
Definition: stats.py:1799
def amedianscore(inarray, dimension=None)
Definition: stats.py:2165
obrientransform
VARIABILITY:
Definition: stats.py:1911
def lsamplestdev(inlist)
Definition: stats.py:622
def lttest_rel(a, b, printit=0, name1='Sample1', name2='Sample2', writemode='a')
Definition: stats.py:1055
def atiecorrect(rankvals)
Definition: stats.py:3559
def lttest_ind(a, b, printit=0, name1='Samp1', name2='Samp2', writemode='a')
Definition: stats.py:1024
def asummult(array1, array2, dimension=None, keepdims=0)
Definition: stats.py:4222
def afindwithin(data)
Definition: stats.py:4328
def agammln(xx)
Definition: stats.py:3945
def acorrelation(X)
Definition: stats.py:2982
DISPATCH CODE ##############.
Definition: stats.py:234
def lmedian(inlist, numbins=1000)
Definition: stats.py:309
def askew(a, dimension=None)
Definition: stats.py:2406
def lzs(inlist)
Definition: stats.py:715
def lrankdata(inlist)
Definition: stats.py:1774
def azs(a)
Definition: stats.py:2879
def azprob(z)
Definition: stats.py:3798
def lranksums(x, y)
Definition: stats.py:1202
def atmax(a, upperlimit, dimension=None, inclusive=1)
Definition: stats.py:2302
def apointbiserialr(x, y)
Definition: stats.py:3160
def lstdev(inlist)
Definition: stats.py:672
def lsumdiffsquared(x, y)
Definition: stats.py:1722
def amedian(inarray, numbins=1000)
Definition: stats.py:2141
def ltrimboth(l, proportiontocut)
TRIMMING FUNCTIONS #######.
Definition: stats.py:731
chisqprob
PROBABILITY CALCS:
Definition: stats.py:1947
def ahistogram(inarray, numbins=10, defaultlimits=None, printextras=1)
Definition: stats.py:2597
def aglm(data, para)
Definition: stats.py:4011
def asumdiffsquared(a, b, dimension=None, keepdims=0)
Definition: stats.py:4260
def lmean(inlist)
Definition: stats.py:296
def aobrientransform(args)
AVARIABILITY FUNCTIONS #####.
Definition: stats.py:2668
def ass(inarray, dimension=None, keepdims=0)
Definition: stats.py:4204
def lchisquare(f_obs, f_exp=None)
Definition: stats.py:1091
def lfprob(dfnum, dfden, F)
Definition: stats.py:1461
def writecc(listoflists, file, writetype='w', extra=2)
SUPPORT FUNCTIONS #######.
Definition: stats.py:1611
def lvariation(inlist)
Definition: stats.py:402
def lerfcc(x)
Definition: stats.py:1382
def outputfstats(Enum, Eden, dfnum, dfden, f, prob)
Definition: stats.py:4091
def aranksums(x, y)
Definition: stats.py:3585
def attest_1samp(a, popmean, printit=0, name='Sample', writemode='a')
AINFERENTIAL STATISTICS #####.
Definition: stats.py:3311
def lcov(x, y, keepdims=0)
Definition: stats.py:632
def icc(x, y=None, verbose=0)
Definition: stats.py:3073
def lkurtosis(inlist)
Definition: stats.py:422
def lgammln(xx)
Definition: stats.py:1507
def lmedianscore(inlist)
Definition: stats.py:331
def lzprob(z)
Definition: stats.py:1398
def lcumfreq(inlist, numbins=10, defaultreallimits=None)
Definition: stats.py:542
def aharmonicmean(inarray, dimension=None, keepdims=0)
Definition: stats.py:2045
def atsem(a, limits=None, inclusive=(1, 1))
Definition: stats.py:2335
def anormaltest(a, dimension=None)
Definition: stats.py:2526
def lwilcoxont(x, y)
Definition: stats.py:1224
def aks_2samp(data1, data2)
Definition: stats.py:3491
def lscoreatpercentile(inlist, percent)
Definition: stats.py:468
def asquare_of_sums(inarray, dimension=None, keepdims=0)
Definition: stats.py:4239
def lz(inlist, score)
Definition: stats.py:704
def lhistogram(inlist, numbins=10, defaultreallimits=None, printextras=0)
Definition: stats.py:503
def asamplevar(inarray, dimension=None, keepdims=0)
Definition: stats.py:2706
def athreshold(a, threshmin=None, threshmax=None, newval=0)
ATRIMMING FUNCTIONS #######.
Definition: stats.py:2911
def lpearsonr(x, y)
Definition: stats.py:834
def lincr(l, cap)
Definition: stats.py:1651
def lmannwhitneyu(x, y)
Definition: stats.py:1148
def aF_oneway(args)
Definition: stats.py:4045
def achisquare(f_obs, f_exp=None)
Definition: stats.py:3472
def asterr(inarray, dimension=None, keepdims=0)
Definition: stats.py:2828
def ltiecorrect(rankvals)
Definition: stats.py:1177
def aF_value(ER, EF, dfR, dfF)
Definition: stats.py:4080
def lF_value(ER, EF, dfnum, dfden)
Definition: stats.py:1594
def lsamplevar(inlist)
Definition: stats.py:607
def litemfreq(inlist)
FREQUENCY STATS ##########.
Definition: stats.py:452
def amean(inarray, dimension=None, keepdims=0)
Definition: stats.py:2099
def ap2t(pval, df)
Definition: stats.py:3392
def alincc(x, y)
Definition: stats.py:3101
def lobrientransform(args)
VARIABILITY FUNCTIONS ######.
Definition: stats.py:571
def asamplestdev(inarray, dimension=None, keepdims=0)
Definition: stats.py:2734
def awilcoxont(x, y)
Definition: stats.py:3606
def lbetacf(a, b, x)
Definition: stats.py:1473
def atvar(a, limits=None, inclusive=(1, 1))
Definition: stats.py:2250
def lvar(inlist)
Definition: stats.py:657
def __call__(self, arg1, args, kw)
Definition: stats.py:253
def lkruskalwallish(args)
Definition: stats.py:1257
def lsummult(list1, list2)
Definition: stats.py:1706
def askewtest(a, dimension=None)
NORMALITY TESTS ##########.
Definition: stats.py:2469
def apaired(x, y)
Definition: stats.py:2994
def llinregress(x, y)
Definition: stats.py:965
def afriedmanchisquare(args)
Definition: stats.py:3673
def lchisqprob(chisq, df)
PROBABILITY CALCULATIONS ####.
Definition: stats.py:1322
moment
MOMENTS:
Definition: stats.py:1896
def acumsum(a, dimension=None)
Definition: stats.py:4181
def lks_2samp(data1, data2)
Definition: stats.py:1109
def lsum(inlist)
Definition: stats.py:1668
def arelfreq(a, numbins=10, defaultreallimits=None)
Definition: stats.py:2650
def adescribe(inarray, dimension=None)
Definition: stats.py:2444
def achisqprob(chisq, df)
APROBABILITY CALCULATIONS ####.
Definition: stats.py:3702
def astdev(inarray, dimension=None, keepdims=0)
Definition: stats.py:2815
def lsem(inlist)
Definition: stats.py:692
def lttest_1samp(a, popmean, printit=0, name='Sample', writemode='a')
INFERENTIAL STATISTICS #####.
Definition: stats.py:997
def akruskalwallish(args)
Definition: stats.py:3637
def arankdata(inarray)
Definition: stats.py:4303
def asignaltonoise(instack, dimension=0)
Definition: stats.py:2747
def lsterr(inlist)
Definition: stats.py:682
def amode(a, dimension=None)
Definition: stats.py:2190
def lgeometricmean(inlist)
Definition: stats.py:269
def asum(a, dimension=None, keepdims=0)
Definition: stats.py:4143
def aitemfreq(a)
AFREQUENCY FUNCTIONS #######.
Definition: stats.py:2549
def ascoreatpercentile(inarray, percent)
Definition: stats.py:2566


wiimote
Author(s): Andreas Paepcke, Melonee Wise, Mark Horn
autogenerated on Fri Jun 7 2019 22:01:33