PDL::Indexing - phpMan

Command: man perldoc info search(apropos)  


INDEXING(1)           User Contributed Perl Documentation          INDEXING(1)



NAME
       PDL::Indexing - how to index piddles.

DESCRIPTION
       This manpage should serve as a first tutorial on the indexing and threading fea-
       tures of PDL.

       This manpage is still in alpha development and not yet complete. "Meta" comments
       that point out deficiencies/omissions of this document will be surrounded by square
       brackets ([]), e.g. [ Hopefully I will be able to remove this paragraph at some
       time in the future ]. Furthermore, it is possible that there are errors in the code
       examples. Please report any errors to Christian Soeller (c.soeller AT auckland.nz).

       Still to be done are (please bear with us and/or ask on the mailing list, see
       PDL::FAQ):

       ?    document perl level threading

       ?    threadids

       ?    update and correct description of slice

       ?    new functions in slice.pd (affine, lag, splitdim)

       ?    reworking of paragraph on explicit threading

Indexing and threading with PDL
       A lot of the flexibility and power of PDL relies on the indexing and looping fea-
       tures of the perl extension. Indexing allows access to the data of a pdl object in
       a very flexible way. Threading provides efficient implicit looping functionality
       (since the loops are implemented as optimized C code).

       Pdl objects (later often called "pdls") are perl objects that represent multidimen-
       sional arrays and operations on those. In contrast to simple perl @x style lists
       the array data is compactly stored in a single block of memory thus taking up a lot
       less memory and enabling use of fast C code to implement operations (e.g. addition,
       etc) on pdls.

       pdls can have children

       Central to many of the indexing capabilities of PDL are the relation of "parent"
       and "child" between pdls. Many of the indexing commands create a new pdl from an
       existing pdl. The new pdl is the "child" and the old one is the "parent". The data
       of the new pdl is defined by a transformation that specifies how to generate (com-
       pute) its data from the parent's data. The relation between the child pdl and its
       parent are often bidirectional, meaning that changes in the child's data are propa-
       gated back to the parent. (Note: You see, we are aiming in our terminology already
       towards the new dataflow features. The kind of dataflow that is used by the index-
       ing commands (about which you will learn in a minute) is always in operation, not
       only when you have explicitly switched on dataflow in your pdl by saying
       "$a->doflow". For further information about data flow check the dataflow manpage.)

       Another way to interpret the pdls created by our indexing commands is to view them
       as a kind of intelligent pointer that points back to some portion or all of its
       parent's data. Therefore, it is not surprising that the parent's data (or a portion
       of it) changes when manipulated through this "pointer". After these introductory
       remarks that hopefully prepared you for what is coming (rather than confuse you too
       much) we are going to dive right in and start with a description of the indexing
       commands and some typical examples how they might be used in PDL programs. We will
       further illustrate the pointer/dataflow analogies in the context of some of the
       examples later on.

       There are two different implementations of this ''smart pointer'' relationship: the
       first one, which is a little slower but works for any transformation is simply to
       do the transformation forwards and backwards as necessary. The other is to consider
       the child piddle a ''virtual'' piddle, which only stores a pointer to the parent
       and access information so that routines which use the child piddle actually
       directly access the data in the parent.  If the virtual piddle is given to a rou-
       tine which cannot use it, PDL transparently physicalizes the virtual piddle before
       letting the routine use it.

       Currently (1.94_01) all transformations which are ''affine'', i.e. the indices of
       the data item in the parent piddle are determined by a linear transformation (+
       constant) from the indices of the child piddle result in virtual piddles. All other
       indexing routines (e.g. "->index(...)") result in physical piddles.  All routines
       compiled by PP can accept affine piddles (except those routines that pass pointers
       to external library functions).

       Note that whether something is affine or not does not affect the semantics of what
       you do in any way: both

        $a->index(...) .= 5;
        $a->slice(...) .= 5;

       change the data in $a. The affinity does, however, have a significant impact on
       memory usage and performance.

       Slicing pdls

       Probably the most important application of the concept of parent/child pdls is the
       representation of rectangular slices of a physical pdl by a virtual pdl. Having
       talked long enough about concepts let's get more specific. Suppose we are working
       with a 2D pdl representing a 5x5 image (its unusually small so that we can print it
       without filling several screens full of digits ;).

        perldl> $im = sequence(5,5)
        perldl> p $im

        [
         [ 0  1  2  3  4]
         [ 5  6  7  8  9]
         [10 11 12 13 14]
         [15 16 17 18 19]
         [20 21 22 23 24]
        ]

        perldl> help vars
        PDL variables in package main::

        Name         Type   Dimension       Flow  State          Mem
        ----------------------------------------------------------------
        $im          Double D [5,5]                P            0.20Kb

       [ here it might be appropriate to quickly talk about the "help vars" command that
       provides information about pdls in the interactive "perldl" shell that comes with
       pdl.  ]

       Now suppose we want to create a 1-D pdl that just references one line of the image,
       say line 2; or a pdl that represents all even lines of the image (imagine we have
       to deal with even and odd frames of an interlaced image due to some peculiar
       behaviour of our frame grabber). As another frequent application of slices we might
       want to create a pdl that represents a rectangular region of the image with top and
       bottom reversed. All these effects (and many more) can be easily achieved with the
       powerful slice function:

        perldl> $line = $im->slice(':,(2)')
        perldl> $even = $im->slice(':,1:-1:2')
        perldl> $area = $im->slice('3:4,3:1')
        perldl> help vars  # or just PDL->vars
        PDL variables in package main::

        Name         Type   Dimension       Flow  State          Mem
        ----------------------------------------------------------------
        $even        Double D [5,2]                -C           0.00Kb
        $im          Double D [5,5]                P            0.20Kb
        $line        Double D [5]                  -C           0.00Kb
        $area        Double D [2,3]                -C           0.00Kb

       All three "child" pdls are children of $im or in the other (largely equivalent)
       interpretation pointers to data of $im.  Operations on those virtual pdls access
       only those portions of the data as specified by the argument to slice. So we can
       just print line 2:

        perldl> p $line
        [10 11 12 13 14]

       Also note the difference in the "Flow State" of $area above and below:

        perldl> p $area
        perldl> help $area
        This variable is Double D [2,3]                VC           0.00Kb

       The following demonstrates that $im and $line really behave as you would exspect
       from a pointer-like object (or in the dataflow picture: the changes in $line's data
       are propagated back to $im):

        perldl> $im++
        perldl> p $line
        [11 12 13 14 15]
        perldl> $line += 2
        perldl> p $im

        [
         [ 1  2  3  4  5]
         [ 6  7  8  9 10]
         [13 14 15 16 17]
         [16 17 18 19 20]
         [21 22 23 24 25]
        ]

       Note how assignment operations on the child virtual pdls change the parent physical
       pdl and vice versa (however, the basic "=" assignment doesn't, use ".=" to obtain
       that effect. See below for the reasons).  The virtual child pdls are something like
       "live links" to the "original" parent pdl. As previously said, they can be thought
       of to work similiar to a C-pointer. But in contrast to a C-pointer they carry a lot
       more information. Firstly, they specify the structure of the data they represent
       (the dimensionality of the new pdl) and secondly, specify how to create this struc-
       ture from its parents data (the way this works is buried in the internals of PDL
       and not important for you to know anyway (unless you want to hack the core in the
       future or would like to become a PDL guru in general (for a definition of this
       strange creature see PDL::Internals)).

       The previous examples have demonstrated typical usage of the slice function. Since
       the slicing functionality is so important here is an explanation of the syntax for
       the string argument to slice:

        $vpdl = $a->slice('ind0,ind1...')

       where "ind0" specifies what to do with index No 0 of the pdl $a, etc. Each element
       of the comma separated list can have one of the following forms:

       ':'   Use the whole dimension

       'n'   Use only index "n". The dimension of this index in the resulting virtual pdl
             is 1. An example involving those first two index formats:

              perldl> $column = $im->slice('2,:')
              perldl> $row = $im->slice(':,0')
              perldl> p $column

              [
               [ 3]
               [ 8]
               [15]
               [18]
               [23]
              ]

              perldl> p $row

              [
               [1 2 3 4 5]
              ]

              perldl> help $column
              This variable is Double D [1,5]                VC           0.00Kb

              perldl> help $row
              This variable is Double D [5,1]                VC           0.00Kb

       '(n)' Use only index "n". This dimension is removed from the resulting pdl (relying
             on the fact that a dimension of size 1 can always be removed). The distinc-
             tion between this case and the previous one becomes important in assignments
             where left and right hand side have to have appropriate dimensions.

              perldl> $line = $im->slice(':,(0)')
              perldl> help $line
              This variable is Double D [5]                  -C           0.00Kb

              perldl> p $line
              [1 2 3 4 5]

             Spot the difference to the previous example?

       'n1:n2' or 'n1:n2:n3'
             Take the range of indices from "n1" to "n2" or (second form) take the range
             of indices from "n1" to "n2" with step "n3". An example for the use of this
             format is the previous definition of the subimage composed of even lines.

              perldl> $even = $im->slice(':,1:-1:2')

             This example also demonstrates that negative indices work like they do for
             normal perl style arrays by counting backwards from the end of the dimension.
             If "n2" is smaller than "n1" (in the example -1 is equivalent to index 4) the
             elements in the virtual pdl are effectively reverted with respect to its par-
             ent.

       '*[n]'
             Add a dummy dimension. The size of this dimension will be 1 by default or
             equal to "n" if the optional numerical argument is given.

             Now, this is really something a bit strange on first sight. What is a dummy
             dimension? A dummy dimension inserts a dimension where there wasn't one
             before. How is that done ? Well, in the case of the new dimension having size
             1 it can be easily explained by the way in which you can identify a vector
             (with "m" elements) with an "(1,m)" or "(m,1)" matrix. The same holds obvi-
             ously for higher dimensional objects. More interesting is the case of a dummy
             dimensions of size greater than one (e.g. "slice('*5,:')"). This works in the
             same way as a call to the dummy function creates a new dummy dimension.  So
             read on and check its explanation below.

       '([n1:n2[:n3]]=i)'
             [Not yet implemented ??????]  With an argument like this you make generalised
             diagonals. The diagonal will be dimension no. "i" of the new output pdl and
             (if optional part in brackets specified) will extend along the range of
             indices specified of the respective parent pdl's dimension. In general an
             argument like this only makes sense if there are other arguments like this in
             the same call to slice. The part in brackets is optional for this type of
             argument. All arguments of this type that specify the same target dimension
             "i" have to relate to the same number of indices in their parent dimension.
             The best way to explain it is probably to give an example, here we make a pdl
             that refers to the elements along the space diagonal of its parent pdl (a
             cube):

              $cube = zeroes(5,5,5);
              $sdiag = $cube->slice('(=0),(=0),(=0)');

             The above command creates a virtual pdl that represents the diagonal along
             the parents' dimension no. 0, 1 and 2 and makes its dimension 0 (the only
             dimension) of it. You use the extended syntax if the dimension sizes of the
             parent dimensions you want to build the diagonal from have different sizes or
             you want to reverse the sequence of elements in the diagonal, e.g.

              $rect = zeroes(12,3,5,6,2);
              $vpdl = $rect->slice('2:7,(0:1=1),(4),(5:4=1),(=1)');

             So the elements of $vpdl will then be related to those of its parent in way
             we can express as:

               vpdl(i,j) = rect(i+2,j,4,5-j,j)       0<=i<5, 0<=j<2

       [ work in the new index function: "$b = $a->index($c);" ???? ]

       There are different kinds of assignments in PDL

       The previous examples have already shown that virtual pdls can be used to operate
       on or access portions of data of a parent pdl. They can also be used as lvalues in
       assignments (as the use of "++" in some of the examples above has already demon-
       strated). For explicit assignments to the data represented by a virtual pdl you
       have to use the overloaded ".=" operator (which in this context we call propagated
       assignment). Why can't you use the normal assignment operator "="?

       Well, you definitely still can use the '=' operator but it wouldn't do what you
       want. This is due to the fact that the '=' operator cannot be overloaded in the
       same way as other assignment operators. If we tried to use '=' to try to assign
       data to a portion of a physical pdl through a virtual pdl we wouldn't achieve the
       desired effect (instead the variable representing the virtual pdl (a reference to a
       blessed thingy) would after the assignment just contain the reference to another
       blessed thingy which would behave to future assignments as a "physical" copy of the
       original rvalue [this is actually not yet clear and subject of discussions in the
       PDL developers mailing list]. In that sense it would break the connection of the
       pdl to the parent [ isn't this behaviour in a sense the opposite of what happens in
       dataflow, where ".=" breaks the connection to the parent? ].

       E.g.

        perldl> $line = $im->slice(':,(2)')
        perldl> $line = zeroes(5);
        perldl> $line++;
        perldl> p $im

        [
         [ 1  2  3  4  5]
         [ 6  7  8  9 10]
         [13 14 15 16 17]
         [16 17 18 19 20]
         [21 22 23 24 25]
        ]

        perldl> p $line
        [1 1 1 1 1]

       But using ".="

        perldl> $line = $im->slice(':,(2)')
        perldl> $line .= zeroes(5)
        perldl> $line++
        perldl> p $im

        [
         [ 1  2  3  4  5]
         [ 6  7  8  9 10]
         [ 1  1  1  1  1]
         [16 17 18 19 20]
         [21 22 23 24 25]
        ]

        perldl> print $line
        [1 1 1 1 1]

       Also, you can substitute

        perldl> $line .= 0;

       for the assignment above (the zero is converted to a scalar piddle, with no dimen-
       sions so it can be assigned to any piddle).

       Related to the assignment feature is a little trap for the unwary: since perl cur-
       rently does not allow subroutines to return lvalues the following shortcut of the
       above is flagged as a compile time error:

        perldl> $im->slice(':,(2)') .= zeroes(5)->xvals->float

       instead you have to say something like

        perldl> ($pdl = $im->slice(':,(2)')) .= zeroes(5)->xvals->float

       We hope that future versions of perl will allow the simpler syntax (i.e. allow sub-
       routines to return lvalues).  [Note: perl v5.6.0 does allow this, but it is an
       experimental feature. However, early reports suggest it works in simple situations]

       Note that there can be a problem with assignments like this when lvalue and rvalue
       pdls refer to overlapping portions of data in the parent pdl:

        # revert the elements of the first line of $a
        ($tmp = $a->slice(':,(1)')) .= $a->slice('-1:0,(1)');

       Currently, the parent data on the right side of the assignments is not copied
       before the (internal) assignment loop proceeds. Therefore, the outcome of this
       assignment will depend on the sequence in which elements are assigned and almost
       certainly not do what you wanted.  So the semantics are currently undefined for now
       and liable to change anytime. To obtain the desired behaviour, use

        ($tmp = $a->slice(':,(1)')) .= $a->slice('-1:0,(1)')->copy;

       which makes a physical copy of the slice or

        ($tmp = $a->slice(':,(1)')) .= $a->slice('-1:0,(1)')->sever;

       which returns the same slice but severs the connection of the slice to its parent.

       Other functions that manipulate dimensions

       Having talked extensively about the slice function it should be noted that this is
       not the only PDL indexing function. There are additional indexing functions which
       are also useful (especially in the context of threading which we will talk about
       later). Here are a list and some examples how to use them.

       "dummy"
           inserts a dummy dimension of the size you specify (default 1) at the chosen
           location. You can't wait to hear how that is achieved?  Well, all elements with
           index "(X,x,Y)" ("0<=x<size_of_dummy_dim") just map to the element with index
           "(X,Y)" of the parent pdl (where "X" and "Y" refer to the group of indices
           before and after the location where the dummy dimension was inserted.)

           This example calculates the x coordinate of the centroid of an image (later we
           will learn that we didn't actually need the dummy dimension thanks to the magic
           of implicit threading; but using dummy dimensions the code would also work in a
           threadless world; though once you have worked with PDL threads you wouldn't
           want to live without them again).

            # centroid
            ($xd,$yd) = $im->dims;
            $xc = sum($im*xvals(zeroes($xd))->dummy(1,$yd))/sum($im);

           Let's explain how that works in a little more detail. First, the product:

            $xvs = xvals(zeroes($xd));
            print $xvs->dummy(1,$yd);      # repeat the line $yd times
            $prod = $im*xvs->dummy(1,$yd); # form the pixelwise product with
                                           # the repeated line of x-values

           The rest is then summing the results of the pixelwise product together and nor-
           malising with the sum of all pixel values in the original image thereby calcu-
           lating the x-coordinate of the "center of mass" of the image (interpreting
           pixel values as local mass) which is known as the centroid of an image.

           Next is a (from the point of view of memory consumption) very cheap conversion
           from greyscale to RGB, i.e. every pixel holds now a triple of values instead of
           a scalar. The three values in the triple are, fortunately, all the same for a
           grey image, so that our trick works well in that it maps all the three members
           of the triple to the same source element:

            # a cheap greyscale to RGB conversion
            $rgb = $grey->dummy(0,3)

           Unfortunately this trick cannot be used to convert your old B/W photos to color
           ones in the way you'd like. :(

           Note that the memory usage of piddles with dummy dimensions is especially sen-
           sitive to the internal representation. If the piddle can be represented as a
           virtual affine (''vaffine'') piddle, only the control structures are stored.
           But if $b in

            $a = zeroes(10000);
            $b = $a->dummy(1,10000);

           is made physical by some routine, you will find that the memory usage of your
           program has suddenly grown by 100Mb.

       "diagonal"
           replaces two dimensions (which have to be of equal size) by one dimension that
           references all the elements along the "diagonal" along those two dimensions.
           Here, we have two examples which should appear familiar to anyone who has ever
           done some linear algebra. Firstly, make a unity matrix:

            # unity matrix
            $e = zeroes(float, 3, 3); # make everything zero
            ($tmp = $e->diagonal(0,1)) .= 1; # set the elements along the diagonal to 1
            print $e;

           Or the other diagonal:

            ($tmp = $e->slice(':-1:0')->diagonal(0,1)) .= 2;
            print $e;

           (Did you notice how we used the slice function to revert the sequence of lines
           before setting the diagonal of the new child, thereby setting the cross diago-
           nal of the parent ?)  Or a mapping from the space of diagonal matrices to the
           field over which the matrices are defined, the trace of a matrix:

            # trace of a matrix
            $trace = sum($mat->diagonal(0,1));  # sum all the diagonal elements

       "xchg" and "mv"
           xchg exchanges or "transposes" the two  specified dimensions.  A straightfor-
           ward example:

            # transpose a matrix (without explicitly reshuffling data and
            # making a copy)
            $prod = $a x $a->xchg(0,1);

           $prod should now be pretty close to the unity matrix if $a is an orthogonal
           matrix. Often "xchg" will be used in the context of threading but more about
           that later.

           mv works in a similar fashion. It moves a dimension (specified by its number in
           the parent) to a new position in the new child pdl:

            $b = $a->mv(4,0);  # make the 5th dimension of $a the first in the
                               # new child $b

           The difference between "xchg" and "mv" is that "xchg" only changes the position
           of two dimensions with each other, whereas "mv" inserts the first dimension to
           the place of second, moving the other dimensions around accordingly.

       "clump"
           collapses several dimensions into one. Its only argument specifies how many
           dimensions of the source pdl should be collapsed (starting from the first). An
           (admittedly unrealistic) example is a 3D pdl which holds data from a stack of
           image files that you have just read in. However, the data from each image
           really represents a 1D time series and has only been arranged that way because
           it was digitized with a frame grabber. So to have it again as an array of time
           sequences you say

            perldl> $seqs = $stack->clump(2)
            perldl> help vars
            PDL variables in package main::

            Name         Type   Dimension       Flow  State          Mem
            ----------------------------------------------------------------
            $seqs        Double D [8000,50]            -C           0.00Kb
            $stack       Double D [100,80,50]          P            3.05Mb

           Unrealistic as it may seem, our confocal microscope software writes data (some-
           times) this way. But more often you use clump to achieve a certain effect when
           using implicit or explicit threading.

       Calls to indexing functions can be chained

       As you might have noticed in some of the examples above calls to the indexing func-
       tions can be nicely chained since all of these functions return a newly created
       child object. However, when doing extensive index manipulations in a chain be sure
       to keep track of what you are doing, e.g.

        $a->xchg(0,1)->mv(0,4)

       moves the dimension 1 of $a to position 4 since when the second command is executed
       the original dimension 1 has been moved to position 0 of the new child that calls
       the "mv" function. I think you get the idea (in spite of my convoluted explana-
       tions).

       Propagated assignments ('.=') and dummy dimensions

       A sublety related to indexing is the assignment to pdls containing dummy dimensions
       of size greater than 1. These assignments (using ".=") are forbidden since several
       elements of the lvalue pdl point to the same element of the parent. As a conse-
       quence the value of those parent elements are potentially ambiguous and would
       depend on the sequence in which the implementation makes the assignments to ele-
       ments. Therefore, an assignment like this:

        $a = pdl [1,2,3];
        $b = $a->dummy(1,4);
        $b .= yvals(zeroes(3,4));

       can produce unexpected results and the results are explicitly undefined by PDL
       because when PDL gets parallel computing features, the current result may well
       change.

       From the point of view of dataflow the introduction of greater-size-than-one dummy
       dimensions is regarded as an irreversible transformation (similar to the terminol-
       ogy in thermodynamics) which precludes backward propagation of assignment to a par-
       ent (which you had explicitly requested using the ".=" assignment). A similar prob-
       lem to watch out for occurs in the context of threading where sometimes dummy
       dimensions are created implicitly during the thread loop (see below).

       Reasons for the parent/child (or "pointer") concept

       [ this will have to wait a bit ]

        XXXXX being memory efficient
        XXXXX in the context of threading
        XXXXX very flexible and powerful way of accessing portions of pdl data
              (in much more general way than sec, etc allow)
        XXXXX efficient implementation
        XXXXX difference to section/at, etc.

       How to make things physical again

       [ XXXXX fill in later when everything has settled a bit more ]

        ** When needed (xsub routine interfacing C lib function)
        ** How achieved (->physical)
        ** How to test (isphysical (explain how it works currently))
        ** ->copy and ->sever

Threading
       In the previous paragraph on indexing we have already mentioned the term occasion-
       ally but now its really time to talk explicitly about "threading" with pdls. The
       term threading has many different meanings in different fields of computing. Within
       the framework of PDL it could probably be loosely defined as an implicit looping
       facility. It is implicit because you don't specify anything like enclosing for-
       loops but rather the loops are automatically (or 'magically') generated by PDL
       based on the dimensions of the pdls involved. This should give you a first idea why
       the index/dimension manipulating functions you have met in the previous paragraphs
       are especially important and useful in the context of threading.  The other ingre-
       dient for threading (apart from the pdls involved) is a function that is threading
       aware (generally, these are PDL::PP compiled functions) and that the pdls are
       "threaded" over.  So much about the terminology and now let's try to shed some
       light on what it all means.

       Implicit threading - a first example

       There are two slightly different variants of threading. We start with what we call
       "implicit threading". Let's pick a practical example that involves looping of a
       function over many elements of a pdl. Suppose we have an RGB image that we want to
       convert to greyscale. The RGB image is represented by a 3-dim pdl "im(3,x,y)" where
       the first dimension contains the three color components of each pixel and "x" and
       "y" are width and height of the image, respectively. Next we need to specify how to
       convert a color-triple at a given pixel into a greyvalue (to be a realistic example
       it should represent the relative intensity with which our color insensitive eye
       cells would detect that color to achieve what we would call a natural conversion
       from color to greyscale). An approximation that works quite well is to compute the
       grey intensity from each RGB triplet (r,g,b) as a weighted sum

        greyvalue = 77/256*r + 150/256*g + 29/256*b =
            inner([77,150,29]/256, [r,g,b])

       where the last form indicates that we can write this as an inner product of the
       3-vector comprising the weights for red, green and blue components with the 3-vec-
       tor containing the color components. Traditionally, we might have written a func-
       tion like the following to process the whole image:

        my @dims=$im->dims;
        # here normally check that first dim has correct size (3), etc
        $grey=zeroes(@dims[1,2]);   # make the pdl for the resulting grey image
        $w = pdl [77,150,29] / 256; # the vector of weights
        for ($j=0;$j<dims[2];$j++) {
           for ($i=0;$i<dims[1];$i++) {
               # compute the pixel value
               $tmp = inner($w,$im->slice(':,(i),(j)'));
               set($grey,$i,$j,$tmp); # and set it in the greyscale image
           }
        }

       Now we write the same using threading (noting that "inner" is a threading aware
       function defined in the PDL::Primitive package)

        $grey = inner($im,pdl([77,150,29]/256));

       We have ended up with a one-liner that automatically creates the pdl $grey with the
       right number and size of dimensions and performs the loops automatically (these
       loops are implemented as fast C code in the internals of PDL).  Well, we still owe
       you an explanation how this 'magic' is achieved.

       How does the example work ?

       The first thing to note is that every function that is threading aware (these are
       without exception functions compiled from concise descriptions by PDL::PP, later
       just called PP-functions) expects a defined (minimum) number of dimensions (we call
       them core dimensions) from each of its pdl arguments. The inner function expects
       two one-dimensional (input) parameters from which it calculates a zero-dimensional
       (output) parameter. We write that symbolically as "inner((n),(n),[o]())" and call
       it "inner"'s signature, where n represents the size of that dimension. n being
       equal in the first and second parameter means that those dimensions have to be of
       equal size in any call. As a different example take the outer product which takes
       two 1D vectors to generate a 2D matrix, symbolically written as
       "outer((n),(m),[o](n,m))". The "[o]" in both examples indicates that this (here
       third) argument is an output argument. In the latter example the dimensions of
       first and second argument don't have to agree but you see how they determine the
       size of the two dimensions of the output pdl.

       Here is the point when threading finally enters the game. If you call PP-functions
       with pdls that have more than the required core dimensions the first dimensions of
       the pdl arguments are used as the core dimensions and the additional extra dimen-
       sions are threaded over. Let us demonstrate this first with our example above

        $grey = inner($im,$w); # w is the weight vector from above

       In this case $w is 1D and so supplied just the core dimension, $im is 3D, more
       specifically "(3,x,y)". The first dimension (of size 3) is the required core dimen-
       sion that matches (as required by inner) the first (and only) dimension of $w. The
       second dimension is the first thread dimension (of size "x") and the third is here
       the second thread dimension (of size "y"). The output pdl is automatically created
       (as requested by setting $grey to "null" prior to invocation). The output dimen-
       sions are obtained by appending the loop dimensions (here "(x,y)") to the core out-
       put dimensions (here 0D) to yield the final dimensions of the autocreated pdl (here
       "0D+2D=2D" to yield a 2D output of size "(x,y)").

       So the above command calls the core functioniality that computes the inner product
       of two 1D vectors "x*y" times with $w and all 1D slices of the form "(':,(i),(j)')"
       of $im and sets the respective elements of the output pdl "$grey(i,j)" to the
       result of each computation. We could write that symbolically as

        $grey(0,0) = f($w,$im(:,(0),(0)))
        $grey(1,0) = f($w,$im(:,(1),(0)))
            .
            .
            .
        $grey(x-2,y-1) = f($w,$im(:,(x-2),(y-1)))
        $grey(x-1,y-1) = f($w,$im(:,(x-1),(y-1)))

       But this is done automatically by PDL without writing any explicit perl loops.  We
       see that the command really creates an output pdl with the right dimensions and
       sets the elements indeed to the result of the computation for each pixel of the
       input image.

       When even more pdls and extra dimensions are involved things get a bit more compli-
       cated. We will first give the general rules how the thread dimensions depend on the
       dimensions of input pdls enabling you to figure out the dimensionality of an
       autocreated output pdl (for any given set of input pdls and core dimensions of the
       PP-function in question). The general rules will most likely appear a bit confusing
       on first sight so that we'll set out to illustrate the usage with a set of further
       examples (which will hopefully also demonstrate that there are indeed many practi-
       cal situations where threading comes in extremly handy).

       A call for coding discipline

       Before we point out the other technical details of threading, please note this call
       for programming discipline when using threading:

       In order to preserve human readability, PLEASE comment any nontrivial expression in
       your code involving threading.  Most importantly, for any subroutine, include
       information at the beginning about what you expect the dimensions to represent (or
       ranges of dimensions).

       As a warning, look at this undocumented function and try to guess what might be
       going on:

        sub lookup {
          my ($im,$palette) = @_;
          my $res;
          index($palette->xchg(0,1),
                     $im->long->dummy(0,($palette->dim)[0]),
                     ($res=null));
          return $res;
        }

       Would you agree that it might be difficult to figure out expected dimensions, pur-
       pose of the routine, etc ?  (If you want to find out what this piece of code does,
       see below)

       How to figure out the loop dimensions

       There are a couple of rules that allow you to figure out number and size of loop
       dimensions (and if the size of your input pdls comply with the threading rules).
       Dimensions of any pdl argument are broken down into two groups in the following:
       Core dimensions (as defined by the PP-function, see Appendix B for a list of PDL
       primitives) and extra dimensions which comprises all remaining dimensions of that
       pdl. For example calling a function "func" with the signature "func((n,m),[o](n))"
       with a pdl "a(2,4,7,1,3)" as "f($a,($o = null))" results in the semantic splitting
       of a's dimensions into: core dimensions "(2,4)" and extra dimensions "(7,1,3)".

       R0    Core dimensions are identified with the first N dimensions of the respective
             pdl argument (and are required). Any further dimensions are extra dimensions
             and used to determine the loop dimensions.

       R1    The number of (implicit) loop dimensions is equal to the maximal number of
             extra dimensions taken over the set of pdl arguments.

       R2    The size of each of the loop dimensions is derived from the size of the
             respective dimensions of the pdl arguments. The size of a loop dimension is
             given by the maximal size found in any of the pdls having this extra dimen-
             sion.

       R3    For all pdls that have a given extra dimension the size must be equal to the
             size of the loop dimension (as determined by the previous rule) or 1;
             otherwise you raise a runtime exception. If the size of the extra dimension
             in a pdl is one it is implicitly treated as a dummy dimension of size equal
             to that loop dim size when performing the thread loop.

       R4    If a pdl doesn't have a loop dimension, in the thread loop this pdl is
             treated as if having a dummy dimension of size equal to the size of that loop
             dimension.

       R5    If output autocreation is used (by setting the relevant pdl to "PDL->null"
             before invocation) the number of dimensions of the created pdl is equal to
             the sum of the number of core output dimensions + number of loop dimensions.
             The size of the core output dimensions is derived from the relevant dimension
             of input pdls (as specified in the function definition) and the sizes of the
             other dimensions are equal to the size of the loop dimension it is derived
             from. The automatically created pdl will be physical (unless dataflow is in
             operation).

       In this context, note that you can run into the problem with assignment to pdls
       containing greater-than-one dummy dimensions (see above).  Although your output
       pdl(s) didn't contain any dummy dimensions in the first place they may end up with
       implicitly created dummy dimensions according to R4.

       As an example, suppose we have a (here unspecified) PP-function with the signature:

        func((m,n),(m,n,o),(m),[o](m,o))

       and you call it with 3 pdls "a(5,3,10,11)", "b(5,3,2,10,1,12)", and "c(5,1,11,12)"
       as

        func($a,$b,$c,($d=null))

       then the number of loop dimensions is 3 (by "R0+R1" from $b and $c) with sizes
       "(10,11,12)" (by R2); the two output core dimensions are "(5,2)" (from the signa-
       ture of func) resulting in a 5-dimensional output pdl $c of size "(5,2,10,11,12)"
       (see R5) and (the automatically created) $d is derived from "($a,$b,$c)" in a way
       that can be expressed in pdl pseudo-code as

        $d(:,:,i,j,k) .= func($a(:,:,i,j),$b(:,:,:,i,0,k),$c(:,0,j,k))
           with 0<=i<10, 0<=j<=11, 0<=k<12

       If we analyze the color to greyscale conversion again with these rules in mind we
       note another great advantage of implicit threading.  We can call the conversion
       with a pdl representing a pixel (im(3)), a line of rgb pixels ("im(3,x)"), a proper
       color image ("im(3,x,y)") or a whole stack of RGB images ("im(3,x,y,z)"). As long
       as $im is of the form "(3,...)" the automatically created output pdl will contain
       the right number of dimensions and contain the intensity data as we exspect it
       since the loops have been implicitly performed thanks to implicit threading. You
       can easily convince yourself that calling with a color pixel $grey is 0D, with a
       line it turns out 1D grey(x), with an image we get "grey(x,y)" and finally we get a
       converted image stack "grey(x,y,z)".

       Let's fill these general rules with some more life by going through a couple of
       further examples. The reader may try to figure out equivalent formulations with
       explicit for-looping and compare the flexibility of those routines using implicit
       threading to the explicit formulation. Furthermore, especially when using several
       thread dimensions it is a useful exercise to check the relative speed by doing some
       benchmark tests (which we still have to do).

       First in the row is a slightly reworked centroid example, now coded with threading
       in mind.

        # threaded mult to calculate centroid coords, works for stacks as well
        $xc = sumover(($im*xvals(($im->dims)[0]))->clump(2)) /
              sumover($im->clump(2));

       Let's analyse what's going on step by step. First the product:

        $prod = $im*xvals(zeroes(($im->dims)[0]))

       This will actually work for $im being one, two, three, and higher dimensional. If
       $im is one-dimensional it's just an ordinary product (in the sense that every ele-
       ment of $im is multiplied with the respective element of "xvals(...)"), if $im has
       more dimensions further threading is done by adding appropriate dummy dimensions to
       "xvals(...)"  according to R4.  More importantly, the two sumover operations show a
       first example of how to make use of the dimension manipulating commands. A quick
       look at sumover's signature will remind you that it will only "gobble up" the first
       dimension of a given input pdl. But what if we want to really compute the sum over
       all elements of the first two dimensions? Well, nothing keeps us from passing a
       virtual pdl into sumover which in this case is formed by clumping the first two
       dimensions of the "parent pdl" into one. From the point of view of the parent pdl
       the sum is now computed over the first two dimensions, just as we wanted, though
       sumover has just done the job as specified by its signature. Got it ?

       Another little finesse of writing the code like that: we intentionally used
       "sumover($pdl->clump(2))" instead of "sum($pdl)" so that we can either pass just an
       image "(x,y)" or a stack of images "(x,y,t)" into this routine and get either just
       one x-coordiante or a vector of x-coordinates (of size t) in return.

       Another set of common operations are what one could call "projection operations".
       These operations take a N-D pdl as input and return a (N-1)-D "projected" pdl.
       These operations are often performed with functions like sumover, prodover, minimum
       and maximum.  Using again images as examples we might want to calculate the maximum
       pixel value for each line of an image or image stack. We know how to do that

        # maxima of lines (as function of line number and time)
        maximum($stack,($ret=null));

       But what if you want to calculate maxima per column when implicit threading always
       applies the core functionality to the first dimension and threads over all others?
       How can we achieve that instead the core functionality is applied to the second
       dimension and threading is done over the others. Can you guess it? Yes, we make a
       virtual pdl that has the second dimension of the "parent pdl" as its first dimen-
       sion using the "mv" command.

        # maxima of columns (as function of column number and time)
        maximum($stack->mv(0,1),($ret=null));

       and calculating all the sums of sub-slices over the third dimension is now almost
       too easy

        # sums of pixles in time (assuming time is the third dim)
        sumover($stack->mv(0,2),($ret=null));

       Finally, if you want to apply the operation to all elements (like max over all ele-
       ments or sum over all elements) regardless of the dimensions of the pdl in question
       "clump" comes in handy. As an example look at the definition of "sum" (as defined
       in "Basic.pm"):

        sub sum {
          PDL::Primitive::sumover($name->clump(-1),($tmp=null));
          return $tmp->at(); # return a perl number, not a 0D pdl
        }

       We have already mentioned that all basic operations support threading and assign-
       ment is no exception. So here are a couple of threaded assignments

        perldl> $im = zeroes(byte, 10,20)
        perldl> $line = exp(-rvals(10)**2/9)
        # threaded assignment
        perldl> $im .= $line      # set every line of $im to $line
        perldl> $im2 .= 5         # set every element of $im2 to 5

       By now you probably see how it works and what it does, don't you?

       To finish the examples in this paragraph here is a function to create an RGB image
       from what is called a palette image. The palette image consists of two parts: an
       image of indices into a color lookup table and the color lookup table itself. [
       describe how it works ] We are going to use a PP-function we haven't encoutered yet
       in the previous examples. It is the aptly named index function, signature
       "((n),(),[o]())" (see Appendix B) with the core functionality that "index(pdl
       (0,2,4,5),2,($ret=null))" will return the element with index 2 of the first input
       pdl. In this case, $ret will contain the value 4.  So here is the example:

        # a threaded index lookup to generate an RGB, or RGBA or YMCK image
        # from a palette image (represented by a lookup table $palette and
        # an color-index image $im)
        # you can say just dummy(0) since the rules of threading make it fit
        perldl> index($palette->xchg(0,1),
                      $im->long->dummy(0,($palette->dim)[0]),
                      ($res=null));

       Let's go through it and explain the steps involved. Assuming we are dealing with an
       RGB lookup-table $palette is of size "(3,x)". First we exchange the dimensions of
       the palette so that looping is done over the first dimension of $palette (of size 3
       that represent r, g, and b components). Now looking at $im, we add a dummy dimen-
       sion of size equal to the length of the number of components (in the case we are
       discussing here we could have just used the number 3 since we have 3 color compo-
       nents). We can use a dummy dimension since for red, green and blue color components
       we use the same index from the original image, e.g.  assuming a certain pixel of
       $im had the value 4 then the lookup should produce the triple

        [palette(0,4),palette(1,4),palette(2,4)]

       for the new red, green and blue components of the output image. Hopefully by now
       you have some sort of idea what the above piece of code is supposed to do (it is
       often actually quite complicated to describe in detail how a piece of threading
       code works; just go ahead and experiment a bit to get a better feeling for it).

       If you have read the threading rules carefully, then you might have noticed that we
       didn't have to explicitely state the size of the dummy dimension that we created
       for $im; when we create it with size 1 (the default) the rules of threading make it
       automatically fit to the desired size (by rule R3, in our example the size would be
       3 assuming a palette of size "(3,x)"). Since situations like this do occur often in
       practice this is actually why rule R3 has been introduced (the part that makes
       dimensions of size 1 fit to the thread loop dim size). So we can just say

        perldl> index($palette->xchg(0,1),$im->long->dummy(0),($res=null));

       Again, you can convince yourself that this routine will create the right output if
       called with a pixel ($im is 0D), a line ($im is 1D), an image ($im is 2D), ..., an
       RGB lookup table (palette is "(3,x)") and RGBA lookup table (palette is "(4,x)",
       see e.g. OpenGL). This flexibility is achieved by the rules of threading which are
       made to do the right thing in most situations.

       To wrap it all up once again, the general idea is as follows. If you want to
       achieve looping over certain dimensions and have the core functionality applied to
       another specified set of dimensions you use the dimension manipulating commands to
       create a (or several) virtual pdl(s) so that from the point of view of the parent
       pdl(s) you get what you want (always having the signature of the function in ques-
       tion and R1-R5 in mind!). Easy, isn't it ?

       Output autocreation and PP-function calling conventions

       At this point we have to divert to some technical detail that has to do with the
       general calling conventions of PP-functions and the automatic creation of output
       arguments.  Basically, there are two ways of invoking pdl routines, namely

        $result = func($a,$b);

       and

        func($a,$b,$result);

       If you are only using implicit threading then the output variable can be automati-
       cally created by PDL. You flag that to the PP-function by setting the output argu-
       ment to a special kind of pdl that is returned from a call to the function
       "PDL->null" that returns an essentially "empty" pdl (for those interested in
       details there is a flag in the C pdl structure for this). The dimensions of the
       created pdl are determined by the rules of implicit threading: the first dimensions
       are the core output dimensions to which the threading dimensions are appended
       (which are in turn determined by the dimensions of the input pdls as described
       above).  So you can say

        func($a,$b,($result=PDL->null));

       or

        $result = func($a,$b)

       which are exactly equivalent.

       Be warned that you can not use output autocreation when using explicit threading
       (for reasons explained in the following section on explicit threading, the second
       variant of threading).

       In "tight" loops you probably want to avoid the implicit creation of a temporary
       pdl in each step of the loop that comes along with the "functional" style but
       rather say

        # create output pdl of appropriate size only at first invocation
        $result = null;
        for (0...$n) {
             func($a,$b,$result); # in all but the first invocation $result
             func2($b);           # is defined and has the right size to
                                  # take the output provided $b's dims don't change
             twiddle($result,$a); # do something from $result to $a for iteration
        }

       The take-home message of this section once more: be aware of the limitation on out-
       put creation when using explicit threading.

       Explicit threading

       Having so far only talked about the first flavour of threading it is now about time
       to introduce the second variant. Instead of shuffling around dimensions all the
       time and relying on the rules of implicit threading to get it all right you some-
       times might want to specify in a more explicit way how to perform the thread loop.
       It is probably not too surprising that this variant of the game is called explicit
       threading.  Now, before we create the wrong impression: it is not either implicit
       or explicit; the two flavours do mix. But more about that later.

       The two most used functions with explicit threading are thread and unthread.  We
       start with an example that illustrates typical usage of the former:

        [ # ** this is the worst possible example to start with ]
        #  but can be used to show that $mat += $line is different from
        #                               $mat->thread(0) += $line
        # explicit threading to add a vector to each column of a matrix
        perldl> $mat  = zeroes(4,3)
        perldl> $line = pdl (3.1416,2,-2)
        perldl> ($tmp = $mat->thread(0)) += $line

       In this example, "$mat->thread(0)" tells PDL that you want the second dimension of
       this pdl to be threaded over first leading to a thread loop that can be expressed
       as

        for (j=0; j<3; j++) {
           for (i=0; i<4; i++) {
               mat(i,j) += src(j);
           }
        }

       "thread" takes a list of numbers as arguments which explicitly specify which dimen-
       sions to thread over first. With the introduction of explicit threading the dimen-
       sions of a pdl are conceptually split into three different groups the latter two of
       which we have already encountered: thread dimensions, core dimensions and extra
       dimensions.

       Conceptually, it is best to think of those dimensions of a pdl that have been spec-
       ified in a call to "thread" as being taken away from the set of normal dimensions
       and put on a separate stack. So assuming we have a pdl "a(4,7,2,8)" saying

        $b = $a->thread(2,1)

       creates a new virtual pdl of dimension "b(4,8)" (which we call the remaining dims)
       that also has 2 thread dimensions of size "(2,7)". For the purposes of this docu-
       ment we write that symbolically as "b(4,8){2,7}". An important difference to the
       previous examples where only implicit threading was used is the fact that the core
       dimensions are matched against the remaining dimensions which are not necessarily
       the first dimensions of the pdl. We will now specify how the presence of thread
       dimensions changes the rules R1-R5 for threadloops (which apply to the special case
       where none of the pdl arguments has any thread dimensions).

       T0  Core dimensions are matched against the first n remaining dimensions of the pdl
           argument (note the difference to R1). Any further remaining dimensions are
           extra dimensions and are used to determine the implicit loop dimensions.

       T1a The number of implicit loop dimensions is equal to the maximal number of extra
           dimensions taken over the set of pdl arguments.

       T1b The number of explicit loop dimensions is equal to the maximal number of thread
           dimensions taken over the set of pdl arguments.

       T1c The total number of loop dimensions is equal to the sum of explicit loop dimen-
           sions and implicit loop dimensions. In the thread loop, explicit loop dimen-
           sions are threaded over first followed by implicit loop dimensions.

       T2  The size of each of the loop dimensions is derived from the size of the respec-
           tive dimensions of the pdl arguments. It is given by the maximal size found in
           any pdls having this thread dimension (for explicit loop dimensions) or extra
           dimension (for implicit loop dimensions).

       T3  This rule applies to any explicit loop dimension as well as any implicit loop
           dimension. For all pdls that have a given thread/extra dimension the size must
           be equal to the size of the respective explicit/implicit loop dimension or 1;
           otherwise you raise a runtime exception. If the size of a thread/extra dimen-
           sion of a pdl is one it is implicitly treated as a dummy dimension of size
           equal to the explicit/implicit loop dimension.

       T4  If a pdl doesn't have a thread/extra dimension that corresponds to an
           explicit/implicit loop dimension, in the thread loop this pdl is treated as if
           having a dummy dimension of size equal to the size of that loop dimension.

       T4a All pdls that do have thread dimensions must have the same number of thread
           dimensions.

       T5  Output autocreation cannot be used if any of the pdl arguments has any thread
           dimensions. Otherwise R5 applies.

       The same restrictions apply with regard to implicit dummy dimensions (created by
       application of T4) as already mentioned in the section on implicit threading: if
       any of the output pdls has an (explicit or implicitly created) greater-than-one
       dummy dimension a runtime exception will be raised.

       Let us demonstrate these rules at work in a generic case.  Suppose we have a (here
       unspecified) PP-function with the signature:

        func((m,n),(m),(),[o](m))

       and you call it with 3 pdls "a(5,3,10,11)", "b(3,5,10,1,12)", "c(10)" and an output
       pdl "d(3,11,5,10,12)" (which can here not be automatically created) as

        func($a->thread(1,3),$b->thread(0,3),$c,$d->thread(0,1))

       From the signature of func and the above call the pdls split into the following
       groups of core, extra and thread dimensions (written in the form "pdl(core
       dims){thread dims}[extra dims]"):

        a(5,10){3,11}[] b(5){3,1}[10,12] c(){}[10] d(5){3,11}[10,12]

       With this to help us along (it is in general helpful to write the arguments down
       like this when you start playing with threading and want to keep track of what is
       going on) we further deduce that the number of explicit loop dimensions is 2 (by
       T1b from $a and $b) with sizes "(3,11)" (by T2); 2 implicit loop dimensions (by T1a
       from $b and $d) of size "(10,12)" (by T2) and the elements of are computed from the
       input pdls in a way that can be expressed in pdl pseudo-code as

        for (l=0;l<12;l++)
         for (k=0;k<10;k++)
          for (j=0;j<11;j++)         effect of treating it as dummy dim (index j)
           for (i=0;i<3;i++)                         |
              d(i,j,:,k,l) = func(a(:,i,:,j),b(i,:,k,0,l),c(k))

       Uhhmpf, this example was really not easy in terms of bookeeping. It serves mostly
       as an example how to figure out what's going on when you encounter a complicated
       looking expression. But now it is really time to show that threading is useful by
       giving some more of our so called "practical" examples.

       [ The following examples will need some additional explanations in the future. For
       the moment please try to live with the comments in the code fragments. ]

       Example 1:

        *** inverse of matrix represented by eigvecs and eigvals
        ** given a symmetrical matrix M = A^T x diag(lambda_i) x A
        **    =>  inverse M^-1 = A^T x diag(1/lambda_i) x A
        ** first $tmp = diag(1/lambda_i)*A
        ** then  A^T * $tmp by threaded inner product
        # index handling so that matrices print correct under pdl
        $inv .= $evecs*0;  # just copy to get appropriately sized output
        $tmp .= $evecs;    # initialise, no backpropagation
        ($tmp2 = $tmp->thread(0)) /= $evals;    #  threaded division
        # and now a matrix multiplication in disguise
        PDL::Primitive::inner($evecs->xchg(0,1)->thread(-1,1),
                              $tmp->thread(0,-1),
                              $inv->thread(0,1));
        # alternative for matrix mult using implicit threading,
        # first xchg only for transpose
        PDL::Primitive::inner($evecs->xchg(0,1)->dummy(1),
                              $tmp->xchg(0,1)->dummy(2),
                              ($inv=null));

       Example 2:

        # outer product by threaded multiplication
        # stress that we need to do it with explicit call to my_biop1
        # when using explicit threading
        $res=zeroes(($a->dims)[0],($b->dims)[0]);
        my_biop1($a->thread(0,-1),$b->thread(-1,0),$res->(0,1),"*");
        # similiar thing by implicit threading with autocreated pdl
        $res = $a->dummy(1) * $b->dummy(0);

       Example 3:

        # different use of thread and unthread to shuffle a number of
        # dimensions in one go without lots of calls to ->xchg and ->mv

        # use thread/unthread to shuffle dimensions around
        # just try it out and compare the child pdl with its parent
        $trans = $a->thread(4,1,0,3,2)->unthread;

       Example 4:

        # calculate a couple of bounding boxes
        # $bb will hold BB as [xmin,xmax],[ymin,ymax],[zmin,zmax]
        # we use again thread and unthread to shuffle dimensions around
        perldl> $bb = zeroes(double, 2,3 );
        perldl> minimum($vertices->thread(0)->clump->unthread(1),
                        $bb->slice('(0),:'));
        perldl> maximum($vertices->thread(0)->clump->unthread(1),
                        $bb->slice('(1),:'));

       Example 5:

        # calculate a self-ratioed (i.e. self normalized) sequence of images
        # uses explicit threading and an implicitly threaded division
        $stack = read_image_stack();
        # calculate the average (per pixel average) of the first $n+1 images
        $aver = zeroes([stack->dims]->[0,1]);  # make the output pdl
        sumover($stack->slice(":,:,0:$n")->thread(0,1),$aver);
        $aver /= ($n+1);
        $stack /= $aver;  # normalize the stack by doing a threaded divison
        # implicit versus explicit
        # alternatively calculate $aver with implicit threading and autocreation
        sumover($stack->slice(":,:,0:$n")->mv(2,0),($aver=null));
        $aver /= ($n+1);
        #

       Implicit versus explicit threading

       In this paragraph we are going to illustrate when explicit threading is preferrable
       over implicit threading and vice versa. But then again, this is probably not the
       best way of putting the case since you already know: the two flavours do mix. So,
       it's more about how to get the best of both worlds and, anyway, in the best of perl
       traditions: TIMTOWTDI !

       [ Sorry, this still has to be filled in in a later release; either refer to above
       examples or choose some new ones ]

       Finally, this may be a good place to justify all the technical detail we have been
       going on about for a couple of pages: why threading ?

       Well, code that uses threading should be (considerably) faster than code that uses
       explicit for-loops (or similar perl constructs) to achieve the same functionality.
       Especially on supercomputers (with vector computing facilities/parallel processing)
       PDL threading will be implemented in a way that takes advantage of the additional
       facilities of these machines. Furthermore, it is a conceptually simply construct
       (though technical details might get involved at times) and can greatly reduce the
       syntactical complexity of PDL code (but keep the admonition for documentation in
       mind). Once you are comfortable with the threading way of thinking (and coding) it
       shouldn't be too difficult to understand code that somebody else has written than
       (provided he gave you an idea what exspected input dimensions are, etc.). As a gen-
       eral tip to increase the performance of your code: if you have to introduce a loop
       into your code try to reformulate the problem so that you can use threading to per-
       form the loop (as with anything there are exceptions to this rule of thumb; but the
       authors of this document tend to think that these are rare cases ;).

PDL::PP
       An easy way to define functions that are aware of indexing and threading (and the
       universe and everything)

       PDL:PP is part of the PDL distribution. It is used to generate functions that are
       aware of indexing and threading rules from very concise descriptions. It can be
       useful for you if you want to write your own functions or if you want to interface
       functions from an external library so  that they support indexing and threading
       (and mabe dataflow as well, see PDL::Dataflow). For further details check PDL::PP.

Appendix A
       Affine transformations - a special class of simple and powerful transformations

       [ This is also something to be added in future releases. Do we already have the
       general make_affine routine in PDL ? It is possible that we will reference another
       appropriate manpage from here ]

Appendix B
       signatures of standard PDL::PP compiled functions

       A selection of signatures of PDL primitives to show how many dimensions PP compiled
       functions gobble up (and therefore you can figure out what will be threaded over).
       Most of those functions are the basic ones defined in "primitive.pd"

        # functions in primitive.pd
        #
        sumover        ((n),[o]())
        prodover       ((n),[o]())
        axisvalues     ((n))                                   inplace
        inner          ((n),(n),[o]())
        outer          ((n),(m),[o](n,m))
        innerwt        ((n),(n),(n),[o]())
        inner2         ((m),(m,n),(n),[o]())
        inner2t        ((j,n),(n,m),(m,k),[o]())
        index          (1D,0D,[o])
        minimum        (1D,[o])
        maximum        (1D,[o])
        wstat          ((n),(n),(),[o],())
        assgn          ((),())

        # basic operations
        binary operations ((),(),[o]())
        unary operations  ((),[o]())

AUTHOR & COPYRIGHT
       Copyright (C) 1997 Christian Soeller (c.soeller AT auckland.nz) & Tuomas J. Lukka
       (lukka AT fas.edu). All rights reserved. Although destined for release as a
       man page with the standard PDL distribution, it is not public domain. Permission is
       granted to freely distribute verbatim copies of this document provided that no mod-
       ifications outside of formatting be made, and that this notice remain intact.  You
       are permitted and encouraged to use its code and derivatives thereof in your own
       source code for fun or for profit as you see fit.



perl v5.8.8                       2003-05-21                       INDEXING(1)

Generated by $Id: phpMan.php,v 4.54 2007/08/21 09:05:22 chedong Exp $ Author: Che Dong
On Apache/2.2.3 (CentOS)
Under GNU General Public License
2008-10-12 23:50 @127.0.0.1 CrawledBy CCBot/1.0 (+http://www.commoncrawl.org/bot.html)
Valid XHTML 1.0!Valid CSS!