Richard Durbin's Thesis - Appendix


A Appendix
    A.1  The statistical test for synapse number correlation with adjacency
    A.2  The sorting algorithm used to order the neural circuitry
    A.3  The method used to determine processing depth
    A.4  The clustering algorithm used to detect bundles

A  Appendix

A.1  The statistical test for synapse number correlation with adjacency

All pairs of neurons A,B in the H series were considered for which there was a synaptic connection both from A to B and from A’ to B’ (A’,B’ are the contralateral homologues of A,B), but where the adjacency between A and B was different from that between A’ and B’. Let S1 be the number of synapses from A to B, S2 can be the number from A’ to B, a1 be the adjacency of A and B, and a2 be the adjacency of A’ and B’. Since each set of four is only counted once we can assume that a1 > a2. The ai are treated as independent variables (i.e. they do not depend on the si), and the si are treated as the outcomes of randome variables Si, which are possibly dependant on the ai. There are two hypotheses that will be tested: a proportional relationship between Si and ai, and independence. More precisely, the proportional model presumes that synapses are made with a certain probability per unit of length of contact. In this case Si will be Poisson distributed with mean (and variance) proportional to ai. However the constant of proportionality may differ for different sets of A,B,A’,B’. The independent model proposes that the Si have mean S, independent of ai, but again possibly different for different sets of neurons.

The test statistic that was used is the sum over all chosen sets of T = (a1s2 — a2s1).

If Si is proportional to ai, then T should have mean value zero. Its variance can be estimated as the sum of the variances of the contributing terms, which are (a12a2m + a22a1m) where m is the Poisson rate, best estimated by (s1+s2) / (a1+a2). This simplifies to being the sum over all the sets of a1a2(s1+s2).

If Si is independent of ai, then T should have mean M, where M is the sum over all the sets of S.(a1-a2), where S is the mean number of synapses for each set. The best estimator for S is (s1+s2)/2. In order to estimate the variance of the differences from the mean, (M-T), we must propose a variance for Si. (It cannot be estimated because then we would lose all our degrees of freedom). It seems reasonable to assume in this case also that the Si have a Poisson distribution, or in any case that their variance is approximately the same as their mean, S. Then the estimated variance of (M-T) is the sum over all sets of S.(a1+a2)2/2.

To test each hypothesis the difference between T and its expected value under the hypotheses is divided by the standard error (the square root of the estimated variance) to give a normalised error, U. Since we are adding together hundreds of similar terms T should be distributed normally, and so theoretically U has a t-distribution, since we have estimated the variance of T. However, because there are hundreds of degrees of freedom (one for each set), U can be tested as if coming from a standard normal distribution.

In total there were 391 sets. The value of T was 7103. If we assume the proportional hypothesis then the standard error is 1324.3 and U is 5.36 which is very significant. We can therefore reject the proportional model. If we assume the independent model then M is 7655 and the standard error is 1338.0 so U is 0.41, which is not significant. So it is quite possible according to this test that the number of synapses formed is independent of adjacency.

A.2  The sorting algorithm used to order the neural circuitry

The basic method of this algorithm is to start with a random ordered list and repeatedly use a simple rearrangement principle to reduce the overall number of upward synapses. The process stops when this number cannot be improved by a rearrangement of the type under consideration. In general this will not give a true optimum order, because the rearrangement principle is not general enough. However, by repeated application of the algorithm to different starting lists one can get an indication of the distribution of final results. If, as they were in the case under consideration here, the results of these repeated optimisations are very similar, then it is likely that they are near the true minimum. The algorithm was run many times until the lowest value so far had come up repeatedly, at which point it was accepted as the optimum.

The actual rearrangement system chosen in this case is to run through the current list and, for each neuron, determine where in the list it should be placed. If this is different from the current position then it is moved there and the neurons in between are shunted one place back in the list to fill the gap.

A.3  The method used to determine processing depth

This method deals with some notional material (sensory influence) which flows down through the network of connections, moving through a synapse at each time step. Each sensory neuron under consideration is given a unit amount of material at time zero. Then at successive time steps the material is redistributed, all the material in each neuron being divided amongst the neurons that it both synapses to and is above in the ordering. The amount that each postsynaptic cell receives is proportional to the number of synapses made. If there are no postsynaptic partners then the material is lost. Clearly material can reunite that has come via different routes but using the same number of synapses from sensory neurons. The requirement that only downward synapses are permitted prevents problems with cycling.

This method makes the assumptions that the influence of a connection is proportional to the number of synapses it contains, and that influence is neither lost nor amplified, merely passing through neurons and being redistributed at each time step. Both these assumptions are neurobiologically unrealistic, but they are probably the best that can be done with the information available. By keeping track of the distribution of material at each time step one can build up a picture of the distribution of time steps required for influence to reach a specific neuron (muscle can be treated as the final postsynaptic neuron), and also of the proportion of influence from the chosen set of sensory neurons that passes through any particular interneuron, or for instance that reaches head muscle as opposed to body muscle.

A.4  The clustering algorithm used to detect bundles

This is a hierarchical clustering algorithm (see e.g. Seber, 1984). The principle is to identify the two items that are most likely to belong to the same group and to link them together. Then a new distance, or, in our case, adjacency, is defined between this pair and each of the remaining items. One then returns to the first step and looks for the most adjacent pair in the reduced set of items, which will include a combined pseudo-item. This process of joining the two closest items continues recursively until only one item is left. At each stage a measure of the association of the two items joined together is given by their adjacency, which in general is a combined adjacency.

Different versions of this process vary in the way that the combined adjacency of the merged item to the remaining items is determined. I used a variant of the group average method (Seber, 1984) that was tailored to this particular problem. I kept data on the circumferential zones of the nerve ring in which each process ran (e.g. lower left). This was necessary because it is only possible for two processes to be adjacent to the extent that they run in the same zone. The adjacency between two groups is then defined as the ratio of the total adjacency between their constituent processes to the summed circumferential zone length that they have in common. By keeping the total constituent adjacencies and the summed zonal lengths at each stage these "zonal ratios" can be easily combined when two items are merged. I also prevented the fusion of groups with comparatively small overlaps, because the data for such cases would be correspondingly noisy and if they were to belong to a genuine bundle there would have to be an overlapping intermediate fibre in any case. This zonal ratio system does, however, permit bundles that are longer than some, or even all, of the constituent processes, and this is an important feature of it.