# A TDC based on Carry-in Lines of the FPGA

Wei Wang, Hao Zhou, Pinbo Xiong

Chongqing University of Posts and Telecommunications, Chongqing 400065 China

## Abstract

The design and implementation of a high-resolution Time-to-Digital Converter in a Filed Programmable Gate Array is proposed. Dedicated carry-in lines in CARRY4 block of the Virtex-5 FPGA are utilized for time interpolation, which realizes the fine time measurement within a system clock period. Meanwhile, Place and Route (PAR) constraints are applied to eliminate the asymmetry of the delay chain, which results in very small integral nonlinearity (INL) and differential nonlinearity(DNL). The simulation results show that the time resolution of TDC is about 15ps RMS or 30ps per LSB, -1LSB< INL <6.5LSB, -1LSB< DNL< 2.5LSB.

## Keywords

The time-to-Digital Converter (TDC); CARRY4; Place and Route (PAR) .

## **1.** Introduction

Time to digital converter is a circuit which is used to measure time, which convert ananalog signal with time information to digital signal, and realize time measurement digitally [1]. Example of recent TDC related areas are time-of-flight (TOF) [2], time-over-thresh-old(TOT) [2], positron emission tomography (PET) imaging [3], laser radar devices [3], and high resolution digital oscilloscope apparatuses.

Up to the present, three are mainly two kind of TDC implementation techniques. One is ASIC-TDC and other is FPGA-TDC. Compared with ASIC-TDCs, FPGA-TDCs have the advantage of a reduced cost, shorter development time, and good felxibibility[3].

The most common way to design the FPGA-TDC is time interpolation. It includes tapped delay line TDC, phase-shift TDC, the Vernier TDC and so on.

In 2005, J.Wu and Z.Shi, et al[4] use cascade chains to implement TDC in Altera Cyclone II FPGA and present a wave union method to further improve the resolution. In 2008, based on two slightly differet controllable ring oscillators, a TDC with 40 ps resolution was implemented by Junnarkar, et a.l[5]. In 2010, R.Szplet and K.Klepacki[6]provide a new approach of pulse shrinking to implement TDC with 42ps resolution. In 2010, M.Buchele, H.Fischer et al[7] using phase-shift clock method achieved 160 ps resolution in Xlilinx Virtex-5 device.

This paper describes a tapped delay line TDC in Xilinx Virtex-5 ML507devices. In second section, we present the architectcture of TDC and the optimize structure of tapped delay line. In third section, the experiment results are discussed. Finally, we conclude this paper and summarize what has been achieved.

## 2. TDC Architecture

The architecture of the prosposed TDC circuit is shown in Fig1.three are two fine measurement channels and a coarse counter to measure the time interval between the start signal and stop signal. Each channel is mianly consisted of tapped delay line, DFF arrays, encoder circuit, calibration circuit.

#### 2.1 Coarse measurement

TDC circuit is mainly consisted of fine measurement and corase measurement. In corase measurement ,it easily appears a metastable phenomenon due to asynchronism of HIT signal, this paper introduces a methods of using two counters(shown in Fig 2) to mesause, one counts the rising

clock of the system clock, other counts the falling edge of the system clock. Using the result of the fine measurement, a stable coarse time count value is always obtained[8].



Fig.2 Coarse time counter

Cnt1 Cnt2 (b) Timing of the coarse time cunter

N+2

N+3

N+1

Cnt2

#### 2.2 Fine measurement

For fine measurement, J. Wu [4] use a tapped line and wave union, Junnarkar [5] introduce a method of Vernier based on two slightly differet controllable ring oscillators. In our dsign, we focus on the "pure" TDL method after malking a careful trade-off amog resoluton, difficulty and nonlinearity. Dedicated carry-in lines are utilized for time interpolation to cascadea delay line by J.Wu [4]. This paper uses dedicated carry-in lines in CARRY4 block of the Virtex-5 FPGA to cascade a tapped delay line, the architecture of CARY4 is shown in Fig3. Each CARRY4 has ten independent inputs and eight independent outputs [9]. We use the input CYINIT to initiate the tapped delay line and CIN to cascade slices upward to form a longer carry chain.



Fig.3 Block diagram of CARRY4 block

Each CARRY4 is mainly consisted of 4 MUXCY and 1 extra MUX, Simulation results show that the delay from CIN to COUT in CARRY4 block is as large as 104ps. Therefore, it is of no use to map the whole CARRY4 block as a basic dealy cell if we try to obtain a high resolution. So we should properly divide the CARRY4 and get finer tap points within a carry chain. In order to shorten and keep propagation delays under control, we apply Place and Route (PAR) to all Configurable Logic Block (CLB) of FPGA. No manual routing was done.

AS mentioned above, we need to subdivide the CARRY4 to achieve a finer measurement scale. We take 4 tapped outputs (C0, C1, C2, C3), the MUXCY is used as a basic dealy element and the signal from the output port of CO (3:0) can be latched with DFF arrays. In this case, we can calculate the real bin width by a code density test. Fig.4 shows the statistical analysis of the TDC bins in this case; the averaged bin size is now around 30ps. We can see lareg variations present in bin width. With a relatively regular pattern of one larger bin. It is important to highlight the fact almost the half of the taps do not present any delay at all. This might occur because of the intrinsic structure of trh CARRY4. In order to flatten this additional inhomogeneous delay, we propose two methods: software calibration, we can get the asymmetric distribution of the bin width using MATLAB and analysze its influence on the linearity of the TDC. Then, we can easily compensate this dealy variation during the offline data processing. In this design, we focus on a hardware compensation which is illustrated by Configuration 1in Fig.3. We obtain the asymmetric delay distribution inside one CARRY4, and recorganize the delay taps within it, the MUX and one MUXCY in first group and the other three MUXCYs in the second group to compensate the asymmetric delay between the two cells. Now we take the taps from the port 'AQ' and 'DQ'. The statistics analysis of the TDC bins in Configuration 1 is show in Fig.5. This compensation require no extra resource, and thus is more attractive.



#### 2.3 Encoder and Calibration

Encoder circuit is used to transform the thermometer code to will be conveniently stored and operation. Dicotomous search algorithm is usually used for transforming thermomter code to binary code. Influenced by PVT (Process, Voltage, Temperature), the dely time of each delay element will be different. In order to get accurate measure result, the dealy linr will be needed to be calibrated

before measurement. The code desity test is used to calibrate the tapped dealy line. Code density is a bin by bin calibrate method. The principle of the code density is shown in Fig. 6. There is no correlation between between HIT signal and CLK signal. That means the probability of HIT signal hit the CLK within a cycle at any point were equal. Assuming that, in N test the number of M appear k times in the encoder results. Then we can conclude that the dealy time of Bin<sub>M</sub> can be written as

$$T_n = \frac{K}{N}T\tag{1}$$

Where, T is the cycle time of CLK, where M is the encoder result. By using this method, we can calculate the delay time of each dealy element.

Then the fine measurement result tn can be wtitten as

$$t_n = \frac{T_n}{2} + \sum_{k=0}^{n-1} T_k$$
(2)

Where n is encoder result.



Fig.6 The code density test principle of HIT signal

## 3. Results and Discussion

#### 3.1 Differential Nonlinearity and Integral Nonlinearity

Characterization of the differential and integral nonlinearities we performed using the statistical code density test [10]. According to formula (3-4), typical results from the characterization of the differential and integral non-linearity are shown in Fig. 7 and Fig. 8, respectively. The differential non-linearity (DNL) was about (-1,2.5) LSB, while the integral non-linearity was about (-1,6.5) LSB.

$$DNL_{i} = \frac{d_{i+1} - d_{i} - LSB}{LSB}, i = 0....N - 1$$
(3)

$$INL_{i} = \frac{d_{i} - i \bullet LSB}{LSB}, i = 0....N - 1$$

$$\tag{4}$$

Where d<sub>i</sub> is delaytime of zeroth tap to ith tap, LSB is the average delaytime of each tap.



Fig. 7 The DNL of tapped delay line



Fig. 8 The INL of tapped delay line

## 3.2 The Precision of Time Measurement

The precision of a TDC, which is the most important parameter for a time measurement system. We evaluate the overall precision through measurement of the delay introduced by a pulse generate with a certain interval time. A single-shot precision is obtained by the precision of TDC divied by  $\sqrt{2}$  [10]. According to formula (5-7), the typical result is shown in Fig. 9. The precision of measurement system is 21.6ps and a single-shot precision is 15ps. All the results are summaried in Table 1.

$$U_n = \frac{\sum_{i=1}^m q_i}{n} \tag{5}$$

$$S_n = \frac{\sum_{i=1}^{n} (q_i - U_n)^2}{n - 1}$$
(6)

$$RMS = \sigma = \sqrt{S_n} \tag{7}$$

Where  $q_i$  is the value of ith measurement,  $U_n$  is the average value of all n measurements, RMS is the precision of TDC.



Fig. 8 The RMS of Time Measurement

| Table 1. TDC Summary |               |               |              |  |
|----------------------|---------------|---------------|--------------|--|
|                      | Reference [6] | Reference [8] | This article |  |
| FPGA                 | Spartan-3     | Vitex-4       | Virtex-5     |  |
| LSB(ps)              | 42            | 50            | 30           |  |
| RMS(ps)              | 56            | 25            | 15           |  |
| DNL(ps)              | (-42,42)      | (-50,100)     | (-30,75)     |  |
| INL(ps)              | (-210,168)    | (-75,150)     | (-30,195)    |  |

| Table 1. TDC | Summary |
|--------------|---------|
|--------------|---------|

## 4. Conclusion

A hign resolution TDC circult with tapped delay line is implemented on Xilinx Virtex-5 ML507 development board. The tapped delay line was built by chained CARRY4. A new strategy is applied to calibrate the non-uniformity of the delay cells. The non-uniformity coming from the asymmetry is compensated directly in the FPGA design. Test results indicate that the LSB of each TDC channel is about 30ps, the resolution of each TDC channel is 15 ps, the DNL of each TDC channel is about (-1,2.5)LSB and the INL of of each TDC channel is about (-1,6.5)LSB.

# References

- [1] W. Pan, G. Gong, and J. Li, A 20-ps Time-to-Digital Converter (TDC) implemented in Field-Programmable Gate Array (FPGA) with Automatic Temperature Correction [J]. IEEE Trans. nuclear science, 2014, 61(3):1468-1473.
- [2] M. Rahim, R. Antoine, L. Arnaud, et al. Position sensitive detection coupled to high-resolution time-of-flight mass spectrometry: Imaging for molecular beam deflection experiments[J]. Review of Scientific Instrume, 2009, 75(12):5221 -5227.
- [3] L. Zhao, and X. Hu, The Design of a 16-Channel 15 ps TDC Implemented in a 65 nm FPGA[J]. IEEE Trans. nuclear science, 2013, 60(5):3532-3536.
- [4] J. Wu, and Z. Shi, The 10-ps wave union TDC: Improving FPGA TDC resolution beyond its cell delay[C]. IEEE Nucl. Sci. Symp. Conf. Rec, 2008, 3440-3446.
- [5] S. Junnarkar, P. O'Connor, and R. Fontaine, FPGA based selfcalibrating 40 picosecond resolution, wide range time to digital converter[C]. IEEE Nucl. Sci. Symp. Conf. Rec, 2008, 3434-3439.
- [6] R. Szplet, and K. Klepacki, An FPGA-Integrated Time-to-Digital Converter Based on Two-Stage Pulse Shrinking[J]. IEEE Trans. instrumentation and measurement, 2010, 59(6): 1663 -1670.
- [7] M. Büchele, H. Fischer, M. Gorzellik, et al. A 128-channel Time-to-Digital Converter (TDC) inside a Virtex-5 FPGA on the GANDALF module[J]. J. Instrumentation, 2012, 37(3): 1827 -1834.
- [8] J. Wang, and S. Liu, A Fully Fledged TDC Implemented in Filed-Programmable Gate Arrays[J]. IEEE Trans. nuclear science, 2010, 57(2):446-450.
- [9] Xilinx. Virtex-5 Libraries Guide for HDL Designs.
- [10] J. Wu, Several Key Issues on Implementing Delay Line Based TDCs Using FPGAs[J]. IEEE Trans. nuclear science, 2010, 57(3):1543-154.