Development of an Autonomous Redundant Attitude Determination System for Cubesats

This paper describes fundamental aspects of the development of a CubeSat nanosatellite attitude determination system with triple redundancy and fault tolerance tools. The system, named SDATF (acronym from the Portuguese name Sistema de Determinação de Atitude com Tolerância a Falhas), calculates the attitude parameters that describe the vehicle spatial orientation. The algorithm adopted for attitude determination is the QUEST method, which considers quaternions as parameters, and was tested and compared with other methods before this choice (Duarte et al. 2009). The SDATF has a planned in-flight test on a mission of the satellite NanosatC-BR2, in the context of a program of the Brazilian National Institute for Space Research (INPE, Instituto Nacional de Pesquisas Espaciais), shown in Fig. 1, where it will fly as an experiment payload, but without responsibility on the attitude determination and control strategy of this CubeSat (Garcia et al. 2018; Schuch et al. 2014). https://doi.org/10.5028/jatm.v12.1166 ORIGINAL PAPER


INTRODUCTION
The primary objective of this system is to provide the satellite attitude with maximum precision from nonspatially qualified electronic components or commercial-of-the-shelf (COTS) components. A secondary goal, no less important than the first one, is to provide the attitude with maximum reliability and availability during the mission, protecting the system from harmful effects of space radiation. The main COTS devices in the system are three microcontrollers STM32F303CC ARM Cortex -M4F and one three-axis magnetometer XEN 1210. In addition to them, other components complete the system electronics. Figure 2 shows an image of the SDATF board design with the arrangement of its components. The integrated circuit memory cells are subject to errors due to the impact of energetic particles, such as heavy ions, neutrons and protons, when exposed to space radiation (Mukherjee 2008), leading to temporary errors and unwanted behavior. Such errors are known as single-event effects (SEE) and can be destructive, single-event latchups (SEL), single-event transients (SET), singleevent upsets (SEU). This paper focuses on SEU events (Wang and Agrawal 2008).
Excessive radiation loads neutralize the logical state of the volatile memory cells of COTS devices. This phenomenon is logically perceived as an inversion of a bit of memory, which may affect the application, causing a system lock or a harmful error in the calculated attitude values, compromising system reliability and availability (Wang and Agrawal 2008). These SEU type errors are temporary failures affecting mainly the volatile memory of integrated circuits. The faults may go unnoticed by the application if the algorithm does not use at that time the specific location of the affected memory, or write other information there and the error caused by the fault disappears. In the opposite case, the error affects the calculation results, compromises attitude determination and can cause problems in the satellite control.
Some studies report the occurrence of disorders related to this type of event in space vehicles and stratospheric balloon missions (Harboe-Sorensen et al. 1990;Hubert et al. 2014). For instance, during 600 days of the ROSAT satellite operation, 40 SEUs were detected (Sims et al. 1993). Other works address the development of instrumentation to simulate in a laboratory the incidence of the phenomenon in integrated circuits (Velazco et al. 2000;Faure et al. 2005), as well as engineering solutions proposals to overcome the problem (Duarte et al. 2011;Kastensmidt et al. 2006). However, a methodology to do this is systematically used to prove that the proposed solution is effective.
This paper seeks to fill a gap of studies involving methodologies to evaluate the effectiveness of solutions to avoid errors associated with SEU events. The main contribution is the detailed description of choices made in the SDATF autonomous system engineering process. Preliminary fault tolerance tests using bit flip injection in the microcontrollers testify the proposed strategy adequacy.

THE SDATF MODULES
The SDATF embedded software is responsible for calculating the spacecraft attitude. This requires several auxiliary code modules for computing attitude parameters. The main module shall receive from the on-board computer (OBC) the information of the position of the Sun. Other information is received such as modified Julian date (MJD), time of day in seconds, two-line element (TLE) orbit data, and geomagnetic field vector read by the magnetometer installed on the SDATF board itself.
One module is responsible for providing orbit propagation, i.e. it obtains the position and velocity of the satellite at the present time, as this information is required for calculations of the geomagnetic field and the position of the Sun. The inputs are the last TLE and the time elapsed since the TLE definition.
Another module is responsible for calculating the geomagnetic field vector using a truncated model from the IGRF12 model (Thébault et al. 2015), which will be used together with the vector measured by the magnetometer in the attitude parameter calculation process. The inputs to this module are time reference of the model (year), distance from the satellite to the center of the Earth, colatitude and east longitude. Another module performs the vector calculations of the Sun's direction in the geocentric inertial reference frame from a mathematical model.
Auxiliary computations, such as coordinate changes, multiplications, and other mathematical operations involving vectors and matrices, are performed in modules specially programmed for these functions. Finally, the attitude determination module itself calculates the quaternions, which are the attitude parameters, using the available information through the algorithm based on the QUEST method.
More details about the SDATF software, as well as tests, results, and analysis regarding the performance and accuracy of calculations from the adopted mathematical models can be found in Garcia et al. (2018).

METHODOLOGY
This section describes the sequence of methodological procedures for building the proposed solution. The level of detail aims to allow and stimulate its evaluation and to present a reference for future developments of similar systems.
The first step in designing a solution is the search for information on integrated circuit technologies used to fabricate potential COTS candidates for use on the SDATF board. Next, it was necessary to obtain information from the manufacturer about the failures in time (FIT), defined as a failure rate of 1 per billion hours / Mbit software error measurement of the possible selected COTS components for the system design. Soft errors, i.e. errors changing the content of a memory cell, increasingly occur on high-density memory cell devices. Integrated circuits of technologies still in wide use, such as the 180 and 40 nm technologies, have soft errors in the range of 100 to 5000 failure in time/Mbit, respectively. The SEU-related soft errors affect SRAM cells (registers, buffers, cache memory, etc.) and microcontroller internal logic cells. As this project considers a microcontroller manufactured with 90 nm technology, the corresponding measure of time failure per Mbit is adopted for the next step. Modern microcontrollers, even equipped with redundancy features to avoid the effects of soft errors, still require some additional redundant application-level techniques to overcome these nondestructive effects.
The second step is to conduct a bibliographic survey of the reports studied, which obtained the statistical distribution of frequency with their respective SEUs measurement parameters in integrated circuits manufactured with technology similar to that used in candidate COTS, working in similar environments and conditions that the SDATF will be faced during its operation.
The objective of this step was to obtain similar failure rates as the main COTS components of the system will face in order to emulate this situation in the laboratory and validate the proposed fault-tolerant solution. The adopted reference study was presented by Faure et al. (2005).
The third step comprises the specification of requirements and limitations imposed by the functionality of the system application, especially regarding its reliability and availability. This stage should consider the study of the conditions of the space environment where the system will operate, and provide the normal and limited conditions of SEU occurrence for decisions taken in the next stage. The SDATF must provide the satellite attitude every second with the highest possible level of reliability and availability, even when the satellite is passing through space zones in its orbit subject to a high level of radiation that can cause SEU.
The fourth step consists of selecting a set of well-established and documented redundancy techniques in order to fully meet all requirements and to comply with the limitations imposed by the system application, under the environmental condition determined by the second and third stages of the methodology. Among the various available redundancy techniques in the fault tolerance area (Kastensmidt et al. 2006), e.g., some hardware and software redundancy techniques were selected: a compensation method based on dual redundancy processing with a spare ready to act; a detection method based on mutual activity monitoring with message forwarding in case of no response; and recording the same result in three different memory locations. Finally, by completing the set of selected techniques, the triple vote of the data stored in the three memory positions validates the calculated attitude.
The fifth step deals with the design of a circuit prototype, capable of being tested considering the emulation of SEU under the same environmental conditions shown in step 2. In this same step, the circuit can be tested for more extreme conditions. For example, testing the circuit by increasing the number of SEUs applied to the system memory cells by two, five, and ten times, compared to those reported in the references (Harboe-Sorensen et al. 1990;Hubert et al. 2014;Sims et al. 1993), to compare the limits of the proposed solution with the ones of selected redundancy techniques. For this purpose, a procedure has been developed to emulate SEUs in prototypes of microcontroller-based systems, as well as the SDATF board (Miranda and Duarte 2016). This system allows us to randomly generate a number of events according to the statistical frequency of Poisson distribution, introducing the necessary parameters to emulate the mission. Application outputs recorded in SDATF are monitored. Errors and correct operations are monitored and reported simultaneously to attest to the effectiveness of the proposed solution.
If the previous step is successful, move on to the physical design of an SDATF engineering model, which will be rigorously tested, followed by a flight model, which tests will be moderated to avoid damage to the final system model (see Fig. 2). The flowchart of the proposed methodology is shown in Fig. 3.
Step 1: Searching info about technologies of COTS Step 2: Analysis of the frequency of SEUs Step 3: Specification of requirements/limitations of the application Step 4: Selection of set of redundancy techniques Step 5: Design of circuit prototype, and tests of emulated SEUs

THE PROPOSED SOLUTION
This section describes in more details the proposed solution, including a description of the hardware structure, system operation and its interaction with the on-board computer (OBC), the fault tolerance scheme, and future flight testing. Figure 2 presents a description of the representative structure of the SDATF board with its main components. Above the microcontrollers, one can see an I 2 C bus, called an external I 2 C bus, which is used for communication between the SDATF system and the OBC.

BRIEF DESCRIPTION OF THE ON-BOARD HARDWARE STRUCTURE
On this bus, there is the traffic of data packets PCK_TASK (containing the information about the calculation task) and PCK_RESULT (containing the obtained results in the calculation tasks). Below the microcontrollers, there is a second I 2 C bus, called I 2 C internal bus (Fig. 4). It is through this bus that microcontrollers exchange data with each other, such as magnetometer measurement data, calculated attitude data, heartbeats, and internal control commands. Still below the microcontrollers, the SPI bus, which is another communication bus, can be observed. The microcontroller set in Master mode obtains data from the magnetometer through this bus.  Figure 4 also shows the reset signals of microcontrollers that have an external physical connection with different generalpurpose input/output (GPIO) pins of the OBC. This way, the OBC can reset all or individually any SDATF microcontroller on demand from a telemetry command coming from the base station on Earth. Each microcontroller pin is also physically connected to another GPIO pin of each microcontroller. Thus, any active microcontroller on the SDATF board can restart each microcontroller as long as there is any suspicion of inactivity of its active monitored microcontroller.

THE INTERACTION BETWEEN SDATF AND OBC
The SDATF acts as an independent attitude calculation server and asynchronous system with respect to OBC. The on-board computer at any time provides data packets with various calculation tasks containing the measurements and parameters sent via an I 2 C packet to the SDATF. This package, called PCK_TASK, contains the uplink telemetry commands and solar sensor data, with the respective time reference marked.
After some time, which is necessary for the attitude calculation by the SDATF, the OBC requests the results via a second I 2 C packet. This package is called PCK_RESULT and contains the calculated attitude and all data required for the ground validation process via downlink to the base station. The fault tolerance techniques involved in the system design are transparent to the OBC working mode. Therefore, the OBC has no control over the SDATF internal operation, internal exchange modes, or communication between the system calculation nodes. A single hardware reset interface has been added to allow OBC, via a remotely sent command from Earth, to perform an on-demand reset individually on any of the three microcontroller units (MCU) or a single simultaneous reset signal on all three MCUs. The interface that allows this direct command is the GPIOs connected to the respective microcontroller reset inputs. After power-up, the system executes internal boot routines and waits for the OBC command while in standby mode. No further command or intervention requires OBC's attention to start the SDATF system.

OPERATION OF THE SDATF
The goal of SDATF is to calculate the CubeSat three-axis attitude from measurements of the three-axis magnetometer installed on its own board, as well as from measurements of solar sensors. To calculate attitude, the system must receive as OBC data input: • The satellite position in orbit data; • The data read from solar sensors installed on the satellite; • The time in UTC format associated with solar sensor measurements; • The reset signal of the magnetometer and calibration commands.
The same firmware program is installed on three microcontrollers. This firmware provides a differentiated configuration for each microcontroller according to SDATF operation in flight. During normal CubeSat operation, the system has one of its microcontrollers configured in Master mode, another in Partner mode, and the third in Sleeper mode. As long as there is no change in the configuration mode between microcontrollers, this configuration is called the SDATF duty cycle. All MCU configuration modes with their responsibilities and tasks are described below: • Master Mode: It receives from OBC a data packet via an external I 2 C bus, called PCK_TASK packet. The microcontroller (MCU) is responsible for sending another data packet through the external bus I 2 C, called the PCK_RESULT packet, to the OBC computer, at its request. The MCU is responsible for processing the measurements, filtering, and calibration of data received from the magnetometer, and the data to be associated with UTC time. These data are obtained at a commonly adopted recording frequency of 40 Hz. The MCU is also responsible for recording the processed magnetometer data in three separate memory locations. The MCU shares this data with the other two microcontrollers via the internal I 2 C bus. This same MCU is responsible for calculating the attitude with a voting procedure of data stored in its memory resulting from magnetometer data processing. The MCU also requests the attitude calculated by the two other microcontrollers and resends the calculated attitude for the validation of the other two microcontrollers via the internal I 2 C bus. Validation of the calculated attitude, compared to the attitude received by two other microcontrollers, takes place in MCU Master. It regularly sends activity messages (called heartbeats) to the MCU in Partner mode and controls MCU Partner activity by regularly listening for this microcontroller with a maximum timeout. When it is suspected that Partner is not communicating, MCU Master should reset MCU in Partner mode. In this case, the Master must restart the suspected locking MCU Partner, which will resume activity in Sleeper mode; and wake up the MCU in Sleeper mode to become the new MCU Partner in the fault-tolerant system. • Partner Mode: The MCU receives an I 2 C external bus data packet, called the PCK_TASK packet, from OBC. It is the microcontroller responsible for receiving the processed data from the magnetometer supplied by the MCU configured in Master mode via the internal I 2 C bus. The Partner must calculate attitude using magnetometer data received from Master, along with data received by PCK_TASK package. The MCU Partner is responsible for sending the calculated attitude to the other two microcontrollers. It receives the calculated attitude of the other microcontrollers via the internal I 2 C bus. The MCU Partner must proceed with voting on its calculated attitude against the attitude received by two other microcontrollers. This MCU must record the voted attitude in order to keep a backup to prepare the PCK_RESULT package in case it becomes the new Master of a new round. The Partner regularly sends messages to the MCU in Master Mode, recording their activity (heartbeats). The Partner is responsible for controlling MCU Master activity through Master's regular heartbeat demand, within a maximum timeout limit. • Sleeper Mode: This MCU receives a data packet via an external I 2 C bus, called a PCK_TASK packet, from OBC. It also receives processed data from the magnetometer supplied by the MCU configured in Master mode via the internal I 2 C bus. The MCU Sleeper also calculates attitude using magnetometer data received from the Master, along with data received by the PCK_TASK packet. It sends the calculated attitude to the other two microcontrollers. The Sleeper must receive the calculated attitude of the other microcontrollers also via the internal I 2 C bus. The MCU Sleeper must proceed with voting on its calculated attitude compared to the attitude received by two other microcontrollers. It records the voted attitude in order to keep a backup to prepare the PCK_RESULT package for the case it becomes the new Master in the next round. The rest of the time is in low-power mode, waiting for a new PCK_TASK package to perform a new duty cycle or waiting for a control signal from MCU Master to become the new MCU in Partner mode in case of any communication failure of the Partner noted by the Master of the round. A microcontroller in Master or Partner configuration mode in a round is called active node. Each active node is responsible for monitoring the activity of the other active nodes in the round. There are two different ways for one node to monitor the activity of the other. The first way is to control the time of useful information exchange between them, where useful information is the treated data from magnetometer readings and attitude calculation data performed individually by each active node. The second way is to control the timeout limit for receiving a heartbeat between active nodes configured as Master and Partner mode. One MCU monitors the operational state of the other. After the attitude is calculated and verified by the master node, the Master and Partner microcontrollers are monitored by heartbeat exchange via the I 2 C internal bus. The Partner microcontroller continues to process the collected magnetometer data continuously, and store the digitally filtered sensor data in three different positions in its SRAM memory.
In order to make this monitoring scheme possible, some timer peripherals are configured in the software of each microcontroller. A first timer is dedicated to setting the heartbeats send timeout to another active system node. A second timer must set the elapsed time limit to wait for the heartbeat from the other active node. A third timer is dedicated to controlling the time limit that the microcontroller must wait to receive magnetometer data for attitude calculation. In addition, the same timer is used to control the time limit for receiving the calculated attitude of the other two active microcontrollers in an SDATF time window.
The calculation attitude record, as well as the voted attitude, should be stored in every three distinct volatile memory positions of each MCU. The goal is that each data used to calculate the attitude or each possible data to be sent to another MCU or OBC first go through a voting scheme (described in detail below) to ensure the reliability of the processed or in-use data.
The voting scheme consists of two phases. The first phase is a simple vote. Only the values stored in the three volatile memory positions are compared. The correct value is considered to be the value contained in most of the three different memory locations. If there is a triple disagreement in this first step, a second voting phase is required, which consists of a bit to bit comparison of the contents of the locations of the three memory locations that failed in the first voting stage. The bit value contained in most equivalent memory locations is considered to be the correct value to obtain the correct voted final value of the data to be used in the attitude calculation or sent to the other nodes.
To validate its calculated attitude with the attitude calculated by other system nodes, the Master mode microcontroller performs the same voting scheme described in the previous paragraph. The attitude calculation calculated by the Master is then stored in three different positions of volatile memory to assemble the PCK_RESULT package, with the result that will be required at any time by the OBC.
The OBC will be responsible for collecting the packets, storing them in its internal memory and periodically sending the packet to the ground telemetry station for ground validation. This package is called PCK_RESULT. The data collected by the packages obtained during the flight will be statistically analyzed and the results will be used for any corrections and improvements in the SDATF system.

FAULT TOLERANCE TESTS RESULTS
In order to perform fault tolerance tests, a prototype was assembled based in the microcontroller PIC24FJ64GA002 (Microchip Technology Inc 2007), thus allowing the mapping of sensitive system memory elements and the undesirable effects of the injected errors. In this context, the bit flip injection was emulated with a radioactive particle arrival rate as proposed by Nicolaidis and Velazco (2007), and with the setting of external event interruption on one target at a time of the attitude determination firmware. The bit flip injection was produced using the portable testbed for harsh environment of single event upset (PORTHES) system, a tool developed at the Intelligent Systems Laboratory of the Federal University of Minas Gerais (UFMG), which allows the application of the code emulating upsets (CEU) methodology proposed by Velazco et al. (2000).
In these batteries of tests, the automatic mode was used to inject the bit flips through the PORTHES graphical interface, where targets are selected by random upset injection, following the Poisson distribution with a rate of 2.3 upsets per second. In this graphical interface, it was possible to identify the system behavior and the effects caused by inserting upsets on this specific target. Table 1 presents the results obtained in those SDATF tests. The duration of each battery of tests was 2 min, and the showed results are the mean values calculated of the observations after three repetitions of the test procedure.
It can be seen that some register failures caused a system lock after terminating the PORTHES bit flip injection process. These registers are U2TXREG, used by the UART2 communication peripheral, which acts to transmit data between Master and Sampler, the WREG14, used as a stack user-defined frame pointer, has the function of allocating the memory, the address of the local variables (context change) and the LATA, which has the pins to reset the other two MCUs, among others. These observed problems indicate the necessity of providing some design changes in hardware and in the software improved to the flight release. Some special attention was dedicated to provide software redundancy in the case where low-system availability cases observed on bit-flip emulation sessions (see last column of Table 1) in order to achieve 100% availability. Concerning registers of which SEUs dealt to sequence-loss errors, the fault-tolerant system was able to resume operation without external intervention. For instance, in the case of IFS0 register, the system recovered and resumed communication with PORTHES after time latency. The general evaluation of these tests results is quite positive in terms of the capacity of the system operational functioning keeping.
Even so, in order to overcome the detected problems, the use of diverse communication channels and hardware interfaces was adopted in order to avoid the low availability observed due to UART communication use during test sessions. Due to this low availability cases observed, the flight SDATF design was modified for using two I2C communication channels and one SPI (as can be seen in Fig. 4). These design modifications resulted in providing diversity channels to data exchange and consequently higher rates of availability and fault recovery. The fault injection tools and the session tests accomplished, provide the necessary data to enhance the fault tolerance design and achieve the availability required by the mission.

CONCLUSION
The main contribution of this work is the procedure described and detailed in the engineering decisions and the sequence of steps required for the designing of a fault-tolerant scheme for the attitude determination system. The few detected problems concerning the system lock during SEU injection sessions allowed to observe possible weaknesses in the proposed fault-tolerant solutions and adapt it to overcome these problems. The SEU emulation campaigns allowed to identify potential weaknesses in the fault-tolerant solution and propose some software design adequacy in order to overcome the low-availability situations. The next steps of this system development will include the analysis of experimental data and validation of the SDATF in the planned flight test as a payload of the NanosatC-BR2 mission.
In addition, a new system proposal, which is more complete in terms of use on CubeSat nanosatellites, should include a control module, in this case constituting an attitude determination and control system (ADCS).