Dataset of coded handwriting features for use in statistical modelling

The data presented here is related to the article titled, “Using handwriting to infer a writer's country of origin for forensic intelligence purposes” (Agius et al., 2017) [1]. This article reports original writer, spatial and construction characteristic data for thirty-seven English Australian1 writers and thirty-seven Vietnamese writers. All of these characteristics were coded and recorded in Microsoft Excel 2013 (version 15.31). The construction characteristics coded were only extracted from seven characters, which were: ‘g’, ‘h’, ‘th’, ‘M’, ‘0’, ‘7’ and ‘9’. The coded format of the writer, spatial and construction characteristics is made available in this Data in Brief in order to allow others to perform statistical analyses and modelling to investigate whether there is a relationship between the handwriting features and the nationality of the writer, and whether the two nationalities can be differentiated. Furthermore, to employ mathematical techniques that are capable of characterising the extracted features from each participant.


Specifications
Forensic science More specific subject area

Handwriting examination, forensic intelligence
Type of data Handwriting specimens were scanned and coded

Experimental features
Features of each participants' handwriting were selected, extracted and coded Data source location Handwriting specimens were collected from people living in the Sydney region (and whom had learnt to write English in New South Wales). All original documents are stored at the University of Technology Sydney (UTS), 15 Broadway, Ultimo, Sydney, NSW, 2007.

Data accessibility
The data is made available with this article Related research article Agius et al. [1].

Value of the data
This data represents a complementary method for using handwriting features to obtain useful knowledge about the source.
The data may help to expand the contributions of handwriting examination beyond answering the traditional Court-oriented questions and become more involved in a forensic intelligence framework.
Researchers may use this data as a building point to trial different statistical techniques and modelling tools in order to determine whether any relationship exists within the dataset.

Data
The data is presented in Excel spreadsheets, where each column is headed by a particular handwriting characteristic with a description of the relevant codes for each feature. Table 1 below provides a description of the data in each spreadsheet. The first spreadsheet is titled "Writer & Spatial Characteristics". Writer characteristics are the traits specific to the individual writer and were taken from a survey filled out by each participant (see below). Spatial characteristics are the height and/or width relationships within and between individual characters and words, and how these are combined to form lines of words and paragraphs. The following six spreadsheets ('h' and 'th' are combined onto one spreadsheet) are the construction characteristics extracted for the characters 'g', 'h', 'th', 'M', '0', '7' and '9'. Construction characteristics are handwriting features that indicate how the writer forms a letter, number or symbol, e.g. number, position, order and direction of strokes. Please note, Vietnamese writer 11 was not included in the spreadsheets for the construction characteristics of the letters and numbers, as these characteristics could not be extracted from their handwriting due to the writing instrument that they used.

Experimental design, materials, and methods
The method for handwriting collection is conveyed in Agius et al. [1]. Briefly, seventy-four writers completed the handwriting collection package (37 Vietnamese and 37 English Australians). The  participants were required to copy out the source document (Fig. 1); the alphabet in upper and lower case and the numbers zero to ten (Fig. 2); and postal addresses (Figs. 2 and 3). The source document contained all twenty-six letters of the alphabet, upper and lowercase, and the numbers 0-9. It was copied down onto a 24-lined piece of A4 white copy paper with the lines 0.8 cm apart and a 2 cm margin [1]. The handwriting features that were chosen for extraction were separated into writer, spatial and construction characteristics. They were recorded or coded depending on the type of data, i.e. the age of the writer was recorded as is, however descriptive information like the direction of the stroke was coded zero for anticlockwise and one for clockwise. The spatial characteristics were extracted only from the source document as it had a sufficient quantity of writing as opposed to the other pages of the collection package. Fig. 4 is a diagram demonstrating the spatial characteristics extraction process.  Table 2 summarises the codes of the qualitative spatial characteristics. Fig. 5 provides a visual depiction of how the feature, the average height of 'a', 'e' and 'o', was measured. The construction characteristics for 'g', 'h' and 'th' were extracted from the source document and from the individual upper and lowercase sections in Fig. 2. They were not extracted from the postal addresses in Fig. 2 as the writing tended to be cramped because there was not much space to write out the postal address, or Fig. 3 as many of the participants wrote the addresses in uppercase. The characters 'M', '0', '7' and '9' were taken from all three forms in the collection package. Table 3 shows how many of each character had their construction characteristics coded in the collection package for each person.
Examples of the construction characteristics extracted for each character and their numerical code are presented in Table 4. This is provided in order to visually convey how the data was created, allowing researchers to make an informed selection of (a) statistical technique(s) which may be more suitable to analyse and model the data.    (1) 0 No space between salutation and text body 1 Space between salutation and text body Paragraph (2) 0 No space between text and valediction 1 Space between text body and valediction Table 4 Examples of all of the construction characteristics extracted for each character and their respective codes.

Transparency document. Supplementary material
Transparency document associated with this article can be found in the online version at http://dx. doi.org/10.1016/j.dib.2017.12.014.

Appendix A. Supplementary material
Supplementary data associated with this article can be found in the online version at http://dx.doi. org/10.1016/j.dib.2017.12.014.