A social network analysis of Twitter: Mapping the digital humanities community

: Defining digital humanities might be an endless debate if we stick to the discussion about the boundaries of this concept as an academic “discipline”. In an attempt to concretely identify this field and its actors, this paper shows that it is possible to analyse them through Twitter, a social media widely used by this “com-munity of practice”. Based on a network analysis of 2,500 users identified as members of this movement, the visualisation of the “who’s following who?” graph allows us to highlight the structure of the network’s relationships, and identify users whose position is particular. Specifically, we show that linguistic groups are key factors to explain clustering within a network whose characteristics look similar to a small world .


ABOUT THE AUTHOR
Martin Grandjean is a researcher in intellectual history at the University of Lausanne (Switzerland).He studies the structuration of scientific networks in the interwar period and develops network analysis and visualisation methods for archives and texts.Specialised in data visualisation, he leads parallel experiments in the fields of data-driven journalism, open data and social media analysis.He's member of the board of Humanistica, the French-speaking digital humanities association.

PUBLIC INTEREST STATEMENT
In recent years, the emergence of new technologies in the humanities and social sciences caused a major upheaval.Grouped under the term "digital humanities", thousands of researchers worldwide structure gradually their community around issues related to the use of new tools and methods.Understanding how this new community organises itself is a challenge because it takes very different forms depending on the institutions and scientific disciplines.
This article analyses the presence of the main actors of this community on Twitter, a social media where each user publishes very short messages to his subscribers.By analysing the "who's following who?" network among these 2,500 people, we discover who are the most connected individuals.Language groups are also very visible and allow to question the homogeneity of this community of practice.http://dx.doi.org/10.1080/23311983.2016.1171458us to overcome the disciplinary clashes, it makes it difficult to identify the borders, as the common denominator seems to be the "kinda the intersection of …" definition (Terras, 2010).Our study voluntarily chooses to focus on a particular field of expression of this community, a social network that has for many years been regarded as one of the main exchange places for digital humanities.Our goal is therefore not to draw conclusions that go beyond this very specific object, but to observe it in order to offer a transversal view of this movement, otherwise difficult to map with traditional methods.Let's seize the opportunity to ditch the useful "network" metaphor to apprehend it more formally, through a social media that embodies these relationships.

Twitter, a growing field of study
Twitter, 2 a social network created in 2006, is a place dedicated to personal expression that brings together hundreds of millions of users around its minimalist concept of microblogging.Its messages of 140 characters and its principle of "following" users without mandatory reciprocity, coupled to a very open application programming interface (API), make it an ideal medium for the study of online behaviour.Its simplicity makes it a frequently used tool to report current events.Hence, many studies analysing the diffusion of information consecutive to an event: an earthquake (Sakaki, Okazaki, & Matsuo, 2010), demonstrations such as the London riots (Beguerisse-Diaz, Garduno-Hernandez, Vangelov, Yaliraki, & Barahona, 2014;Casilli & Tubaro, 2012), international conferences (Grandjean & Rochat, 2014;Jussila, Huhtamaki, Henttonen, Karkkainen, & Still, 2014), teachings (Stepanyan, Borau, & Ullrich, 2010) or interactions on neutral corpus (Darmon, Omodei, & Garland, 2015).These "dynamic" analyses, which typically map networks of tweets, mentions and retweets out, owe their popularity to the availability of the material and the possibility for researchers to analyse its contents.They frequently lead to questions on influence measuring (Subbian & Melville, 2011;Suh, Hong, Piroll, & Chi, 2010), especially when it comes to political communication (Stieglitz & Dang-Xuan, 2012;Vainikka & Huhtamäki, 2015) or scientometry (Haustein, Peters, Sugimoto, Thelwall, & Larivière, 2014).
But when it becomes clear that the content of a user's tweets is not always indicative of his field of specialisation-due to the noise produced by the many personal messages, jokes, politics, etc.we need to turn to a network whose structure seems more readily analysable in terms of "community": the follow graph (Myers, Sharma, Gupta, & Lin, 2014).

Dataset
The prerequisite to this study was the preparation of a list of more than 2,500 Twitter users identified as part of the digital humanities community.We saw above that the definition of this field was subject to many changes: rather than stick to lists of association members or authors of a set of journals, we listed all users who identify themselves as being directly or indirectly part of this "community of practice".It is in the very short Twitter "bio" (160 characters) that we spotted the vocabulary linking these researchers together.First of all, it is by listing all the followers of the most visible users (national or international institutions, established professors and researchers in the field, Twitter accounts of scientific events, etc.) and by reviewing their biographies that a first selection was made.Within this corpus, we then randomly select a number of users and we also analyse their subscribers.This list is then enriched in three ways: by the identification of users who tweeted with specific DH conferences hashtags; through the self-reporting of users who, following the publication of blog posts about this research, announced to be part of the corpus; and finally, through harvesting the results of the Twitter search engine on a selection of keywords related to the digital humanities.
By its nature, this corpus cannot aim to be comprehensive, but it should be noted that it offers, unlike most official lists, to include a segment of the academic population (generally non-institutional) who doesn't publish or doesn't normally participate in official events.They did not wait to receive the DH "label" to assign themselves and see themselves as members of this community.Specifically, this article analyses the "who's following who?" relationships inside a Twitter list containing exactly 2,538 Twitter accounts of individuals or institutions (on 1 October 2015).This network is obtained after downloading-via the Twitter API-the list of all the followers of each of the accounts, then filtered according to whether they are themselves members of the list or not.It therefore only concerns the relations within this corpus, not the tens of thousands of non-DH users who follow these 2,500 accounts.

Result: an apparent small world
At first, the network of digital humanities on Twitter is a form of small world (Milgram, 1967), at least that's what suggests its visual representation 3 (Figure 1).It indeed shows an extremely dense network.Only one cluster seems to detach itself slightly, while another one, nearby somewhat distorts the very circular structure of the network.The size of the circles/vertices is proportional to the centrality degree of the users (the number of connections, followers and followings together), we note that only 11 of them exceed 1,000 connections.In addition, the colour of the circles shows the indegree (inbound degree, their followers only), allowing us to see that only 17 people (white circles) are followed by more than one-third of the users in the corpus.Median user follows 59 Twitter accounts from the list and is followed by 39 of them.
Are digital humanities-whereas describing themselves as a transversal field-finally a closed world where everybody knows everybody? 4In fact, despite its apparent homogeneity-its limited division into small communities-the density of the graph isn't extremely high.The density is calculated based on the number of possible edges in the network, its value here is 0.036, on a scale from 0 (no edge) to 1 (an edge between every 2,500 vertices).
Even if the network can be structurally considered a small world under the terms of (Watts & Strogatz, 1998), with a high average clustering coefficient (0.366) and a reduced average path length (2.297, with a maximal distance of 5), the application of this concept to an asymmetrical social media remains unclear.
These first elements should not make us forget that this network is a visual representation of a set of data whose complexity is not limited to a simple graphical rendering.Beyond a certain aesthetic, sometimes very suggestive, it is in its ability to generate new research questions-pushing the researcher to get back into the data itself-that a network analysis proves his interest. 55. To follow or to be followed?
Prior to the benefit from more advanced structural measures, the first analysis that we propose is the comparison of the ratio between followers and followings.Figure 2 visualises this relationship as a scatter plot, supported by two bar charts that summarise the distribution of these two values.First observation: more than half of the users follow less than 100 people and are themselves followed by less than 100 people (category A, 63.1%).The vast majority of the corpus is actually made up of very weakly connected users, information that the network visualisation (Figure 1), with its totalising aim, tends to make us forget.
Traditionally, it is considered that users that are highly followed are personalities and institutions whose influence and reputation is superior to users who subscribe to a large number of accounts without themselves being widely followed.We can now distinguish six categories of users, based on their followings/followers ratio (assuming category A users are excluded from this ranking due to their insignificant number of connections): • Category B: users who follow at least four times more users than they have subscribers (1.3%).
• Category C: users who follow at least twice as many users than they have subscribers (6.6%).

Figure 2. Followings and followers among peers, and frequency distribution.
• Category D: users who follow up to two times more users they have subscribers (13.8%)-This is the largest population of this corpus, behind category A.
The first three categories bring together users who use Twitter a technological monitor.Without necessarily creating content that will make them influencers (even if it's not incompatible), a significant portion of these users is kept informed of the news of their research fields through this social media.It is also to be noted that following a large number of users obviously has a social function that has nothing pejorative.Subscribing to a large number of users typically increases the number of followers (the people being followed are notified of the subscription, they discover their new subscriber and sometimes follow him or her back if interested).
• Category E: users who are followed up to two times more than they follow themselves (8.7%).
• Category F: users who are followed at least twice as much as they follow themselves (4.1%).
• Category G: users who are followed at least four times more than they follow themselves (2.3%).
In the last three categories, we find users who are followed by more users than they follow themselves, generally because they occupy a privileged position in the field (journals, institutions, associations, advanced academic positions, prominent figures in the community or content producers).While the border between categories D and E isn't very significant, the presence of a user in categories F and G is very indicative about his behaviour on the social network.It is indeed among these last two categories that we can find some of the "stars" of the field (in the sense of Moreno, 1934, which lays the basis of network analysis, where the stars are individuals who focus incoming relations).
However, with a little distance, it should be noted that the presence in one or the other of these categories is not a definitive marker of the user's position in the field: having a very high ratio does not always mean being an influential person, but sometimes simply shows a rather elitist attitude (following hardly anyone, e.g.), or a popularity due to an external factor (being a renowned institution outside as well as inside the DH field, e.g.).Let's also recall that we're only analysing the followers/following ratio inside our corpus.A user with a very low ratio may well be followed by tens of thousands of Twitter users outside the community (and a "star" user can have no followers outside this network).

A geography of the linguistic communities
Beyond the apparent homogeneity of the network of these 2,500 Twitter users, the geographical, cultural and language distribution must be questioned.While digital humanities are often seen as an essentially English-speaking movement, many local or linguistic communities have emerged in recent years, claiming for their specificities not to be embedded in a large English-speaking congregation.While the geographical issues do not always tally with the language issues (French is spoken in Europe, Africa and North America, Spanish and Portuguese in Europe and South America and English on every continent, at least as a second tongue), national, regional or linguistic associations are emerging, 6 as a "special interest group" of the Alliance of Digital Humanities Association (ADHO), dedicated to the promotion of diversity. 7However, the Internet, in general, and Twitter, in particular, are highly globalised places.It is not uncommon for a user to overlook national and linguistic borders as he or she follows the publications of a very wide variety of users.Therefore, are the language communities discernible in our data-set?And if so, how to judge their representativeness regarding the "real world"? 8 Analysing the language of the tweets posted by users from our corpus for a given period is a chimeric operation, both by the amount of "noise" to disambiguate and by the nature of the content of tweets that are often multilingual.Fortunately, the Twitter API provides, for each of its users, the language of the interface used.Even if English is often the default language, the proportion of accounts using another language is important in our list (27%).Figure 3 visualises, on the same graph as Figure 1, the interface language of our 2,500 Twitter users.It appears very clearly that the two major "clusters" we were already able to distinguish beforehand correspond to very well-defined linguistic communities.In particular, the French-speaking community is almost completely detached from the main group.To a lesser extent, the German-speaking community is clearly circumscribed.Far behind, constituting the third largest non-English speaking community, Spanish-speaking users are also all in the same area of the graph but do not come off from the main group as clearly as the previous two.The remaining users, particularly small Italian-speaking and Dutch-speaking communities, are spread in a kind of "global village" at the intersection of all the other communities.
Two important notes for reading this graph: • The community of a given language is not limited to the individuals who use Twitter in the concerned language: there are many French or Germans using Twitter in English in the identified clusters.We estimate that we can add about 30% to the total of the linguistic communities presented in Figure 3, diminishing the English-speaking community in the same proportion.
• The spatialisation of the network is obtained by a force algorithm, 9 which means that the proximity between two vertices cannot be interpreted as a real proximity to each other: as in all network visualisations, this geography is the result of a complex calculation that takes into account each of the edges (there are 236,000).
One thing remains: the French-speaking community is particularly isolated.We can elaborate several hypotheses and questions: is English less used there than in other non-English speaking communities?Or at the opposite, is it a language less mastered in the other regions, justifying that French users are followed less because they are less understandable?Is the French-speaking digital humanities community important and structured enough to be less dependent on English references?Or are the practices so different that the need for skills transfer is less strong with this community than with others?Is it finally only a bias related to the social media analysed, where behaviours differ according to local "cultures"?Besides, we also note that in the French-speaking cluster, we find most of the French users in the peripheral group.Most of the Swiss, Belgian and Canadian users are rather positioned at the intersection with the other linguistic communities, and thus less isolated.
If the position of the French-speaking community is surprising, it is also because of the comparison with other language communities that we would have expected to be a stronger presence.Rather than seeing the French position as abnormal, is it not worrying to observe such a fusion between the Spanish-and German-speaking communities and the main group?And what about the users using an Italian (36), Dutch (24), Portuguese (10) or other marginal languages (40) interface?Note that the language distribution within the digital humanities community on Twitter is not comparable with the general distribution of languages in the world, or with the distribution of languages in usually studied tweets sets (Hale, 2014; working on a 2011 data-set).This is not the consequence of a biased data-set but simply a research field that is not (yet) globalised and remains in its major part a European and North American phenomenon (which is also demonstrated by the geography of THATCamps, the emblematic manifestations of this "community of practice", see Grandjean, 2015b).

Measuring structural features
Using a formal network only to be satisfied by a comment on its visual characteristics is to miss its structural characteristics.Centrality is a way to quantify the importance of the vertices in a network: its different declinations are frequently used in social network analysis to identify and highlight specific positions (Newman, 2010).
We will therefore seek to go beyond the visual representation in order to list the users of our corpus holding a remarkable structural position.This process is not restricted to online social networking and has been used since the works of Freeman (1978).As in Rochat (2014), Table 1 shows the values of four centrality measurements from our corpus.Two of them, the In-and Out-degrees have already been exploited above.These are also the easiest to define as they are immediately translatable into Twitter's language, respectively "followers" and "followings".The Betweenness centrality, which measures the number of times a vertice is present on the shortest path between two other vertices, highlights users who are structurally in a "bridge" position between the subdivisions of the network.The Eigenvector centrality assigns each vertice a score of authority that is based on the score of the vertices with which it is connected.Table 1 is completed by Figure 4, allowing readers to get a sense of the geography of the measurements obtained and to clarify their distribution.

In-degree
The number of followers decreases very rapidly within the top 100 users.The most followed account is @dhnow (Digital Humanities Now 10 ) whose aggregation mission seems to be recognised by the community.Then, follow leading figures, institutions, associations and publishers.Spatially, a high inbound degree is not the exclusive privilege of one of the "linguistic communities" studied above.Note that the most followed users are still generally-and logically-located in the "heart" of the network, between the central "global village" and the English-speaking region.

Out-degree
Some massively followed accounts are themselves following very few users from the list.Consequence: the classification according to the out-degree is quite different from the previous one.
The account that follows the largest number of users is @DHInstitute (Digital Humanities Summer Institute, 11 University of Victoria), an event that presumably seeks to bring together the community.The distribution in "long tail" of this measure is less pronounced than for the in-degree.This can be explained very naturally because on Twitter, the majority of users follow more people than they are followed themselves.Except for a few users with a high in-degree but a very low out-degree, the distribution of this measure on the graph is very similar to the previous one.

Betweenness
In a harmoniously distributed network, highly connected accounts (with a high centrality degree) are usually also the ones most often being on the shortest path between the vertices of the graph.But we have seen that our network contains clusters that are detached from the main structure.It is therefore logical that we find individuals with high betweenness in the area which is located at the intersection between the main network and the French and German clusters.These users-often French or German speakers engaged in international structures as ADHO or EADH-are transmission belts between different regions of the graph.

Eigenvector
As the eigenvector centrality is assigned to the vertices according to the score their neighbours received, it produces a result that highlights the very connected users within the larger group of our graph.Here, this measure of authority no longer focuses our attention to the periphery and to the inter-community "bridges", but rather to the centre and its English-speaking majority.Except a few hyper-connected users who monopolise the top positions in almost every centrality ranking, we see here less cosmopolitan users, better "installed" in their English-speaking environment.
We will avoid considering these measures as indicators of influence.They document the network structure, not the nature and content of the relations themselves.They nevertheless allow to pinpoint patterns whose study should be coupled with an analysis of the position of these users in the world of academic hierarchies, publications or co-directions of research projects.

Limits and perspectives
The mode of creation of the data-set can, at least partially be a factor in the small-world visual impression: if the majority of the 2,500 users were detected because they were following a more "visible" account, then the high density of the network is logical, even though an effort was made to focus on minorities.Self-determination also has its limits: a person that all his colleagues would describe as a "digital humanist" but describes himself on Twitter with a biography that does not describe his scientific activity may pass under the spectrum of our analysis.Note that it is possible to overcome this problem by no longer focusing on biographies of registered users but on their structural characteristics themselves.The next step of this analysis could indeed be to list the hundreds of thousands of followers "off-list" from our 2,500 selected profiles and automatically integrate to the list those who follow or are followed by a determined number of members from the original list.A systematic way of grouping the community of those who, even without being practitioners, "follow" the latest research in DH.
Concerning the debate on the linguistic structure of the network, a limitation obviously comes from the language of the author.Speaking French, he is better able to explore this part of the network than another, something that could artificially produce a high clustering of his own linguistic community.This risk is minor here due to a special effort made to find a maximum of users representing the linguistic diversity of the field, particularly in German-, Italian-and Spanish-speaking areas.
The fact that the list is public is likely to skew the results of this analysis: conscious of having been added, some users could use it to discover and follow new users, which would have the effect of increasing the network's density.Similarly, we cannot exclude that the process has led some to discover the list author's account: they may have found the initiative or the profile interesting and will have therefore followed it, which could have caused a slight upgrade of the latter in the ranking.In the longer term, a public list is problematic because it is likely, gradually acquiring the status of "reference", to encourage compulsive subscription behaviours, such as users hoping to be "followed back" by colleagues.In itself, this behaviour is not a problem, it is a networking strategy that can be justified to socialise in a given community, but to use only one list for this is problematic: the more it is used for this purpose, the denser the network becomes, the more it impoverishes the diversity possibility in the field.But on the other hand, keeping this list public is mostly a way of giving the community a chance to discover unknown profiles and is a contribution to the friendly spirit of this social media.This also allows other researchers to use this corpus to conduct other types of studies: content analysis, interactions, biographies, shared links, etc.Also note that the representativeness of Twitter is widely debated (Mislove, Jorgensen, Ahn, Onnela, & Rosenquist, 2011;Sloan et al., 2013), and that it is established that the social network's users are not a sample image of the population (Duggan, Ellison, Lampe, Lenhart, & Madden, 2015;Miller, Ginnis, Stobart, Krasodomski-Jones, & Clemence, 2015).While this representativeness is crucial to draw political conclusions (Boyadjian, 2014;Vainikka & Huhtamäki, 2015), the universities' landscape and the digital humanities are themselves such a little representation of the population that these considerations are difficult to apply here.Hence, the need to combine our analysis to a qualitative survey of these areas to assess this very special representativeness.

Conclusion
In this paper, we found that defining digital humanities as a "community" avoids endless debates on its disciplinary boundaries but does not allow us to know who's practicing them today.As an attempt to identify this field, leaving aside the epistemological discussion, our study shows that this item is analysable through a social media widely used by the so-called "digital humanists" (2,500 users).In analysing the network of "who's following who?", it was found that a small number of individuals and institutions are focusing so much attention that the graph appears to be very homogeneous around them.The fact remains that many types of behaviour can be deduced from the graph and that structural characteristics of the network enable us to highlight some users holding remarkable positions.Specifically, we showed that French-speaking users, and to a lesser extent Germanspeaking users, stand out: the language factor strongly influences the network structure.
Obviously, any quantification leads to a form of objectification whose limits we need to understand.But we note that the availability of this type of data-set and the opportunities offered by tools and theories such as social network analysis allows us to shed new light on this community.You are free to: Share -copy and redistribute the material in any medium or format Adapt -remix, transform, and build upon the material for any purpose, even commercially.The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms: Attribution -You must give appropriate credit, provide a link to the license, and indicate if changes were made.You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.

No additional restrictions
You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.

Figure
Figure 1.Digital humanities network on Twitter: 2,500 users following each other.

Figure
Figure 3. Highlighting the interface language.

Figure 4 .
Figure 4. Spatial and statistical distribution of the four metrics.
and dialog with, expert editors and editorial boards • Retention of full copyright of your article • Guaranteed legacy preservation of your article • Discounts and waivers for authors in developing regions Submit your manuscript to a Cogent OA journal at www.CogentOA.com